The lower-division data science courses are organized along the following four sequences. Course descriptions (and in cases numbers) are being revised as the details of the program are evolving.

- Introduction and overview (COGS 9)
- Data, Inference, Prediction, and Computation (DSC 10, 20, and 30)
- Data meets Theory (currently special editions of CSE 20 and 21; DSC 40 A-B after the course revisions are completed)
- Practice and Application of Data Science (DSC 80)

**Introduction and Overview**

#### COGS 9: Introduction to Data Science

Instructor: Brad Voytek

Prerequisites: None

**Data, Inference, Prediction, and Computation**

The following three course together provide the basic skills to deal with data statistically and computationally.

#### DSC 10: Principles of Data Science

This introductory course develops computational thinking and tools necessary to answer questions that arise from large-scale datasets. This course emphasizes an end-to-end approach to data science, introducing programming techniques in Python that cover data processing, modeling, and analysis.

*Extended description*: First, how can data be extracted that describes real-world phenomenon? This part of the course includes data collection, processing, and cleaning ("munging"), and dealing with formatted and semi-formatted data (e.g. json). Second, how can data be modeled, and used to make predictions? This includes methods in regression and classification, and experimental design. And third, how can the results of this analysis be understood and reasoned about? This includes topics in visualization, and methods for hypothesis testing and validation. The course will involve hands-on analysis of a variety of real-world datasets, including economic data, document collections, geographical data and social networks.

Instructor(s): Marina Langlois, Janine Tiefenbruck, Julian McAuley

Prerequisites: None

#### DSC 20: Programming and Basic Data Structures for Data Science

Provides an understanding of the structures that underlie the programs, algorithms, and languages used in data science by expanding the repertoire of computational concepts introduced in DSC 10 and exposing students to techniques of abstraction. Course will be taught in Python and will cover topics including recursion, higher-order functions, function composition, object-oriented programming, interpreters, classes and simple data structures such as arrays, lists and linked lists.

Instructors: Marina Langlois, Julian McAuley

Prerequisites: DSC 10

#### DSC 30: Data Structures and Algorithms for Data Science

Builds on topics covered in DSC 20 and provides practical experience in composing larger computational systems through several significant programming projects using Java. Students will study advanced programming techniques including encapsulation, abstract data types, interfaces, algorithms and complexity, and data structures such as stacks, queues, priority queues, heaps, linked lists, binary trees, binary search trees and hash tables.

Instructors: Marina Langlois, Julian McAuley

Prerequisites: DSC 20

**Data meets Theory**

The following two courses provide a theoretical foundation for data science at the lower-division level.

#### DSC 40A: Theoretical Foundations of Data Science I

Currently a special edition of CSE 20 aimed at data science majors and minors will serve the role of DSC 40A as the course revision is being approved.

This course, the first of a two-course sequence (DSC 40A and DSC 40B), will introduce the theoretical foundations of data science. Students will become familiar with mathematical language for expressing data analysis problems and solution strategies, and will receive training in probabilistic reasoning, mathematical modeling of data, and algorithmic problem solving. DSC 40A will introduce fundamental topics in machine learning, statistics and linear algebra with applications to data analysis.

*Extended description*: DSC 40 A-B connect to DSC 10, 20 and 30 by providing the theoretical foundation for the methods that underlie data science.

Instructors: Janine Tiefenbruck, Sanjoy Dasgupta and Mohan Paturi

Prerequisites: DSC 10 and MATH 20C or MATH 31BH and MATH 18 or MATH 20F or MATH 31AH. Restricted to students within the DS25 major. All other students will be allowed as space permits.

#### DSC 40B: Theoretical Foundations of Data Science II

This coourse will introduce the theoretical foundations of data science. Students will become familiar with mathematical language for expressing data analysis problems and solution strategies, and will receive training in probabilistic reasoning, mathematical modeling of data, and algorithmic problem solving. DSC 40B introduces fundamental topics in combinatorics, graph theory, probability, and continuous and discrete algorithms with applications to data analysis.

Instructors: Janine Tiefenbruck, Sanjoy Dasgupta and Mohan Paturi

Prerequisites: DSC 40A; Restricted to students within the DS25 major. All other students will be allowed as space permits.

**Note:** DSC 40B was initially introduced with the number DSC 42. There is no DSC 42 any more. We have renamed DSC 42 as DSC 40B.

**Practice and Application of Data Science**

#### DSC 80: The Practice and Application of Data Science

The marriage of data, computation, and inferential thinking, or "data science", is redefining how people and organizations solve challenging problems and understand the world. This intermediate level class bridges between DSC 10, 20 and 30 and upper division data science courses as well as methods courses in other fields. Students master the data science life-cycle and learn many of the fundamental principles and techniques of data science spanning algorithms, statistics, machine learning, visualization, and data systems.

*Extended description*: Compared to DSC 10, 20 and 30, this class adopts an end-to-end approach to data science, focused on building large-scale, working systems on real data, and putting knowledge from previous courses into practice. Skills and expertise developed in this course enable students to pursue careers in data science or apply it to research.

Instructor(s): Marina Langlois and Julian McAuley

Prerequisites: DSC 30, DSC 40A