Chris Hua

“Data” “Scientist”

Data Science for College Students - Courses


Giving advice to be one of the hardest things somebody can do, and yet people ask for it at an alarming rate. Perhaps the best way to start, then, is with a reminder that life is hard and it’s okay to not know what you’re doing. Your peers will pretend that they have everything figured out, and they will say as much, but know that you’re not seeing their own struggles.

The other caveat I want to set out up front is that advice is contextual. This advice is for undergrads attending a top US university; that’s not a prerequisite for doing data science (at all!) but it’s what I’m familiar with.

Okay, so data science. In general, optimize for learning.

At the undergraduate level, learning for data science usually means to take as many statistics courses as you can. Math is good, as is computer science, but are not as important as people make them out to be. For the most part, you don’t need to derive things, so more math beyond linear algebra is nice but not required. In addition, writing, and learning to write statistical code / data pipeline tasks is more about patience than theory. A minor in computer science is probably enough for you to be comfortable writing code and to have a general idea of programming paradigms.

Learning also means to take good classes, even if you expect a GPA hit. Tech companies don’t care that much about your GPA, and are in general accustomed to lower GPA’s from engineering students. Prerequisites are also rarely hard prerequisites, if you think you can hack the course and are willing to learn what you don’t know as you go, try and take the course. Professors notice if you’re a sophomore in an upper-level course and it’s a good way to stand out.

A lot of schools are starting “business analytics” or “data science” majors, but it’s hard to evaluate them as a group. The good implementations follow my advice above and focus on statistics courses with a few CS courses. A really good major would include a “capstone” project-based or research-based course. I think a lot of these dual majors, though, enable students to take easy classes - every major has easy electives, and when you allow courses from two majors to count, you double the easy electives you can take. This is a bad design from an organizational standpoint, because if you allow some students to graduate with nonsense courses then the major’s reputation will be diluted. This is especially important in the infancy of these majors.

I think an ideal curriculum would look something like:

It would be good to shore up the places in which your curriculum is lacking with self-study.

Did I forget something? Do you disagree? Let me know at hua.christopher+hl[@]