Welcome to Data Science II

Check out the course schedule for times, places, topics, and deadlines.


We will be using Python 3[.10+] for most assignments.


There are no required textbooks for the course. Course materials will be linked from the lecture schedule and archived in the GitHub repository; other recommended texts will be cited.


You are not expected to know any programming before taking this course, but you are expected to pick it up very quickly.


You are expected to be familiar with probability theory, statistics, linear algebra, and basic machine learning.

Course Details

This is the beating heart of course expectations, policies, and schedules. If something isn't clear, please ask; claiming you didn't know is not an acceptable defense.

Discord

Discord is our primary method of communication. The email you receive from me inviting you to the Discord server will likely be the last email I will proactively send you. While you are welcome to email me, I receive dozens--sometimes hundreds--of emails each day. Discord cuts through the noise, making it much more likely I will respond to questions on the server. Plus, in Discord, you have your student colleagues who can probably help even more quickly.


Grading

There will be five assignments. You are required to turn in three; any more are extra credit.

Everyone is required to sign up for and deliver at least one workshop. Any more is extra credit.

Choose one:

  • Midterm Exam (Thursday, Feb 27)
  • Final Project (proposal, updates, presentation)
Doing both is extra credit.

Assignments 45%
Workshop 15%
Midterm 40%
Final Project 40%

There is NO FINAL EXAM for this class.


Assignments

There will be five assignments through the course. Each assignment will be released on Thursday mornings, and will be due by 11:59:59pm two and a half weeks later on a Tuesday.

For every 24 hours an assignment is late, 25% will be deducted from the final grade.

Follow directions precisely in terms of code design; auto-graders will be implemented to run your code! If the auto-grader fails, it may not necessarily be due to a bug in your code, but rather that you did not adhere to the input/output design guidelines of the assignment.

If you need a deadline extension on a homework, just ask! Just please ask before said deadline.


AutoLab

We are using AutoLab for assignment submission and autograding.

Take note: AutoLab is run from an internal UGA server, and can therefore only be accessed from campus, or via VPN. AutoLab can be found at this link:

autolab.cs.uga.edu


Workshops

Workshops are held most Wednesdays. Each student is required to organize and lead at least one workshop, most likely requiring workshop presentation teams of 2-3 students each week.

Workshops are meant to be an opportunity to dig into the nuts and bolts of topics we are covering in lecture by demonstrating working code and how these topics can be implemented in practice. This can take several forms--students are encouraged to be imaginative!

You can demo a new Python package, show a better way of doing something from lecture or homework, or even a live-coding session for how to solve a certain problem (though it should go without saying that you need to practice the live coding ahead of time).


Midterm

The midterm exam will be Thursday, February 27. Expect a mix of multiple choice, short answer, and coding questions.

If you opt to do the Final Project instead of the midterm, you do not need to attend class on the day of the midterm exam.


Final Projects

You are encouraged to form teams of 2 or 3. Larger teams, or teams of a single individual, are not forbidden but are discouraged.

There are three components: the proposal, two periodic updates, and the presentation. The proposal is a 1-2 page roadmap of your project, detailing how you plan to do it, any contingencies and outcomes you can surmise, and your teammates. The updates entail two deadlines between the proposal and the presentation where you specify in 1 page what you have accomplished so far, any obstacles you have encountered, how you plan to handle them, and any deviations you anticipate from the plan spelled out in your proposal. The presentation is a 25-minute talk summarizing your problem, motivating your approach, and discussing your results.


Final Exam

There isn't one! Hooray!


Recommended Textbooks

The course has no required textbook. However, there are several recommended textbooks that this course will draw on over the semester. Should you want more information on a topic, these would be good places to start.

Title Author(s) Links
Elements of Statistical Learning Trevor Hastie, Robert Tibshirani, and Jerome Friedman [amazon] [pdf]
Often called the "machine learning bible," this book is the one-stop shop for all fundamentals of machine learning. It gets a bit dense at points, and other topics are sometimes only glossed over (e.g. spectral clustering), but it has breadth and depth in most of the basics. Best of the all, the PDF is currently (as of 2025) freely available. Highly recommend.
Pattern Recognition and Machine Learning Christopher Bishop [amazon]
This is a slightly shorter albeit no less detailed covering of all the basics of machine learning. However, whereas Hastie et al would warn you before diving into particularly gruesome theory, Bishop assumes you're ok with all of it and so can feel a bit condescending at times. Still a great overview of all the basics. Highly recommend.
Probabilistic Machine Learning Kevin Murphy [github]
Murphy is one of the best textbook writers on machine learning, as far as couching everything in basic probability theory goes. This is a series of textbooks that have a difficulty curve, but readers are rewarded for sticking with all of them. A big time commitment, but well worth it. Highly recommend.
Mathematics for Machine Learning Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong [github]
This is one of my favorite books on the fundamentals of machine learning. Not only is the book freely available as a PDF, but the authors have created numerous Jupyter notebooks for interactively following the material. It's less broad in scope than Hastie et al, but if you really want to learn the basics hands-on, this is it. Highly recommend.
Statistical Learning with Sparsity Trevor Hastie, Robert Tibshirani, and Martin Wainwright [amazon] [pdf]
A more specialized textbook on statistical sparsity. Highly relevant in today's large-scale machine learning platforms. PDF is freely available. Recommend.
Convex Optimization Stephen Boyd and Lieven Vandenberghe [amazon] [pdf]
A great read on convex optimization procedures. PDF is freely available. Recommend.
Computer Vision: Algorithms and Applications Richard Szeliski [amazon] [pdf]
Fantastic overview of all things Computer Vision. The 2020 edition was revised to include substantial content on deep neural networks, including GANs. PDF is freely available. Recommend.
Active Contours Andrew Blake and Michael Isard [amazon] [pdf]
An older reference for computer vision techniques cast in a signal processing context. The fundamentals are solid but may be a bit too mathy for many.
Probabilistic Graphical Models Daphne Koller and Nir Friedman [amazon]
A good reference on graphical models, but there's no free PDF.
Lecture Notes on Spectral Graph Methods Michael W. Mahoney [pdf]
Fantastic reference on spectral clustering, including history, derivation, and applications. Freely available PDF.
Statistical Analysis of Network Data Eric D. Kolaczyk [amazon]
Analysis techniques and strategies for network data.
Deep Learning Ian Goodfellow, Yoshua Bengio, and Aaron Courville [amazon] [html]
A decent reference on all things deep learning, but somewhat dated at this point. Still, content is freely available in HTML format (no PDFs, unfortunately).
Machine Learning Tom Mitchell [amazon] [html]
Also a good reference for the basics of machine learning, but dated. Still, PDF is freely available.

Here's the short version: don't copy code from the internets or from your friends. Coding is like writing: everyone has their own style, and it's easily recognizable. As such, forgery and plagiarism is also easily sniffed out. Plus, there are tools that do this for us now (this is, after all, a data science class).

If cheating is uncovered, I am obligated to report the incident, no questions asked. As in, I don't warn you beforehand or give you a chance to apologize; the first you'll hear that I think you've been dishonest is from UGA, not me.

The UGA Academic Honesty Policy is the final word on these matters. Lack of knowledge of these policies is not sufficient justification for violations. If in doubt, ask me.

Use of chatbots is not prohibited, but not encouraged either. I won't go into the irony of using an ML-based chatbot specifically not optimized for conceptual correctness to help you ace an ML course, but suffice to say: like any other form of "presenting someone else's work as your own," it's pretty easy to notice. I'll notice. And while I won't mark down for using it (unless it's incorrect, of course), don't expect a glowing recommendation letter from me.