Welcome to Data Science II

Check out the course schedule for times, places, topics, and deadlines.


We will be using Python 3[.8+] for most assignments.


There are no required textbooks for the course. Course materials will be linked from the lecture schedule and archived in the GitHub repository; other recommended texts will be cited.


You are not expected to know any programming before taking this course, but you are expected to pick it up very quickly.


You are expected to be familiar with probability theory, statistics, and basic machine learning.

Course Details

This is the beating heart of course expectations, policies, and schedules. If something isn't clear, please ask; claiming you didn't know is not an acceptable defense.

Discord

Discord is our primary method of communication. The email you receive from me inviting you to the Discord server will likely be the last email I will proactively send you. While you are welcome to email me, I receive dozens--sometimes hundreds--of emails each day. Discord cuts through the noise, making it much more likely I will respond to questions on the server. Plus, in Discord, you have your student colleagues who can probably help even more quickly.


Grading

There will be five assignments. You are required to turn in three; any more are extra credit.

Everyone is required to sign up for and deliver at least one workshop. Any more is extra credit.

Choose one:

  • Midterm Exam (Tuesday, Oct 12)
  • Final Project (proposal, updates, presentation)
Doing both is extra credit.

Assignments 45%
Workshop 15%
Midterm 40%
Final Project 40%

There is NO FINAL EXAM for this class.


Assignments

There will be five assignments through the course. Each assignment will be released on Tuesday mornings, and will be due by 11:59:59pm two and a half weeks later on a Thursday.

For every 24 hours an assignment is late, 25% will be deducted from the final grade.

Follow directions precisely in terms of code design; auto-graders will be implemented to run your code! If the auto-grader fails, it may not necessarily be due to a bug in your code, but rather that you did not adhere to the input/output design guidelines of the assignment.


AutoLab

We are using AutoLab for assignment submission and autograding.

Take note: AutoLab is run from an internal UGA server, and can therefore only be accessed from campus, or via VPN. AutoLab can be found at this link:

autolab.cs.uga.edu

Workshops

Workshops are held most Mondays. Each student is required to organize and lead at least one workshop, most likely requiring workshop presentation teams of 2-3 students each week.

Students taking CSCI 6360 are required to present two workshops.

Workshops are meant to be an opportunity to dig into the nuts and bolts of topics we are covering in lecture by demonstrating working code and how these topics can be implemented in practice. This can take several forms--students are encouraged to be imaginative!

You can demo a new Python package, show a better way of doing something from lecture, or even a live-coding session for how to solve a certain problem (though it should go without saying that you need to practice the live coding ahead of time).

The current schedule can be found here.


Midterm

The midterm exam will be Tuesday, October 12. Expect a mix of multiple choice, short answer, and coding questions.

If you opt to do the Final Project instead of the midterm, you do not need to attend class on the day of the midterm exam.


Final Projects

You are encouraged to form teams of 2 or 3. Larger teams, or teams of a single individual, are not forbidden but are discouraged.

There are three components: the proposal, two periodic updates, and the presentation. The proposal is a 1-2 page roadmap of your project, detailing how you plan to do it, any contingencies and outcomes you can surmise, and your teammates. The updates entail two deadlines between the proposal and the presentation where you specify in 1 page what you have accomplished so far, any obstacles you have encountered, how you plan to handle them, and any deviations you anticipate from the plan spelled out in your proposal. The presentation is a 30-minute talk summarizing your problem, motivating your approach, and discussing your results.


Final Exam

There isn't one! Hooray!


Recommended Textbooks

The course has no required textbook. However, there are several recommended textbooks that this course will draw on over the semester. Should you want more information on a topic, these would be good places to start.

Title Author(s) Links
Elements of Statistical Learning Trevor Hastie, Robert Tibshirani, and Jerome Friedman [reviews] [amazon] [pdf]
Statistical Learning with Sparsity Trevor Hastie, Robert Tibshirani, and Martin Wainwright [amazon] [pdf]
Pattern Recognition and Machine Learning Christopher Bishop [reviews] [amazon]
Machine Learning: A Probabilistic Perspective Kevin Murphy [amazon]
Convex Optimization Stephen Boyd and Lieven Vandenberghe [amazon] [pdf]
Computer Vision: Algorithms and Applications Richard Szeliski [amazon] [pdf]
Active Contours Andrew Blake and Michael Isard [amazon] [pdf]
Probabilistic Graphical Models Daphne Koller and Nir Friedman [amazon]
Lecture Notes on Spectral Graph Methods Michael W. Mahoney [pdf]
Statistical Analysis of Network Data Eric D. Kolaczyk [amazon]
Deep Learning Ian Goodfellow, Yoshua Bengio, and Aaron Courville [amazon] [html]
Machine Learning Tom Mitchell [amazon] [html]

Here's the short version: don't copy code from the internets or from your friends. Coding is like writing: everyone has their own style, and it's easily recognizable. As such, forgery and plagiarism is also easily sniffed out. Plus, there are tools that do this for us now (this is, after all, a data science class).

If cheating is uncovered, I am obligated to report the incident, no questions asked. As in, I don't warn you beforehand or give you a chance to apologize; the first you'll hear that I think you've been dishonest is from UGA, not me.

The UGA Academic Honesty Policy is the final word on these matters. Lack of knowledge of these policies is not sufficient justification for violations. If in doubt, ask me.