Welcome to Data Science II

Check out the course schedule for times, places, topics, and deadlines.


We will be using Python 3[.6+] for most assignments.


There are no required textbooks for the course. Course materials will be linked from the lecture schedule and archived in the GitHub repository; other recommended texts will be cited.


You are not expected to know any programming before taking this course, but you are expected to pick it up very quickly.


You are expected to be familiar with probability theory, statistics, and basic machine learning.

Course Details

This is the beating heart of course expectations, policies, and schedules. If something isn't clear, please ask; claiming you didn't know is not an acceptable defense.

Slack

Slack is our primary method of communication. The email you receive from me to register on Slack will likely be the last email I will proactively send you. While you are welcome to email me, I receive dozens--sometimes hundreds--of emails each day. Slack cuts through the noise, making it much more likely I will respond to questions in the Slack chat. Plus, in Slack, you have your student colleagues who can probably help even more quickly.

eds-uga-csci4360.slack.com


Grading

There will be five assignments, a required workshop for each student, a midterm exam, and a final project.

Assignments 45%
Workshop 10%
Midterm 20%
Final Project 25%

There is NO FINAL EXAM for this class.


Assignments

There will be five assignments through the course. Each assignment will be released on Tuesday mornings, and will be due by 11:59:59pm two weeks later.

For every 24 hours an assignment is late, 25% will be deducted from the final grade.

Follow directions precisely in terms of code design; auto-graders will be implemented to run your code! If the auto-grader fails, it may not necessarily be due to a bug in your code, but rather that you did not adhere to the input/output design guidelines of the assignment.


Workshops

Workshops are held most Mondays. Each student is required to organize and lead at least one workshop, most likely requiring workshop presentation teams of 2-3 students each week.

Workshops are meant to be an opportunity to dig into the nuts and bolts of topics we are covering in lecture by demonstrating working code and how these topics can be implemented in practice. This can take several forms--students are encouraged to be imaginative!

You can demo a new Python package, show a better way of doing something from lecture, or even a live-coding session for how to solve a certain problem (though it should go without saying that you need to practice the live coding ahead of time).

The current schedule can be found here.


Midterm

The midterm exam will be Thursday, October 3. Expect a mix of multiple choice, short answer, and coding questions.


Final Projects

The final project is a collaboration between computational scientists (you) and domain scientists (humanities students). You'll put skills you've learned in this class to work analyzing datasets and asking questions far beyond your areas of expertise!

You are encouraged to form teams of 2 or 3. You'll also be teamed up with at least one student from another course! You will have to work together to bridge the "domain divide", using your respective areas of expertise to help the team on a whole.

There are three components: the proposal, the presentation, and the deliverable. The proposal is a 1-2 page roadmap of your project, detailing how you plan to do it, any contingencies and outcomes you can surmise, and your teammates. The presentation is a 30-minute talk summarizing your problem, motivating your approach, and discussing your results. The deliverable is a two-parter: the first is your code, and the second is a 6-10 page NIPS-style paper on your project.


Final Exam

There isn't one! Hooray!


Recommended Textbooks

The course has no required textbook. However, there are several recommended textbooks that this course will draw on over the semester. Should you want more information on a topic, these would be good places to start.

Title Author(s) Links
Elements of Statistical Learning Trevor Hastie, Robert Tibshirani, and Jerome Friedman [reviews] [amazon] [pdf]
Statistical Learning with Sparsity Trevor Hastie, Robert Tibshirani, and Martin Wainwright [amazon] [pdf]
Pattern Recognition and Machine Learning Christopher Bishop [reviews] [amazon]
Machine Learning: A Probabilistic Perspective Kevin Murphy [amazon]
Convex Optimization Stephen Boyd and Lieven Vandenberghe [amazon] [pdf]
Computer Vision: Algorithms and Applications Richard Szeliski [amazon] [pdf]
Active Contours Andrew Blake and Michael Isard [amazon] [pdf]
Probabilistic Graphical Models Daphne Koller and Nir Friedman [amazon]
Lecture Notes on Spectral Graph Methods Michael W. Mahoney [pdf]
Statistical Analysis of Network Data Eric D. Kolaczyk [amazon]
Deep Learning Ian Goodfellow, Yoshua Bengio, and Aaron Courville [amazon] [html]
Machine Learning Tom Mitchell [amazon] [html]

Here's the short version: don't copy code from the internets or from your friends. Coding is like writing: everyone has their own style, and it's easily recognizable. As such, forgery and plagiarism is also easily sniffed out. Plus, there are tools that do this for us now (this is, after all, a data science class).

If cheating is uncovered, I am obligated to report the incident, no questions asked. As in, I don't warn you beforehand or give you a chance to apologize; the first you'll hear that I think you've been dishonest is from UGA, not me.

The UGA Academic Honesty Policy is the final word on these matters. Lack of knowledge of these policies is not sufficient justification for violations. If in doubt, ask me.