Welcome to Data Science II

Fall 2017 Rendition

All course announcements will be made in the Slack chat.

Lectures are Tuesdays and Thursdays from 11:00am - 12:15pm, and Wednesdays from 11:15am - 12:05pm. All meetings are in Hardman Hall, Room 101.

Course assignments will use AutoLab. We will be using Python 3[.6+] for most assignments.

There are no required textbooks for the course. Course materials will be linked from the lecture schedule and archived in the GitHub repository; other recommended texts will be cited.

You are not expected to know any programming before taking this course, but you are expected to pick it up very quickly.

You are expected to be familiar with probability theory, statistics, and basic machine learning.

Course Details

This is the beating heart of course expectations, policies, and schedules. If something isn't clear, please ask; claiming you didn't know is not an acceptable defense.

  • AutoLab

    We are using AutoLab for assignment submission and autograding.

    Take note: AutoLab is run from an internal UGA server, and can therefore only be accessed from campus, or via VPN. AutoLab can be found at this link:

    autolab.cs.uga.edu

  • Slack

    Slack is our primary method of communication. The email you receive from me to register on Slack will likely be the last email I will proactively send you. While you are welcome to email me, I receive dozens--sometimes hundreds--of emails each day. Slack cuts through the noise, making it much more likely I will respond to questions in the Slack chat. Plus, in Slack, you have your student colleagues who can probably help even more quickly.

    eds-uga-csci4360.slack.com

  • Grading

    There will be five assignments, a required workshop for each student, a midterm exam, and a final project.

    Assignments 45%
    Workshop 10%
    Midterm 20%
    Final Project 25%

    There is NO FINAL EXAM for this class.

  • Assignments

    There will be five assignments through the course. Each assignment will be released on Thursday mornings, and will be due by 11:59:59pm two weeks later.

    Assignment are downloaded and submitted through AutoLab. Assignments not submitted through AutoLab will not be graded; do not email me your assignments. For every 24 hours an assignment is late, 25% will be deducted from the final grade.

    Because AutoLab incorporates an auto-grader, you need to follow directions precisely for how you design your code. If the auto-grader fails, it may not necessarily be due to a bug in your code, but rather that you did not adhere to the input/output design guidelines of the assignment.

  • Workshops

    Workshops are held most Wednesdays. Each student is required to organize and lead at least one workshop; students can pair up if needed.

    Workshops are meant to be an opportunity to dig into the nuts and bolts of topics we are covering in lecture by demonstrating working code and how these topics can be implemented in practice. This can take several forms--students are encouraged to be imaginative!

    You can demo a new Python package, show a better way of doing something from lecture, or even a live-coding session for how to solve a certain problem (though it should go without saying that you need to practice the live coding ahead of time).

    The current schedule can be found here.

  • Midterm

    The midterm exam will be Tuesday, October 3. Expect a mix of multiple choice, short answer, and coding questions.

  • Final Project

    The final project is an opportunity for you to really flex your data science skills and tackle a project you're passionate about.

    You are encouraged to form teams of 2 or 3; while teams of 1 person are strongly discouraged, exceptions to the 2-3 person rule of thumb are permitted with justification.

    There are three components: the proposal, the presentation, and the deliverable. The proposal is a 1-2 page roadmap of your project, detailing how you plan to do it, any contingencies and outcomes you can surmise, and your teammates. The presentation is a 30-minute talk summarizing your problem, motivating your approach, and discussing your results. The deliverable is a two-parter: the first is your code, and the second is a 6-10 page NIPS-style paper on your project.

    Graduate students (taking 6360) are required to also submit their papers to a journal or conference; the submission venue must be included in the project proposal. Undergraduate students (taking 4360) can do this for extra credit.

  • Final Exam

    There isn't one! Hooray!

  • Recommended Textbooks

    The course has no required textbook. However, there are several recommended textbooks that this course will draw on over the semester. Should you want more information on a topic, these would be good places to start.

    Title Author(s) Links
    Elements of Statistical Learning Trevor Hastie, Robert Tibshirani, and Jerome Friedman [reviews] [amazon] [pdf]
    Statistical Learning with Sparsity Trevor Hastie, Robert Tibshirani, and Martin Wainwright [amazon] [pdf]
    Pattern Recognition and Machine Learning Christopher Bishop [reviews] [amazon]
    Machine Learning: A Probabilistic Perspective Kevin Murphy [amazon]
    Convex Optimization Stephen Boyd and Lieven Vandenberghe [amazon] [pdf]
    Computer Vision: Algorithms and Applications Richard Szeliski [amazon] [pdf]
    Active Contours Andrew Blake and Michael Isard [amazon] [pdf]
    Probabilistic Graphical Models Daphne Koller and Nir Friedman [amazon]
    Lecture Notes on Spectral Graph Methods Michael W. Mahoney [pdf]
    Statistical Analysis of Network Data Eric D. Kolaczyk [amazon]
    Deep Learning Ian Goodfellow, Yoshua Bengio, and Aaron Courville [amazon] [html]

Lecture Schedule

Lectures are held on Tuesdays, Wednesdays, and Thursdays. All lectures are in Hardman Hall, Room 101.

Tues/Thurs lectures are 11:00am - 12:15pm. Wed lectures are 11:15am - 12:05pm.

Date Topic Links
Tues, 8/15 Lecture 1: Course Introduction pptx | pdf
Wed, 8/16 Workshop 0: Setting up your Python Environment ipynb
Thurs, 8/17 Homework 1 Released pdf
Thurs, 8/17 Lecture 2: Python Crash Course html | pdf | ipynb
Tues, 8/22 Guest Lecturer Charles Morn pdf | pptx
Wed, 8/23 Guest Lecturer John Miller: Linear Regression
Thurs, 8/24 Guest Lecturer John Drake: Computational Botany pdf
Tues, 8/29 Guest Lecturer Khaled Rasheed: Evolutionary Computation pdf | pptx
Wed, 8/30 Workshop 1: ML Pipelines and Hyperparameter Gridsearch with scikit-Learn
Justin Hooker & Sy Ahmed
materials | gdoc1 | gdoc2
Thurs, 8/31 Homework 1 Due ; Homework 2 Released pdf
Thurs, 8/31  
Tues, 9/5 Guest Lecturer Khaled Rasheed: Evolutionary Computation (continued) pdf | pptx
Wed, 9/6 Workshop 2: AutoML with tpot
William Sanders & Rajeswari Sivakumar
materials
Thurs, 9/7 Lecture 9: Dense Motion Analysis pdf | pptx
Tues, 9/12 UGA CLASSES CANCELED
Wed, 9/13 Workshop 3: Object Segmentation and Tracking with OpenCV
Weiwen Xu
materials
Thurs, 9/14 Homework 2 Due
Thurs, 9/14 Guest Lecturer Tianming Liu: HAFNI pdf | ppt
Mon, 9/18 Homework 3 Released pdf
Tues, 9/19 Lecture 12: Linear Dynamical Systems pdf | pptx
Tues, 9/19 Assignment 1 Postmortem Discussion pdf | pptx
Wed, 9/20 Guest Lecturer Mike Scarbrough CEO, Nextech
Thurs, 9/21 Workshop 4: Bayesian ML with PyMC3 and Edward
Taylor Smith & Jonathan Hayne
slides | ipynb
Tues, 9/26 Lecture 14: Graphs pdf | pptx
Wed, 9/27 Workshop 5: Scalable analytics with PySpark and Dask
Nicholas Klepp
materials
Thurs, 9/28 Homework 3 Due
Thurs, 9/28 Lecture 15: Spectral clustering pdf | pptx
Tues, 10/3 Midterm Exam
Wed, 10/4 Post-mortem Review
Thurs, 10/5 Homework 4 Released pdf
Thurs, 10/5 Lecture 16: Semi-supervised learning on graphs pdf | pptx
Tues, 10/10 Lecture 17: Metric learning pdf | pptx
Wed, 10/11 Workshop 6: Approximate nearest-neighbors with annoy materials
Thurs, 10/12 Final Project Proposals Due
Thurs, 10/12 Lecture 18: Kernel and Sparse PCA pdf | pptx
Tues, 10/17 Lecture 19: Randomized SVD pdf | pptx
Wed, 10/18 Workshop 7: Auto-differentiation with Autograd
I-Huei Ho
materials
Thurs, 10/19 Homework 4 Due ; Homework 5 Released pdf
Thurs, 10/19 Lecture 20: Dictionary learning pdf | pptx
Tues, 10/24 Lecture 21: Kernel Methods pdf | pptx
Wed, 10/25 Workshop 8: AutoEncoders with H2o
Prajay Shetty
materials
Thurs, 10/26 Lecture 22: Neural networks pdf | pptx
Tues, 10/31 Lecture 23: Backpropagation pdf | pptx
Wed, 11/1 Workshop 9: Deep learning with keras
Aditya Shinde & Christopher Barrick
materials
Thurs, 11/2 Homework 5 Due
Thurs, 11/2 Lecture 24: Information theory for deep learning pdf | pptx
Tues, 11/7 Lecture 25: Convolutional neural networks pdf | pptx
Wed, 11/8 Workshop 10: Introduction to deep learning with Tensorflow
Jonathan Waring & Xiaojia He
materials
Thurs, 11/9 Lecture 26: Recurrent neural networks pdf | pptx
Tues, 11/14 Lecture 27: Autoencoders pdf | pptx
Wed, 11/15 Workshop 11: GANs in PyTorch
Charlie Lu
materials | slides
Thurs, 11/16 Lecture 28: Deep generative models pdf | pptx
Tues, 11/28
  • Justin Hooker
  • Jonathan Waring, Jonathan Hayne
  • Charles Lu
Wed, 11/29
  • Sy Ahmed
  • Xiaojia He
  • Chris Barrick, Prajay Shetty, Aditya Shinde
Thurs, 11/30
  • Weiwen Xu, I-Huei Ho, Nick Klepp
  • Rajeswari Sivakumar, William Sanders
  • Taylor Smith
Thurs, 12/7 Final Project Deliverables Due

Contact

The best and most reliable way to reach out to me or the course TA is the course Slack chat.

Dr. Shannon Quinn (instructor)