Introduction to Machine Learning and Data Mining

Tufts University COMP 135 is designed to give students a comprehensive introduction to machine learning and data mining. The course covers a breadth of topics in machine learning, providing hands-on experience and theoretical understanding.

Course Overview

Machine learning is the study of algorithmic methods for learning and prediction based upon data. Approaches range from extracting patterns from large collections of data to online learning in real-time applications. ML is becoming increasingly widespread due to the accessibility of computational power and datasets, along with recent advances in ML algorithms. This course focuses on a broad introduction to ML, requiring significant cognitive effort from students.

Ideal candidates are upper-level undergraduates or beginning graduate students comfortable with mathematical techniques and programming. Relevant mathematical knowledge includes statistics, probability, calculus, and linear algebra.

Class Times

Tu, Th 10:30AM - 11:45AM
Location: Tisch Library, 304-Auditorium

Instructor

Kyle Harrington - [email protected]
Office Hours: By appointment

Teaching Assistants

Sepideh Sadeghi - [email protected]
- Office Hours: Mon noon-1pm, Fri 10am-noon, Location: Halligan 121
Hao Cui - [email protected]
- Office Hours: Tue 4:30-5:30 pm, Thu 4:30-5:30 pm, Location: Halligan 121

Grading

Written homework assignments (20%)
Quizzes (20%)
In-class midterm exam (20%): March 17
Final project (40%)

Rules for Late Submissions

All work must be turned in on the date specified. Notify Kyle Harrington of special circumstances at least two days in advance.

Collaboration Policy

Homework assignments and projects: Discussion about problems and concepts is encouraged, but each assignment must be completed individually.
Quizzes and exams: No collaboration allowed.

Tentative List of Topics

Supervised Learning: nearest neighbors, decision trees, linear classifiers, Bayesian classifiers; feature processing and selection; avoiding over-fitting; experimental evaluation.
Unsupervised learning: clustering algorithms; generative probabilistic models; the EM algorithm; association rules.
Theory: basic PAC analysis for classification.
More supervised learning: neural networks; backpropagation; dual perceptron; kernel methods; support vector machines.
Additional topics: active learning; aggregation methods; time series models; reinforcement learning.

Reference Material

Primary Text: Machine Learning. Tom M. Mitchell, McGraw-Hill, 1997
Introduction to Machine Learning, Ethem Alpaydin, 2010.
Data Mining: Practical Machine Learning Tools and Techniques. Ian H. Witten, Eibe Frank, 2005.

Programming and Software

Weka: A machine learning package used for some assignments.
Languages: Python, Java, Julia, Matlab, Clojure, R
Jupyter: A notebook-based programming environment used for in-class demos and assignments.

Assignments, Quizzes, and Exams

Assignment 1

Download Weka
Open Weka, Choose Explorer models
Load the dataset with “Open file…”

Submission: Write a one-paragraph description of findings, including visualizations and clustering results.

Assignment 2 (Bonus)

Git and GitHub: Clone the repository, complete the assignment in a Jupyter Notebook, and submit a pull request.

Final Projects

Proposals Due: March 7
Project Due: May 5

Submission: An 8-12 page paper with a detailed project report.

Resources

Faculty resources: Ask around for interesting datasets.
Google Scholar: Search for articles published in “ICML”, “NIPS”, or “Machine Learning”.
Datasets: A comprehensive list of datasets for machine learning projects.

Course Content

Detailed slides, assignments, and project guidelines can be found on the course GitHub page.

For more information and resources, visit the course GitHub repository.

Last updated on Jan 1, 2016

← napari Tutorial DL at MBL 2024 Aug 27, 2024