layout

title

description

has_children

mathjax

Main TextBooks:

Data Mining and Analysis: Fundamental Concepts and Algorithms by Mohammed J. Zaki and Wagner Meira Jr., 2021 (Data Mining & Analysis) PDF
Pattern Recognition, by Sergios Theodoridis and Konstantinos Koutroumpas, 2009
The Elements of Statistical Learning (ESL)
Foundations of Data Science (FDS), by Avrim Blum, John Hopcroft, and Ravindran Kannan, 2018

1403/07/07

Introduction to Data Science

Introduction to Linear Algebra for Applied Machine Learning with Python
Introduction to Linear Algebra with Python

HW1{: .label .label-red }Generate random points with uniform distribution in the unit sphere, due date: ~~1403/07/12~~ Extended: 1403/07/18

HW2{: .label .label-red }Satisfiability Table, due date: ~~1403/07/15~~ Extended: 1403/07/18

Further Reading{: .label .label-yellow }

Python Programming

Python Data Science Handbook
Slide: Introduction to Data Science by Zico Kolter
Slide: Introduction to Data Science by Kevin Markham
Slide: Clustering by Matt Dickenson

1403/07/14, 21

k-NN

A comparison of several classifiers in scikit-learn
Code{: .label .label-green} KNN-Iris
Code{: .label .label-green} KNN-mnist

High Dimensional Data

Section 2.5 of ESL
- Page 41/764 Local Methods in High Dimensions
The curse of dimensionality, Candy example
Slides Chap. 6 of Zaki

Colab{: .label .label-green }High Dimensional Data - The curse of dimensionality

Colab{: .label .label-green }High Dimensional Data - KNN

Colab{: .label .label-green }Clustering of images, as high dim. data

HW-xx{: .label .label-red }Page 15 of FDS - Orthogonality of d-dimensional Gaussian vectors, due: 1403/08/26

Further Reading{: .label .label-yellow }

Slides Chap. 1 of Zaki
Random Projection: Theory and Implementation in Python with Scikit-Learn
Johnson–Lindenstrauss lemma
Gaussian random projection
Scikit-learn: The Johnson-Lindenstrauss bound for embedding with random projections

Paper{: .label .label-blue }Supervised dimensionality reduction for big data

Paper{: .label .label-blue }An Introduction to Johnson–Lindenstrauss Transforms

Sketching Algorithms for Big Data, Harvard
- JL Lemma, History of lower bounds
- JL Lower bound Optimality

Further Reading{: .label .label-yellow }

Discrete Optimization : Draft version of My Book: Meta Heuristic Algorithms

Some Examples:

N-Queen Problem
Knight-Tour, My old Delphi program, previous century!
Traveling Salesman Problem
Packing & Cutting Problems, My old Delphi program, previous century! My MSc. Project.
Some published papers about the above programs

1403/07/28

Random Search : Chapter 1 & 2 of My Book + My NP-Complete Paper

HW-xx{: .label .label-red }Python Code of Program 1.2, Page 11 of My book - RS, due: 1403/07/25

SAT
SA, Continue : Chapter 3 of My Book

HW-xx{: .label .label-red }Python Code of Program 1.3, Page 23 of My book - SA, due: 1403/07/26

Further Reading{: .label .label-yellow }

Simulated Annealing From Scratch in Python
Simulated Annealing Tutorial, 2D Example

1403/07/28

PSO : Chapter 6 of My Book

HW-xx{: .label .label-red }Use one the Python packages to find the minimum of $$f(x)=3sin(x)+(0.1x-3)^2$$: PSO for function 1.2, due: 1403/07/28

Some Python packages for PSO:

Pymoo
PySwarms

Further Reading{: .label .label-yellow }

A Gentle Introduction to Particle Swarm Optimization

Paper{: .label .label-blue }: A Fish School Clustering Algorithm: Applied to Student Sectioning Problem

1403/07/24

Chapter 11 & 12 of Pattern Recognition, Theodoridis
Chapeter 11:
- Page 602, Section 11.2 PROXIMITY MEASURES - Page 604
- Page 606, Section B. Similarity Measures: The inner product & Pearson’s correlation coefficient
- Page 607, Discrete-Valued Vectors & contingency table
- Page 616, 11.2.3 Proximity Functions between a Point and a Set
Chapter 12:
- 12.1 INTRODUCTION
- 12.3 SEQUENTIAL CLUSTERING ALGORITHMS

1403/07/29

Exercises

1403/08/01

SAT Table & Brute Force Algorithm for Clustering
- Colab{: .label .label-green }Brute Force Alg for Clustering

HW-xx{: .label .label-red }Generate data & Clustering, due: 1403/08/04

HW-xx{: .label .label-red }BSAS Algorithm, due: 1403/08/07

Image Processing and Computer Vision, Intro

Colab{: .label .label-green }Image Segmentation 01

1403/08/06

Representative-Based Clustering

Section 14.3.5 of ESL
- Page 527/764 ESL, Eq. 14.28: W(C)
- The problem with one unknown variable becomes a problem with two unknowns!
Section 8.3 of K-means Clustering
Colab{: .label .label-green }Image Segmentation 02- kmeans clustering
Chapter 13 of Data Mining & Analysis
Slides (Representative-based Clustering)

Further Reading{: .label .label-yellow }

Lloyd’s, MacQueen’s and Hartigan-Wong’s k-Means
Convergence in Hartigan-Wong k-means method and other algorithms

1403/08/13

sklearn.datasets.make_blobs
Section 14.3.6 of ESL
- Page 528/764 ESL, K-means

1403/08/15

Section 14.3.9 of ESL
- Page 533/764 Vector Quantization

Colab{: .label .label-green }Image Segmentation 03- kmeans clustering

Colab{: .label .label-green }LVQ

HW-xx{: .label .label-red }K-means on color images, due: 1403/08/19

Further Reading{: .label .label-yellow }

Clustering, Lecture 14, New York University
CSC 411 Lecture 15: K-Means, University of Toronto

1403/09/04

Bias-Variance Tradeoff

Wiki
MLU-Explain bias-variance
MLU-Explain double-descent, part 1

Further Reading{: .label .label-yellow }

MLU-Explain double-descent, part 2
The Bias-Variance Tradeoff: A Newbie’s Guide, by a Newbie
bias-variance-trade-off

Paper{: .label .label-blue }VC Theoretical Explanation of Double Descent

Paper{: .label .label-blue }Reconciling modern machine-learning practice and the classical bias–variance trade-off

Double Descent, Highlights
Reproducing Deep Double Descent, Highlights
- deep_double_descent, colab

Sec 22.3 of Zaki

Paper{: .label .label-blue }Understanding the double descent curve in Machine Learning

1403/09/06,11,13

Chapter 5 of VanderPlas: In Depth: k-Means Clustering
Chapter 17 of Data Mining & Analysis
Clustering Validation
Matching in Bipartite Graphs
Silhouette, Clustering Evaluation
Clustering Evaluations

Colab{: .label .label-green }bi-partiate-graph-maximum-matching

Colab{: .label .label-green }Silhouette

Further Reading{: .label .label-yellow }

MSc. Project: Graph Cut
Wiki: Graph Matching
Bipartite Graphs and Stable Matchings
MIT, NOTES ON MATCHING

Paper{: .label .label-blue }Graph Matching and local search

Paper{: .label .label-blue }Graph Feature Selection for Anti-Cancer Plant Recommendation

1403/09/18, 25, 27

Principal Component Analysis explained visually
In Depth: Principal Component Analysis, Python Data Science Handbook
PRML-PCA Slides
- Matrix Differentiation by Randal J. Barnes
Chapter 7 of Zaki (Slides)

Colab{: .label .label-green }PCA-01

Kaggle{: .label .label-green }Country Profiling Using PCA and Clustering

An Introduction to Principal Component Analysis (PCA) with 2018 World Soccer Players Data, PDF
Using PCA to See Which Countries have Better Players for World Cup Games, PDF

HW-xx{: .label .label-red }PCA Algorithm, due: 1403/10/02

Further Reading{: .label .label-yellow }

A geometric interpretation of the covariance matrix
A geometric interpretation of ... (In Persian)
PCA in SKLearn
PCA on IRIS
Faces recognition example using eigenfaces and SVMs
- Face dataset

Paper{: .label .label-blue }Eigenbackground Revisited

Colab{: .label .label-green }SVD-01

Colab{: .label .label-green }SVD for Image Compression

1403/10/02

Hierarchical Clustering

A good image for hierarchical clustering
Chapter 14 of Data Mining & Analysis
Slides (Hierarchical Clustering): PDF
sklearn.cluster.AgglomerativeClustering
- Comparing different hierarchical linkage methods on toy datasets

Colab{: .label .label-green }Clustering of images

Further Reading{: .label .label-yellow }

Slide: Hierarchical Clustering by Jing Gao

1403/10/04

Linear Discriminant Analysis

Chapter 20 of Data Mining & Analysis
Slides (Linear Discriminant Analysis): PDF
Comparison of LDA and PCA
HW-xx: Compare LDA and PCA first axis (classification by SVM)

1403/10/09

Mid Term{: .label .label-purple }

1403/10/11

Soft k-means

Colab{: .label .label-green } Gaussian Mixture Models

1403/10/16,18

Bayes

Slides
SE example
How to Develop a Naive Bayes Classifier from Scratch in Python

Further Reading{: .label .label-yellow }

Duda
Naïve Bayes Algorithm -Implementation from scratch in Python, Medium
- Github
Segmentation using Bayesian Decision Theory

Paper{: .label .label-blue }BayeSeg: Bayesian modeling for medical image segmentation with interpretable generalizability

Github

Bayesian approach to Natural Image Matting

1403 Winter

Adding Features

Colab{: .label .label-green }Add Pixels' coordinates for image segmentation

1403/11/01

EXAM{: .label .label-purple }

Image Processing and Computer Vision

Website: Image Processing in Python with Scikit-image by M. Jaderian
- Scikit-image documentation
- Scikit-image examples
Website: Image Processing in Python with OpenCV by M. Kiani
Github: Tutorial for Image Processing in Python by Shaoning Zeng
Book: Image processing tutorials

Further Reading{: .label .label-yellow } : Some published papers

الگوریتم مجارستانی - ویکی‌پدیا، دانشنامهٔ آزاد (wikipedia.org)

Image Matting

Github Rep.: A Python library for alpha matting https://pymatting.github.io/ by Y. Gavet & J. Debayle

K-means

Sec 5.11 of JakeVanderPlas

Paper{: .label .label-blue } كلاسه بندی فازی بهینه دانشجویان با استفاده از یك تابع فازی در حل مسئله برنامه ریزی ژنتیكی دروس هفتگی دانشگاه

Bilateral K-Means for Superpixel Computation
Balanced clustering - Wikipedia
balanced-kmeans · PyPI
K-means using PyTorch (github.com)
Balanced K-Means for Clustering
Balanced k-Means Revisited
K-Means Clustering in Python: A Practical Guide – Real Python
Data clustering: 50 years beyond K-means - ScienceDirect
K-Means Factorization
Clustering IRIS dataset with particle swarm optimization(PSO)

Representative-Based Clustering

Chapter 13 of Data Mining & Analysis
HW-xx{: .label .label-red } 13.5: Q2, Q4, Q6, Q7
Slides (Representative-based Clustering): PDF
Slide: Introduction to Machine Learning (Clustering and EM) by Barnabás Póczos & Aarti Singh
Tutorial: The Expectation Maximization Algorithm by Sean Borman
Tutorial: What is Bayesian Statistics? by John W Stevens

Further Reading{: .label .label-yellow }

Slide: Tutorial on Estimation and Multivariate Gaussians by Shubhendu Trivedi
Slide: Mixture Model by Jing Gao
Paper: Fast Exact k-Means, k-Medians and Bregman Divergence Clustering in 1D
Paper: k-Means Requires Exponentially Many Iterations Even in the Plane by Andrea Vattani
Book: Understanding Machine Learning: From Theory to Algorithms by Shai Shalev-Shwartz and Shai Ben-David

Mahalanobis distance

Chapter 2 of Zaki, page 54, eq. 2.43
What is Mahalanobis distance?
- Use the Cholesky transformation to correlate and uncorrelate variables
Mahalanobis Distance – Understanding the math with examples (python)
Unlocking the Power of Mahalanobis Distance: Exploring Multivariate Data Analysis with Python
Outlier detection-faradars
Mahalanobis Distance – Understanding the math with examples (python)

1403 Winter

Clustering Validation

Chapter 17 of Data Mining & Analysis
Slides of Section 17.1 (Clustering Validation): PDF
Slide: Clustering Analysis by Enza Messina
Slide: Information Theory by Jossy Sayir
Slide: Normalized Mutual Information: Estimating Clustering Quality by Bilal Ahmed

Further Reading{: .label .label-yellow }

Slide: Clustering Evaluation (II) by Andrew Rosenberg
Slide: Evaluation (I) by Andrew Rosenberg

1403 Winter

Density-Based Clustering

Chapter 15 of Data Mining & Analysis
Slides of Section 15.1 (Density-based Clustering): PDF
Slide: Spatial Database Systems by Ralf Hartmut Güting

1403 Winter

Kernel Method

Chapter 5 of Data Mining & Analysis
Kernel-Kmeans Chapter 13 of Data Mining & Analysis
HW-xx{: .label .label-red } TBA

EXAM{: .label .label-purple }

1403 Winter

Spectral and Graph Clustering

Chapter 16 of Data Mining & Analysis
Exercises 16.5: Q2, Q3, Q6
Slides (Spectral and Graph Clustering): PDF
Slide: Spectral Clustering by Andrew Rosenberg
Slide: Introduction to Spectral Clustering by Vasileios Zografos and Klas Nordberg

Further Reading{: .label .label-yellow }

Slide: Spectral Methods by Jing Gao
Tutorial: A Tutorial on Spectral Clustering by Ulrike von Luxburg
Tutorial: Matrix Differentiation by Randal J. Barnes
Lecture: Spectral Methods by Sanjoy Dasgupta
Paper: Positive Semidefinite Matrices and Variational Characterizations of Eigenvalues by Wing-Kin Ma

Itemset Mining

Chapter 8 of Data Mining & Analysis

1403 Winter

Link Analysis

Ranking Graph Vertices, Page Rank
Linear Algebra and Technology

Further Reading{: .label .label-yellow }

Chapter 5 of Mining of Massive Datasets
Slide of Sections 5.1, 5.2 (PageRank, Efficient Computation of PageRank): Analysis of Large Graphs 1
Slide of Sections 5.3-5.5 (Topic-Sensitive PageRank, Link Spam, Hubs and Authorities): Analysis of Large Graphs 2
Slide: The Linear Algebra Aspects of PageRank by Ilse Ipsen
Paper: A Survey on Proximity Measures for Social Networks by Sara Cohen, Benny Kimelfeld, Georgia Koutrika

Additional Slides:

Practical Data Science by Zico Kolter
Course: Data Mining by U Kang
Statistical Data Mining Tutorials by Andrew W. Moore

Lecture: Finding Meaningful Clusters in Data by Sanjoy Dasgupta
Paper: An Impossibility Theorem for Clustering by Jon Kleinberg

Files

MFDS.md

Latest commit

History

MFDS.md

File metadata and controls

Main TextBooks:

1403/07/07

Introduction to Data Science

Python Programming

1403/07/14, 21

k-NN

High Dimensional Data

1403/07/28

1403/07/28

1403/07/24

1403/07/29

1403/08/01

Image Processing and Computer Vision, Intro

1403/08/06

Representative-Based Clustering

1403/08/13

1403/08/15

1403/09/04

Bias-Variance Tradeoff

1403/09/06,11,13

1403/09/18, 25, 27

1403/10/02

Hierarchical Clustering

1403/10/04

Linear Discriminant Analysis

1403/10/09

1403/10/11

1403/10/16,18

Bayes

1403 Winter

Adding Features

1403/11/01

Image Processing and Computer Vision

Image Matting

K-means

Representative-Based Clustering

Mahalanobis distance

1403 Winter

Clustering Validation

1403 Winter

Density-Based Clustering

1403 Winter

Kernel Method

1403 Winter

Spectral and Graph Clustering

Itemset Mining

1403 Winter

Link Analysis

Additional Slides: