layout | title | description | has_children | mathjax | tags | |||
---|---|---|---|---|---|---|---|---|
page |
Mathematical Foundations of Data Science |
Listing of course modules and topics. |
true |
true |
|
- Data Mining and Analysis: Fundamental Concepts and Algorithms by Mohammed J. Zaki and Wagner Meira Jr., 2021 (Data Mining & Analysis) PDF
- Pattern Recognition, by Sergios Theodoridis and Konstantinos Koutroumpas, 2009
- The Elements of Statistical Learning (ESL)
- Foundations of Data Science (FDS), by Avrim Blum, John Hopcroft, and Ravindran Kannan, 2018
HW1{: .label .label-red }Generate random points with uniform distribution in the unit sphere, due date: 1403/07/12 Extended: 1403/07/18
HW2{: .label .label-red }Satisfiability Table, due date: 1403/07/15 Extended: 1403/07/18
Further Reading{: .label .label-yellow }
-
Slide: Introduction to Data Science by Zico Kolter
-
Slide: Introduction to Data Science by Kevin Markham
-
Slide: Clustering by Matt Dickenson
-
Section 2.5 of ESL
- Page 41/764 Local Methods in High Dimensions
-
Slides Chap. 6 of Zaki
Colab{: .label .label-green }High Dimensional Data - The curse of dimensionality
Colab{: .label .label-green }High Dimensional Data - KNN
Colab{: .label .label-green }Clustering of images, as high dim. data
HW-xx{: .label .label-red }Page 15 of FDS - Orthogonality of d-dimensional Gaussian vectors, due: 1403/08/26
Further Reading{: .label .label-yellow }
- Slides Chap. 1 of Zaki
- Random Projection: Theory and Implementation in Python with Scikit-Learn
- Johnson–Lindenstrauss lemma
- Gaussian random projection
- Scikit-learn: The Johnson-Lindenstrauss bound for embedding with random projections
Paper{: .label .label-blue }Supervised dimensionality reduction for big data
Paper{: .label .label-blue }An Introduction to Johnson–Lindenstrauss Transforms
Further Reading{: .label .label-yellow }
Discrete Optimization : Draft version of My Book: Meta Heuristic Algorithms
Some Examples:
-
N-Queen Problem
-
Knight-Tour, My old Delphi program, previous century!
-
Traveling Salesman Problem
-
Packing & Cutting Problems, My old Delphi program, previous century! My MSc. Project.
Random Search : Chapter 1 & 2 of My Book + My NP-Complete Paper
HW-xx{: .label .label-red }Python Code of Program 1.2, Page 11 of My book - RS, due: 1403/07/25
- SAT
- SA, Continue : Chapter 3 of My Book
HW-xx{: .label .label-red }Python Code of Program 1.3, Page 23 of My book - SA, due: 1403/07/26
Further Reading{: .label .label-yellow }
PSO : Chapter 6 of My Book
HW-xx{: .label .label-red }Use one the Python packages to find the minimum of
Some Python packages for PSO:
Further Reading{: .label .label-yellow }
Paper{: .label .label-blue }: A Fish School Clustering Algorithm: Applied to Student Sectioning Problem
-
Chapter 11 & 12 of Pattern Recognition, Theodoridis
-
Chapeter 11:
- Page 602, Section 11.2 PROXIMITY MEASURES - Page 604
- Page 606, Section B. Similarity Measures: The inner product & Pearson’s correlation coefficient
- Page 607, Discrete-Valued Vectors & contingency table
- Page 616, 11.2.3 Proximity Functions between a Point and a Set
-
Chapter 12:
- 12.1 INTRODUCTION
- 12.3 SEQUENTIAL CLUSTERING ALGORITHMS
- Exercises
- SAT Table & Brute Force Algorithm for Clustering
- Colab{: .label .label-green }Brute Force Alg for Clustering
HW-xx{: .label .label-red }Generate data & Clustering, due: 1403/08/04
HW-xx{: .label .label-red }BSAS Algorithm, due: 1403/08/07
Colab{: .label .label-green }Image Segmentation 01
- Section 14.3.5 of ESL
- Page 527/764 ESL, Eq. 14.28: W(C)
- The problem with one unknown variable becomes a problem with two unknowns!
- Section 8.3 of K-means Clustering
- Colab{: .label .label-green }Image Segmentation 02- kmeans clustering
- Chapter 13 of Data Mining & Analysis
- Slides (Representative-based Clustering)
Further Reading{: .label .label-yellow }
- Lloyd’s, MacQueen’s and Hartigan-Wong’s k-Means
- Convergence in Hartigan-Wong k-means method and other algorithms
- sklearn.datasets.make_blobs
- Section 14.3.6 of ESL
- Page 528/764 ESL, K-means
- Section 14.3.9 of ESL
- Page 533/764 Vector Quantization
Colab{: .label .label-green }Image Segmentation 03- kmeans clustering
Colab{: .label .label-green }LVQ
HW-xx{: .label .label-red }K-means on color images, due: 1403/08/19
Further Reading{: .label .label-yellow }
Further Reading{: .label .label-yellow }
- MLU-Explain double-descent, part 2
- The Bias-Variance Tradeoff: A Newbie’s Guide, by a Newbie
- bias-variance-trade-off
Paper{: .label .label-blue }VC Theoretical Explanation of Double Descent
Paper{: .label .label-blue }Reconciling modern machine-learning practice and the classical bias–variance trade-off
Paper{: .label .label-blue }Understanding the double descent curve in Machine Learning
- Chapter 5 of VanderPlas: In Depth: k-Means Clustering
- Chapter 17 of Data Mining & Analysis
- Clustering Validation
- Matching in Bipartite Graphs
- Silhouette, Clustering Evaluation
- Clustering Evaluations
Colab{: .label .label-green }bi-partiate-graph-maximum-matching
Colab{: .label .label-green }Silhouette
Further Reading{: .label .label-yellow }
- MSc. Project: Graph Cut
- Wiki: Graph Matching
- Bipartite Graphs and Stable Matchings
- MIT, NOTES ON MATCHING
Paper{: .label .label-blue }Graph Matching and local search
Paper{: .label .label-blue }Graph Feature Selection for Anti-Cancer Plant Recommendation
- Principal Component Analysis explained visually
- In Depth: Principal Component Analysis, Python Data Science Handbook
- PRML-PCA Slides
- Matrix Differentiation by Randal J. Barnes
- Chapter 7 of Zaki (Slides)
Colab{: .label .label-green }PCA-01
Kaggle{: .label .label-green }Country Profiling Using PCA and Clustering
- An Introduction to Principal Component Analysis (PCA) with 2018 World Soccer Players Data, PDF
- Using PCA to See Which Countries have Better Players for World Cup Games, PDF
HW-xx{: .label .label-red }PCA Algorithm, due: 1403/10/02
Further Reading{: .label .label-yellow }
- A geometric interpretation of the covariance matrix
- A geometric interpretation of ... (In Persian)
- PCA in SKLearn
- PCA on IRIS
- Faces recognition example using eigenfaces and SVMs
Paper{: .label .label-blue }Eigenbackground Revisited
Colab{: .label .label-green }SVD-01
Colab{: .label .label-green }SVD for Image Compression
- A good image for hierarchical clustering
- Chapter 14 of Data Mining & Analysis
- Slides (Hierarchical Clustering): PDF
- sklearn.cluster.AgglomerativeClustering
Colab{: .label .label-green }Clustering of images
Further Reading{: .label .label-yellow }
- Slide: Hierarchical Clustering by Jing Gao
- Chapter 20 of Data Mining & Analysis
- Slides (Linear Discriminant Analysis): PDF
- Comparison of LDA and PCA
- HW-xx: Compare LDA and PCA first axis (classification by SVM)
Mid Term{: .label .label-purple }
Colab{: .label .label-green } Gaussian Mixture Models
Further Reading{: .label .label-yellow }
- Duda
- Naïve Bayes Algorithm -Implementation from scratch in Python, Medium
- Segmentation using Bayesian Decision Theory
Paper{: .label .label-blue }BayeSeg: Bayesian modeling for medical image segmentation with interpretable generalizability
Colab{: .label .label-green }Add Pixels' coordinates for image segmentation
EXAM{: .label .label-purple }
- Website: Image Processing in Python with Scikit-image by M. Jaderian
- Website: Image Processing in Python with OpenCV by M. Kiani
- Github: Tutorial for Image Processing in Python by Shaoning Zeng
- Book: Image processing tutorials
Further Reading{: .label .label-yellow } : Some published papers
- Github Rep.: A Python library for alpha matting https://pymatting.github.io/ by Y. Gavet & J. Debayle
- Sec 5.11 of JakeVanderPlas
Paper{: .label .label-blue } كلاسه بندی فازی بهینه دانشجویان با استفاده از یك تابع فازی در حل مسئله برنامه ریزی ژنتیكی دروس هفتگی دانشگاه
- Bilateral K-Means for Superpixel Computation
- Balanced clustering - Wikipedia
- balanced-kmeans · PyPI
- K-means using PyTorch (github.com)
- Balanced K-Means for Clustering
- Balanced k-Means Revisited
- K-Means Clustering in Python: A Practical Guide – Real Python
- Data clustering: 50 years beyond K-means - ScienceDirect
- K-Means Factorization
- Clustering IRIS dataset with particle swarm optimization(PSO)
- Chapter 13 of Data Mining & Analysis
- HW-xx{: .label .label-red } 13.5: Q2, Q4, Q6, Q7
- Slides (Representative-based Clustering): PDF
- Slide: Introduction to Machine Learning (Clustering and EM) by Barnabás Póczos & Aarti Singh
- Tutorial: The Expectation Maximization Algorithm by Sean Borman
- Tutorial: What is Bayesian Statistics? by John W Stevens
Further Reading{: .label .label-yellow }
- Slide: Tutorial on Estimation and Multivariate Gaussians by Shubhendu Trivedi
- Slide: Mixture Model by Jing Gao
- Paper: Fast Exact k-Means, k-Medians and Bregman Divergence Clustering in 1D
- Paper: k-Means Requires Exponentially Many Iterations Even in the Plane by Andrea Vattani
- Book: Understanding Machine Learning: From Theory to Algorithms by Shai Shalev-Shwartz and Shai Ben-David
- Chapter 2 of Zaki, page 54, eq. 2.43
- What is Mahalanobis distance?
- Mahalanobis Distance – Understanding the math with examples (python)
- Unlocking the Power of Mahalanobis Distance: Exploring Multivariate Data Analysis with Python
- Outlier detection-faradars
- Mahalanobis Distance – Understanding the math with examples (python)
- Chapter 17 of Data Mining & Analysis
- Slides of Section 17.1 (Clustering Validation): PDF
- Slide: Clustering Analysis by Enza Messina
- Slide: Information Theory by Jossy Sayir
- Slide: Normalized Mutual Information: Estimating Clustering Quality by Bilal Ahmed
Further Reading{: .label .label-yellow }
- Slide: Clustering Evaluation (II) by Andrew Rosenberg
- Slide: Evaluation (I) by Andrew Rosenberg
- Chapter 15 of Data Mining & Analysis
- Slides of Section 15.1 (Density-based Clustering): PDF
- Slide: Spatial Database Systems by Ralf Hartmut Güting
- Chapter 5 of Data Mining & Analysis
- Kernel-Kmeans Chapter 13 of Data Mining & Analysis
- HW-xx{: .label .label-red } TBA
EXAM{: .label .label-purple }
- Chapter 16 of Data Mining & Analysis
Exercises 16.5: Q2, Q3, Q6 - Slides (Spectral and Graph Clustering): PDF
- Slide: Spectral Clustering by Andrew Rosenberg
- Slide: Introduction to Spectral Clustering by Vasileios Zografos and Klas Nordberg
Further Reading{: .label .label-yellow }
- Slide: Spectral Methods by Jing Gao
- Tutorial: A Tutorial on Spectral Clustering by Ulrike von Luxburg
- Tutorial: Matrix Differentiation by Randal J. Barnes
- Lecture: Spectral Methods by Sanjoy Dasgupta
- Paper: Positive Semidefinite Matrices and Variational Characterizations of Eigenvalues by Wing-Kin Ma
- Chapter 8 of Data Mining & Analysis
- Ranking Graph Vertices, Page Rank
- Linear Algebra and Technology
Further Reading{: .label .label-yellow }
- Chapter 5 of Mining of Massive Datasets
- Slide of Sections 5.1, 5.2 (PageRank, Efficient Computation of PageRank): Analysis of Large Graphs 1
- Slide of Sections 5.3-5.5 (Topic-Sensitive PageRank, Link Spam, Hubs and Authorities): Analysis of Large Graphs 2
- Slide: The Linear Algebra Aspects of PageRank by Ilse Ipsen
- Paper: A Survey on Proximity Measures for Social Networks by Sara Cohen, Benny Kimelfeld, Georgia Koutrika
- Practical Data Science by Zico Kolter
- Course: Data Mining by U Kang
- Statistical Data Mining Tutorials by Andrew W. Moore
- Lecture: Finding Meaningful Clusters in Data by Sanjoy Dasgupta
- Paper: An Impossibility Theorem for Clustering by Jon Kleinberg