Your no-nonsense guide to the Math used in Artificial Intelligence
A person working in the field of AI who doesn’t know math is like a politician who doesn’t know how to persuade. Both have an inescapable area to work upon!
A week back, I wrote an article on How to Get into Data Science in 2021 and since then I've received several emails from people all over the world asking just how much math is required in Data Science.
I won't lie: It's a lot of math.
And this is one of the reasons that puts off many beginners. After much research and talks with several veterans in the field, I've compiled this no-nonsense guide that covers all of the fundamentals of the math you'll need to know. The concepts mentioned below are usually covered over several semesters in college, but I've boiled them down to the core principles that you can focus on.
This guide is an absolute life-saver for beginners, so you can study the topics that matter most, and an even better resource for practitioners such as myself who require a quick breeze-through on these concepts.
Note: You don't need to know all of the concepts (below) in order to get your first job in Data Science. All you need is a firm grasp of the fundamentals. Focus on those and consolidate them.
Knowledge of algebra is perhaps fundamental to math in general. Besides mathematical operations like addition, subtraction, multiplication and division, you'll need to know the following:
Linear Algebra is the primary mathematical computation tool in Artificial Intelligence and many areas of Science and Engineering. In this, 4 primary mathematical objects and their properties need to be understood:
- Scalars - a single number (can be real or natural).
- Vectors - a list of numbers, arranged in order. Consider them as points in space with each element representing the coordinate along an axis.
- Matrices - a 2-D array of numbers where each number is identified by 2 indices.
- Tensors - an N-D array of numbers (N>2), arranged on a regular grid with N-axes. Important in Machine Learning, Deep Learning and Computer Vision
- Eigenvectors & Eigenvalues - special vectors and their corresponding scalar quantity. Understand the significance and how to find them.
- Singular Value Decomposition - factorization of a matrix into 3 matrices. Understand the properties and applications.
- Principal Component Analysis (PCA) - understand the significance, properties, and applications.
Properties such as the Dot product, Vector product and the Hadamard product are useful to know as well.
Calculus deals with changes in parameters, functions, errors and approximations. Working knowledge of multi-dimensional calculus is imperative in Data Science. The following are the most important concepts (albeit non-exhaustive) in Calculus:
- Derivatives - rules (addition, product, chain rule etc), hyperbolic derivatives (tanh, cosh etc) and partial derivatives.
- Vector/Matrix Calculus - different derivative operators (Gradient, Jacobian, Hessian and Laplacian)
- Gradient Algorithms - local/global maxima & minima, saddle points, convex functions, batches & mini-batches, stochastic gradient descent, and performance comparison.
- Basic Statistics - Mean, median, mode, variance, covariance etc
- Basic rules in probability - events (dependent & independent), sample spaces, conditional probability.
- Random variables - continuous & discrete, expectation, variance, distributions (joint & conditional).
- Bayes' Theorem - calculates validity of beliefs. Bayesian software helps machines recognize patterns and make decisions.
- Maximum Likelihood Estimation (MLE) - parameter estimation. Requires knowledge of fundamental probability concepts (joint probability and independence of events).
- Common Distributions - binomial, poisson, bernoulli, gaussian, exponential.
An important field that has made significant contributions to AI and Deep Learning, and is yet unknown to many. It can be thought of as an amalgamation of calculus, statistics and probability.
- Entropy - also called Shannon Entropy. Used to measure the uncertainty of in an experiment.
- Cross-Entropy - compares two probability distrubutions & tells us how similar they are.
- Kullback Leibler Divergence - another measure of how similar two probability distrubutions are.
- Viterbi Algorithm - widely used in Natural Language Processing (NLP) & Speech
- Encoder-Decoder - used in Machine Translation RNNs & other models.
In Artificial Intelligence, maths is very important. Without it, it's comparable to a human body without a soul. You can treat the mathematical concepts as a pay-as-you-go: whenever a foreign concept pops up, grab it and devour it! The guide above presents a minimal, yet comprehensive, resource to understand any kind of topic or concept in AI.
Good luck!
If you think this roadmap can be improved, please do open a PR with any updates and submit any issues. We will continue to improve this, so you might want to consider watching/starring this repository to revisit it in the future.
Have a look at the contribution guide for how to update the roadmap.
- Open a pull request with improvements
- Discuss ideas in issues
- Spread the word
- Reach out with any feedback
This roadmap was created by Jason Dsouza and made publically available under the MIT License.