Custom Matmul Kernels This repository contains source code for this blog post. Dependency Python 3.7.10 or higher CuPy 7.4.0 or higher Pytorch 1.8.1 or higher Only tested with CUDA 11.2