Skip to content

Resources for recent AI systems (deployment concerns, cost and accessibility). -- closed

Notifications You must be signed in to change notification settings

Jason-cs18/Awesome-AI-Systems

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 

Repository files navigation

Thanks for your interests.

Because I focus on collaborative learning between multi-camera networks and don't have time to organize recent research works on AI systems, I will not maintain this github. Awesome-System-for-Machine-Learning maintained by HuaizhengZhang is an comprehensive list of rencent works on AI systems, especially on distributed machine learning systems. Research notes and codes for AI-Systems.

AI-Systems

As discussed in [1, 2], there are three system-level concerns in real-world AI applications. They are deployment, cost and accessibility.

[1] Stoica et al. A Berkeley View of Systems Challenges for AI.
[2] Ratner et al. MLSys: The New Frontier of Machine Learning Systems.

Deployment Concerns

Deployment concerns include robustness to adversarial influences or other spurious factors; safety more broadly considered; privacy and security, especially as sensitive data is increasingly used; interpretability, as is increasingly both legally and operationally required; fairness, as ML algorithms begin to have major effects on our everyday lives; and many other similar concerns.

Popular approaches (todo, summary)

Video
  1. Cryptography for Safe Machine Learning. In MLSys'20.: Shafi Goldwasser presented some techniques about cryptography in machine learning.
Paper
  1. Telekine: Secure Computing with Cloud GPUs. In NSDI'20.: they aim to solve some concerns about privacy in the recent GPU trusted execution environments (TEE).
  2. Themis: Fair and Efficient GPU Cluster Scheduling. In NSDI'20.: a fair GPU scheduling algorithm. further reading
  3. Federated Optimization in Heterogeneous Networks. In MLSys'20.: they proposed a framework named FedProx to tackle heterogeneity in federated networks. Traditional feaderated learning frameworks targeted to solve problems about privacy in machine learning but suffered from challenges about systems heterogeneity and statistical heterogeneity (non-identical distributions).
  4. FLEET: Flexible Efficient Ensemble Training for Heterogeneous Deep Neural Networks. In MLSys'20.: to handle poor performance of data sharing strategy in a heterogenous set of DNNs, authors intro duced a flexible ensemble DNN training framework named FLEET.
  5. What is the State of Neural Network Pruning? In MLSys'20.: authors prop osed an open-source framework named ShrinkBench to evaluate pruning methods.
  6. Attention-based Learning for Missing Data Imputation in HoloClean. In MLSys'20.: they utilized attention mechanism to analysis and interpret missing data imputation problem.
  7. A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms. In MLSys'20.: an evaluation tool for deep learning hardware and software platforms.
  8. MLPerf Training Benchmark. In MLSys'20.: a machine learning benchmark for training tasks.

Cost

Cost on annotation, computation, latency and power.

Popular approaches (todo, summary)

Video
  1. Theory & Systems for Weak Supervision. In MLSys'20.: Christopher Ré highlighted the importance of data in the real world deployments because we aren't usually able to get enough high quality labels for large training data. To bride this gap, he introduced many works from theoretical analysis to real world deployment about weak supervised learning, which only learns from noisy weakly labels. Also, He introduced Snorkel which is a popular weak supervised framework developed by Brown University.
Paper
  1. Improving Resource Efficiency of Deep Activity Recognition via Redundancy Reduction. In HotMobile'20.: they target to reduce cost on computation and memory of deep human activity recognition (HAR) models. Note
  2. Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems. In MLSys'20.: they propsoed a distributed GPU hierarchical parameter server to fit terabyte-scale parameters for massive scale deep learnning ads systems.
  3. Resource Elasticity in Distributed Deep Learning. In MLSys'20.: To relieve the hard assumptions about fixed resource allocation through the lifetime of the job in distributed training, they designed and implemented the first autoscaling engine for these workloads.
  4. SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems. In MLSys'20.: they prop osed Sub-Linear Deep Learning Engine named SLIDE to handle fast training on large datasets and efficient utilization on the current hardware. This engine blends smart randomized algorithms with multicore parallelism and workload optimization.
  5. Breaking the Memory Wall with Optimal Tensor Rematerialization. In MLSys'20.: they prop osed a new system to accelerate training under a memory-constraint environment.
  6. SkyNet: a Hardware-Efficient Method for Object Detection and Tracking on Embedded Systems. In MLSys'20.: authors proposed SkyNet, a hardware-efficient method to deliver the state-of-the-art detection accuracy and speed for embedded systems.
  7. Fine-Grained GPU Sharing Primitives for Deep Learning Applications. In MLSys'20.: they identified the importance of fine-grained GPU sharing methods when multiple DL workloads accessed the same GPU but they only tested some simple scheduling algorithms (FIFO, SRTF, PACK and FAIR). From my perspective, scheduling methods can be customized for specific applications because different context help us design or implement the most suitable scheduling algorithms. Note
  8. Improving the Accuracy, Scalability, and Performance of Graph Neural Networks with Roc. In MLSys'20.: they proposed a distributed multi-GPU framework for fast GNN training and inference on graphs.
  9. OPTIMUS: OPTImized matrix MUltiplication Structure for Transformer neural network accelerator. In MLSys'20.: they introduced an efficient inference accelerator for transformer network to improve resource utilization on hardware.
  10. PoET-BiN: Power Efficient Tiny Binary Neurons. In MLSys'20.: authors proposed a Look-up Table based power efficient implementation on resource-constrained embedding devices.
  11. Memory-Driven Mixed Low Precision Quantization for Enabling Deep Network Inference on Microcontrollers. In MLSys'20.: they presented a novel end-to-end methodology for enabling the deployment of high-accuracy deep networks on microcontrollers through mixed low-bitwidth compression and integer-only operations.
  12. Trained Quantization Thresholds for Accurate and Efficient Fixed-Point Inference of Deep Neural Networks. In MLSys'20.: authors presented a new efficient method for quantization.
  13. Riptide: Fast End-to-End Binarized Neural Networks. In MLSys'20.: they proposed a scheduled library for binarized linear algebra operations based on their analysis on the underlying challenges on binarized neural networks.
  14. Searching for Winograd-aware Quantized Networks. In MLSys'20.: a new quantized network.
  15. Blink: Fast and Generic Collectives for Distributed ML. In MLSys'20.: authors introduced Blink, a collective communication library that dynamically generates optimal communication primitives by packing spanning trees.
  16. MotherNets: Rapid Deep Ensemble Learning. In MLSys'20.: to overcome large demanding resources when training deep ensemble networks, they proposed MotherNets to reduce training cost.
  17. Willump: A Statistically-Aware End-to-end Optimizer for Machine Learning Inference. In MLSys'20.: compared with many optimizers for ML inference, they proposed Willump, an end2end optimizer for machine learning inference pipelines built on two novel optimizations: cascade feature computations and approximating top-K queries. Note
  18. Server-Driven Video Streaming for Deep Learning Inference. In SIGCOMM'20.: they presented a new video streaming protocol to reduce cost of current video streaming systems.
  19. Reducto: On-Camera Filtering for Resource-Efficient Real-Time Video Analytics. In SIGCOMM'20.: they proposed a novel frame filtering approach to trade-off resource usage and accuracy of real-time video analytics.

Accessibility

Accessibility to developers and organizations without PhD-level machine learning and systems expertise. From my perspective, most of distributed training works belong to this area because optimizing distributed learning tools could help developers deploy their machine learning algorithms fast.

Popular approaches (todo, summary)

Paper
  1. A System for Massively Parallel Hyperparameter Tuning. In MLSys'20.: they proposed a new hyperparameter optimization algorithm named ASHA to solve large-scale hyperparameter optimization problems in distributed training.
  2. PLink: Discovering and Exploiting Locality for Accelerated Distributed Training on the public Cloud. In MLSys'20.: they introduced a new optimized communication library called PLink to speed up distributed training in public cloud.
  3. BPPSA: Scaling Back-propagation by Parallel Scan Algorithm. In MLSys'20.: they reformulated the commonly used back-propagation (BP) algorithm into a scan operation to handle the limitation of BP in a parallel computing environment.
  4. MNN: A Universal and Efficient Inference Engine. In MLSys'20.: they proposed Mobile Neural Network (MNN), a universal and efficient inference engine tailored to mobile applications.

Useful external Resources

Books for Deep Learning (a popular learning approaches in machine learning)

  1. Dive into Deep Learning
  2. Deep Learning
  3. 智能计算系统 (AI Computing Systems)
  4. Tutorial on hardware accelerators for deep neural networks (the Energy-Efficient Multimedia Systems group at MIT)

Course

  1. (UW)CSE 599W: Systems for ML: Low-level optimization in Deep Learning frameworks.
  2. (UCB)AI-Sys: Machine Learning Systems: a general course for AI systems.
  3. (UMich)EECS 598: Systems for AI (W'20): a general course for AI systems.

Conference

  1. SysML: Systems and Machine Learning
  2. SOSP: ACM Symposium on Operating Systems Principles
  3. OSDI: USENIX Symposium on Operating Systems Design and Implementation

Tools

  1. TVM: End to End Deep Learning Compiler Stack

About

Resources for recent AI systems (deployment concerns, cost and accessibility). -- closed

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published