Skip to content
Change the repository type filter

All

    Repositories list

    • konduktor

      Public
      cluster/scheduler health monitoring for GPU jobs on k8s
      Python
      Other
      14372Updated Nov 5, 2024Nov 5, 2024
    • examples

      Public
      example repos for training models on Trainy cloud
      0000Updated Sep 6, 2024Sep 6, 2024
    • torchtune

      Public
      A Native-PyTorch Library for LLM Fine-tuning
      Python
      BSD 3-Clause "New" or "Revised" License
      419000Updated Aug 24, 2024Aug 24, 2024
    • Prometheus community Helm charts
      Mustache
      Apache License 2.0
      5k000Updated Jun 19, 2024Jun 19, 2024
    • unsloth

      Public
      Finetune Llama 3, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
      Python
      Apache License 2.0
      1.2k000Updated Jun 18, 2024Jun 18, 2024
    • llm-atc

      Public archive
      Fine-tuning and serving LLMs on any cloud
      Python
      Apache License 2.0
      28610Updated Dec 2, 2023Dec 2, 2023
    • training

      Public
      Reference implementations of MLPerf™ training benchmarks
      Python
      Apache License 2.0
      556001Updated Nov 21, 2023Nov 21, 2023
    • vllm

      Public
      A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      Apache License 2.0
      4.5k001Updated Nov 15, 2023Nov 15, 2023
    • airoboros

      Public
      Customizable implementation of the self-instruct paper.
      Python
      Apache License 2.0
      71001Updated Nov 15, 2023Nov 15, 2023
    • FastChat

      Public
      An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
      Python
      Apache License 2.0
      4.5k001Updated Nov 14, 2023Nov 14, 2023
    • RWKV-LM

      Public
      RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
      Python
      Apache License 2.0
      859000Updated Nov 2, 2023Nov 2, 2023
    • nodify

      Public archive
      Profiling tools for distributed training
      HTML
      Other
      43710Updated Oct 31, 2023Oct 31, 2023
    • trainy

      Public
      A simple Pure Python/PyTorch performance daemon for training workloads
      Python
      21500Updated Aug 2, 2023Aug 2, 2023
    • dynolog

      Public
      Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the linux kernel, CPU, disks, Intel PT, GPUs etc. Dynolog also integrates with pytorch and can trigger traces for distributed training applications.
      C++
      MIT License
      40100Updated Jun 29, 2023Jun 29, 2023