A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
-
Updated
Apr 16, 2026
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
Distributed Machine Learning Patterns from Manning Publications by Yuan Tang https://bit.ly/2RKv8Zo
Super-Efficient RLHF Training of LLMs with Parameter Reallocation
Keras implementation of the renowned publication "DeepFace: Closing the Gap to Human-Level Performance in Face Verification" by Taigman et al. Pre-trained weights on VGGFace2 dataset.
Carefully curated list of awesome data science resources.
SOTA Google's Perceiver-AR Music Transformer Implementation and Model
A comprehensive guide designed to empower readers with advanced strategies and practical insights for developing, optimizing, and deploying scalable AI models in real-world applications.
This is the official codebase for KDD 2021 paper Generalized Zero-Shot Extreme Multi-Label Learning
Materials for "Machine Learning on Big Data" course
Execution framework for multi-task model parallelism. Enables the training of arbitrarily large models with a single GPU, with linear speedups for multi-gpu multi-task execution.
A fully adaptive, zero-tuning parameter manager that enables efficient distributed machine learning training
[DEPRECEATED] Multi-Instrumental Music Transformer trained on 12GB/400k MIDIs
[DEPRECEATED] Piano Transformer model trained on 2.6GB of MIDI piano music
Keras implementation of the renowned publication "FaceNet: A Unified Embedding for Face Recognition and Clustering" by Schroff et al.
Crack SWE (ML) / DS MAANG Interviews
C++ Port of Temporal Graph Networks for Deep Learning on Dynamic Graphs
Official Code Base for ICLR 2024 paper Enhancing Tail Performance in Extreme Classifiers by Label Variance Reduction
This project is for developing a deep neural networks and its variant from scratch. No external libraries are used except for GPU operations.
Our full length research is finally acknowledged through double-blind review procedure.
Robust distributed checkpointing and job management system for multi-GPU SLURM workloads
Add a description, image, and links to the large-scale-machine-learning topic page so that developers can more easily learn about it.
To associate your repository with the large-scale-machine-learning topic, visit your repo's landing page and select "manage topics."