LLMPostTraining is a framework for post-training and fine-tuning Large Language Models (LLMs): Supervised Fine-Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), distillation, quantization, and MoE/MLM training. It uses DeepSpeed ZeRO, multi-GPU parallelism, and shared config/training utilities.
- Python 3.12+
- CUDA-capable GPU (recommended)
- uv or pip for dependencies
- Clone the repository:
git clone <repository-url>
cd llm_training- Install uv (optional):
pip install uv- Install dependencies:
uv pip install -r requirements.txt --no-build-isolation --index-strategy unsafe-best-matchor with pip:
pip install -r requirements.txt- Run from repo root with
PYTHONPATHset:
export PYTHONPATH=/path/to/llm_training # or . when already in repo rootGCP VM with local SSD (optional):
sudo lsblk -o NAME,SIZE,TYPE,MOUNTPOINT | grep nvme0n1
sudo mkfs.ext4 -F /dev/nvme0n1
sudo mkdir -p /mnt/disks/local-ssd
sudo mount /dev/nvme0n1 /mnt/disks/local-ssd
sudo chmod a+w /mnt/disks/local-ssd
UUID=$(sudo blkid -s UUID -o value /dev/nvme0n1)
echo "UUID=$UUID /mnt/disks/local-ssd ext4 discard,defaults,nofail 0 2" | sudo tee -a /etc/fstabpython main.pyFrom repo root:
PYTHONPATH=. bash training/seqorth_sft/run_seqorth.sh [config] [num_gpus]Configs live under config/seqorth/ (e.g. config/seqorth/seqorth_qwen_config.json).
cd training/sft
# Configs: config/sft/*.json
python custom_module_sft.py # or use run_g3moe_config.sh with config/sft/...cd training/lightning_trainer
export CUDA_VISIBLE_DEVICES=0
export CUDA_LAUNCH_BLOCKING=1
export WANDB_API_KEY=<your_wandb_api_key>
export HF_SECRET_KEY=<your_huggingface_token>
export HF_DATASETS_CACHE=<your_cache_directory>
huggingface-cli login --token $HF_SECRET_KEY
wandb login --relogin $WANDB_API_KEY
python trainer.py fit \
--trainer.fast_dev_run false \
--trainer.max_epochs 5 \
--model.learning_rate 3e-3 \
--data.train_batch_size 4 \
--data.eval_batch_size 4cd training/rlhf
export CUDA_VISIBLE_DEVICES="0,1"
export WANDB_API_KEY=<your_wandb_api_key>
export HF_SECRET_KEY=<your_huggingface_token>
export HF_DATASETS_CACHE=<your_cache_directory>
huggingface-cli login --token $HF_SECRET_KEY
wandb login --relogin $WANDB_API_KEY
accelerate launch --config_file "accelerate_config.yaml" train.pyInstall the MoRA (peft-mora) package from the repo, then run training:
pip install -e ./models/mora
cd training/mora
# Use train.py with your config (see training/mora/README.md)- Training methods: SFT (Seqorth, G3MoE), RLHF (GRPO, TTC, DPO/SimPO), MoRA, distillation, Lightning
- Models: Seqorth MoE, G3MoE, MoRA, Qwen3 MoE fused; shared code under
core/andmodels/ - Config: Single
config/tree for seqorth, sft, rlhf, eval - Distributed: DeepSpeed ZeRO-2/3, multi-GPU, NVMe offload
- Monitoring: W&B, eval callbacks, routing benchmarks
- ZeRO-2 + ZenFlow: Recommended for performance
- ZeRO-3: Max memory efficiency; parameter/gradient/optimizer partitioning, NVMe offload
- ZenFlow: Async gradient updates, selective updates, communication overlap (configurable in DeepSpeed config)
# Disable ZenFlow if RAM OOM
export DISABLE_ZENFLOW=1- Use ZeRO-2 + ZenFlow for best throughput; ZeRO-3 for largest models
- Reduce
per_device_train_batch_sizeor enablegradient_checkpointingif OOM - Configs under
config/seqorth/,config/sft/reference DeepSpeed JSONs in the same tree
- PyTorch 2.4+ (CUDA)
- Transformers (with trust_remote_code for Qwen/Seqorth)
- DeepSpeed, Accelerate
- Lightning AI (for
training/lightning_trainer) - Hugging Face (datasets, tokenizers, etc.)
MIT License.