-
Notifications
You must be signed in to change notification settings - Fork 333
Pull requests: xlite-dev/Awesome-LLM-Inference
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
🔥[SageAttention-3] SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-bit Training
#147
by DefTruth
was merged May 24, 2025
Loading…
Flex Attention: a Programming Model for Generating Optimized Attention Kernels
#146
by DefTruth
was merged May 12, 2025
Loading…
Add The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs
#145
by PiotrNawrot
was merged May 5, 2025
Loading…
🔥[BitNet v2] Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs
#144
by DefTruth
was merged May 5, 2025
Loading…
🔥[Triton-distributed] TileLink: Generating Efficient Compute-Communication Overlapping Kernels using Tile-Centric Primitives
#142
by DefTruth
was merged Apr 27, 2025
Loading…
🔥[MMInference] MMInference: Accelerating Pre-filling for Long-Context Visual Language Models via Modality-Aware Permutation Sparse Attention
#140
by DefTruth
was merged Apr 25, 2025
Loading…
🔥[FSDP 1/2] PyTorch FSDP: Getting Started with Fully Sharded Data Parallel(FSDP)
#139
by DefTruth
was merged Apr 25, 2025
Loading…
🔥🔥[SGLang] Efficiently Programming Large Language Models using SGLang
#138
by DefTruth
was merged Apr 18, 2025
Loading…
🔥[KV Cache Prefetch] Accelerating LLM Inference Throughput via Asynchronous KV Cache Prefetching
#133
by DefTruth
was merged Apr 12, 2025
Loading…
TRITONBENCH: Benchmarking Large Language Model Capabilities for Generating Triton Operator
#132
by DefTruth
was merged Apr 6, 2025
Loading…
MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism
#131
by DefTruth
was merged Apr 6, 2025
Loading…
Previous Next
ProTip!
no:milestone will show everything without a milestone.