xlite-dev / Awesome-LLM-Inference Public

Notifications You must be signed in to change notification settings
Fork 333
Star 4.9k

Code
Issues 1
Pull requests
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Pull requests: xlite-dev/Awesome-LLM-Inference

Labels 10 Milestones 0

New pull request New

Clear current search query, filters, and sorts

0 Open 146 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Update README.md - add KVTC

#156 by CStanKonrad was merged Nov 11, 2025

Loading…

Add siiRL

#155 by liao1995 was merged Jul 29, 2025

Loading…

add two papers

#154 by JoursBleu was merged Jul 10, 2025

Loading…

Add SDMPrune paper

#153 by sccbhxc was merged Jun 30, 2025

Loading…

Add Inference-Time Hyper-Scaling

#152 by CStanKonrad was merged Jun 16, 2025

Loading…

Add STAND

#151 by woominsong was merged Jun 6, 2025

Loading…

Add a new paper (GuidedQuant)

#150 by jusjinuk was merged Jun 6, 2025

Loading…

Update new paper (KVzip)

#149 by Janghyun1230 was merged Jun 5, 2025

Loading…

Add 4 papers

#148 by woominsong was merged Jun 4, 2025

Loading…

🔥[SageAttention-3] SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-bit Training

#147 by DefTruth was merged May 24, 2025

Loading…

Flex Attention: a Programming Model for Generating Optimized Attention Kernels

#146 by DefTruth was merged May 12, 2025

Loading…

Add The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs

#145 by PiotrNawrot was merged May 5, 2025

Loading…

🔥[BitNet v2] Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs

#144 by DefTruth was merged May 5, 2025

Loading…

🔥[Triton-distributed] TileLink: Generating Efficient Compute-Communication Overlapping Kernels using Tile-Centric Primitives

#142 by DefTruth was merged Apr 27, 2025

Loading…

Update Multi-GPUs/Multi-Nodes Parallelism

#141 by DefTruth was merged Apr 26, 2025

Loading…

🔥[MMInference] MMInference: Accelerating Pre-filling for Long-Context Visual Language Models via Modality-Aware Permutation Sparse Attention

#140 by DefTruth was merged Apr 25, 2025

Loading…

🔥[FSDP 1/2] PyTorch FSDP: Getting Started with Fully Sharded Data Parallel(FSDP)

#139 by DefTruth was merged Apr 25, 2025

Loading…

🔥🔥[SGLang] Efficiently Programming Large Language Models using SGLang

#138 by DefTruth was merged Apr 18, 2025

Loading…

Add SeerAttention and SlimAttention Paper

#135 by sunshinemyson was merged Apr 16, 2025

Loading…

🔥[KV Cache Prefetch] Accelerating LLM Inference Throughput via Asynchronous KV Cache Prefetching

#133 by DefTruth was merged Apr 12, 2025

Loading…

TRITONBENCH: Benchmarking Large Language Model Capabilities for Generating Triton Operator

#132 by DefTruth was merged Apr 6, 2025

Loading…

MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism

#131 by DefTruth was merged Apr 6, 2025

Loading…

Update Mooncake-v3 paper link

#130 by DefTruth was merged Mar 30, 2025

Loading…

Update README.md

#129 by DefTruth was merged Mar 30, 2025

Loading…

Add download_pdfs.py

#128 by DefTruth was merged Mar 30, 2025

Loading…

Previous 1 2 3 4 5 6 Next

Previous Next

ProTip! no:milestone will show everything without a milestone.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!