dflash

Here are 9 public repositories matching this topic...

Luce-Org / lucebox-hub

Lucebox optimization hub: hand-tuned LLM inference, built for specific consumer hardware.

kernel cuda cuda-kernels nvidia-cuda luce rtx3090 llama-cpp local-ai qwen speculative-decoding dflash megakernel speculative-prefill pflash lucebox

Updated May 5, 2026
C++

Tencent / AngelSlim

Star

Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.

audio eagle quantization diffusion vlm llm qwen speculative-decoding llm-compression hunyuan deepseek fp4 dflash

Updated Apr 29, 2026
Python

Sandermage / genesis-vllm-patches

Star

vLLM patcher for Qwen3.6 on consumer NVIDIA — Qwen3.6-35B-A3B-FP8 (192 tok/s, +68% over stock) + Qwen3.6-27B-int4-AutoRound + 256K context. 126 patches: TurboQuant k8v4 KV, MTP/DFlash spec-decode, FULL cudagraph, hybrid GDN streaming, structured boot summary, one-command installer, 1958 tests. v7.72.2.

Updated May 5, 2026
Python

AEON-7 / vllm-dflash

Star

DFlash vLLM for DGX Spark — Plug & Play Block-Diffusion Speculative Decoding

docker inference nvidia blackwell llm vllm qwen speculative-decoding block-diffusion nvfp4 dgx-spark dflash

Updated May 1, 2026
Python

hec-ovi / vllm-awq4-qwen

Star

vLLM Qwen 3.6-27B (AWQ-INT4) + DFlash speculative decoding on AMD Strix Halo (gfx1151 iGPU, 128 GB UMA, ROCm 7.13). 24.8 t/s single-stream, vision, tool calling, 256K context, OpenAI-compatible, Docker. Matches DGX Spark FP8+DFlash+MTP at a third of the cost. No CUDA.

docker rocm openai-api awq vllm llm-inference speculative-decoding multimodal-llm qwen3 gfx1151 ryzen-ai-max dflash amd-strix-halo rdna35 27b

Updated May 2, 2026
Python

cryptopoly / ChaosEngineAI

Sponsor

Star

Local AI workstation — discover, run, chat, benchmark, and generate images from open-weight models. DFlash/DDTree speculative decoding, five cache compression strategies (RotorQuant, TriAttention, TurboQuant, ChaosEngine), MLX + llama.cpp + vLLM backends.

Updated May 5, 2026
Python

phuongncn / qwen3.6-27b-speedhack-gx10-dgx-spark

Star

Qwen3.6 27B × DFlash — 30-35 tok/s on NVIDIA DGX Spark (GB10) - LLama.Cpp

dgx llamacpp dflash qwen36

Updated May 2, 2026
C++

croll83 / llama.cpp-dgx

Star

llama.cpp fork optimized for NVIDIA DGX Spark / GB10 (Blackwell, SM 12.1) — TurboQuant weights + KV, NVFP4, DFlash MTP

blackwell llama-cpp speculative-decoding gb10 nvfp4 dflash turboquant

Updated May 1, 2026
C++

am423 / dflash-robot

Star

GGUF-native DFlash speculative decoding runtime for local models

cuda llama-cpp ggml llm-inference gguf speculative-decoding dflash

Updated May 5, 2026
C++

Improve this page

Add a description, image, and links to the dflash topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the dflash topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dflash

Here are 9 public repositories matching this topic...

Luce-Org / lucebox-hub

Tencent / AngelSlim

Sandermage / genesis-vllm-patches

AEON-7 / vllm-dflash

hec-ovi / vllm-awq4-qwen

cryptopoly / ChaosEngineAI

phuongncn / qwen3.6-27b-speedhack-gx10-dgx-spark

croll83 / llama.cpp-dgx

am423 / dflash-robot

Improve this page

Add this topic to your repo