Skip to content
#

dflash

Here are 9 public repositories matching this topic...

Language: All
Filter by language

vLLM patcher for Qwen3.6 on consumer NVIDIA — Qwen3.6-35B-A3B-FP8 (192 tok/s, +68% over stock) + Qwen3.6-27B-int4-AutoRound + 256K context. 126 patches: TurboQuant k8v4 KV, MTP/DFlash spec-decode, FULL cudagraph, hybrid GDN streaming, structured boot summary, one-command installer, 1958 tests. v7.72.2.

  • Updated May 5, 2026
  • Python
ChaosEngineAI

Local AI workstation — discover, run, chat, benchmark, and generate images from open-weight models. DFlash/DDTree speculative decoding, five cache compression strategies (RotorQuant, TriAttention, TurboQuant, ChaosEngine), MLX + llama.cpp + vLLM backends.

  • Updated May 5, 2026
  • Python

Improve this page

Add a description, image, and links to the dflash topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the dflash topic, visit your repo's landing page and select "manage topics."

Learn more