int8-quantization

Hardware-aware optimization of CNN inference using INT8 quantization in PyTorch. Includes benchmarking, profiling, and visualization of accuracy, latency, and model size for edge deployment.

ml cnn pytorch int8-quantization

Updated Jan 1, 2026
Python

JayDS22 / Production-LLM-Serving-Optimization-Framework

Star

High-performance LLM inference platform with vLLM continuous batching achieving 12.3K+ req/sec, 42ms P50/178ms P99 latency, INT8/INT4 quantization (70% memory savings), tensor parallelism across 4 GPUs, and comprehensive monitoring serving 1500+ concurrent users.

multimodal int8-quantization llm-serving inference-embedded-engine

Updated Oct 3, 2025
Python

Improve this page

Add a description, image, and links to the int8-quantization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the int8-quantization topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

int8-quantization

Here are 7 public repositories matching this topic...

NJU-Jet / SR_Mobile_Quantization

clovaai / frostnet

jahongir7174 / YOLOv8-qat

Howell-Yang / onnx2trt

Mohammed2372 / Translation-API

Armitawh / Edge-ML-Quantization-Benchmark

JayDS22 / Production-LLM-Serving-Optimization-Framework

Improve this page

Add this topic to your repo