fp16

Here are 15 public repositories matching this topic...

SthPhoenix / InsightFace-REST

InsightFace REST API for easy deployment of face recognition services with TensorRT in Docker.

docker gpu face-recognition face-detection fp16 tensorrt onnx arcface insightface fastapi retinaface centerface mask-detection tensorrt-conversion scrfd yolov5-face adaface

Updated Jun 1, 2025
Python

the0807 / YOLOv8-ONNX-TensorRT

Star

👀 Apply YOLOv8 exported with ONNX or TensorRT(FP16, INT8) to the Real-time camera

computer-vision object-detection fp16 tensorrt int8 onnx yolov8

Updated May 23, 2024
Python

esteveste / dreamerV2-pytorch

Star

Pytorch implementation of DreamerV2: Mastering Atari with Discrete World Models, based on the original implementation

pytorch atari fp16 world-models dreamerv2

Updated Jul 25, 2022
Python

kentaroy47 / pytorch-cifar10-fp16

Star

Let's train CIFAR 10 Pytorch with Half-Precision!

training cnn pytorch cifar10 fp16 mixed-precision mixed-precision-training

Updated Oct 25, 2019
Python

angelolamonaca / PyTorch-Precision-Converter

Star

A flexible utility for converting tensor precision in PyTorch models and safetensors files, enabling efficient deployment across various platforms.

machine-learning deep-learning utilities deployment toolkit optimization pytorch fp16 model-compression model-conversion model-checkpoint bf16 tensor-precision efficient-deployment

Updated Aug 24, 2023
Python

quanvuhust / Export-ONNX-float-16

Star

Export pytorch model to ONNX and convert ONNX from float32 to float 16

deep-learning pytorch deep-convolutional-networks fp16 onnx dynamic-batch-size convnext

Updated Feb 2, 2023
Python

lbin / apextrainer_detectron2

Star

apextrainer is an open source toolbox for fp16 trainer based on Detectron2 and Apex

pytorch apex fp16 detectron2

Updated Feb 26, 2020
Python

umitkacar / onnx-tensorrt-optimization

Star

40x faster AI inference: ONNX to TensorRT optimization with FP16/INT8 quantization, multi-GPU support, and deployment

Updated Nov 14, 2025
Python

floriankark / transformer

Star

Transformer implementation in pytorch trained on NVIDIA A100 in fp16

pytorch transformer attention fp16 attention-is-all-you-need byte-pair-encoding a100

Updated Jan 19, 2025
Python

tk-yasuno / deepseek-v3-quantization-analysis

Star

Comprehensive performance analysis of DeepSeek V3 quantization levels (FP16, Q8_0, Q4_0) on 16GB GPU environments.

quantization model-evaluation fp16 gpu-performance latency-analysis model-quantization inference-acceleration model-optimization llm-inference llm-optimization deepseek-v3 throughput-analysis

Updated Sep 27, 2025
Python

arithy / softfloatpy

Sponsor

Star

A Python binding of Berkeley SoftFloat

floating-point ieee-754 ieee754 fp16 binary16 quadruple-precision softfloat bfloat16 bfloat bf16 binary128 fp128

Updated Jan 20, 2026
Python

Dartayous / FP16-vs-FP32-A-GPU-Lab-in-Frames

Star

A reproducible GPU benchmarking lab that compares FP16 vs FP32 training on MNIST using PyTorch, CuPy, and Nsight profiling tools. This project blends performance engineering with cinematic storytelling—featuring NVTX-tagged training loops, fused CuPy kernels, and a profiler-driven README that narrates the GPU’s inner workings frame by frame.

performance-engineering deep-learning reproducible-research cuda pytorch fp16 cupy mixed-precision nsight gpu-benchmark nvtx fp32 tensor-core

Updated Sep 5, 2025
Python

Adxell / CNN-downcasting-fashion-model

Star

The goal of this reposotory is create a downcasting the Fashion MNIST model from FP32 to FP16 using pytorch

cnn-model fp16 fp32

Updated Sep 22, 2025
Python

yasser1-0 / FP16-vs-FP32-A-GPU-Lab-in-Frames

Star

🎬 Explore GPU training efficiency with FP32 vs FP16 in this modular lab, utilizing Tensor Core acceleration for deep learning insights.

performance-engineering deep-learning reproducible-research cuda pytorch fp16 cupy mixed-precision nsight gpu-benchmark nvtx fp32 tensor-core

Updated Sep 6, 2025
Python

obsidianplusplus / YOLOv5-TensorRT-Accelerator

Star

基于TensorRT加速的YOLOv5高性能推理框架 | High-performance YOLOv5 inference framework accelerated by TensorRT with dynamic optimization

cuda fp16 tensorrt int8 pycuda yolov5 dynamic-shapes-cuda-stream

Updated Mar 29, 2025
Python

Improve this page

Add a description, image, and links to the fp16 topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the fp16 topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fp16

Here are 15 public repositories matching this topic...

SthPhoenix / InsightFace-REST

the0807 / YOLOv8-ONNX-TensorRT

esteveste / dreamerV2-pytorch

kentaroy47 / pytorch-cifar10-fp16

angelolamonaca / PyTorch-Precision-Converter

quanvuhust / Export-ONNX-float-16

lbin / apextrainer_detectron2

umitkacar / onnx-tensorrt-optimization

floriankark / transformer

tk-yasuno / deepseek-v3-quantization-analysis

arithy / softfloatpy

Dartayous / FP16-vs-FP32-A-GPU-Lab-in-Frames

Adxell / CNN-downcasting-fashion-model

yasser1-0 / FP16-vs-FP32-A-GPU-Lab-in-Frames

obsidianplusplus / YOLOv5-TensorRT-Accelerator

Improve this page

Add this topic to your repo