InsightFace REST API for easy deployment of face recognition services with TensorRT in Docker.
-
Updated
Jun 1, 2025 - Python
InsightFace REST API for easy deployment of face recognition services with TensorRT in Docker.
👀 Apply YOLOv8 exported with ONNX or TensorRT(FP16, INT8) to the Real-time camera
Pytorch implementation of DreamerV2: Mastering Atari with Discrete World Models, based on the original implementation
Let's train CIFAR 10 Pytorch with Half-Precision!
A flexible utility for converting tensor precision in PyTorch models and safetensors files, enabling efficient deployment across various platforms.
Export pytorch model to ONNX and convert ONNX from float32 to float 16
apextrainer is an open source toolbox for fp16 trainer based on Detectron2 and Apex
40x faster AI inference: ONNX to TensorRT optimization with FP16/INT8 quantization, multi-GPU support, and deployment
Transformer implementation in pytorch trained on NVIDIA A100 in fp16
Comprehensive performance analysis of DeepSeek V3 quantization levels (FP16, Q8_0, Q4_0) on 16GB GPU environments.
A reproducible GPU benchmarking lab that compares FP16 vs FP32 training on MNIST using PyTorch, CuPy, and Nsight profiling tools. This project blends performance engineering with cinematic storytelling—featuring NVTX-tagged training loops, fused CuPy kernels, and a profiler-driven README that narrates the GPU’s inner workings frame by frame.
🎬 Explore GPU training efficiency with FP32 vs FP16 in this modular lab, utilizing Tensor Core acceleration for deep learning insights.
Add a description, image, and links to the fp16 topic page so that developers can more easily learn about it.
To associate your repository with the fp16 topic, visit your repo's landing page and select "manage topics."