Hidden Speed in CUDA's Shared Memory
How to exactly quantize models and still not lose accuracy.
A practical guide to Quantization
How to exactly quantize models and still not lose accuracy.
Down the CudaMemory lane
Data Transfers Between CPU and GPU
Quantization explained, like you are five.
Explaining the intuition behind quantization
TensorRT meets C++
TensorRT inference in C++