How Quantization Benchmark Works
Comparing float32, float16, and int8 in the browser
Quantization Benchmark runs one TensorFlow.js model at different numeric precisions. It measures average inference time, estimated weight memory footprint, and how much output confidence shifts relative to a float32 baseline.
1) One model, three precision modes
The experiment initializes a single convolutional model architecture and reuses it for each benchmark pass. For float32, weights are used as-is. For float16 and int8, each weight tensor is quantized and then loaded into an identical model clone.
for (const mode of ["float32", "float16", "int8"] as const) {
const result = await runModeBenchmark(baseModel, inputTensor, mode);
orderedResults.push(result);
}2) Float16 simulation
Float16 keeps the same dynamic range idea as float32 but with fewer mantissa bits. The benchmark rounds each weight value to half-precision granularity, then stores it back as float32 for browser execution.
const exponent = Math.floor(Math.log2(clamped));
const normalized = clamped / 2 ** exponent;
const mantissa = Math.round((normalized - 1) * 1024) / 1024;
return sign * (1 + mantissa) * 2 ** exponent;3) Int8 affine quantization
Int8 maps continuous weight values into 256 discrete buckets. The benchmark computes a scale from max absolute value, quantizes into [-128, 127], then dequantizes back to float values before inference.
const scale = maxAbs === 0 ? 1 : maxAbs / 127;
const q = Math.max(-128, Math.min(127, Math.round(value / scale)));
const dequantized = q * scale;4) Measuring speed, memory, and quality
Speed is measured as average milliseconds across repeated inference passes. Memory is estimated from parameter count and bytes per precision (4, 2, or 1). Quality is reported as top-1 agreement and confidence drift versus float32.
const memoryBytes = parameterCount * bytesForMode(mode);
const top1Agreement = sameTop1 ? 100 : 0;
const confidenceDrift = Math.abs(topPredictionProb - baselineProb);Mission Debrief
Lower precision reduces model memory requirements.
Speed gains depend on backend and hardware support.
Quantization introduces rounding error that may shift confidence or top labels.