Adaptive Numerical Precision Control For Energy-Minimal Inference On Constrained Edge Platforms
Keywords:
Mixed-Precision Quantization, Adaptive Precision, Edge Inference, Energy Minimization, Neural Architecture, Constrained Devices, Differentiable Quantization.Abstract
Edge AI execution requires neural network inference to be performed under tight power envelopes – often under 300 milliwatts – whilst achieving sufficient task accuracy for a given application. Uniform, fixed-precision bit-width quantisation at INT8 or INT4 uniformly scales the bit width across all layers, at the expense of significant accuracy drops in layers requiring the highest precision, such as the first and last convolutional blocks. In this paper, propose AdaPrecNet, an adaptive numerical precision control framework that dynamically selects per-layer and per-activation bit-widths during inference based on input complexity signals and hardware power telemetry. AdaPrecNet uses a small precision controller (a 2-layer MLP consuming < 0.1% of compute) trained end-to-end using a differentiable quantisation surrogate and an explicit power consumption loss term. At peak power, on the Raspberry Pi 4 and NVIDIA Jetson Nano, AdaPrecNet cuts power from 1200 mW (FP32) down to 290 mW, while achieving 93.7% of baseline task accuracy compared to 480 mW (91.3%) INT8 static and 510 mW (92.1%) INT8 dynamic performance.




