Abstract
This paper presents AdaLite, a knowledge distillation framework for monocular depth estimation designed for efficient deployment on resource-limited devices, without relying on quantization or pruning. While large-scale depth estimation networks achieve high accuracy, their computational and memory demands hinder real-time use. To address this problem, a large model is adopted as a teacher, and a compact encoder–decoder student with few trainable parameters is trained under a dual-supervision scheme that aligns its predictions with both teacher feature maps and ground-truth depths. AdaLite is evaluated on the NYUv2, SUN-RGBD and KITTI benchmarks using standard depth metrics and deployment-oriented measures, including inference latency. The distilled model achieves a 94% reduction in size and reaches 1.02 FPS on a Raspberry Pi 2 (2 GB CPU), while preserving 96.8% of the teacher’s accuracy ( ) and providing over 11× faster inference. These results demonstrate the effectiveness of distillation-driven compression for real-time depth estimation in resource-limited environments. The code is publically available.