Authors: Rajat Takkar, Disha Sharma, Hridyesh Sharma
Abstract: The rapid growth of edge computing has changed how artificial intelligence is deployed on devices with limited resources such as smartphones, embedded systems, and IoT devices. In such environments, constraints related to memory, power, and storage make it difficult to use traditional deep learning models directly. Although modern neural networks perform well in tasks like computer vision, they often need high computational resources, which limits their practical use on edge devices. In this work, we focus on lightweight deep learning architectures that are designed to operate efficiently under these constraints. Specifically, we examine three widely used models—MobileNetV2, SqueezeNet, and EfficientNet-B0—for real-time inference on edge devices. The CIFAR-10 dataset is used as a benchmark to evaluate model performance. To improve training efficiency, we also apply transfer learning by utilizing features from pre-trained models. In addition, optimization techniques such as structured pruning and dynamic quantization are used to reduce unnecessary parameters and improve computational efficiency without significantly affecting performance. These methods help in lowering model size and speeding up inference, making deployment more feasible in resource-limited environments. The experimental results show noticeable differences in performance across the selected models. EfficientNet-B0 achieves the highest classification accuracy of 92.06%, while SqueezeNet provides faster inference due to its compact architecture and fewer parameters. MobileNetV2 offers a balanced trade-off between accuracy and latency, making it suitable for practical applications. Overall, the findings highlight the importance of selecting appropriate lightweight architectures along with effective optimization strategies when deploying deep learning models on edge devices. This work provides useful insights into balancing accuracy, model size, and inference speed, which are key factors in real-world edge computing scenarios.