

Novel Edge AI with Power-Efficient Re-configurable LP-MAC Processing Elements
Deep learning has become increasingly important in various fields, such as robotics, image processing, and speech recognition. However, the high computational requirements of deep learning models make it challenging to deploy them on edge & embedded devices with constrained power and area budgets. This paper proposes a novel low-power technique for implementing deep learning models on edge devices called LP-MAC (Low Power Multiply Accumulate). LP-MAC is designed for fixed-point format operations and takes advantage of reusing the input vector for MAC operations. It provides a new hardware design for the MAC unit, relying solely on adders, shifters, and multiplexers instead of multipliers, which consume more power. This makes LP-MAC more power-efficient with over 30% power reduction compared to conventional MAC. The LP-MAC technique offers efficient dynamically precision control of MAC operations, making it more power-efficient and real-time due to its low latency. These features make LP-MAC an ideal choice for implementing deep learning models on edge devices with low power budgets, particularly for Convolutional Neural Networks (CNNs), Fully Connected networks, and Transformer attention networks. The paper provides hardware implementations of such networks in addition to a methodology for upgrading any existing network to use LP-MAC instead of conventional MAC that uses multiplications. Additionally, an AHB-SLAVE co-processor of LP-MAC array has been designed for use in embedded devices. © 2023 IEEE.