Dieses Ergebnis aus BASE kann Gästen nicht angezeigt werden. Login für vollen Zugriff.

Mix-GEMM: An efficient HW-SW architecture for mixed-precision quantized deep neural networks inference on edge devices

Title:	Mix-GEMM: An efficient HW-SW architecture for mixed-precision quantized deep neural networks inference on edge devices
Authors:	Reggiani, Enrico; Pappalardo, Alessandro; Doblas Font, Max; Moretó Planas, Miquel; Olivieri, Mauro; Unsal, Osman Sabri; Cristal Kestelman, Adrián; Barcelona Supercomputing Center
Contributors:	Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors; Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
Publisher Information:	Institute of Electrical and Electronics Engineers (IEEE)
Publication Year:	2023
Collection:	Universitat Politècnica de Catalunya, BarcelonaTech: UPCommons - Global access to UPC knowledge
Subject Terms:	Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors; Deep learning; Neural networks (Computer science); High performance computing -- Energy consumption; Performance evaluation; Training; Computer architecture; Energy efficiency; Computational efficiency; Aprenentatge profund; Xarxes neuronals (Informàtica); Càlcul intensiu (Informàtica) -- Consum d'energia
Description:	Deep Neural Network (DNN) inference based on quantized narrow-precision integer data represents a promising research direction toward efficient deep learning computations on edge and mobile devices. On one side, recent progress of Quantization-Aware Training (QAT) frameworks aimed at improving the accuracy of extremely quantized DNNs allows achieving results close to Floating-Point 32 (FP32), and provides high flexibility concerning the data sizes selection. Unfortunately, current Central Processing Unit (CPU) architectures and Instruction Set Architectures (ISAs) targeting resource-constrained devices present limitations on the range of data sizes supported to compute DNN kernels.This paper presents Mix-GEMM, a hardware-software co-designed architecture capable of efficiently computing quantized DNN convolutional kernels based on byte and sub-byte data sizes. Mix-GEMM accelerates General Matrix Multiplication (GEMM), representing the core kernel of DNNs, supporting all data size combinations from 8- to 2-bit, including mixed-precision computations, and featuring performance that scale with the decreasing of the computational data sizes. Our experimental evaluation, performed on representative quantized Convolutional Neural Networks (CNNs), shows that a RISC-V based edge System-on-Chip (SoC) integrating Mix-GEMM achieves up to 1.3 TOPS/W in energy efficiency, and up to 13.6 GOPS in throughput, gaining from 5.3× to 15.1× in performance over the OpenBLAS GEMM frameworks running on a commercial RISC-V based edge processor. By performing synthesis and Place and Route (PnR) of the enhanced SoC in Global Foundries 22nm FDX technology, we show that Mix-GEMM only accounts for 1% of the overall area consumption. ; This research was supported by the ERDF Operational Program of Catalonia 2014-2020, with a grant from the Spanish State Research Agency [PID2019-107255GB] and with DRAC project [001-P-001723], by the grant [PID2019-107255G-C21] funded by MCIN/AEI/ 10.13039/501100011033, by the Generalitat de Catalunya ...
Document Type:	conference object
File Description:	14 p.; application/pdf
Language:	English
Relation:	https://ieeexplore.ieee.org/document/10071076; info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2019-107255GB-C21/ES/BSC - COMPUTACION DE ALTAS PRESTACIONES VIII/; https://hdl.handle.net/2117/386754
DOI:	10.1109/HPCA56546.2023.10071076
Availability:	https://hdl.handle.net/2117/386754; https://doi.org/10.1109/HPCA56546.2023.10071076
Rights:	Open Access
Accession Number:	edsbas.BE98C38A
Database:	BASE