Dieses Ergebnis aus BASE kann Gästen nicht angezeigt werden. Login für vollen Zugriff.

SPARSE AUTOENCODERS MAKE AUDIO FOUNDATION MODELS MORE EXPLAINABLE

Title:	SPARSE AUTOENCODERS MAKE AUDIO FOUNDATION MODELS MORE EXPLAINABLE
Authors:	Mariotte, Théo; Lebourdais, Martin; Almudévar, Antonio; Tahon, Marie; Ortega, Alfonso; Dugué, Nicolas
Contributors:	Laboratoire d'Informatique de l'Université du Mans (LIUM); Le Mans Université (UM); Equipe Language and Speech Technology (LST); Le Mans Université (UM)-Le Mans Université (UM); ViVoLab, Aragon Institute for Engineering Research (I3A); Universidad de Zaragoza = University of Zaragoza Saragossa University = Université de Saragosse; This work was funded by PULSAR regional grant 182822; This work was performed using HPC resources from GENCI–IDRIS (Grant 2025-AD011016588)
Source:	International Conference on Acoustics, Speech, and Signal Processing ; https://hal.science/hal-05520654 ; International Conference on Acoustics, Speech, and Signal Processing, May 2026, Barcelone (Espagne), Spain
Publisher Information:	CCSD
Publication Year:	2026
Collection:	Le Mans Université: Archives Ouvertes (HAL)
Subject Terms:	audio classification; explainable AI; pretrained models; Sparse Autoencoders; [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]; [INFO.INFO-SD]Computer Science [cs]/Sound [cs.SD]; [INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing
Subject Geographic:	Spain
Time:	Barcelone (Espagne), Spain
Description:	International audience ; Audio pretrained models are widely employed to solve various tasks in speech processing, sound event detection, or music information retrieval. However, the representations learned by these models are unclear, and their analysis mainly restricts to linear probing of the hidden representations. In this work, we explore the use of Sparse Autoencoders (SAEs) to analyze the hidden representations of pretrained models, focusing on a case study in singing technique classification. We first demonstrate that SAEs retain both information about the original representations and class labels, enabling their internal structure to provide insights into self-supervised learning systems. Furthermore, we show that SAEs enhance the disentanglement of vocal attributes, establishing them as an effective tool for identifying the underlying factors encoded in the representations.
Document Type:	conference object
Language:	English
Availability:	https://hal.science/hal-05520654; https://hal.science/hal-05520654v1/document; https://hal.science/hal-05520654v1/file/explainability_for_audio_ICASSP26-11.pdf
Rights:	https://creativecommons.org/licenses/by-nc-nd/4.0/ ; info:eu-repo/semantics/OpenAccess
Accession Number:	edsbas.F62C6CC0
Database:	BASE