Katalog Plus
Bibliothek der Frankfurt UAS
Bald neuer Katalog: sichern Sie sich schon vorab Ihre persönlichen Merklisten im Nutzerkonto: Anleitung.
Dieses Ergebnis aus BASE kann Gästen nicht angezeigt werden.  Login für vollen Zugriff.

Detecting Synthetic Lyrics with Few-Shot Inference

Title: Detecting Synthetic Lyrics with Few-Shot Inference
Authors: Labrak, Yanis; V. Epure, Elena; Meseguer-Brocal, Gabriel
Contributors: Avignon Université (AU); Laboratoire Informatique d'Avignon (LIA); Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI; Zenidoc; Deezer Research
Source: https://hal.science/hal-04621180 ; 2024.
Publisher Information: CCSD
Publication Year: 2024
Collection: Université d'Avignon et des Pays de Vaucluse: HAL
Subject Terms: [INFO]Computer Science [cs]; [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]; [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing
Description: In recent years, generated content in music has gained significant popularity, with large language models being effectively utilized to produce human-like lyrics in various styles, themes, and linguistic structures. This technological advancement supports artists in their creative processes but also raises issues of authorship infringement, consumer satisfaction and content spamming. To address these challenges, methods for detecting generated lyrics are necessary. However, existing works have not yet focused on this specific modality or on creative text in general regarding machine-generated content detection methods and datasets. In response, we have curated the first dataset of high-quality synthetic lyrics and conducted a comprehensive quantitative evaluation of various few-shot content detection approaches, testing their generalization capabilities and complementing this with a human evaluation. Our best few-shot detector, based on LLM2Vec, surpasses stylistic and statistical methods, which are shown competitive in other domains at distinguishing human-written from machine-generated content. It also shows good generalization capabilities to new artists and models, and effectively detects post-generation paraphrasing. This study emphasizes the need for further research on creative content detection, particularly in terms of generalization and scalability with larger song catalogs. All datasets, pre-processing scripts, and code are available publicly on GitHub and Hugging Face under the Apache 2.0 license.
Document Type: report
Language: English
Availability: https://hal.science/hal-04621180; https://hal.science/hal-04621180v1/document; https://hal.science/hal-04621180v1/file/_EMNLP_2024__Few_shot_AI_Generated_Lyrics_Detection-8.pdf
Rights: http://creativecommons.org/licenses/by-nc-nd/ ; info:eu-repo/semantics/OpenAccess
Accession Number: edsbas.59B3A80E
Database: BASE