Dieses Ergebnis aus BASE kann Gästen nicht angezeigt werden. Login für vollen Zugriff.

Machines Against Rage: Generating High-quality Counterspeech with Language Models

Title:	Machines Against Rage: Generating High-quality Counterspeech with Language Models
Authors:	Bonaldi, Helena
Contributors:	Bonaldi, Helena; Guerini, Marco
Publisher Information:	TRENTO; Università degli studi di Trento
Publication Year:	2025
Collection:	Università degli Studi di Trento: CINECA IRIS
Description:	Online hate speech is typically tackled via blocking or deletion measures. However, these actions have limited effectiveness, and they often raise questions about the protection of users' freedom of speech. In this context, counterspeech has emerged as a promising alternative strategy as it fights online hate by providing positive and de-escalatory responses. The potential effectiveness of counterspeech has motivated an increasing interest in studying ways to partially automatise its production: the goal of this work is to investigate the extent to which Natural Language Generation can be employed to pursue this task. Specifically, we will focus on how counterspeech can be automatically produced by Language Models, which are currently the most powerful tool available for text generation. In particular, we first focus on how to effectively collect counterspeech data by combining human expertise and machine generation to obtain single and multi-turn counterspeech interactions. Secondly, we fine-tune various language models on the collected data and compare their performance in generating counterspeech using different decoding mechanisms. This allows us to identify one of the major weaknesses of language models in this task: the tendency to produce vague generations that can technically work with any input but lack specificity in their content. We address this problem in two ways. First, we intervene at training time and propose two attention-based regularisation techniques to prevent lexical overfitting. Then, we test whether there are other intervening factors outside training impacting the quality of the generation. In particular, we investigate whether safety guardrails weaken a model's argumentative strength, and we test different argumentative strategies to refute hate and compare their cogency. We conclude by discussing open challenges of counterspeech research in NLP.
Document Type:	doctoral or postdoctoral thesis
Language:	English
Relation:	firstpage:1; lastpage:175; numberofpages:175; https://hdl.handle.net/11572/458930; http://dx.doi.org/10.15168/11572_458930
DOI:	10.15168/11572_458930
Availability:	https://hdl.handle.net/11572/458930; https://doi.org/10.15168/11572_458930
Rights:	info:eu-repo/semantics/openAccess ; license:Creative commons ; license uri:http://creativecommons.org/licenses/by/4.0/
Accession Number:	edsbas.A316B9DA
Database:	BASE