Dieses Ergebnis aus BASE kann Gästen nicht angezeigt werden. Login für vollen Zugriff.

AlignGuard: scalable safety alignment for text-to-image generation

Title:	AlignGuard: scalable safety alignment for text-to-image generation
Authors:	Liu, R; Chen, IC; Gu, J; Zhang, J; Pi, R; Chen, Q; Torr, P; Khakzar, A; Pizzati, F
Publication Year:	2025
Collection:	Oxford University Research Archive (ORA)
Description:	Text-to-image (T2I) models are widespread, but their limited safety guardrails expose end users to harmful content and potentially allow for model misuse. Current safety measures are typically limited to text-based filtering or concept removal strategies, able to remove just a few concepts from the model’s generative capabilities. In this work, we introduce AlignGuard, a method for safety alignment of T2I models. We enable the application of Direct Preference Optimization (DPO) for safety purposes in T2I models by synthetically generating a dataset of harmful and safe imagetext pairs, which we call CoProV2. Using a custom DPO strategy and this dataset, we train safety experts, in the form of low-rank adaptation (LoRA) matrices, able to guide the generation process away from specific safety-related concepts. Then, we merge the experts into a single LoRA using a novel merging strategy for optimal scaling performance. This expert-based approach enables scalability, allowing us to remove 7× more harmful concepts from T2I models compared to baselines. AlignGuard consistently outperforms the state-of-the-art on many benchmarks and establishes new practices for safety alignment in T2I networks. We will release code and models
Document Type:	conference object
Language:	English
Availability:	https://ora.ox.ac.uk/objects/uuid:0550d283-1253-4b8a-bdd3-55fb8b1a3854
Rights:	info:eu-repo/semantics/openAccess ; CC Attribution (CC BY)
Accession Number:	edsbas.32DCB319
Database:	BASE