| Title: |
AlignGuard: scalable safety alignment for text-to-image generation |
| Authors: |
Liu, R; Chen, IC; Gu, J; Zhang, J; Pi, R; Chen, Q; Torr, P; Khakzar, A; Pizzati, F |
| Publication Year: |
2025 |
| Collection: |
Oxford University Research Archive (ORA) |
| Description: |
Text-to-image (T2I) models are widespread, but their limited safety guardrails expose end users to harmful content and potentially allow for model misuse. Current safety measures are typically limited to text-based filtering or concept removal strategies, able to remove just a few concepts from the model’s generative capabilities. In this work, we introduce AlignGuard, a method for safety alignment of T2I models. We enable the application of Direct Preference Optimization (DPO) for safety purposes in T2I models by synthetically generating a dataset of harmful and safe imagetext pairs, which we call CoProV2. Using a custom DPO strategy and this dataset, we train safety experts, in the form of low-rank adaptation (LoRA) matrices, able to guide the generation process away from specific safety-related concepts. Then, we merge the experts into a single LoRA using a novel merging strategy for optimal scaling performance. This expert-based approach enables scalability, allowing us to remove 7× more harmful concepts from T2I models compared to baselines. AlignGuard consistently outperforms the state-of-the-art on many benchmarks and establishes new practices for safety alignment in T2I networks. We will release code and models |
| Document Type: |
conference object |
| Language: |
English |
| Availability: |
https://ora.ox.ac.uk/objects/uuid:0550d283-1253-4b8a-bdd3-55fb8b1a3854 |
| Rights: |
info:eu-repo/semantics/openAccess ; CC Attribution (CC BY) |
| Accession Number: |
edsbas.32DCB319 |
| Database: |
BASE |