Katalog Plus
Bibliothek der Frankfurt UAS
Bald neuer Katalog: sichern Sie sich schon vorab Ihre persönlichen Merklisten im Nutzerkonto: Anleitung.
Dieses Ergebnis aus BASE kann Gästen nicht angezeigt werden.  Login für vollen Zugriff.

OmniGen2: Exploration to Advanced Multimodal Generation

Title: OmniGen2: Exploration to Advanced Multimodal Generation
Authors: Wu, Chenyuan; Zheng, Pengfei; Yan, Ruiran; Xiao, Shitao; Luo, Xin; Wang, Yueze; Li, Wanli; Jiang, Xiyan; Liu, Yexin; Zhou, Junjie; Liu, Ze; Xia, Ziyi; Li, Chaofan; Deng, Haoge; Wang, Jiahao; Luo, Kun; Zhang, Bo; Lian, Defu; Wang, Xinlong; Wang, Zhongyuan; Huang, Tiejun; Liu, Zheng
Publication Year: 2025
Collection: ArXiv.org (Cornell University Library)
Subject Terms: Computer Vision and Pattern Recognition; Artificial Intelligence; Computation and Language
Description: In this work, we introduce OmniGen2, a versatile and open-source generative model designed to provide a unified solution for diverse generation tasks, including text-to-image, image editing, and in-context generation. Unlike OmniGen v1, OmniGen2 features two distinct decoding pathways for text and image modalities, utilizing unshared parameters and a decoupled image tokenizer. This design enables OmniGen2 to build upon existing multimodal understanding models without the need to re-adapt VAE inputs, thereby preserving the original text generation capabilities. To facilitate the training of OmniGen2, we developed comprehensive data construction pipelines, encompassing image editing and in-context generation data. Additionally, we introduce a reflection mechanism tailored for image generation tasks and curate a dedicated reflection dataset based on OmniGen2. Despite its relatively modest parameter size, OmniGen2 achieves competitive results on multiple task benchmarks, including text-to-image and image editing. To further evaluate in-context generation, also referred to as subject-driven tasks, we introduce a new benchmark named OmniContext. OmniGen2 achieves state-of-the-art performance among open-source models in terms of consistency. We will release our models, training code, datasets, and data construction pipeline to support future research in this field. Project Page: https://vectorspacelab.github.io/OmniGen2; GitHub Link: https://github.com/VectorSpaceLab/OmniGen2
Document Type: text
Language: unknown
Relation: http://arxiv.org/abs/2506.18871
Availability: http://arxiv.org/abs/2506.18871
Accession Number: edsbas.30828B39
Database: BASE