| Title: |
OmniGen2: Exploration to Advanced Multimodal Generation |
| Authors: |
Wu, Chenyuan; Zheng, Pengfei; Yan, Ruiran; Xiao, Shitao; Luo, Xin; Wang, Yueze; Li, Wanli; Jiang, Xiyan; Liu, Yexin; Zhou, Junjie; Liu, Ze; Xia, Ziyi; Li, Chaofan; Deng, Haoge; Wang, Jiahao; Luo, Kun; Zhang, Bo; Lian, Defu; Wang, Xinlong; Wang, Zhongyuan; Huang, Tiejun; Liu, Zheng |
| Publication Year: |
2025 |
| Collection: |
ArXiv.org (Cornell University Library) |
| Subject Terms: |
Computer Vision and Pattern Recognition; Artificial Intelligence; Computation and Language |
| Description: |
In this work, we introduce OmniGen2, a versatile and open-source generative model designed to provide a unified solution for diverse generation tasks, including text-to-image, image editing, and in-context generation. Unlike OmniGen v1, OmniGen2 features two distinct decoding pathways for text and image modalities, utilizing unshared parameters and a decoupled image tokenizer. This design enables OmniGen2 to build upon existing multimodal understanding models without the need to re-adapt VAE inputs, thereby preserving the original text generation capabilities. To facilitate the training of OmniGen2, we developed comprehensive data construction pipelines, encompassing image editing and in-context generation data. Additionally, we introduce a reflection mechanism tailored for image generation tasks and curate a dedicated reflection dataset based on OmniGen2. Despite its relatively modest parameter size, OmniGen2 achieves competitive results on multiple task benchmarks, including text-to-image and image editing. To further evaluate in-context generation, also referred to as subject-driven tasks, we introduce a new benchmark named OmniContext. OmniGen2 achieves state-of-the-art performance among open-source models in terms of consistency. We will release our models, training code, datasets, and data construction pipeline to support future research in this field. Project Page: https://vectorspacelab.github.io/OmniGen2; GitHub Link: https://github.com/VectorSpaceLab/OmniGen2 |
| Document Type: |
text |
| Language: |
unknown |
| Relation: |
http://arxiv.org/abs/2506.18871 |
| Availability: |
http://arxiv.org/abs/2506.18871 |
| Accession Number: |
edsbas.30828B39 |
| Database: |
BASE |