Katalog Plus
Bibliothek der Frankfurt UAS
Bald neuer Katalog: sichern Sie sich schon vorab Ihre persönlichen Merklisten im Nutzerkonto: Anleitung.
Dieses Ergebnis aus arXiv kann Gästen nicht angezeigt werden.  Login für vollen Zugriff.

Multimodal Conversation Structure Understanding

Title: Multimodal Conversation Structure Understanding
Authors: Chang, Kent K.; Cramer, Mackenzie Hanh; Ho, Anna; Nguyen, Ti Ti; Yuan, Yilin; Bamman, David
Publication Year: 2025
Subject Terms: Computation and Language
Description: While multimodal large language models (LLMs) excel at dialogue, whether they can adequately parse the structure of conversation -- conversational roles and threading -- remains underexplored. In this work, we introduce a suite of tasks and release TV-MMPC, a new annotated dataset, for multimodal conversation structure understanding. Our evaluation reveals that while all multimodal LLMs outperform our heuristic baseline, even the best-performing model we consider experiences a substantial drop in performance when character identities of the conversation are anonymized. Beyond evaluation, we carry out a sociolinguistic analysis of 350,842 utterances in TVQA. We find that while female characters initiate conversations at rates in proportion to their speaking time, they are 1.2 times more likely than men to be cast as an addressee or side-participant, and the presence of side-participants shifts the conversational register from personal to social.; accepted to EACL 2026 main conference; 22 pages, 9 figures, 10 tables
Document Type: Working Paper
Access URL: http://arxiv.org/abs/2505.17536
Accession Number: edsarx.2505.17536
Database: arXiv