MMAR: Multilingual and multimodal anaphora resolution in instructional videos - CRISTAL-MAGNET
Communication Dans Un Congrès Année : 2024

MMAR: Multilingual and multimodal anaphora resolution in instructional videos

Résumé

Multilingual anaphora resolution identifies referring expressions and implicit arguments in texts and links to antecedents that cover several languages. In the most challenging setting, cross-lingual anaphora resolution, training data, and test data are in different languages. As knowledge needs to be transferred across languages, this task is challenging, both in the multilingual and cross-lingual setting. We hypothesize that one way to alleviate some of the difficulty of the task is to include multimodal information in the form of images (i.e. frames extracted from instructional videos). Such visual inputs are by nature language agnostic, therefore cross-and multilingual anaphora resolution should benefit from visual information. In this paper, we provide the first multilingual and multimodal dataset annotated with anaphoric relations and present experimental results for end-to-end multimodal and multilingual anaphora resolution. Given gold mentions, multimodal features improve anaphora resolution results by ∼10% for unseen languages.
Fichier principal
Vignette du fichier
oguz_EMNLP24.pdf (726.7 Ko) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04733760 , version 1 (13-10-2024)

Licence

Identifiants

  • HAL Id : hal-04733760 , version 1

Citer

Cennet Oguz, Pascal Denis, Simon Ostermann, Natalia Skachkova, Emmanuel Vincent, et al.. MMAR: Multilingual and multimodal anaphora resolution in instructional videos. Findings of the 2024 Conference on Empirical Methods in Natural Language Processing, Nov 2024, Miami, United States. ⟨hal-04733760⟩
66 Consultations
30 Téléchargements

Partager

More