Katalog Plus
Bibliothek der Frankfurt UAS
Bald neuer Katalog: sichern Sie sich schon vorab Ihre persönlichen Merklisten im Nutzerkonto: Anleitung.
Dieses Ergebnis aus BASE kann Gästen nicht angezeigt werden.  Login für vollen Zugriff.

Optimal Word Segmentation for Neural Machine Translation into Dravidian Languages

Title: Optimal Word Segmentation for Neural Machine Translation into Dravidian Languages
Authors: Dhar, Prajit; Bisazza, Arianna; van Noord, Gertjan
Contributors: Nakazawa, Toshiaki; Nakayama, Hideki; Goto, Isao; Mino, Hideya; Ding, Chenchen; Dabre, Raj; Kunchukuttan, Anoop; Higashiyama, Shohei; Manabe, Hiroshi; Pa Pa, Win; Parida, Shantipriya; Bojar, Ondřej; Chu, Chenhui; Eriguchi, Akiko; Abe, Kaori; Oda, Yusuke; Sudoh, Katsuhito; Kurohashi, Sadao; Bhattacharyya, Pushpak
Source: Dhar, P, Bisazza, A & van Noord, G 2021, Optimal Word Segmentation for Neural Machine Translation into Dravidian Languages. in T Nakazawa, H Nakayama, I Goto, H Mino, C Ding, R Dabre, A Kunchukuttan, S Higashiyama, H Manabe, W Pa Pa, S Parida, O Bojar, C Chu, A Eriguchi, K Abe, Y Oda, K Sudoh, S Kurohashi & P Bhattacharyya (eds), Proceedings of the 8th Workshop on Asian Translation (WAT2021). Association for Computational Linguistics (ACL), pp. 181-190.
Publisher Information: Association for Computational Linguistics (ACL)
Publication Year: 2021
Collection: University of Groningen research database
Description: Dravidian languages, such as Kannada and Tamil, are notoriously difficult to translate by state-of-the-art neural models. This stems from the fact that these languages are morphologically very rich as well as being low-resourced. In this paper, we focus on subword segmentation and evaluate Linguistically Motivated Vocabulary Reduction (LMVR) against the more commonly used SentencePiece (SP) for the task of translating from English into four different Dravidian languages. Additionally we investigate the optimal subword vocabulary size for each language. We find that SP is the overall best choice for segmentation, and that larger dictionary sizes lead to higher translation quality.
Document Type: conference object
File Description: application/pdf
Language: English
Relation: info:eu-repo/semantics/altIdentifier/hdl/https://hdl.handle.net/11370/0d4b6254-b565-4673-a48e-00a98e4adb54
Availability: https://hdl.handle.net/11370/0d4b6254-b565-4673-a48e-00a98e4adb54; https://research.rug.nl/en/publications/0d4b6254-b565-4673-a48e-00a98e4adb54; https://pure.rug.nl/ws/files/190347402/2021.wat_1.21.pdf
Rights: info:eu-repo/semantics/openAccess ; http://creativecommons.org/licenses/by/4.0/
Accession Number: edsbas.59D72B3E
Database: BASE