Séminaire de Recherche en Linguistique

Ce séminaire reçoit des conférenciers invités spécialisés dans différents domaines de la linguistique. Les membres du Département, les étudiants et les personnes externes intéressées sont tous cordialement invités.

Description du séminaire Print

Titre Machine Translation of Verb Tenses
Conférencier Sharid Loaiciga
Date mardi 04 novembre 2014
Heure 12h15
Salle L208 (Bâtiment Candolle)
Description

We present a method for verb phrase (VP) alignment in an English/French parallel corpus and its use for improving statistical machine translation (SMT) of verb tenses. The method starts from automatic word alignment performed with GIZA++, and relies on a POS tagger and a parser, in combination with several heuristics, in order to identify non-contiguous components of VPs, and to label the aligned VPs with their tense and voice on each side. This procedure is applied to the Europarl corpus, leading to the creation of a smaller, high-precision parallel corpus with about 320,000 pairs of finite VPs, which is made publicly available. This resource is used to train a tense predictor for translation from English into French, based on a large number of surface features. Two MT systems are compared: (1) a baseline phrase-based SMT; and, (2) a tense-aware SMT system using the above predictions within a factored translation model. For several tenses, such as the French “imparfait”, the tense-aware SMT system improves significantly over the baseline.

   
Document(s) joint(s) -