WP Leader: Dublin City University
The aim of this work package is to perform and report detailed human and automatic evaluation of the first and the second stage translation output. This WP is structured into 3 tasks which will provide input into the process of retraining and retuning the translation engines.
WP5 is divided to 3 Tasks. More specifically,
Task 5.1: Automatic evaluation (DCU) (M3-M30)
Diagnostic evaluation (based on BLEU, TER, etc.) will be performed focused on specific linguistic phenomena and error types, and taxonomies of errors will be automatically developed. Comparative analysis of the results will be performed and reported upon across translation models, across language models, across languages and across text types.
Task 5.2: Human evaluation (DCU, UEDIN, UBER, IURC, Deluxe Media Europe Ltd, Universiteit van Tilburg) (M16-M17)
Human evaluation of the translation output will be carried out through crowdsourcing. To ensure a thorough evaluation, the evaluations will include input from translation and foreign language students, end users (ie. MOOC customers) and domain experts. An error analysis report will be conducted and comparative evaluation across translation models, across language models, across languages and across text types will be performed.
Task 5.3: Analysis of results (DCU, UEDIN, UBER, Deluxe Media Europe Ltd, IURC) (M3- M36)
The quantitative and qualitative evaluation results of both types of evaluation will be analysed thoroughly and used to form the feedback vector for the second translation stage. The same analysis will be performed after the second stage translation as well for fine-tuning the final system.