The project plans to create three MT prototypes, the second of which has been developed for 11 target languages: from English into German, Italian, Portuguese, Dutch, Bulgarian, Greek, Polish, Czech, Croatian, Russian, and Chinese. The current translation systems are based on phrase-based SMT and neural machine translation (NMT), with a view to move to NMT for the final prototype. Our NMT systems have achieved state-of-the-art performance in recent evaluation campaigns. These systems use the Nematus toolkit for training, and the translation server is based on the amuNMT toolkit. The translation systems have been adapted to MOOC texts via fine-tuning of the model parameters on in-domain training data to maximize translation quality on this domain.
The decision to focus on NMT was based on the results of a comparative evaluation of SMT and NMT for four language pairs using a variety of metrics. The results showed improved fluency and fewer annotated errors in the NMT output, although document-level post-editing performance was not found to have significantly improved when using NMT. A large-scale crowdsourced evaluation of the quality of our NMT systems is currently underway for all 11 TraMOOC language pairs, with human evaluation and error annotation (using a subset of the MQM taxonomy) performed by crowdworkers and professional translators. A final evaluation stage will follow, employing crowdsourcing for the annotation of entities, topics and terms in the source and target texts. This will generate a thesaurus of tag-sets that innovatively allows for the implicit evaluation of the machine-translated output through the comparison of the source and the target tag-sets.
In terms of disseminating the important research results of the project, during the last months TraMOOC was present at the 27th Meeting of Computational Linguistics in The Netherlands (CLIN 27), the Content Supply Chain & Workflow Management Forum, the 15th Conference of the European Association for Computational Linguistics (EACL 2017), the Fifth European MOOCs Stakeholders Summit (EMOOCs 2017 Conference), and the 20th Annual Conference of the European Association for Machine Translation (EAMT 2017) increasing considerably the visibility and interest about the project. This is also shown by the increase in the project’s public website traffic.
In a nutshell, important technical work has been performed during this period and most of this period’s objectives have been achieved. More information about the project's progress and latest results is available on the project's progress timeline. You may also follow us on our social media groups and pages (Facebook, Twitter, LinkedIn, YouTube).