WP Leader: Stichting Katholieke Universiteit (Radboud University)
This WP focuses on topic detection and sentiment analysis for indirect translation evaluation.
Topic identification for indirect translation evaluation
We will apply topic detection methods to automatically identify course-related topics in MOOC data, i.e. in presentations, lecture subtitles, and MOOC forum posts. The approach adopted will be based on statistical topic modelling methods that automatically detect the latent topic structure of texts in a larger collection (e.g. all texts relating to a particular MOOC course). The task will be show-cased initially for English, German, and Dutch, drawing on existing normalization and tokenization software for preprocessing. Topic annotations provided from the crowdsourcing process (crowdsourcing activity 3) will be used for supervising the topic identification process, as well as evaluating it. After the third crowdsourcing activity, annotated text will help extend the applicability to the remaining target languages. Topic identification may be used for indirect machine translation evaluation. Applying the methodology to a source text, and to the translated text produced by the translation engine, two topic sets will be generated, one for each language. The degree of matching between the two sets is an implicit translation evaluation metric. The analysis of the results will reveal missing or erroneous elements in the training data that can be automatically corrected for re-training the translation engines and improving thereby the translation output in the second translation stage.
Sentiment analysis for indirect translation evaluation
Sentiment analysis aims at automatically extracting the opinion (positive/neutral/negative) expressed in a text segment. TraMOOC will apply sentiment analysis to text (i) posted on MOOC fora by MOOC users and students and (ii) posted on other social media referring to MOOC courses. The text in category (i) will be provided by the MOOC consortium partner, while category (ii) text will be collected by crawling social network sites with specific keywords and phrases that relate to MOOC content. All necessary steps will be taken in order to ensure protection of privacy issues regarding the posts. After the online translation service prototype has been delivered (System prototype – v.2) by M28, MOOC users will be able to apply the service to their individual courses and comment on the service output via posting. Sentiment analysis on these posts will aim at extracting the opinion of MOOC users regarding the translated material by using opinion mining techniques, and by taking into account also direct polarity indicators, such aslike users’ (up)votes and (dis)like hits.