Asma Alamri

University of Birmingham

Google Translate and Children's Literature.
Can Google Translate do it alone, without Post-Editing?

Since the 1980s, machine translation (MT) technology has shown significant improvements and has been introduced as another type of translation aid, rather than aiming to produce a fully automatic high-quality translation. Since raw machine translation output cannot always meet the needs of the end users, it has been combined with post-editing. The question here is whether machine translation combined with post-editing is good enough to rely on for the translation of children’s stories? Arnold points out that “all we want from the computer is some kind of draft quality translation: something which is more or less faithful to the original, understandable in its own right, and which is a reasonable starting point for a polished translation” (2003:113). However, MT output cannot be reliable on its own as it varies significantly “depending on the system and language pair and still far from human translation (HT) standards” (Carl et al, 2015). So, machine translation post-editing can be considered as a preferential step to improve MT output. It has been argued that using MT tools helps to foster the process of translation without reducing its quality if it is combined with post-editing, and this is what this paper aims to prove.

This paper investigates the claim that raw machine translation output combined with post-editing can be as good as human standard translation, taking into consideration the benefits of using machine translation: saving time and ensuring translation quality. This investigation will be done on a children’s story text of 509 words for the language pair English→ Arabic. The story will be translated using Google Translate and edited manually. To evaluate the quality of the post-edited text, I will translate the story from scratch and compare both texts.

At first sight, children’s stories appear to be simple and direct with simplified vocabulary and grammar, but in fact to translate them with faithfulness can be quite a challenge (Van Doorslaer, 2012:23). I chose to translate this text because children’s literature has special requirements in translation. Beside the semantic and lexical challenges that face machine translation these texts have more to examine. They demand cultural and ideological adjustment. Also, the difference in the tone and the stylistic and aesthetic aspects of the stories’ presentation make translating them challenging to enable them to be read like an original,non-translated text, which is another challenge.

For translating my text, I chose Google Translate which uses a neural machine translation NMT engine and is available for free. Google Translate translates whole sentences and looks at the broader context to guess the most relevant translation. It does this by going through the pre-translated database it has. It is widely used as it was shown in the latest statistics released in 2018: Google Translate boasts more than 500m daily users worldwide and offers 103 languages (Sommerlad, 2018).

In Arabic, children’s stories vocabularies should be simplified, so they match young children’s language levels. Machine translation engines usually provide translation in classic Arabic, which needs to be replaced with a simplified language that is adapted to the target audience. In addition, in children’s stories rhyme is essential, as children’s stories target young children to instil values; they are also read aloud, so they have rhyme which makes them lovely to hear. Also, children’s literature translation is not easy because it is not intended to be presented to children exclusively. It is also targeted at parents, teachers and librarians. This “dual readership” makes it more demanding when it comes to translation and reproduction (Alvstad 2010). Given the above reasons, it is clear that children’s stories require high quality translation.

By the end of this investigation, a new role of translators can be seen. Machine translation has transformed the process of translation and upgraded the role of translators to be post-editors, which was realised by the National Degree Committee in China that offered Master programs and training courses for translators “as demanded by the market” to fill this gap in the increasing need for the language services industry (Jia et al, 2019). In the meantime, the role of machine translation cannot be denied or ignored. NMT’s characteristic improvements in grammar and coherence help translators to spend less time editing the text and polishing it rather than spending longer time in translating it from scratch.

Asma Alamri is a freelance translator with a Master’s degree in Translation Studies obtained from University of Birmingham (UK). She combins the degree of higher Diploma in General Education with the Master’s degree in translation studies to translate children’s literature in the language pair English-Arabic. This article was written under the supervision of Dr. Gabriella Saldanha.

  • Bachelor’s Degree of English language (BSc.) 2013.
    King Khalid University, Saudi Arabia.
  • Completion of American Explorer Program 2016.
    ELS Language Schools, Boston, USA.
  • Higher Diploma in General Education 2017.
    Bisha University, Saudi Arabia.
  • MA in Translation Studies 2020.
    University of Birmingham, United Kingdom.

Emmanuelle Esperança-Rodier

Université de Grenoble-Alpes

What's on an Annotator's Mind? Analysis of Error Typologies to Highlight Machine Translation Quality Assessment Issues

In an era of Artificial Intelligence, billions of words are needed to train, fine-tune and test models (Linzen, 2020). Quality assessment of the models therefore requires collections of annotated corpora, most of them having to be created. The quality of those newly created annotated corpora is now questioned in several studies.

Our work addresses this issue. By analysing the way one single annotator behaves while annotating the same documents with two different typologies, we aim to show that the choice of a typology impacts the results.

To focus on quality rather than quantity, we selected a small corpus of 74 segments consisting of 1582 source words and 2151 target words.

The corpus consists firstly, in the first 9 segments, of an English patent document, each segment being translated using both the WIPO neural and statistical Machine Translation (MT) systems, resulting in 18 translation hypotheses of the source segments into French. Secondly, in the first 28 segments, it consists of an environmental regulation-like document translated using the European Commission MT@EC translation system for the statistical system and the European Commission eTranslation system for the neural system. From the 28 English source segments, we obtained 56 French translation hypotheses.

We then asked one annotator to annotate the corpus using two different typologies, namely Vilar’s typology (Vilar et al., 2006) and DQF-MQM (Lommel et al., 2018).

The annotator is a native English speaker, C2 in French, who at the time was an undergraduate student at the University of Exeter studying Literature and Modern Languages.  

The annotator used the ACCOLÉ platform (Esperança-Rodier et al., 2019) which provides a collaborative environment to annotate parallel corpora according to several typologies. We got respectively 137 error annotations using Vilar’s typology and 122 using the DQF-MQM typology.

If we briefly analyse the data, in Table 1 and Table 2 below, we can see that we have a 16% increase in the total number of annotations when using Vilar’s typology.



WIPO Translate N

Brad Pitt


source words305target words11annotated errors
WIPO Translate SGeorge Clooney
9segments272source words302target words25annotated errors
Canada eTranslationJude Law
28segments519source words768target words35annotated errors
Canada MT@ECRobert
28segments519source words776target words51annotated errors
Totaux74segments1582source words2151target words122annotated errors

Table 1: Number of annotations per document using the DQF-MQM annotation typology

This increase notably comes from the patent document while it is less marked for the environmental document.

WIPO Translate NHawai
9segments272source words305target words13annotated errors
WIPO Translate SMadagascar
9segments272source words302target words29annotated errors
Canada eTranslationNouvelle Calédonie
28segments519source words768target words36annotated errors
Canada MT@ECNew Zealand
28segments519source words776target words59annotated errors
Totaux74segments1582source words2151target words137annotated errors

Table 2: Number of annotations per document using Vilar’s annotation typology

Comparing the neural systems to the statistical system, it appears that there are more errors annotated for the statistical systems with a very noticeable gap for the patent document. This trend is shared by both typologies.

In order to find an explanation for those metrics, we studied how the annotator has annotated the data using the two different typologies.

We have found several discrepancies. To sum up, a recurring discrepancy is the difference in the number of annotations depending on the way the typology has been developed. For the example below, the source “this” being translated by “la présente” has been annotated using the error type “Accuracy>Mistranslation” from the DQF-MQM typology, counting for one single error, while when using Vilar’s typology, the annotator has divided it into two errors: “la” as an “incorrect words>Sense>Wrong lexical choice” error, and “présente” as an “incorrect words>Extra Words” error. Consequently, for that example, we have got one more error with Vilar’s typology than with the DQF-MQM one.

Example 1:

GB: Our sustainable prosperity will depend on this.

FR: Notre prospérité durable dépendra de la présente.

{#1;Source: this; Target: la présente; Error_Type_DQF-MQM: Accuracy>Mistranslation}

{ #1;Source: this; Target: la; Error_Type_Vilar: Incorrect Words>Sense>Wrong lexical choice }

{ #2;Source: ; Target: présente; Error_Type_Vilar: Incorrect Words>Extra Words }

Another recurring discrepancy is that Vilar’s typology offers the error type “Style” which has been used by our annotator who has not found any equivalence in the DQF-MQM typology.

The results of this analysis demonstrate how the choice of a typology can have different implications. It also opens the way to further studies on the impact of evaluation criteria on quality assessment.

Emmanuelle Esperança-Rodier is a lecturer at Université Grenoble Alpes (UGA), France, Laboratoire d’Informatique de Grenoble (LIG), where she teaches English for Specific Purpose.

After defending a PhD in computational linguistics, titled “Création d’un Diagnostiqueur Générique de Langues Contrôlées, avec application particulière à l’Anglais Simplifié”, she worked as a post-editor in a translation agency. Back at University, she participated in IWSLT and WMT evaluation campaigns, as well as in several LIG projects.

She now works on the evaluation of MT systems based on competences and focused on tasks, translation error analysis and multilinguism.

Anna Iankovskaia

University of Wolverhampton

The Sources of Text Complexity for NMT

Recent years have seen the growing popularity of NMT for both the translation industry and the general audience of Internet users. Despite this trend and the constant upgrade of NMT architecture, its output is still prone to errors and requires a certain amount of post-editing. A program able to estimate the complexity of the source text for NMT could give a preliminary idea to translators and translation company managers as to what expectations they could have with regard to the quality of the MT output and the required level of post-editing.

The present research aims at achieving two principal goals: (1) to analyse the sources of lexical, syntactic, and conceptual text complexity, and (2) to build an algorithm able to assess text complexity for NMT purposes by detecting certain complexity-related patterns.

The lexical and syntactic analysis is performed in the course of a manual comparison of NMT outputs from two freely available NMT engines – DeepL and ModernMT, with their reference translation. This procedure is supposed to shed light on the lexical phenomena (e.g. particular types of multi-word expressions) and syntactic structures most likely to be mistranslated and result in adequacy and fluency issues.

The analysis of conceptual complexity partially replicates the research by Štajner and Hulpuş (2018) with respect to the complexity metrics the authors apply to measure text complexity. However, if the above-mentioned paper considers conceptual complexity for human readers, the present project investigates whether the same correlation between the metrics and the level of text complexity exists for NMT. Another difference is the knowledge graph used, as the present project exploits WordNet (Fellbaum, 1998, ed.) whereas the original research utilises DBpedia. The metrics of conceptual complexity are (1) the node degree of the concepts encountered in a text – more general concepts have this indicator higher and are easier for human understanding, and (2) the length of the shortest path – the shorter the average path between concepts in a text, the easier it is to comprehend by humans (Štajner and Hulpuş, 2018). The present research intends to clarify whether this correlation is true for NMT and if any correlation between the quality of NMT and semantic complexity can be observed at all.

The second part of the project is an algorithm programmed to identify a range of patterns harvested at the previous stage. The heart of its architecture is BERT (Devlin et al., 2019), a pre-trained transformer selected for three reasons. First, it has proved to be state-of-the-art for a range of NLP tasks. Second, BERT is able to discriminate between meanings which is important for the project as the preliminary analysis has shown that figurative components in the original almost always lead to NMT errors. Therefore the ability of BERT to ‘understand’ where semantically distant concepts are combined for stylistic creativity might be potentially useful for the present work. The third reason is the reduced amount of data – up to 1,000 examples of each pattern – required for the model fine-tuning as BERT is pre-trained.

At this stage, the final architecture of the program is not yet ready. Either a purely neural or a hybrid approach (the merge of neural and rule-based) is possible depending on the sources of complexity and the most efficient and simplest implementation of their identification. The program is also supposed to score texts for complexity differentiating between adequacy and fluency errors as the former can significantly mislead the users (especially if they do not understand the source language) and are considered to be more harmful.

Model performance will be evaluated in several steps. First, text will be translated with the NMT system and annotated by a human expert for errors. Then, the same source text will be processed with the program that will output it as a sequence labelled for potential mistakes. Finally, the F1-score will be calculated for predicted and real MT errors. Another F1-score will be obtained for assessing the capacity of the algorithm to detect the required set of patterns.

One of the shortcomings of the project is related to its applicability limited by the language pair – English/Russian, and domain – news commentary. The dataset used in the research is the News Commentary parallel corpus (Tiedemann, 2012). The corpus includes the original text and its reference translation. Twenty such pairs are taken for the analysis of NMT errors.

A subsidiary result which is to be reported with the conclusions of the main analysis is the comparison of the performance of the two popular NMT engines.

Anna Iankovskaia is a translator with first degree in Translation and Translation Studies obtained from Smolensk State University (Russia) and a current student of European Master’s Technology for Translation and Interpreting, a joint master programme at the University of Wolverhampton (United Kingdom) and University of Málaga (Spain). The project submitted for Translating and the Computer-42 is the author’s master thesis due in 2021. The original idea of the project was proposed by Prof Dr Ruslan Mitkov and it is written under his supervision as well as under the supervision of Dr Cristina Toledo Báez.

Marina Tonkpeeva

University of Wolverhampton

Investigating Interpreting and Translation Strategies: A Corpus-Based Approach

The present study aims at identifying strategies used by interpreters and translators when processing verbal nouns (VNs) and deverbal nouns (DVNs) in the course of English-Russian simultaneous interpreting and translation. To assist the study, it was necessary to compile a corpus covering source speeches, their interpretations and translations in the English-Russian language pair. While interpreting corpora are still a rare commodity, such resources are already emerging with the most prominent examples being corpora based on European Parliament (EP) speeches: EPIC, EPTIC, and EPICG (EPIC-Ghent). EPIC is a parallel corpus of EP source speeches in Italian, English and Spanish and their simultaneous interpretations in all the possible combinations. EPTIC is an intermodal parallel corpus comprised of the EP-delivered speeches, their interpretations and translations. At the moment EPTIC covers English, French, Italian, and Slovene. For the Russian language, there is also SIREN, a parallel aligned bidirectional corpus of original and simultaneously interpreted political speeches in Russian and English. SIREN is not publicly available.

In order to conduct this study, the compilation of a parallel intermodal corpus using the EPTIC corpus as a model to follow is one of the major objectives of this project. To this end, we retrieved source speeches from the United Nations web television portal. The oral data for the present corpus were transcribed with the use of speech recognition software (YouTube automatic captioning and iOS speech recognition tool) and then manually post-edited. The transcripts of original speeches and their translations were accessed from open sources. A parallel intermodal corpus combining speeches delivered in English, their corresponding simultaneous interpretations into Russian as well as further published Russian translations were used as the material for the study. The corpus of political discourse currently consists of approximately 32,000 words (3 hours 36 min). Transcripts include non-verbalised noises (for instance, [laughter]) and pauses. Punctuation marks are used occasionally where possible. Numerals are shown as figures. POS-tagging and lemmatisation are fully automated. The corpus is offered as part of the SketchEngine platform.

We performed the interpreting and translation strategies by comparing the datasets. Strategies used in interpreting and translation usually fall into many different classifications and sometimes are referred to as ‘transformations’. In the present research, we focused on the linguistic properties of communicating VNs and DVNs in different modes of translation used by interpreters and translators to overcome grammatical differences between the two languages. The analysis shows that in order to successfully manage VNs and DVNs both interpreters (73%) and translators (63%) mostly employ lexicosemantic transformations. However, if we look at the more granular classification, it becomes clear that interpreters mostly rely on compression strategies and omissions (53%). Translators mostly use decompressing techniques (35%): In the present research, we also identified mode-specific transformations applicable to simultaneous interpreting only.

Intermodal corpora prove to be convenient and promising resources for investigating the processes of interpreting and translation. Nevertheless, the complexity of creating large-scale corpora still hinders their development. The study reported in this poster seeks to fill in this gap with one of its major deliverables being a parallel intermodal English-Russian corpus featuring political speeches, their respective interpretations and translations.

It is worth noting that this study is still work in progress. In the future, we plan to expand the present corpus and the scope of material analysed and explore the employment of Natural Language Processing techniques to automate and speed up the analysis.

Marina Tonkopeeva is a first-year MA student of the European Master’s in Technologies for Translation and Interpreting at the University of Wolverhampton and University of Malaga.

Marina is a freelance conference interpreter working with international organisations and agencies in Russia and abroad.

For the past 3 years, Marina has been coordinating the translation project at the SDG Academy, a global UN initiative.

In 2017, Marina graduated from St Petersburg State University with a Master’s degree in Translation.

No picture available

Yuxiang Wei

Dublin City University

Post-Editing of Structurally Ambiguous Translation: The Biasing Effect from Source Text

This is part of an ongoing project which addresses the cognitive aspects of post-editing in relation to lexical and structural ambiguity. In the present paper, interim results from analyses of the CRITT Translation Process Research Database (TPR-DB) are reported regarding the disambiguation process in the post-editor’s mind. Following a brief review of the theoretical assumptions and empirical evidence concerning the assumed mental states for lexical disambiguation in translating and post-editing (Wei, forthcoming), this paper seeks to investigate the post-editor’s cognitive processing of ambiguous structures in the machine translation output. Eye movement and keyboard behaviour are analysed in detail to examine the observable disruptions of processing in respect of structural ambiguity. Tentative results show that, on the one hand, the subjects seem to parse the ambiguous target text (TT) in favour of the interpretation which is semantically consistent with the source, and on the other, disruptions of processing pertaining to the garden-path effect tend to occur not in the latter part of the sentence (where the wrong parse is disconfirmed), but in the earlier regions where the most quickly-built analysis is semantically inconsistent with the source text (ST). This indicates that the cognitive processes of disambiguation in the TT structure receives a strong bias from the mental representation of the source sentence. In the meantime, it also appears that in post-editing, structural disambiguation of the TT is largely suppressed as a result of the biasing effect from the ST, therefore the cognitive resources which are allocated to this aspect of the task seem fairly limited.

Yuxiang Wei is a doctoral candidate at the Centre for Translation and Textual Studies (CTTS), Dublin City University, Ireland. His educational background is cross-disciplinary, and he holds a Master of Philosophy (MPhil) from the Chinese University of Hong Kong. His MPhil research examined the lexical and syntactic aspects of Machine Translation for academic texts, involving corpus-based analyses on optics research articles. In his current research at the CTTS, he is interested in the correlation of the temporal, technical, and cognitive effort of post-editing to source- and target-text features, particularly in respect of lexical and structural ambiguity. He is funded by the School of Applied Language and Intercultural Studies, Dublin City University.