Director Kamusi.org at EPFL in Lausanne
How AI Cured Coronavirus and Delivered Universal Translation, and Other MT Myths and Magic
Artificial Intelligence, Neural Networks, and Machine Learning are often viewed as the launchpad to the future throughout the information technology industry. Research in language technology, especially Machine Translation, has steered increasingly toward these topics. The popular press magnifies the enthusiasm, leading many in the public to believe that universal translation has arrived, or is just around the corner. To what extent is this enthusiasm warranted?
AI is eminently suited for some tasks, and ill-fitted for others. Through analysis of mounds of weather data, for example, AI can detect patterns and make predictions far beyond traditional observational forecasts. On the other hand, an AI-based dating app would surely fail to connect would-be lovebirds beyond a few of the sci-fi faithful, because finding a match involves far too many variables that cannot be constituted as operable data.
Within language technology, tasks that are suitable for AI include finding multiword expression (MWE) candidates, or learning patterns that can predict grammatical transformations. Unsuitable tasks include deciding what word clusters are legitimate MWEs, or how underlying ideas could be effectively rendered in other languages without vast troves of parallel data. Yet the success, or hint thereof, for AI to achieve some Natural Language Processing (NLP) tasks has been frequently transmogrified as near-mastery of translation by computer science.
The assumptions about what AI can do for MT are built on several myths:
• We have adequate linguistic data. In fact, well-stocked parallel corpora are just part of the journey, and only exist for a smattering of language pairs at the top of the research agenda, not for the vast majority. By contrast, little or no useful digitized data exists for most of the world’s languages that are the mother tongues of most of the world’s people.
• Neural Networks have conquered the barriers faced by previous MT strategies, at least for well-provisioned languages. In fact, we can see recent qualitative improvements for certain languages in certain domains, but even the best pairs fail when pushed beyond the zones for which they have comparable data. Weather forecasts can be perfectly translated among numerous languages, for example, while seductive conversations for online dating will break MT in any language.
• Machine Learning yields ever more accurate translations. In fact, MT almost never channels its computed results through human verification, so it cannot learn whether its output is intelligible. We thought computers had learned to recognize gorillas from people tagging “gorilla” to a myriad of images, until we found machines applying that learned label to dark-skinned humans as well. Learning a language is lot more complicated than learning to recognize a gorilla. People learn languages through years of being informed and corrected by native speakers (parents, teachers, friends), and adjusting their output when they fail to be understood. For language learning by computers, the same iterative attention is no less necessary.
• Zero-shot translation bridges languages that do not have parallel data connections. In fact, zero-shot is purely experimental and the numbers so far are rock bottom.
Although AI is still in its infancy for translation and other NLP, we know its nutritional needs to grow toward maturity: data. This talk will conclude by describing two methods for collecting the data to power future MT. The first is already viable, using crowds to play games to provide terms for their languages, aligned semantically to a universal concept set. This method dispenses with the computational inferences about how words map across languages, in favor of natural intelligence. Such data can provide the bedrock vocabulary, including inflections and MWE’s, that hard-wired linguistic models and neural networks can then use to achieve grammatically and syntactically acceptable equivalence in other languages within the system. The second method is still in vitro. With vocabulary data in place, users can currently tag their intended sense of a word or MWE on the source side, constricting translations to terms that share the same sense. Those tags can become a rich vein of sense-disambiguated data from which machines can truly learn. AI cannot now translate among most languages, in most contexts, but, with sufficient well-specified, well-aligned data, it could.
Martin Benjamin is the founder and director of Kamusi, an NGO dedicated to gathering linguistic data and setting that data to work within language technologies, with a major goal to include languages that are otherwise neglected in research and trade. He began Kamusi in 1994 as a sideline to a PhD in Anthropology at Yale, as a Swahili dictionary that was a very early online experiment in what would later be termed “crowdsourcing”.
In response to demands from other African languages, he developed a model for multilingual lexicography through which languages interlink at a fine-grained semantic level. These knowledge-based relations undergird underfunded translation technologies he is currently building for all 7111 ISO-coded languages.
Among his writings, he is the author of “Teach You Backwards: An In-Depth Study of Google Translate for 108 Languages“, an empirical investigation of Google’s results in the context of larger questions pertaining to the enterprise of Machine Translation. His lab is now seated at the Swiss EdTech Collider at EPFL in Lausanne.
Professor of Translation
University of East Anglia
Jo’s first academic post was at the University of Reading. She then moved to the Department of French at Leeds University, where she was a founder member of the Centre for Translation Studies and ran the MA Applied Translation Studies for over a decade. Since joining UEA in 2012, Jo has taught specialist MA modules in translation technologies and professional translation, and an undergraduate module on translation and globalisation, as well as supervising PhDs in both applied and literary translation.
Jo holds an MA (Hons) in French Language and Literature and a PhD from Glasgow University, Scotland. She has also completed postgraduate training in teaching and learning in higher education. She was awarded a National Teaching Fellowship in 2008, when she also became a Fellow of the Higher Education Academy. She became an Honorary Fellow of the Chartered Institute of Linguists in 2018. Jo has previously served as a member of the AsLing TC Programme Committee and Organising Committee.
Khaled Ben Milad
University of Swansea
The TM matching of the associated source-and-target pairs is estimated through computing a score for the similarity of TM sources to input segments. In order to filter out less similar segments, translators can set a minimum match threshold, 70% for example (Bloodgood and Strauss, 2014), to limit TM suggestions to those for which scores are equal to or above the threshold value, on the assumption that these suggestions will be the most useful. However, this begs the question as to what sort of qualities the TM sources need to display to be considered highly similar to the input segment. In fact, the answer is not as simple as might be supposed since useful target segments in the TM dataset might exist but fail to be selected because their associated source segment contains a move (re-ordering) operation. This may be due to the fact that the TM matching procedure attempts to search for similarity using the same word order as the whole input segment: For instance, if [abcd] is given to a translator as a source text but their TM source is [bacd], would the TM matching algorithms accurately compute the high similarity? And if not, why not?
The word order of Arabic is flexible so verbal sentences, which are more common, can begin using the subject (Habash, 2010). In such a scenario, an experiment set out to answer the question above. Our hypothesis is that current TM algorithms, which are based largely on Levenshtein edit-distance calculations (Simard and Fujita 2012), will not provide appropriate fuzzy-matching scores if a TM source includes a move (reordering) operation.
Data and Methodology
The evaluation method treated the TM as a ‘black-box’ component; a test suite was used as an instrument to evaluate TM recall. In order to run the test, 85 verbal sentences in Arabic, which ranged from three to ten words in length, were extracted from the Arabic – English MeedanMemory. The subject of each test segment contained a different type of sub-segment, a single or multiple-word unit. Having extracted the segments, we applied a move operation – Verb place (1) was exchanged to Subject unit position (2). As a result, the test segments were SVO while the word order of TM sources was VSO, bearing in mind that the meaning of the two sentences was identical. After processing the test segments, we then submitted these as a document for translation to the five CAT tools – namely, Déjà Vu X3; OmegaT 5.2; memoQ 9.0; Memsource Cloud and SDL Trados Studio 2019. The document was uploaded as a file for translation, MeedanMemory attached as a TM and 70% was set as the translation threshold in the CAT tools. Then, those target segments that were matched at the translation threshold or higher were presented in the proposal window, while lower matches were not.
The results show that the fuzzy match values of segment retrieval that included a move operation reduced as the length of the segments decreased. However, TM matching metrics of the five CAT tools used different routines for handling such moves. The matching scores when retrieving short segments that included a move operation were lower than for long segments, despite the high usability of these segments, regardless of the sentence length.
The paper concludes that only longer sentences which include a move operation are likely to be presented as TM proposals; short sentences which include a move operation do not benefit from the use of TM. A possible explanation for the production of low-scoring matches is that the TM systems’ matching metrics did not recognise the move intervention, in which a procedure of calculating strings of surface forms was used.
The evolution of neural machine translation (NMT) has shown effectiveness and promising results in some European languages (Sennrich et al., 2016), but its success is limited to language pairs with availability of large amounts of parallel data; the standard encoder-decoder models show poor performance in low-resource languages (Koehn and Knowles, 2017). For Arabic, as a morphologically rich and low-resourced language, the afore-mentioned approach has its own limitations that make it not effective enough for translation quality. A number of MT systems that have switched to NMT and offer a service of Arabic <> English translation were evaluated in terms of their translation quality. Abdelaal and Alazzawie (2020) investigated the quality of Google’s output when translating from Arabic to English using human ranking. The study found that the system’s output produced high semantic adequacy, but some errors were found regarding fluency. A similar study, Al-mahasees (2020) tested two NMT systems – Google, Bing, in addition to the Sakhr hybrid MT system – using adequacy and fluency ratings. The study, which was conducted twice, in 2016 and 2017, to compare the development of the systems, found that Google outperformed the other systems regarding the production of adequate and fluent output over the two years in both directions. The current study will evaluate translation quality of a set of NMT systems – free and commercial, in both directions using human judgment and automatic metrics.
Data and Methodology
For the purpose of the study, ten texts in Arabic and ten in English were randomly extracted from an Arabic-English corpus – LDC2004T18, each text consisting of several sentences – test one consists of 117 sentences in Arabic, while test two includes 109 sentences in English. Having extracted sentences, the two test suites were translated using, in addition to the free systems – Bing NMT; Google NMT and Yandex NMT, the commercial interactive and adaptive Lilt. The reason behind involving Lilt, which has only recently supported Arabic, was to compare its translation quality against the free systems’ output. The evaluation method employed in the study was based on procedures of human judgment and automatic evaluation. In terms of human judgment, a subset of four source sentences each paired with its four MT systems’ output in each direction was distributed in an online questionnaire, in which participants were asked to rate adequacy and fluency on a four-point Likert scale, according to TAUS quality criteria. For automatic evaluation, BLEU was applied.
In terms of adequacy and fluency ratings, 11 respondents answered the questionnaire. The participants’ preference varied from system to system. However, the overall mean score reveals that the systems’ output expressed a high level of meaning rather than producing fluent translation, which suggests that heavier edit-operations were needed for post-editing fluency. A further observation is that the quality of Arabic to English translation was assigned relatively higher scores than when translating from English to Arabic. The very common error noticed in the systems’ output when Arabic was a target was untranslated words – proper nouns, which led to a reduction in translation quality. However, the outcome of human evaluation reveals that the most adequate and fluent MT system was Google, followed by Bing, with the other two systems behind, in both translation directions. Regarding the automatic scores, the BLEU 3 metric was run on the same questionnaire translation pairs. The result showed that Bing NMT was the best, followed by Google NMT, in both directions. A comparison of the human judgment results against BLEU scores shows that although the system producing the best quality translation was different between the two quality evaluation procedures, translation from morphologically rich languages was easier for NMT systems than translating into morphologically rich languages. Additional information is that we also ran the tests on ModernMT during the period of the free translation service offered due to COVID-19 – in May 2020. The system’s output received a better BLEU score than Lilt and Yandex in terms of Arabic to English translation, while it scored the best regarding English to Arabic translation, in which proper names were translated.
This study investigated translation quality of a set of NMT systems in terms of Arabic<>English translation using human evaluation and automatic metrics. The results reveal that NMT systems produced more adequate translation than fluent translation in both directions. Further, producing fluent translation into Arabic was more difficult than into English. Moreover, Google NMT was rated by evaluators the most adequate and fluent system in both translation directions, while Bing NMT achieved the best BLEU score.
I am interested in investigating computer-based translation tools. I am a research student at the University of Swansea UK. My PhD research is to conduct evaluation on the performance of translation memory systems when a sentence retrieval including Arabic linguistic features. A further investigation examines the quality output of neural machine translation systems in terms of Arabic<>English translation.
- 2016-Present: PhD student – Swansea University-UK
- Nov 2010-Jul 2013: Teacher at Almergib University, the Education College, Al-khomes, Libya
- 2007-2010: MA in Translation and interpreting. The Libya Academy, Libya.
- 1997-2001 BSc in English Language. Azzaytuna University.
Gloria Corpas Pastor
Language Technology for Interpreters: the VIP Project
Evidence of technological change, led by advances in digital technologies, is all around us and the field of interpreting is no exception. Recent years have witnessed a tremendous interest in language technologies and digital resources for interpreters (Braun, 2019; Drechsel, 2019). Nowadays there is a pressing need to develop interpreting technologies, with practitioners increasingly calling for tools tailored to their needs and their new work environments. However, technology growth in the profession still appears rather limited and slow-paced, despite some evidence that the profession is heading towards a technological turn (Fantinuoli, 2018). While language technologies have already had a profound transformative effect in translation, they have not yet led to a paradigm shift to the interpreters’ “digital workplace”.
Although interpreting has not yet benefited from technology as much as its sister field, translation, interest in developing tailor-made solutions for interpreters has risen sharply in recent years. With the advent of new technology, interpreters can work remotely, deliver interpreting in different modes (consecutive, simultaneous, liaison, etc.) and contexts (conferences, courts, hospitals, etc.), on many devices (phones, tablets, laptops, etc.), and even manage bookings and invoice clients with ease. But, unlike translation, interpreting as a human activity has resisted complete automation for various reasons, such as fear, unawareness, communication complexities, lack of tools tailored to interpreters’ needs, etc. (Mellinger and Hanson, 2018).
Several attempts to meet interpreters’ needs have been developed, mainly computer-assisted interpreting (CAI) tools and computer-assisted interpreting training (CAIT) tools but they are rather modest in terms of the support they provide (Wang and Wang, 2019). Nowadays, CAI tools basically encompass terminology management tools, corpora and note-taking applications (for an overview, see Corpas Pastor, 2018; Fantinuoli, 2017; Rütten, 2017; Xu, 2018; and Braun, 2019). There are almost no terminology tools to assist interpreters during interpretation or in the follow-up of interpreting assignments, nor can they be fully integrated in the interpreter’s workflow. There is a severe lack of purpose-built tools that fulfill interpreters’ needs and requirements. State-of-the-art tools suffer from further limitations in terms of platform-dependency, cross-platform access problems, integration and interoperability issues, low precision and recall, low degree of automation, lack of multiple format exchange, absence of robust cross-lingual NLP methodology and speech technology, among other problems.
This paper will present the results of a R&D project (VIP: Voice-text integrated system for interpreters) on language technologies applied to interpreting. Interpreters need to be equipped with tools which support new functionalities that can provide assistance during all phases of the interpretation process (both onsite and remote), including self-assessment and training. The VIP platform provides access to a wide range of tools and resources to assist interpreters in the preparation phase, during a given interpreting job and after the assignment (for training, life-long learning and follow-up purposes). VIP integrates terminology tools, corpora building and processing, automatic glossary building, automatic speech recognition and quality assessment applications, etc. VIP is freely accessible for researchers and practitioners.
The paper will be structured as follows. The first section will provide an overview of existing tools and resources for interpreters. The second section will describe the VIP tool, an environment designed to assist interpreters during the entire process (preparation phase, interpreting job and follow-up).
European Publication Office
Maria Recort Ruiz
International Labour Office
Terminology: Towards a Systematic Integration of Semantics and Metadata
Integration of Semantics and Metadata
In a time when technological advancements foster the public services’ modernisation and support rapidly growing amount of information exchanges between public administrations, across borders and sectors, the need for interoperability is bigger than ever and can be subdivided into four layers (1) (legal, organisational, semantic and technical). To attain interoperability, the risk of creating new digital barriers for administrations, businesses, and citizens must be avoided and makes it necessary for public services to have a common terminology, to define stable interfaces between them, and to be aware of already existing solution building blocks that have been developed by others (2,3).
The tip of the iceberg for citizens and the majority of officials and public services’ staff members takes the form of bits and pieces of information, as disseminated through the internet and intranets, and is linked to basic expectations, i.e. their availability, findability, retrievability and reusability. In turn, this is basically linked to the creation of information, its understandability, portability, management and sharing.
These characteristics are located at a crossroads for drafters, linguists, publishers and information management specialists, who are all stakeholders in the authoring-translation-publishing (ATP) chain but who do not always perceive the need for a unified workflow with transversal features, like standard formats and metadata. Aware of this and of the necessary switch of paradigm, the Publications Office of the European Union and the International Labour Office initiated an informal collaboration in 2019 with the support of corpus and terminology managers, and information management specialists. Their basic objectives were to support a further quality and accessibility enhancement of terminology and semantic assets, to move to a systematic use of semantics and to build on linked data. The project was divided into various phases and the results of the first one and the objectives of the second one were presented at the 41st edition of the Translating and the Computer Conference.
Phase one focused on the alignment of a major EU semantic asset (EuroVoc) with ILO semantic and terminological assets (thesaurus and taxonomy, as well as the ‘gig’ economy glossary). This multifaceted process was eased using semantic web technologies and standards . Achieved results underwent a first theoretical assessment. The following phase was centred on the human assessment and validation by specialists of automatically generated mappings between concepts included in all four vocabularies. Additionally, possible asset quality improvements were identified during this process and led to concrete asset maintenance actions. It is to be noted that both organisations further contributed to the semantic and technical layers mentioned above through e.g. the use of a new alignment and data visualisation tool.
This process, subdivided in numerous steps, put into light limitations (full cross-understanding by linguists and information management specialists of specific standards, technical means, acceptance of the project and implementation of deliverables) but as well solutions and achievements of relevance for both organisations and communities of professionals.
While moving towards semantic web technologies proved to be a big step for the producers (authors and linguists), publishers, and the consumers of data, benefits appeared for all stakeholders of the ATP chain.
We propose to shortly present the purposes, steps and current results of this project, along with the technologies used, as a follow-up of the previous Translating and the Computer conference.
Live Speech-to-Text and Machine Translation Tool for 24 Languages
The European Parliament works in 24 languages and is committed to ensuring the highest possible degree of resource-efficient multilingualism. In the European Parliament, all parliamentary documents are published in all of the EU’s official languages, which are considered equally important.
The right of each Member of the Parliament to read and write parliamentary documents, follow debates and speak in his or her own official language is expressly recognised in the Parliament’s Rules of Procedure.
Multilingualism also makes the European institutions more accessible and transparent for all citizens of the Union, which is essential for the success of EU democracy. Europeans are entitled to follow the Parliament’s work, ask questions and receive replies in their own language, under European legislation.
In order to produce the different language versions of its written documents and communicate with EU citizens in all the official languages, the European Parliament maintains an in-house translation service able to meet its quality requirements and work to the tight deadlines imposed by parliamentary procedures. Interpreting services are provided for multilingual meetings organised by the official bodies of the institution.
Despite the efforts, deaf and hard of hearing people cannot currently follow the European Parliament debates in real time. Subtitling by human transcribers, with the high degree of multilingualism required, is a highly resource-intensive task. Automatic live transcription of parliamentary debates would make them accessible to people with disabilities and thus improve the services the Parliament is offering to the citizens.
The European Parliament already uses technology as a support to the translation process. Offline automatic speech recognition is used to facilitate the production of the verbatim transcript of plenary sessions.
The European Parliament has begun to explore further the potential of online automatic speech recognition and machine translation technologies for the 24 languages in managing multilingualism efficiently and providing better, cost-efficient services for cross-lingual communication for its Members and European citizens.
We will present how the European Parliament wishes to support multilingual innovation and the digitalisation of all EU official languages through targeted developments relying heavily on the use of AI technologies.
It is therefore aiming to enter into an Innovation Partnership in order to acquire a licence for a tool that is able to automatically transcribe and translate parliamentary multilingual debates in real time.
The tool will also be able to learn from corrections and supporting data as well as from user feedback, so as to enhance quality levels over time.
The objective is to provide a useful service for Members of the European Parliament in accessing debates on screen as well as to provide accessibility for deaf and hard of hearing people who currently have no direct access to the debates of the European Parliament.
The ultimate goal is to provide an automatic transcription and machine translation service for parliamentary debates covering the 24 official languages used by the institution.
Head of Speech to Text Unit, DG TRAD European Parliament Brussels Area, Belgium
- European Parliament (2003 – 2016)
- Head of Unit / Interpretation manager
- Adam Mickiewicz University
- Assistant Professor (2001 – 2003)
- PhD student/lecturer (1997 – 2001)
- Freelance translator & interpreter (1996 – 2003)
- DomData AG sp. z o. o. In-house translator (1996 – 1999)
- Uniwersytet im. Adama Mickiewicza w Poznaniu Doctor of Philosophy (PhD), Linguistics (2001)
- Monterey Institute of International Studies Junior Fulbright Scholar, Linguistics (Computer Assisted Translation) · (1999 – 2000)
- Uniwersytet im. Adama Mickiewicza w Poznaniu Master of Arts (MA), English (1997)
- University College Galway TEMPUS grant, European life & institutions; English language & literature; Gaelic
The Role and Perspective of the Post-Editor: What Are the Training Challenges?
The present paper aims to provide an analysis of the skills and competences required to perform Post-Editing tasks, in an attempt to set the tone for an effective and specialized Post-Editing training Protocol. Post-Editing of Machine Translation (MT) is an increasingly popular linguistic service, already widely adopted within the language industry, as a cost-effective and time-saving solution with the potential to increase productivity. Initially conceived as a mere “human partner” to the MT engine (Vieira et al 2019), in recent years the Post-Editor has emerged as a new role in its own right, characterized nonetheless by certain ambiguities regarding the scope and nature of his/her work. Previous analyses in the field have provided insight into this new role, with the seminal works of Allen (2003) and Krings (2001) contributing to a theorization of the Post-Editing process, and that of Rico & Torrejon (2002) focusing on skills and training guidelines.
Drawing on these important contributions, this paper will attempt to link Post-Editing-specific skills, following the typology developed by Rico & Torrejon (2012), tothe concept of effort required to carry out mental processes during Post-Editing, as described by Krings (2001). To this end, we shall correlate linguistic skills, instrumental competences and core competences (Rico & Torrejon, 2012) to four types of effort, namely: temporal, technical and cognitive (Krings, 2001), adding the concept of psychological effort to Krings’ traditional categorization. Recent research (O’ Brien & Moorkens, 2014) as well as empirical evidence suggest that Post-Editing is widely viewed as a disruptive force operating within the translation profession, with many linguists expressing scepticism towards Post-Editing as a service and some of them mounting resistance to what they perceive as a threat to their future employability. We argue that much of this negative criticism actually stems from a lack of comprehensive and widely available training that would provide guidelines and dispel ambiguities concerning speed and productivity standards, as well as compensation models. We further suggest that Post-Editing, combined with other activities, could enable linguists to expand the scope of their work and increase their overall job satisfaction. Post-Editing should not be deemed as a replacement for traditional translation roles, but rather as a shift in skills and common practices enabled by language technology.2. METHODOLOGY
The present paper will demonstrate the challenges involved in setting up a comprehensive Post-Editing training protocol and attempt to define its content, supported by empirical research results, as reflected in a digital survey among language industry professionals (Gene, 2020). The survey was conducted in the context of a Post-Editing Webinar and its sample comprised representatives of Language Service Providers (LSPs) and Single Language Vendors (SLVs). The results hereby presented reflect the answers of 51 LSP and 12 SLV representatives.3. FINDINGS
The paper concludes by discussing the most important challenges related to the training of Post-Editors. To do so, it presents the results of a relevant survey (Gene, 2020) conducted among 63 members of the language community (51 LSP and 12 SLV representatives). The survey indicates that the greatest concerns relate to a lack of detailed Post-Editing instructions, exaggerated speed and productivity models and ambiguous payment models.4. CONCLUSION
With this analysis we wish to shed light on the elements that a future training scheme should take into account, in order to better prepare linguists for the increasingly common Post-Editing requests and help change attitudes towards this new service.
Viveta Gene is Translation & Localization Industry Specialist at Intertranslations S.A. With more than 15 years of experience as a linguist and vendor manager, she recently decided to combine her expertise and know-how to become a language solutions specialist. Viveta has an MA in Translation and New Technologies from the Department of Foreign Languages and Interpreting from the Ionian University. Her main focus is to promote new trends in the industry, where translation skills meet MT technology. MT tools and post-editing techniques are amongst her key fields of interest. This year she is about to start her PhD in the field of post-editing.
NMT plus a Bilingual Glossary: Does this Really Improve Terminology Accuracy and Consistency?
As an LSP offering global multilingual solutions, at CPSL we receive many different translation and localisation requests from our customers, and sometimes, due to budget and time constraints or flexible quality expectations, the most cost- and/or time-effective solution for their translation needs is a machine translation (MT) and post-editing (PE) workflow. When there is not enough time and data to train domain-based systems, and especially for one-off urgent requests, we consider testing stock NMT systems (generic neural machine translation engines), as these can render acceptable results in a great variety of content types.
Another advantage of such systems, apart from the above-mentioned versatility, is the fact that most of them are integrated into the main commercial and open-source CAT tools via plugins or connectors. And from the point of view of quality, fluency is one of the most praised quality improvements of NMT when compared with phrase-based systems. Terminological accuracy, however, is an aspect which is still far from being solved: most of our post-editors and MT evaluators note in their reports that a given term is not consistently translated across the whole document, even if the MT system has been specifically trained with reliable translation memories and validated glossaries.
Some tech giants offering NMT such as Google and Amazon now allow users to upload a bilingual glossary to their cloud servers aiming at improving terminological accuracy and consistency. At CPSL we considered that this new feature clearly deserved our attention and, in order to test and use it, we developed a series of scripts and programmes allowing integration into our current CAT tools and workflows.
We have been using this feature for some time now with several content types and language combinations as well as collecting feedback from our post-editors and MT evaluators. In this presentation we would like to share our findings so far and try to answer questions such as: up to what extent have terminology accuracy and consistency improved? Do the results depend on language combinations, on specific domains? And maybe we can also use the Asling TC42 conference as a ground for collecting any improvement suggestions and allowing them to reach the MT developers.
Lucía Guerrero is a Machine Translation Specialist at CPSL, a linguistic services provider based in Spain with presence in Germany, the UK and the US. The range of services includes translation, software and web localization, multilingual SEO, interpreting, multimedia and e-learning in all major Western and Eastern European, Scandinavian, Asian and Middle-Eastern languages.
Lucía is also part of the collaborative teaching staff at the Universitat Oberta de Catalunya. Having worked in the translation industry since 1998, she has also been a senior Translation and Localization Project Manager specialized in international institutions, has managed localization projects for Apple Computer and has translated children’s and art books.
Becoming a Machine Translation Coach
Although Artificial Intelligence development in machine translation is leading to lower prices, higher efficiency and increasing speed of translation for businesses, there remains much to be done to create a robust system that covers all known languages and all specialised subject areas with the same level of usage quality. This is why having a data quality management system is key to utilising the technology safely. We like the analogy of “coaching” the NMT output rather than post-editing it. This is because NMT will learn not just from the translation that the human translator produced, but also from other feedback. NMT engines learn from bilingual and monolingual data.
It is the bilingual data where the post-editor contributes. After receiving the human post-edit of the translation of a given segment, the goal is to learn from the “segment + its post edit” pair and induce the model to better translate the next input segment. This means that, in time, the task of the linguist will involve less fixing of grammatical errors, and more checking whether the translation is correct or not, making the process of post-editor more enjoyable.
Holder of degrees in Specialised Translation and Law, Julie’s career saw her manage translation and communication for BMO Bank of Montreal as well as financial and legal translation projects at major Language Service Providers in France and in the UK. These roles were a natural fit for Julie, a passionate communicator who speaks fluent French, Spanish and English.
Currently, Julie holds responsibility for Asian Absolute’s global Sales and Operation teams. She also personally led the start-up of the company’s operations in Bangkok and Panama City.
She has over 10 year professional experience in multilingual communications and AI applications in linguistics. She is a regular guest speaker at Localisation and tech/AI events. She recently spoke at the ATC Summits 2017 and 2018; EUATC 2018; Connected World Summit 2018; AI & Big Data Innovation Summit 2018 in Beijing and the IP EXPO Manchester 2019.
Adam University in Poznań
& XTM Group
Assessing Cross-lingual Word Similarities Using Neural Networks
One of the classic problems in natural language processing is word-level alignment. An algorithm is given a bilingual corpus – a set of sentence pairs in which the source sentences are paired with their corresponding translations – target sentences. For each of these pairs the algorithm has to decide which words from the source and target sentences are each other’s counterparts.
The problem of automatic word-level alignment was tackled primarily by statistical algorithms (see IBM models). This approach to word alignment was well suited for its main purpose This approach to word alignment was well suited for its main purpose (1) – statistical machine translation (SMT). The standard workflow for SMT involved a training phase during which, among others, the word alignments were computed. By assumption, the training phase was time consuming and so were the statistical word alignment algorithms. However, the information about word alignments between a sentence and its translation can be used in several other applications. In our scenario it is CAT (Computer-Assisted Translation) tools which can use this information in order to perform tasks of automatic correction of text.
In order to enable automatic correction tasks, it is necessary to predict word alignments for a given sentence pair on the fly, i.e. without the lengthy training phase. All automatic corrections in a CAT tool must not take more than a few seconds (ideally – no more than one second), otherwise such mechanisms would not be able to speed up the translation process. The standard statistical word alignment computation must be run on the whole corpus to ensure the quality of results. This process, however, can take hours.
The standard statistical word alignment computation must be run on the whole corpus to ensure the quality of results. This process, however, can take hours.
Apart from speed, the quality of alignments is another challenge in our CAT scenario. The IBM statistical models, even when trained on very large corpora, exhibit observable shortcomings. These shortcomings were not disruptive in the machine translation training but are not acceptable in automatic text correction where the highest quality of word alignments is desired. The third challenge is multilingualism – we require the word alignment mechanism to operate on over a hundred languages and all possible language pairs within these languages. Since the statistical approach requires separate training for all translation directions the support for 100 languages would involve 100 times 99 = 9900 separate training operations. Not to mention the necessity of acquiring bilingual corpora for each of these operations.
Our approachAs a step towards automatic prediction of word alignments we introduced a mechanism for predicting the similarity between words in different languages. The input for this mechanism is a pair of words – one in the source language, the other in the target language. The output is a real number between 0 and 1 indicating the probability of whether the target word is the translation of the source word.
The technique of computing interlingual similarity scores between words relies on data provided by Facebook (2) and BabelNet (3). The data is processed using the following procedure.
Firstly, the Facebook vectors are downloaded as text files. These files are dictionaries (one language per file), containing the top 2 000 000 words with their vector representations. Word representations (also called word embeddings) were obtained by Facebook using an auto-encoder neural network on the CommonCrawl corpus. It was observed that by examining mathematical relations between these vectors, it is possible to find words which are semantically similar (4). This, however, only applies to word similarities within one language. In order to examine similarities between words across languages, an additional operation was needed – alignment of vectors (5) between languages by computing a transformation matrix. The matrix is trained by a bilingual dictionary for which we used licensed resources provided by BabelNet. In order to handle multiple languages we assumed the following procedure: for English, we use the vectors provided by Facebook directly. For all other languages, we first compute the transformation matrix from this language into English using data from BabelNet dictionaries. Thus we are able to support every language direction, for which Facebook provides source and target language vectors. Sample results of similarity calculation between Italian and English:
|English word||Italian word||Similarity|
|cat||fatta (female cat)||0.552|
|cat||giorno (day)||0.164 (as expected, low similarity)|
|day||fuoco (fire)||0.193 (as expected, low similarity)|
ConclusionsPreliminary results obtained for the English-Italian language pair are promising. With the help of the similarity computation mechanism we will pursue the goal of developing a robust on-line word aligner. ———-
(1) Peter F. Brown and Stephen A. Della-Pietra and Vincent J. Della-Pietra and Robert L. Mercer (1993): The Mathematics of Statistical Machine Translation, Computational Linguistics
(4) Distributed representations of words and phrases and their compositionality T Mikolov, I Sutskever, K Chen, GS Corrado, J Dean – Advances in neural information processing systems, 2013
Rafał Jaworski, PhD, is an academic lecturer and scientist specializing in natural language processing techniques. His Alma Mater is Adam University in Poznań, Poland, where he works at the Department of Artificial Intelligence.
His scientific work concentrates around developing robust AI algorithms for the needs of computer assisted translation. These include, among others, automatic lookup of linguistic resources and computer assisted post editing.
Apart from the research and teaching, he works as Linguistic AI Expert at XTM International, leading a team of young and talented developers who put his visions and ideas into practice.
University of Wolverhampton
How Can Terminology Extraction and Management Technology Help Language Professionals in Broadcast Media?
Modern broadcast media is characterised by a high degree of internationalisation. Most major media outlets are either multilingual by design (e.g. BBC, France 24, VOA, RT, etc.) or function in a plurilingual environment due to the globalised nature of the current news agenda. News organisations seek to ensure their viewers can access the latest developments with minimal latency, which is sometimes achieved through live on-air interpretation of speeches, announcements, press conferences and other events.
While broadcast media interpreting is intrinsically a technology-reliant process, technological solutions that specifically facilitate the work of media interpreters still appear to be relatively scarce. This paper proposes a prototype of a digital tool that could be used by interpreters working in multilingual broadcast media. The tool could serve as an aid during multiple stages of the interpreting process, including assignment preparation, in the booth and during the post-assignment de-briefing. Its key functions would include terminology extraction, terminology management and, optionally, speech recognition in the booth. The prototype combines several existing technological solutions and could interface with available tools offering these solutions.
Since the tool is designed to be used in multilingual media outlets, it considers the specific workflows in these contexts. In news media outlets, translation-related activities are often not considered as “essentially dissimilar from other tasks involved in the production of news” (Bielsa, 2007, p. 143). This highlights an interesting dichotomy: while interpreting as a process technically does not overlap with journalism per se, in terms of terminology, semantics and pragmatics, it is still interlaced with other activities that take place in a newsroom. Therefore, from the point of view of practicality and marketability, a technological tool designed for the media interpreter should ideally provide transferrable solutions that could be employed by both interpreters and other agents involved in plurilingual-context news production, such as translators, news writers or output editors. This paper therefore focuses on the applicability of such a tool in the area where the activities of interpreters and other language specialists could overlap the most: terminology extraction.
The prototype was tested on the Russian-English language pair using Vladimir Putin’s annual, wide-ranging press conference, as complete official transcripts of all recent editions of this event are openly accessible online both in Russian and English, which facilitated the compilation of pilot corpora. A 268.000-word parallel corpus was created, which was then run through automatic terminology extraction tools in different configurations: as a bilingual corpus composed of complete unedited transcripts and as pre-processed subject-specific bilingual subcorpora, whereby information had to be categorised by topic manually. Tools provided by Sketch Engine and Terminotix were used in the tests. In our case, the solution that appeared to yield the most comprehensive results requiring the least amount of post-editing was automatic bilingual term extraction from subject-specific parallel subcorpora, carried out in Synchroterm by Terminotix. Indeed, a corpus comprising full unedited transcripts of previous press conferences might not be the best fit for automatic term extraction, as such texts contain both spontaneous speech and common language, which generate noise, as our tests have demonstrated. Pre-processed thematically arranged corpora appear to be a better option, since bilingual term lists automatically generated from this type of corpora require less post-editing.
As the pilot run has shown, although the proposed tool design could potentially facilitate various aspects of a broadcast media interpreter’s work, the tool’s other modules would need to be tailored carefully to the task at hand for best results. For terminology management, it should be easy to use the tool for a variety of types of search. In addition, results would ideally be shareable to allow for teamwork between different members of the news production team; a placeholder solution corresponding to these criteria was adopted for testing purposes. Although initial tests of Automatic Speech Recognition were conducted, further testing using professional hardware is necessary.
In conclusion, some components of the proposed tool could simplify the work of the interpreter and other language specialists in the media. Future work will test the actual applicability, practical benefits, interpreters’ performance, and attitudes toward the prototype with media interpreters working in the Russian to English language pair.————
Bielsa Esperança. (2007). Translation in Global News Agencies. Target, 19(1), 135–155. doi: 10.1075/target.19.1.08bie
Daniya Khamidullina is a second-year MA student from the first cohort of the European Master’s in Technologies for Translation and Interpreting (EM TTI). After obtaining an undergraduate degree in Linguistics, Translation and Interpreting at Lomonosov Moscow State University, Daniya worked for international media outlets (BBC, RT Spanish) as an in-house translator, interpreter and localisation supervisor for several years. Her master’s dissertation research focuses on the use of translation technology in audiovisual translation workflows.
STAR Corporate Language Management: Translation Management in Times of Distancing
In times of change we are looking for new ways to perform our regular tasks, like working from home without seeing our colleagues and having the chance to work with them face-to-face. Despite the distance, we need to cooperate with each other, discuss, guide and teach our workmates, learn from each other, make plans together and do our best to promote the team´s success.
But how can we boost our teamwork and keep distance at the same time? On the example of STAR Corporate Language Management, we will show you how to work with each other effectively. Thanks to our management tool, remote teams and individuals can manage the translation process in a well-planned way and improve their collaboration in teams. You will see how to use STAR Corporate Language Management and its benefits, how to coordinate the translation within a web-based platform and achieve your best result working together – completely virtually.
We will create a translation project simulating a normal workflow from its beginning up to completion. The audience will see how a workflow may be modified depending on the company´s structure and needs and how all the participants cooperate with each other within the created model.
Working with STAR Corporate Language Management reduces administrative duties and especially in these times, when all parties involved are working from different locations, it links all participants through defined processes. Be part of a fictive team and simulate a real translation order with us. Observe how the different roles interact with each other within the platform. The team members take action only when their respective step comes, and their expertise is needed. To achieve a smooth transition from one step to the other, it is important to model the right workflow and define the roles and tasks, automating the routine. In our example we will use the model Customer-TranslatorReviewer-Customer. However, workflow customisation is the key to success here. And since the processes can vary from company to company, we will give you an idea which roles and steps are primary and how they can be extended.
Another important topic of distance working is the way of task assignment: task pool or direct job assignment. Task pools have a lot of benefits, but also disadvantages compared to direct job assignments. We will have a look into them and analyze the role of the project manager in both scenarios. Project managers are the team leaders when it comes to translation orders. We will give valuable hints how to optimize their work and get the best results using web tools so that they are well equipped to handle project scheduling, standardization, resource optimization, reporting and controlling.
We are all facing the problems of remote teamwork, such as transparency and project security. We will show possibilities as to how to solve them in an efficient way, how to manage company-wide processes with one tool, virtually bringing people together.
Olena Kharchenko graduated as a translator from the Karazin-University Kharkiv in the year 2013. 2017 she obtained her master´s degree at the Johannes-Gutenberg-University Mainz, where she studied conference interpreting of Russian, German and English.
At the beginning of 2018 she became a part of the Second Level Support Team at STAR Group with emphasis on Support and Training for STAR TermStar/Transit, WebTerm und CLM.
The PL Project
Jogging, Swimming, Interpreting At the Same Place, Same Day. Impossible?
What is the relationship between these three activities you will say? Well, you can do all in a row if you are well organised. The first one to warm you, the second one to cool you and the last one to let you do your job relaxed and mentally well prepared.
But those activities have to be practiced at the same place, otherwise, one day will not be enough, especially if you have to take public transport or drive to the place where interpretation is requested.
Ever heard about the RSI – Remote Interpretation Services? No? But you know what the coronavirus is, surely? Well, there is a relationship between the two: in the past, hundreds of years ago, interpreters had to move to the place of interpretation (booth, exhibition, court, …) and would lose a lot of time and sometimes nerves, spend a little money for the night, food and show nice dress apparel on site. Then came a nasty virus which forced everyone to stay home for some weeks and a lot of events were cancelled. All of a sudden, event organisers, but also all companies in need of communication devices discovered that interpretation services can also be offered on-line with rather simple tools through the Internet.
Actually, RSI were created well before the COVID-19 crisis, but they were not very welcomed by parties on both sides: interpreters would refuse to use such a degrading device, arguing about the low quality of the equipment, the risk of connection interruption, the loss of quality of interpretation and criticise every colleague showing too much interest. No serious discussion could be started about RSI without hysteric recriminations by the “real professionals”.
And for the companies, though interested by the reduced cost of this service compared to on site interpretation, they doubted that the technique was reliable, that the interpreters were real professionals and that the audience would be satisfied and would appreciate such service.
Then, due to the fact that nearly anybody today in our Western societies has a smartphone that could support the download of a specific App and possibly use earbuds, RSI is today accessible nearly for free for the audience and at a reduced price for the customers.
How could such a change occur and what are the future prospects?
Digitalisation is applied in every domain, including in translation and interpretation businesses. Solutions have existed for a long time, but were not really considered in the past. The main opposition came from the professionals themselves, either in translation or in interpretation. A translation agency is considered a kind of devil stealing the heart and essence of the profession to resell garbage, useless documents, thus hiring the worst translators and taking the most profitable customers away from the market.
RSI underwent the same criticism in terms of recruitment: only incompetent interpreters would work for such horrible employers, equipment would be of bad quality, and unpredicable events could occur during transmission. So, why do broadcasting programs show World Cup finals or any event if the quality of the devices were not good. Everyone should go to the stadium instead. Ah ah, but the match takes place on the other side of the world, so what is to be done to see the event?
Early 1991, Prof. Dr. Patrick H. Lehner founded The PL Project, an office dedicated to providing professional translations and interpretation services. During the last 30 years, Lehner was General Manager of the Computer-Expo in Lausanne and of the Ecole de la Construction in Tolochenaz (VD). He has also served before as Head of the Information Department for the International Savings Banks Institute (ISBI) in Geneva and as Manager of the French-Swiss Chamber of Commerce (CFSCI).
A Swiss and French citizen, Lehner has a Master of research from the University of Paris I Sorbonne, a MSc in Management from EAP-European School for Management Studies (today ESCP Europe), and a Bachelor’s Degree in Economics from the University of Paris I Sorbonne. He started without full completion a Bachelor in Translation, then in Theology at the University of Geneva. He holds also a CAS certificate as Legal Interpreter from the ZhAw and is sworn translator.
Lehner started teaching in 1992 at the Ecole de la Construction, Tolochenaz VD while service there as General Manager. He became lecturer at the University of Finance, afterwards Business Management University, in 2002. He lectured at European University in Montreux and Geneva, at BBA, MBA and DBA level. Lehner also teached for PharmaSuisse in Management Techniques, at Innopark in Yverdon and in Studio Renens about Creating a Start-Up. He founded in 2012 Universitas Cartagensis, a teaching institution in Bogota (CO) under construction.
Today, Lehner is regular lecturer at Business School Lausanne/VD, American University in Switzerland/VD, and Banku Augstkola, Riga/LT. He has been teaching and organizing for 23 years the “Economic Weeks” in the French speaking part of Switzerland and went 14 years in a row teaching this introduction to economy seminar at the Swiss School in Bogota/CO.
He as published different books about Human Resource Management and wrote several articles about HRM and IT.
Université Paul Sabatier
Translating Writers Don’t Feel They Should Modify The Source Text
Translating writers are non-translators who are asked to produce documentation in English in parallel with writing it in their native language, often for delocalization purposes. This specific profile of translator can enjoy modifying the target text (like regular translators), as well as the source text (unlike regular translators).
To help them produce more efficient documentation, we first worked on a new kind of CAT tool that allows the user to edit both target and source texts. The point is that the translating writers don’t “feel” that they should modify the source text, and would prefer to improve the linguistic quality of the target text.
We worked on problematic bilingual documents from a company that delivers technology solutions to manage business operations. Using a typology of 151 types of errors usually used for didactical purposes, we processed 173 710 words (for both languages) written by 15 translating writers.
We note that the very problem we see for the English speaking colleagues, who need to understand the target texts and follow the requirements described in the bilingual documents is rather the number of missing words (even missing segments) in one and in the other language than the linguistic quality.
Claire Lemaire defended her PhD thesis (Université Grenoble Alpes, France) in Linguistics in 2017 on “how to adapt translation technology, initially designed for professional translators, to domain experts who have to translate for their company“. Since 2019 she has been an associate professor at Université Paul Sabatier and an visiting researcher at Grenoble Informatics Laboratory (LIG).
Her research interests are mainly related to language processing and machine translation. She currently works on solutions for translating writers in the industry, and on how to improve a morphological analyzer of the German language.
Before her academic career, she first worked in a German multinational software corporation that makes enterprise software to manage business operations and customer relations, and then in more French companies that create and deliver business and technology solutions.
Post-editing Creative Texts vs Specialised Texts: Investigating Translators’ Perceptions and Performance
In the past five years, the development of Neural Machine Translation (NMT) models has led to improved MT outputs especially for resource-rich language pairs (Deng and Liu, 2018) and mainly at the level of fluency (Castilho et al., 2017a; 2017b). As a result, they have been increasingly used in the language industry for generating raw output to be then post-edited by professional translators (Lommel and DePalma, 2016) and have been associated with productivity gains (Guerberof, 2009; 2012; Plitt and Masselot, 2010; Gaspari et al., 2014; Toral et al., 2018; Moorkens et al., 2018). This scenario, however, has been predominantly reserved for specialised, repetitive texts, given that creative texts, such as literary and promotional texts, still remain a great challenge for MT (Toral and Way, 2018) and are thus considered to be the last bastion of human translation. Creative texts, unlike specialised texts, have a clear expressive or aesthetic function. As a result, it is not sufficient to merely preserve their meaning; translators have to employ all their artistic resources and awareness of both content and context to create translations that offer the reader a comparable reading experience with that enjoyed by the reader of the original text (Toral and Way, 2018). Thus, the translation should “undo the original” (de Man, 1986) to deal with the uniqueness of the source and target languages and the source and target cultures. This ‘undoing’ is not an easy task, as it requires not only linguistic competence, but also creative competence. It is not surprising, therefore, that many translators of creative texts continue to shun MT or believe it is “inadequate for their purposes” (Cadwell et al., 2016: 237). A further limitation to the broader use of MT, especially in the case of creative texts, is that its correction, which is generally termed post-editing of machine translation (PEMT) or post-editing (PE), is a task disliked by many translators, who have complained that it constrains their work, allows limited opportunities for creativity, and forces them to correct ‘stupid’ errors (Cadwell et al, 2016; 2017; Moorkens and O’Brien, 2017), which result in “admittedly imperfect translations” (Besacier, 2014: 121).
This paper seeks to investigate the translators’ performance when post-editing a promotional text from the domain of tourism and compare it to their performance when post-editing a specialised medical text. To that end, ten postgraduate students from the Department of Foreign Languages, Translation and Interpreting at the Ionian University are asked to post-edit the Greek NMT output of a 400-word excerpt from an online holiday brochure and a 400-word excerpt from a clinical trial intended to verify the effects of an investigational medicine. Both source texts (STs) are written in English, while the NMT system used is Google Translate. Their post-edited versions are assessed for quality on the basis of adequacy and fluency rating as well as a fine-grained error annotation and error analysis. To annotate translation errors several error taxonomies have been proposed, ranging from coarse-grained (Vilar et al., 2006) to fine-grained ones, such as the Multidimensional Quality Metrics (MQM) (Lommel et al., 2014), the TAUS Dynamic Quality Framework (DQF) Error Typology (1), the SCATE error taxonomy (Tezcan et al., 2019) and the harmonised DQF/MQM error typology which has been widely used by the industry, research, and academia (Lommel 2018, 109) as it provides a functional approach to quality. In particular, it focuses on the translation product rather than the translation process and it examines whether a translation meets particular specifications and identifies specific error types in the translated text. In the present study, the analysis of the quality is based on an adapted version of the DQF/MQM typology. Each segment is annotated by two professional Greek translators, each with over 10 years of experience. The final annotation score is the average number of errors identified by both annotators, while the inter-annotator agreement is calculated using Cohen’s kappa coefficient (κ) which measures the inter-annotators’ reliability (Cohen, 1960). In addition, questionnaires and interviews are used to capture and compare the translators’ attitudes and perceptions vis-à-vis MT and PE of creative texts and medical texts.
Besacier, Laurent, and Lane Schwartz. 2015. Automated translation of a literary work: a pilot study. In Proceedings of the Fourth Workshop on Computational Linguistics for Literature, Denver, CO, 114–122.
Cadwell, Patrick, Sheila Castilho, Sharon O’Brien, and Linda Mitchell 2016. Human factors in machine translation and post-editing among institutional translators. Translation Spaces, 5(2): 222–243.
Cadwell, Patrick, Sharon O’Brien, and Carlos S. C. Teixeira. 2017. Resistance and accommodation: factors for the (non-) adoption of machine translation among professional translators. Perspectives-studies in Translatology, 26: 301–321.
Castilho, Sheila, Joss Moorkens, Federico Gaspari, Rico Sennrich, Vilelmini Sosoni, Panayota Georgakopoulou, Pintu Lohar, Andy Way, Antonio Valerio Miceli Barone, and Maria Gialama. 2017a. A Comparative Quality Evaluation of PBSMT and NMT using Professional Translators. MT Summit 2017, Nagoya, Japan.
Castilho, Sheila, Joss Moorkens, Federico Gaspari, Iacer Calixto, John Tinsley, and Andy Way. 2017b. Is Neural Machine Translation the New State of the Art?, The Prague Bulletin of Mathematical Linguistics, 108: 109–120.
Cohen, Jacob. 1960. A Coefficient of Agreement for Nominal Scales, Educational and Psychological Measurement, 20(1):37–46.
Deng, Li, and Yang Liu. 2018. A Joint Introduction to Natural Language Processing and to Deep Learning. In Li Deng and Yang Liu (eds.) Deep Learning in Natural Language Processing, 1–22. Singapore: Springer.
De Man, Paul. 1986. The resistance to theory. Minneapolis: University of Minnesota Press.
Gaspari, Federico, Antonio Toral, Sudip Kumar Naskar, Declan Groves, and Andy Way. 2014. Perception vs reality: Measuring machine translation post-editing productivity. Paper presented at the 3rd workshop on post-editing technology and practice (WPTP-3), within the 11th biennial conference of the Association Human Factors in Computing Systems (CHI).
Guerberof, Anna. 2009. Productivity and quality in MT post-editing. In Marie-Josée Goulet, Christiane Melançon, Alain Désilets and Elliott Macklovitch (eds.) Beyond translation memories workshop. MT Summit XII, Ottawa. Association for Machine Translation in the Americas
Guerberof, Ana. 2012. Productivity and quality in the post-editing of outputs from translation memories and machine translation. PhD Dissertation. Barcelona: Universitat Rovira i Virgili.
Lommel, Arle. 2018. Metrics for Translation Quality Assessment: A Case for Standardising Error Typologies. In Joss Moorkens, Sheila Castilho, Federico Gaspari and Steven Doherty (eds.) Translation Quality Assessment, 109–127. New York: Springer.
Lommel, Arle, and Donald A. DePalma. 2016. Europe’s leading role in Machine Translation: How Europe is driving the shift to MT. Technical report. Common Sense Advisory, Boston.
Lommel, Arle, Aljoscha Burchardt, and Hans Uszkoreit. 2014. Multidimensional quality metrics (MQM): A framework for declaring and describing translation quality metrics. Tradumatica: tecnologies de la traducció, 12:455–463.
Moorkens, Joss, Antonio Toral, Sheila Castilho, and Andy Way. 2018. Translators’ perceptions of literary post-editing using statistical and neural machine translation. Translation Spaces, 7: 240–262.
Plitt, Mirco and François Masselot. 2010. A productivity test of statistical machine translation post-editing in a typical localisation context. The Prague Bulletin of Mathematical Linguistics, 93: 7–16.
Tezcan, Arda, Joke Daems, and Lieve Macken. 2019. When a ‘sport’ is a person and other issues for NMT of novels. In Proceedings of the Qualities of Literary Machine Translation, 40–49, Dublin, Ireland, 19 August. European Association for Machine Translation.
Toral, Antonio and Andy Way. 2018. What level of quality can Neural Machine Translation attain on literary text? In Joss Moorkens,
Sheila Castilho, Federico Gaspari and StevenDoherty (eds.) Translation Quality Assessment, 263–1287. New York: Springer.
Toral, Antonio, Martijn Wieling, and Andy Way. 2018. Post-editing Effort of a Novel with Statistical and Neural Machine Translation. Frontiers in Digital Humanities 5:9.
Vilar, David, Jia Xu, Luis Fernando D’Haro, Hermann Ney. 2006. Error analysis of statistical machine translation output. In Proceedings of the International Conference on Language Resources and Evaluation (LREC), Genoa, Italy. European Language Resources Association (ELRA).
Maria Stasimioti is a PhD candidate at the Department of Foreign Languages, Translation and Interpreting at the Ionian University. She holds a BA in Translation Studies and an MA in Theory and Didactics of Translation from the same university. She has been working as a freelance translator and proofreader since 2010. She has taught Computer-Assisted-Translation and English for Specific Purposes (ESP) at the Ionian University and participated in the EU-funded project TraMOOC (Translation of Massive Open Online Courses, https://tramooc.eu/). Her research interests lie in the areas of Machine Translation (MT), Computer-Assisted-Translation (CAT), Post Editing (PE) and Cognitive Studies.
Vilelmini Sosoni is Assistant Professor at the Department of Foreign Languages, Translation and Interpreting at the Ionian University in Corfu, Greece, where she teaches Legal and Economic Translation, EU texts Translation and Terminology, Translation Technology, Translation Project Management and Audiovisual Translation (AVT). In the past, she taught Specialised Translation in the UK at the University of Surrey, the University of Westminster and Roehampton University, and in Greece at the National and Kapodistrian University of Athens and the Institut Français d’ Athènes. She also has extensive industrial experience having worked as translator, editor and subtitler. She holds a BA in English Language and Linguistics from the National and Kapodistrian University of Athens, an MA in Translation and a PhD in Translation and Text Linguistics from the University of Surrey. Her research interests lie in the areas of Corpus Linguistics, Machine Translation (MT), Translation of Institutional Texts and AVT. She is a founding member of the Research Lab “Language and Politics” of the Ionian University and a member of the “Centre for Research in Translation and Transcultural Studies” of Roehampton University. She has participated in several EU-funded projects, notably TraMOOC, Eurolect Observatory and Training Action for Legal Practitioners: Linguistic Skills and Translation in EU Competition Law, while she has edited several volumes and books on translation and published numerous articles in international journals and collective volumes.
APE-QUEST, or How To Be Picky About Machine Translation<
While machine translation (MT) quality has dramatically progressed in recent years, especially because of the shift to deep learning, the data behind the MT system remains a crucial factor. Therefore, the risk of errors (e.g. terminological errors) is especially high when a generic MT system is applied to domain-specific text, as there is a mismatch between the data on which the system is trained and the data to which it is applied. Also, sentence translations produced by MT systems tend to have a variable quality, which depends not only on the domain in question, but also on the length of a sentence, the distance between source and target languages, the presence of ambiguous words with multiple possible translations, etc.
Given domain specificity and MT variability, a smooth translation workflow requires selecting which MT translations should be manually post-edited and which ones can be left as such. There are typically two use cases in the context of selecting whether a translation is acceptable or not: the first one is the assimilation case, which involves gisting (getting the gist of the translated document) or creating in-house translations; the second one is the dissemination case, which involves the publication of translated documents in order for them to be used externally. In the assimilation case, it is likely that there are more unedited MT translations which will satisfy user needs than in the dissemination case.
The APE-QUEST project (Automated Postediting and Quality Estimation, duration 2018-2020) is a collaboration between CrossLang (coordinator), Unbabel and the University of Sheffield, funded by the Connecting Europe Facility programme of the European Commission (EC). It investigates the combination of an MT system with a quality estimation (QE) system, in order to channel some sentence translations to a human posteditor. In addition, the project investigates the integration of an automatic post-edition (APE) system as a second means of correcting MT translations. Channelling takes place through a quality gate. This is a workflow that applies automatic QE to the translation of a sentence produced by MT and decides, based on QE thresholds, whether the translation should be kept (high QE score), post-edited automatically (moderate score), or reviewed by a human post-editor (low score).
QE systems of the University of Sheffield and Unbabel were tested in the project. They are based on human post-editions, including corrections to translations of domain-specific sentences. In the process of training such a system, it learns from the post-editions, either through “classical” machine learning or through deep learning, which is the state of the art. In the project, 10,000 domain-specific sentences (involving texts relating to the legal domain, procurement and online dispute resolution) were post-edited for each of the language pairs English-French, English-Dutch and English-Portuguese, and used as training data for QE systems. The post-editions have been made publicly available by the consortium through the repository ELRC-SHARE.
The post-edited sentences have also been used in the project as training data for the University of Sheffield’s APE systems. Tests showed that it is very difficult to automatically correct MT output in both a precise and exhaustive way. Many sentence translations produced by state-of-the-art MT have such a high quality that APE tends to overcorrect them. APE can be set up to avoid overcorrection, but then its corrections may become too limited. It should be noted that the post-edited data can also be used as MT training data (if an internal MT system is used) in order for the MT system to improve, and to reduce the need for human post-edition. Tests are currently ongoing to measure the performance of the quality gate with respect to the assimilation and the dissemination use cases. These tests involve selecting thresholds, engaging human post-editors and raters, and making use of eTranslation, the MT system built by the EC. The outcome of these tests will indicate the potential speed gain and cost reduction compared to human translation from scratch, and the potential gain in translation quality compared to a scenario where MT output is not subject to any human review. For the reasons mentioned earlier, the APE component is not activated in the quality gate for the tests.
Tom Vanallemeersch holds a PhD in computational linguistics from the University of Leuven. At CrossLang, he customises machine translation systems and provides consultancy and presentations for the European Commission on various language technologies. Previous activities include development at Systran and DG Translation and project coordination at Dutch Language Union.
Sara Szoc holds a PhD degree in Linguistics from the University of Leuven and joined the CrossLang team in Ghent in 2013. As a Language Engineer, she is mainly involved in projects focused on building and improving machine translation systems. Her primary interests in this area include engine customization, MT evaluation and Quality Estimation.
Heidi Depraetere has over 20 years’ experience in the translation and language technology industries. She is a founder and director of CrossLang, a consulting and systems integration company dedicated to translation automation technology. User case-centric machine translation evaluation is an area of active interest.
Joachim Van den Bogaert started his career in natural language processing as a researcher for the Centre for Computational Linguistics at the KU Leuven. Since then, he has been active as a research engineer in the machine translation industry for over ten years. He currently leads the CrossLang development team and is involved in funded projects on machine translation, ontology extraction and semantic reasoning, including APE-QUEST.
Lucia Specia is Professor of Natural Language Processing at Imperial College London and University of Sheffield. Her research focuses on various aspects of data-driven approaches to language processing, with a particular interest in multimodal and multilingual context models and work at the intersection of language and vision. Her work can be applied to various tasks such as machine translation, image captioning, quality estimation and text adaptation. She is the recipient of the MultiMT ERC Starting Grant on Multimodal Machine Translation (2016-2021) and is currently involved in other funded research projects on multimodal machine learning and machine translation, including APE-QUEST. In the past she worked as Senior Lecturer at the University of Wolverhampton (2010-2011), and research engineer at the Xerox Research Centre, France (2008-2009, now Naver Labs). She received a PhD in Computer Science from the University of São Paulo, Brazil, in 2008.
Fernando Alva-Manchego is a Ph.D. Candidate and Research Assistant at the University of Sheffield and a member of the Natural Language Processing Research Group in the Department of Computer Science. His research focuses on developing resources and methods for evaluation of natural language generation models, with emphasis on text simplification and machine translation. Previously, he worked as Lecturer at the Pontifical Catholic University of Peru (2013-2016). He received his M.Sc. in Computer Science from the University of São Paulo, Brazil, in 2013. He also holds a B.Sc. in Informatics Engineering from the Pontifical Catholic University of Peru.
Flávio Azevedo is a R&D Project Manager at Unbabel. He has more than 12 years of experience working in R&D and Innovation related projects and activities for Groups like Thales and Timwe. He has participated in different European Funded R&D Projects (EFFISEC, SECURED,EIT ICT – Connected Digital Cities) and also Portuguese Funded R&D Projects (SMART-er, MultiPass) acting mainly as Technology Research Engineer for the fields of Computer Vision, Videon Analytics, Pattern Recognition and Decision Support Systems. He has a Bachelor and Master in Telecommunications and Computer Science from ISCTE where he is also now a PhD student of Complexity Science.