Miguel Ángel Candel-Mora is an associate professor in the department of AppliedLinguistics of the Universitat Politècnica de València, Spain. He holds an MA inTranslation from Kent State University and a PhD in Linguistics from the Universitat deValència. For the last twenty years he has also worked as a translator actively involvedin the translation and localisation industry. His academic interests focus ontranslation-oriented bilingual terminology management, computational terminology,computer-assisted translation, information retrieval, corpus-based translation andtext mining.

Miguel Ángel Candel-Mora

will present…

Evaluation of English to Spanish MT output of Tourism 2.0 consumer-generated reviews with postediting purposes

 

Abstract

While in the past, internet users were passive observers, in Web 2.0 they have become active contributors. According to studies on consumer-generated reviews (Schemmann, 2011) seven in every ten Internet users worldwide trust consumer opinions and peer recommendations posted online. Likewise, according to the Spanish Tourist Movement Survey (Familitur, 2013) of the Spanish Institute of Tourism Studies, Internet use increased over 29% in 2012: almost all users (99.2%) used it to search for information, 76.5% to make a reservation and 52.4% for payment of services. Despite this significant business volume, online travel review platforms usually rely solely on raw machine translations of consumer reviews, therefore this paper focuses on the analysis of a corpus of machine translation output of consumer-generated reviews dealing with travel and tourism in order to propose improvements in the MT system and identify error categories and their effects on text quality with postediting purposes.

More specifically, the objectives of this paper are twofold:
1) First, the characteristics of this new genre of consumer reviews are defined with the aim of a carrying out a native speaker readability evaluation, and then,
2) From a corpus of one hundred consumer-generated reviews, MT output and original texts are manually processed to determine the level of text quality and
acceptance, and define, identify and classify error categories (lexical, syntactic and terminological)

In order to reach these objectives, a corpus of one hundred user reviews originally written in English and selected from a given time period was compiled from TripAdvisor, the leading online travel review platform regarding use and content available, that operates in 45 countries and in 28 languages. Currently, TripAdvisor stores more than 200 million reviews and opinions from travellers around the world on more than 4.5 million businesses and properties in 147,000+ destinations.
Consumer-generated content has a variety of types: (1) service evaluation, (2) feedback & interactive functions and (3) matching & search performance functions Schemmann (2011). One type of service evaluation review is has the form of free-style text and structured text, with different styles and length. In this paper, the corpus is composed of free-style texts because the objective is to identify how machine translation handles this new medium.
The corpus used in this research work was processed through Wordsmith Tools to accurately calculate the global statistics of the corpus, extract frequency wordlists and later on, observe the items selected for this research in their context and calculate their occurrences.
Unquestionably, online review data and text consumer-generated content have turned a reliable measurement of customer satisfaction, therefore research work on analytical
tools to process and transform these data would be beneficial for both parties:
customers and hoteliers. Especially in terms of improving MT output and contribute to
current studies on the characteristics of MT postediting.
Tourism 2.0 consumer-generated reviews opens new lines of research for linguists:
from the approach to specialized terminology and new text types, to the influence of the translation of tourism 2.0 on the target language, and the paradigm shift in the translation model: the active participation of the reader in the translation process.

———-

References

Allen, J. (2003) “Post‐editing” in Somers, H. ed., Computers and Translation. A translator’s guide. Amsterdam/Philadelphia: John Benjamins. 297-317.
Calvi, M. V. (2001). El léxico del turismo. Culturele. Barcelona: Universitat de Barcelona. [Accessed 25/03/2015 at http://www.ub.edu/filhis/culturele/turismo .html].
Colina, Sonia (2008). “Translation Quality Evaluation: Empirical Evidence for a Functionalist Approach”, The Translator 14(1), 97-134.
Cox, C. et al. (2008). Consumer-generated web-based tourism marketing.
Queensland: CRC for Sustainable Tourism Ltd.
FAMILITUR Informe Annual 2012. Instituto de Turismo de España. URL: www.iet.tourspain.es [19/04/2015]
Gouadec, Daniel (2010). “Quality in Translation.” in Yves Gambier and Luc Van Doorslaer (eds) Handbook of Translation Studies, Vol. 1. Amsterdam: John Benjamins, 270-275.
J2450, Quality Metric for Language Translation of Service Information.
Lauscher, Susanne (2000). “Translation Quality Assessment: Where Can Theory and Practice Meet?” The Translator 6(2), 149-168.
Seargeant, and Tagg, (2014). The Language of Social Media: Identity and Community on the Internet. London: Palgrave MacMillan Publishers.
Schemmann, B. (2011). A Classification of Presentation Forms of Travel and Tourism-Related Online Consumer Reviews. e-Review of Tourism Research 2.
O’Brien, S. (2012) “Towards a Dynamic Quality Evaluation Model for Translation”. The Journal of Specialised Translation, 17. 2012
Vásquez, C. (2014). Online consumer reviews. New York: Bloomsbury