listed in order by surname of (1st) presenter
Khetam Yasseen Alsharou, PhD researcher in Translation Studies, University College London, Centre for Translation Studies (CenTraS) Khetam Al Sharou , B.A., MSc, is a PhD researcher in Translation Studies at University College London, working on a thesis provisionally entitled ‘Training of Moses on the English-Arabic Combination. Features and Usability of Open Source Translation Technologies in a Master Level Translator Training Programme’. As part of my project, I am offering intensive training on the use of the unix-based machine translation engine Moses – which is well established in the DGT – in MA programmes that work on English into Arabic and intensively train translators as proficient users of translation memories, machine translation, and further translation technologies.
My long-term goal is to be actively involved with research and teaching in this area that inspires me. Other research interests include Post-Editing and Evaluation of Machine Translation. As an undergraduate in Syria, I was appointed as a lecturer at the University of Damascus, which later fully sponsored Miss Al Sharou MSc studies at Heriot Watt in Edinburgh, UK, during which I engaged and acquired competences and ability to use a range of translation technologies (Trados Studio, DVX-based memories, MemoQ). Currently, I have been offered the opportunity of working together with my supervisor (Dr Federico Federici) on a journal article provisionally entitled ‘Moses, Time, and Crisis Translation: an experiment of intensive training’ to be submitted to the journal Translation and Interpreting Studies in May 2016. |
Translator Training: Exploration of Students’ Abilities and Needs An extensive body of literature has emphasized the ongoing need to constantly keep translation curricula up-to-date, pointing out the issue of integrating more technology into translators’ training.
Recent research and development of machine translation (MT) solutions of free and open source software (FOSS) statistical machine translation (SMT) systems could become the new frontier of CAT technology; that is, they can be ergonomically integrated within the human translation workflow. The research presented here set out to test the hypothesis that the FOSS SMT Moses can be integrated into master-level translation training programmes offered to students working from English into Arabic; this hypothesis was tested by exploring the usefulness and usability of Moses in relation to students’ learning progress in intensive modules. The present paper discusses the results of a project collecting evidence to verify the assumption above. A set of participants, consisting of MA students of translation studying at Sultan Qaboos University and University of Jordan, agreed to participate in the data collection. All data was collected between November 2015 and June 2016. The participants completed a questionnaire at the start of their teaching programme and at its completion. The data collection tools also consisted of interviews and focus group discussions. |
Lindsay Bywood (University of Westminster) and Andrew Lambourne (Leeds Beckett University) Lindsay Bywood is Senior Lecturer in Translation Studies and teaches translation, audiovisual translation, and project management for translators at postgraduate level. She holds an MA in German and Philosophy from the University of Oxford, an MA in Translation from the University of Salford and is currently writing up her PhD. Her research centres around the diachronic variation in the subtitling of German films into English, with other research interests in machine translation and post-editing and the interface between industry and academia. Before becoming an academic she worked for many years in the audiovisual translation industry and is now responsible for the postgraduate professional development programme for translators and interpreters. She is deputy editor of The Journal of Specialised Translation (JoSTrans), reviews editor for Perspectives: Studies in Translatology, and a director of the European Association for Studies in Screen Translation (ESIST).
Andrew Lambourne‘s career has focused on the design and development of computer-assistive tools which help busy professionals to become more productive, particularly in the areas of speech and language technology and textual information processing. Andrew’s R&D credentials were forged in the broadcast technology market where he pioneered many of the tools and techniques which make it possible to provide live TV subtitles for deaf and hearing-impaired people, as well as cost-effective subtitles for recorded programmes. He also created market-leading systems for the production and delivery of information services – starting with broadcast teletext and leading to the digital text and interactive services ancillary to TV broadcasts. He has extensive experience with using speech recognition and various transcription systems since 1995, and managed the development of a patented real-time speech-to-text alignment and speech following technology. Since live subtitling aims to achieve perfection in real-time transcription, Andrew has investigated and refined the techniques which can be used to maximise quality and accuracy, and has trained live re-speakers around the UK and Europe. With an interest in workflow support tools in general, he advises on how to set up productive partnerships between people and technology so as to get the best from the skills and capabilities which each provide. |
Automated Detection and Correction of Errors in Real-time Speech-to-text: a Research Approach Intralingual subtitles provide access to TV broadcasts for those viewers who are deaf or hearing impaired. For live programming, these subtitles have to be produced and transmitted in real time within just a few seconds, otherwise they lag the programme content and become meaningless. Real-time transcription direct from the broadcast content does not yield sufficiently high quality, therefore trained staff listen and either re-speak subtitle content, punctuation and commands to a speaker-dependent speech-to-text system, or key it using phonetic codes into a machine-shorthand system such as Stenograph. Either way, the task is challenging and the subtitles usually contain errors.
The purpose of this research is to look in detail at the kinds of errors which occur in typical real-time subtitles and suggest a detailed taxonomy, to understand more about the causes of such errors, and hence to investigate techniques which could be used either to prevent them (as part of the pre-production preparation) or to detect automatically that an error has occurred and either in an assistive or even automatic way correct it without significantly delaying the subtitle delivery, and without erroneously adjusting correctly transcribed speech. |
Félix do Carmo (University of Oporto) Félix do Carmo is an Invited Lecturer at the Faculty of Arts of the University of Porto and a member of CLUP – the Linguistics Research Centre of the University of Porto. He holds a Master’s in Translation Studies and is currently doing a PhD in Human Language Technologies. He is the Managing Director of TIPS, Lda., a translation company specialized in technical translation into Portuguese, where he manages translation teams and supervises professional internships of translators. He is an accredited trainer in translation and localization software and he has presented several papers on translation technologies in national and international conferences.
Co-authors who will not be attending the conference: Luís Trigo has a BSc on Economics and a MSc in Data Analysis and Decision Support Systems from the Faculty of Economics of the University of Porto. He worked for several years in publishing and marketing management, and he taught subjects focussing on technology and society. Currently, he works on Natural Language Processing and Business Process Consultancy for global companies, while collaborating with LIAAD (the Laboratory of Artificial Intelligence and Decision Support) at INESC TEC, doing research on the fields of Data Mining, Information Retrieval, Social Network Analysis and Visualization. Currently, he is writing a dissertation for the PhD on Human Language Technologies in the Faculty of Arts of the University of Porto. Belinda Maia was an Associate Professor at the Faculdade de Letras da Universidade do Porto until she retired in 2015, but remains a member of the Centro de Linguística da Universidade do Porto. She continues to do research in the areas of forensic linguistics, translation, human language technologies, and terminology, and she co-supervises PhD theses in these areas. |
From CATs to KATs Current technologies may lead to a revolution to Computer-Aided Translation (CAT) tools. Most of these technologies, which are behind the Machine Translation (MT) comeback, come from the field of Machine Learning. When these technologies are incorporated as extra supports to the tools used by translators, this new generation of tools may be renamed as Knowledge-Assisted Translation (KAT) tools.
We will offer our experience with some of the features which are available in some implementations, but this paper will concentrate on suggesting “Recommended Specifications” for such tools, by resorting to the capacities of Machine Learning methods complemented by Artificial Intelligence and Augmented Intelligence to deal with huge volumes of data. Our starting point is the tasks that translators perform in an interconnected world – clients, and human and machine resources. We will then present some of the Machine Learning features that may be used as supports to the work of translators and post-editors. From domain identification to resource management, there are several areas to study. At the end, zooming into the simpler editing tasks, there are complex theoretical and technological issues that are worth discussing, because they are at the centre of the adaptation that these tools should undergo. |
Eleanor Cornelius (University of Johannesburg) Eleanor Cornelius is a senior lecturer in the Department of Linguistics at the University of Johannesburg and has a doctoral degree in Applied Linguistics from the same institution. She teaches introductory courses to linguistics, and practical translation and psycholinguistics at undergraduate level. She also teaches Text-editing and Psycholinguistics at honours level. Dr Cornelius gained experience in both academic and professional contexts. She is a fully accredited simultaneous interpreter (English-Afrikaans; Afrikaans-English).
Dr Cornelius started her career as a language practitioner at the Bureau of the Woordeboek van die Afrikaanse Taal in Stellenbosch, after which she joined in the Department of Afrikaans at the Port Elizabeth campus of Vista University, and subsequently the Department of Afrikaans at the University of Fort Hare. In 1998, Dr Cornelius was appointed principal language practitioner in the State Language Services of the Department of Arts and Culture, where she worked as a translator of government documents such as legislation, policy documents, speeches etc. Arguably the biggest challenge of her career arrived when she assumed the position of deputy director of language planning at the Pan South African Language Board (PANSALB). She was tasked with establishing dictionary units for the each of the eleven official languages. By the time she left PANSALB to join the Rand Afrikaans University in 2002, all eleven units were established and fully functional. At the former Rand Afrikaans University, Dr Cornelius was responsible for the establishment and management of a foundation programme, and later extended degree programmes, for underprepared students in the Faculty of Humanities. Dr Cornelius managed these programmes for the past six years. At the beginning of 2008, Dr Cornelius relocated to the Department of Linguistics and Literary Theory (now the Department of Linguistics). She has a longstanding relationship with this Department, since she has been teaching practical legal translation and interpreting in a part-time capacity for many years. Eleanor has read papers at numerous local and international language conferences and academic development conferences. Dr Cornelius also regularly presents workshops on the topic of “Plain language” and, as a result, is sometimes referred to as “Mrs Plain Language”! She is often called upon to review papers for publication in scholarly journals and to act as external examiner by other universities for undergraduate modules and postgraduate studies. Dr Cornelius serves on the Council of the International Federation of Translators (FIT). In addition, she is the vice-chair of the South African Translators’ Institute (SATI), a SATI accredited simultaneous interpreter in two directions, a member of the Accreditation Committee of SATI, a member of the Linguistic Society of South Africa (LSSA) and a member of the South African Applied Linguistic Association (SAALA). She is also the liaison between DFKI (a Germany-based company dealing with MT and AI) and FIT on the QT21 project. |
Potential Impact of QT21 The presentation will describe the QT21 project from the perspective of the International Federation of Translators:
Six of the ways that humans currently relate with machine translation (MT) systems will be outlined, leading up to a seventh way that will be discussed in more detail. Huge volumes of texts need to be translated in different sectors of the economy globally. A feasible approach to meeting this need is to employ both MT and human teams, including translators, in addressing the world’s translation needs. Analytic evaluation of MT quality by human translators will be introduced, focusing on the MQM framework. The seventh way humans can relate with MT systems involves annotation, by humans, of specific errors in the raw MT using standardized error categories, rather than generating a single number indicating overall quality. The potential impact of QT21 on MT and professional translators will be considered. Through FIT, human translators will be able to participate in the development of improved MT systems. This will help them give objective advice to clients and to guide the developers of next generation translation tools. FIT’s position is there will be enough work for translators who see MT as an opportunity rather than a threat. |
Denis Dechandon (EU Translation Centre) Denis Dechandon has over 20 years’ experience in translation and linguistics, in office automation and in various management roles. After getting acquainted with the translation work and its requirements at EU level, he fully committed himself to the definition and implementation of various processes and workflows to provide a better support to linguists and to streamline the working of support teams.
Denis is in charge of a service dedicated to the linguistic and technical support provided to translators, revisers, editors, captioners and subtitlers (i.a. CAT, corpus management, formatting and layouting, machine translation and terminology) and to the maintenance and enhancement of tools and linguistic resources at the Translation Centre for the Bodies of the European Union. Committed to further changes and evolutions in these fields, Denis took over the role of IATE Tool Manager in May 2015. |
From IATE to IATE 2 or When Technologies are Agents of Change and Means to Improve User Satisfaction The migration to the revamped, modernised and upgraded InterActive Terminology for Europe, the EU’s inter-institutional terminology database, is going through a thorough IT development process designed to produce a brand new tool built around some major lines.
Keeping in mind all improvements required, as defined by a dedicated task force reporting to the IATE Management Group (‘IMG’), or needed, due to the obsolescence of some technologies used over the last 12 years or to the availability of new technologies that could better serve users’ needs, and taking into account the current state of a tool which had undergone corrective and evolutive maintenance over time with an increasing number of technical limitations , it was finally proposed and accepted to go for a brand new tool. The rebirth of IATE was announced. Over the last two years the interinstitutional cooperation took a new rise: all Task Forces of the IMG brought ideas and expressed needs, while engaging in complementary activities, such as a vast cleaning of IATE entries. ‘Making life easier for users’, ‘responsive web design’, ‘improved collaborative work’, ‘improved return on investment’, ‘integration with CATT tools’, through new technologies and automations are all corner stones of the IATE 2 project. |
Emmanuelle Esperança-Rodier (University of Grenoble) Emmanuelle Esperança-Rodier is a lecturer at Université Grenoble Alpes (UGA), France, Laboratoire d’Informatique de Grenoble (LIG), where she teaches English for Specific Purpose. After defending a PhD in computational linguistics, titled “Création d’un Diagnostique Générique de Langues Contrôlées, avec application particulière à l’Anglais Simplifié”, she worked as a post-editor in a translation agency. Back at University, she participed in IWSLT and WMT evaluation campaigns, as well as in several LIG projects. She now works on the evaluation of MT systems based on competences and focused on tasks, translation error analysis and multilinguism.
Co-author who will not be attending the conference: Johan Didier |
Translation Quality Evaluation of MWEs from French into English Using an SMT System Nowadays, Statistical Machine Translation (SMT) is widely available. Nevertheless, using MT at its best is not an easy task. Structures appearing sporadically trigger most of the regular mistakes of SMT systems. We work on one of those structures: the Multi Word Expressions (MWEs).Our study aims at evaluating the quality of MWE translation obtained using SMT.
Firstly, we present the process of our quality evaluation of the English translation got via an SMT system created using Moses Toolkit [Koehn et al., 2007], of one French technical document. On the French document, MWEs have been semi-automatically annotated according to their type [Tutin et al., 2015]. Secondly, we describe the linguistic criteria of Vilar’s classification of translation errors [Vilar et al. 2006] as well as the adaptation we had to perform to use Blast [Stymne, 2011]. Thirdly, we analyse the global results of our quality evaluation before going into details, in our fourth part, on Full Phraseme MWE. We finally show that most of the French MWEs are translated into English MWEs, and that we need to implement in further work a collaborative error annotation tool. |
Claudio Fantinuoli (Johannes Gutenberg University) Claudio Fantinuoli is Senior Lecturer at the Johannes Gutenberg University Mainz in Germersheim. His research and teaching areas include corpus-based translation and interpreting studies as well as information management for translators and interpreters |
InterpretBank. Redefining Computer-assisted Interpreting Tools This paper presents InterpretBank, a computer-assisted interpreting tool developed to support conference interpreters during all phases of the interpreting process. The overall aim of the tool is to create an interpreter’s workstation which allows conference interpreters to optimize the workflow before, during and after the event they are called upon to interpret. The tool takes into consideration the specific needs of conference interpreters, such as the way they prepare for a conference, the modality of terminology access, and so forth. It also exploits the latest advances in computational linguistics, especially in the field of information retrieval and text mining, making use of the abundance of information available on the Web to provide interpreters with specialized information which can be used to increase the quality of interpreter performance. The paper also introduces some theoretical principles of the use of terminology tools in interpretation and the results of initial empirical experiments conducted with this software. |
David Filip (Trinity College Dublin) David Filip is Chair (Convener) of OASIS XLIFF OMOS TC; Secretary, Editor and Liaison Officer of OASIS XLIFF TC; a former Co-Chair and Editor for the W3C ITS 2.0 Recommendation; Advisory Editorial Board member for the Multilingual magazine; and co-moderator of the Interoperability and Standards WG at JIAMCATT.
His specialties include open standards and process metadata, workflow and meta-workflow automation. David works as a Research Fellow at the ADAPT Research Centre, Trinity College Dublin, Ireland. Before 2011, he oversaw key research and change projects for Moravia’s worldwide operations. David held research scholarships at universities in Vienna, Hamburg and Geneva, and graduated in 2004 from Brno University with a PhD in Analytic Philosophy. David also holds master’s degrees in Philosophy, Art History, Theory of Art and German Philology. |
Why XLIFF and Why XLIFF 2 ? This is to inform the business and decision making communities among the ASLING audience about the high level benefits of bitext and XLIFF 2. Translator and Engineering communities will also benefit, as they need the high level arguments to make the call for XLIFF 2 adoption in their organizations.
We start with a conceptual outline and simple non-technical examples what bitext is, what different sorts of bitext exist and how they are useful at various stages in various industry processes, such as translation, localisation, terminology management, quality and sanity assurance projects etc. Examples of projects NOT based on bitext are given, benefits and drawbacks compared on a practical level of tasks performed. Issues introduced by monolingual editing of content in multilingual content stores will be discussed. The following will be demonstrated:
|
Daniela Ford (University College London) Daniela Ford has an MSc in Technical Translation from the University of Hildesheim, Germany. She started her professional career in London where she worked 5 years as an in-house translator (French/English into German) before going freelance in 1999 and then forming her own limited company. Her main subject areas are technical and software localization, and she works for many international blue-chip companies.
Daniela Ford has been teaching part-time on the MSc Translation at Imperial College London since 2001 (when the course was launched), both as a Teaching Fellow for Practical Translation as well as CAT tools, and is continuing to teach on this course since it was transferred to University College London in 2013. She has also been involved in teaching a module on translation memory and machine translation at the University of Westminster in London and is currently still teaching as a visiting lecturer on translation technologies at the University of Westminster as well as several other universities in and outside of London, including the Middle East. She was involved in a 3-year EU-funded project on creating e-learning courses for translators and is the author and moderator of the e-learning course on Software Localization (formerly at Imperial College and now at UCL, http://www.ucl.ac.uk/centras/professional-online-courses/online-course-localisation) which, since 2009 when it was launched, has been running continuously 3 times a year. The course attracts participants from all over the world. Daniela Ford is an SDL certified trainer for SDL Trados technologies, and she has given several talks at international conferences including Aslib Translating & The Computer (London) and the ITI (Institute of Translation & Interpreting) Conference in the UK. She is also a Committee member of the London Regional Group of the ITI. A keen reader and language enthusiast, she has learned around 10 languages so far in her life, and has a passion for everything related to language technologies including software development and localization. Daniela Ford is married and lives and works in London. |
Can you Trust a TM? Results of an Experiment Conducted in November 2015 at CenTraS @ UCL In November 2015, an experiment was conducted at CenTraS @ UCL, with 69 MSc Translation students, covering a total of 14 languages. Most students were new to TM tools before they came to UCL in October 2015.
Students were given a TM for their language combination, which contained several correct 100% matches, as well as several deliberately incorrect 100% matches. The source text given for translation contained several identical sentences, very similar sentences compared to what was in the TM as well as several new sentences for translation. Students were not told that the TM was "faulty". The students translated the short text into their mother tongue and submitted their updated TM as well as their bilingual file. Evaluation of the experiment started about 6 months after the original experiment, with several students assisting with the evaluation. The aims of the evaluation were to find out whether students blindly trusted the content of the given TM, or whether they picked up on incorrect 100% matches as well as very small differences such as formatting. An interesting part of the study was to find out whether there were cultural differences since data was available from students from 14 different countries. |
Xiaotian (Fred) Guo (New Vision Languages) I had been a teacher of English in Henan Normal University in China since 1984 before paying a jointly sponsored ten-month long academic visit to the University of Birmingham by British Council and the Chinese Education Commission in 1997 when I was introduced to corpus linguistics. In 2000, after completing two years of service back to China in the university, I came back to Birmingham for a PhD study under Professor Susan Hunston in learner English. After having been awarded my PhD degree, I had a chance to teach an MA course called Translation Technology to international postgraduates in SOAS (London University) from 2009 to 2012. I was also invited by SDL to introduce the use of CAT tools and their Studio 2009 to their audience through a webinar and a software promotion conference in London. Currently I am doing freelance translation in the UK and have been invited by Henan Normal University to be an off-campus supervisor teaching MA students of through online teaching and intensive lecturing while visiting China. |
Drawing a Route Map of Making a Small Domain-specific Parallel Corpus for Translators and Beyond After years of development of corpus technologies, it has become obvious that translators can benefit directly from the achievements of this field. However, it seems that corpus advancement has not been deployed accordingly by translators to aid their translation. As a corpus linguist and translator myself, I believe that when corpus technologies are made attractive and simple enough and when they do feel a strong need and burning desire to make their own corpus to assist their translation, then application of such technology will gradually become part of a translator’s life, just as other computer-assisted translation (CAT) tools have done over the past ten years or so. This paper attempts to make a demonstration as to how easy it can be to DIY a corpus by building a small domain-specific corpus between English and Chinese in the field of financial services. The making of such a corpus has been summarised into three simple steps: 1) Collection of raw parallel language data; 2) Alignment of the parallel texts; 3) Segmentation and Annotation. It is hoped that other users of corpora including translation trainers, language teachers and students will also find this presentation informative and beneficial. |
Roger Haycock (Haycock Technical Services) Roger Haycock is a chartered electrical engineer who specialised in electrical power generation and high voltage systems. He has travelled extensively and always had an interest in languages. He is now a part time student of Translation Studies at Portsmouth University. |
A Case Study of German into English by Machine Translation: to Evaluate Moses Using Moses for Mere Mortals This paper evaluates the usefulness of Moses, an open source statistical machine translation (SMT) engine, for professional translators and post editors. It takes a look behind the scenes at the workings of Moses and reports on experiments to investigate how translators can contribute to advances in the use of SMT as a tool. In particular the difference in quality of output was compared as the amount of training data was increased using four SMT engines.
This small study works with the German-English language pair to investigate the difficulty of building a personal SMT engine on a PC with no connection to the Internet to overcome the problems of confidentiality and security that prevent the use of online tools. The paper reports on the ease of installing Moses on an Ubuntu PC using Moses for Mere Mortals. Translations were compared using the Bleu metric and human evaluation. |
Joss Moorkens (Dublin City University) Joss Moorkens is a lecturer and researcher in the ADAPT Centre, within the School of Applied Lan-guages and Intercultural Studies in Dublin City University (DCU) with interests in human evaluation of translation technology, ethics and translation technology, and translation evaluation.
Co-authors who will not be attending the conference: Federico Gaspari teaches English linguistics and translation studies at the University for Foreigners “Dante Alighieri” of Reggio Calabria (Italy) and is a postdoctoral researcher at the ADAPT Centre in Dublin City University, where he works on EU projects focusing on machine translation evaluation. Andy Way is a Professor of Computing at Dublin City University (DCU) and Deputy Director of the ADAPT Centre. He is a former President of the European Association for Machine Translation and edits the journal Machine Translation. Sheila Castilho is a postdoctoral researcher in the ADAPT Centre in Dublin City University. Her re-search interests include human and usability evaluation of machine translation, translation technology and audio-visual translation. Rico Sennrich is a Research Associate at the Institute for Language, Cognition and Computation, University of Edinburgh, where he has worked since 2013. His focus is on data-driven natural lan-guage processing, in particular machine translation, syntax, and morphology. Alexandra Birch is a researcher in the machine translation group in Informatics at the University of Edinburgh. She is interested in applying semantics and deep learning to problems in machine transla-tion. Antonio Valerio Miceli Barone is a researcher in the machine translation group in Informatics, at the University of Edinburgh. His research interests are machine translation and neural networks. Valia Kordoni is an Associate Professor at the Humboldt University of Berlin and coordinator of the Horizon 2020 project TraMOOC (Translation for Massive Open Online Courses). |
A Crowd-sourced Comparative Evaluation of Phrase-Based SMT and Neural Machine Translation The use of machine translation (MT) has become widespread since statistical machine translation (SMT) became the dominant paradigm. However, there is growing interest among the research community in the possibilities of neural machine translation (NMT) based largely on impressive results in automatic evaluation. There has to date been no published large-scale human evaluations of NMT output.
This paper reports on a comparative human evaluation of phrase-based SMT and NMT in four language pairs, using a crowdsourcing platform to compare output from both systems using a variety of metrics. These metrics comprise automatic evaluation, human rankings of adequacy and fluency, error-type markup, and post-editing effort (technical and temporal effort). This evaluation is part of the work of the TraMOOC project, which aims to create a replicable semi-automated methodology for high-quality MT of educational data. While the primary intention for this evaluation is to identify the best MT paradigm for our proposed methodology for TraMOOC, we believe that our evaluation results will be of interest to the wider research community and to those in the translation industry interested in the deployment of cutting-edge MT systems. |
Jon Riding and Neil Bolton (United Bible Societies) Jon Riding Jon Riding leads the Glossing Technologies Project for United Bible Societies. The project develops language independent NLP systems to assist Bible translators by automatically analysing elements of natural languages. He is a Visiting Researcher at Oxford Brookes University. In addition to his work in computational linguistics for UBS Jon teaches Koine Greek, Classical Hebrew and Biblical Studies for various institutions in the UK including Sarum College – (where he is an associate lecturer). Jon’s research interests include the automatic analysis of complex non-concatenative structures in natural language, the development of the New Testament text and the writings of the early Church Fathers.Neil Boulton Neil Boulton works as part of the Glossing Technologies Project for United Bible Societies. The project develops language independent NLP systems to assist Bible translators by automatically analysing elements of natural languages. Previously most of his working life has been spent in various IT roles for British and Foreign Bible Society, based in Swindon, UK. |
What's in a Name? What’s in a name? This presentation describes the development of a language independent process for identifying proper-names in a text without recourse to lexica. The process is derived from a machine originally intended to analyse non-concatenative morphologies in natural languages. The particular context for this work is the task of managing the 5,000 or so proper-names found in a Bible, including the identification of close cognates and reporting instances where a related form does not appear to be present.
The need for such a system is explained and the process by which the machine is able to identify names in the target text is described. The problems posed by disparate orthographies are noted as is the machine’s ability to learn from successful parses. Results obtained from Eurasian, South American and African languages will be presented and discussed, common problems for the process identified and its possible use in the context of technical vocabulary suggested. A further application for the same process as a step towards automatic syntax analysis is considered and commonalities between the task of identifying morphology templates, ordered phoneme sets and syntax patterns noted. |
Ankit Srivastava (DFKI-German Research Center for Artificial Intelligence) Dr. Ankit Srivastava is a Researcher at the Language Technology Lab of the German Research Center for Artificial Intelligence (DFKI) in Berlin, Germany. He has extensive experience in the development of statistical machine translation systems, the integration of systems in Co-authors who will not be attending the conference: Prof. Dr. Felix Sasaki is a Senior Researcher at the Language Technology Lab of the German Research Center for Artificial Intelligence (DFKI) in Berlin, Germany. He has more than 10 years experience in dealing with research and industry topics related to the multilingual web. He has coordinated and participated in several EU projects (MultilingualWeb.LT, LIDER, FREME) building bridges between language technology, web technology, and the linked data community. Peter Bourgonje, Julian Moreno-Schneider, Jan Nehring – Biography texts to follow. Dr. Georg Rehm is a Senior Consultant at the Language Technology Lab of the German Research Center for Artificial Intelligence (DFKI) in Berlin, Germany. He is the Network Manager of META-NET, Manager of the German/Austrian office of W3C, and coordinator of EU project CRACKER as well as German Ministry funded project Digital Curation Technologies. |
How to Configure Statistical Machine Translation with Linked Open Data Resources In this presentatin we outline easily implementable procedures to leverage multilingual Linked Open Data (LOD) resources such as the DBpedia in open-source Statistical Machine Translation (SMT) systems such as Moses. Using open standards such as RDF (Resource Description Framework) Schema, NIF (Natural language processing Interchange Format), and SPARQL (SPARQL Protocol and RDF Query Language) queries, we demonstrate the efficacy of translating named entities and thereby improving the quality and consistency of SMT outputs. We also give a brief overview of two funded projects that are actively working on this topic. These are the (1) BMBF funded project DKT (Digitale Kuratierungstechnologien) on digital curation technologies, and (2) EU Horizon 2020 funded project FREME (Open Framework of e-services for Multilingual and Semantic Enrichment of Digital Content). This is a step towards designing a Semantic Web-aware Machine Translation (MT) system and keeping SMT algorithms up-to-date with the current stage of web development (Web 3.0). |