Tools like Systran (Babelfish) and Google Translate are designed for consumers of information, but we will mainly serve the producers of information. We want the quality to be good enough so that, for instance, an e-commerce site can translate their web pages automatically without the fear that the message will change. With other tools, a potential customer can, for instance, read an e-commerce page written in French and translate it into Swedish just to find out whether the shop has something of interest for her.
There is a price we have to pay of course: we will not be able to translate just anything. We can only translate things that we have customized the system to translate. This follows from a well-know trade-off in machine translation: one cannot at the same time reach full coverage and full precision. In this trade-off, Systran and Google have opted for coverage whereas MOLTO opts for precision.
MOLTO translators are specialized to different domains, which use language in uniform and well-understood ways. In MOLTO itself, we will build systems for three such domains: mathematical exercises, biomedical patents, and museum object descriptions. But these domains are just examples, which help us to develop and evaluate the tools; we expect the tools to be applicable to new domains by other people. Examples of such domains could be e-commerce sites, Wikipedia articles, contracts, business letters, user manuals, and software localization.
No. "Newspaper text" is not a well-defined domain in MOLTO's sense, at least not in the light of the knowledge we have today. So we leave it to other tools to translate newspapers, novels, and random web pages.
This is exactly what we want to make easier. Traditionally, it has been an effort of years to build a translation system of any reasonable size. We want to bring this down to months, in some cases even to days. And we want it to be doable for persons without special training in MOLTO, in linguistics, or in programming. Read the "Technology" section to find out how we believe we can do this.
No. Firstly because we cannot translate outside well-defined domains. Secondly, and more interestingly, we will provide new working modes for human translators: instead of translating similar documents in the same domain over and over again, they will be able to work on customizing the translation systems. The systems will learn from a few well-chosen examples, translated by humans, how to translate other texts within the same domain. This will raise the translator's work to a higher level.
Human translators will always be better than MOLTO at making intelligent decisions about style, and hence produce more elegant text. On the other hand, MOLTO will be good at terminologies and idiomatic usages in specialized domains, for which human translators might lack training.
MOLTO is committed to dealing with 15 languages, which includes 12 official languages of the European Union - Bulgarian, Danish, Dutch, English, Finnish, French, German, Italian, Polish, Romanian, Spanish, and Swedish - and 3 other languages - Catalan, Norwegian, and Russian. But during the project, other languages are likely to be added, since they are provided by other on-going projects.
The main thing we use for each language in MOLTO is a resource grammar, which is actually a software library that defines the grammatical rules of the language: its word inflection and syntactic structures. Writing a resource grammar for a new language requires an effort of 3--6 months from a reasonably skilled programmer with good theoretical and practical knowledge of the language.
There is on-going work on at least Arabic, Farsi, Hebrew, Hindi/Urdu, Icelandic, Japanese, Latvian, Maltese, Portuguese, Swahili, Tswana, and Turkish. The EU languages that still lack developers are Czech, Estonian, Greek, Hungatian, Irish, Lithuanian, Slovak, and Slovene. You are most welcome to contribute to any of these languages!
We will release the first prototype of MOLTO web service in June 2010. This prototype will be constantly updated, and more mature tools will be released during 2011. The case studies will be finished in late 2012. But you can already now get an idea of the underlying technology by trying out a fridge magnet demo or a text input demo.
We will receive feedback from users continuously, and fix all errors as soon as possible. One advantage with MOLTO technology is that it is highly programmable: we can locate errors in translations with high precision, and produce a fixed version of the system quickly without breaking anything else.