Frequently Asked Questions - Technology

Technology

The main idea is to use interlinguas based on domain semantics and equipped with reversible generation functions. Thus translation is a composition of parsing the source language and generating the target language. An implementation of this technology is provided by GF, Grammatical Framework, grammaticalframework.org. GF is in MOLTO complemented by the use of ontologies, such as used in the semantic web. We will also use methods of statistical machine translation (SMT) for improving robustness and extracting grammars from data.

GF is a framework for defining multilingual grammars, each based on a common abstract syntax. The abstract syntax is defined by using type theory, in the same way as in logical frameworks. The natural language generation part is called concrete syntax, which is a feature-based grammar formalism equivalent to PMCFG (Parallel Multiple Context-Free Grammars) and has polynomial parsing behaviour. GF uses PMCFG as its "machine language", which is compiled from

GF has been developed for 12 years now, and multilingual GF-based translation has been tested in numerous applications, ranging from mathematics via software specifications to spoken dialogue systems (see GF homepage). We also believe there are lots of interesting domain translation tasks out there, even if we cannot provide a competitor to open-domain systems like Google translate.

Yes, it is, if we want to have a universal interlingua working for everything. This is why we don't believe we can ever translate newspapers with MOLTO techniques. However, domain-specific interlinguas have proved quite feasible. Notice that this move is similar to what has happened in ontologies: they have moved from universal ontologies to domain ontologies.

The first challenge is to scale up the size of applications. Not so much the number of languages, which we know how to manage already, but the lexicon size - from hundreds to thousands of words. We need techniques to build manually and extract automatically such translation lexica. This leads to the second challange, which is to minimize the development effort, in terms of skills and time: to make GF available for people with no special training, as a part of their normar work flows.

This is perhaps the most speculative research topic in MOLTO. We will, first of all, attach to the increasing efforts on hybrid systems, where statistics is used as fall-back of rule-based translation, and there are many yet-to-be-explored technical ideas around this. We will also use statistics to automatically extract translation rules, and to resolve ambiguities. But we want to maintain the control of the quality of the translation; thus we won't blindly return uncertain fall-back translations without warning the user about the uncertainty.

The main generic tools are extensions of GF with new user interfaces: a grammar engineer's tool for building systems for new domains, and a translator's tool for using a given translation system. On top of these generic tools, we will build tools tailored to the domains of our case studies. Thus, while the generic translator's tool will be usable in the mathematics domain as well, the users will appreciate its integration with computer algebra systems; the museum object tools will be integrated with existing tools for browsing the

Our code will run on all major operating systems: Linux, Mac OS X, and Windows. So users can download and install MOLTO tools on their own computers. But we will also make them available as web services. The translator's tool, in particular, should be usable within a web browser without any software downloading required. Some kinds of translators, e.g. tourist phrasebooks, will also be natural to run on mobile phones, e.g. on the iPhone and Android platforms. We will provide user interfaces adapted to these platforms, for both on-line and off-line use.

Here is a concrete example of how it can go on. Let's say you want to build a translator for arithmetic propositions. Then you build first of all an abstract syntax, which defines basic concepts such as the set of natural numbers, the properties "even" and "odd", and the relation "greater than"; properties and relations are functions from expressions to propositions. This is how the abstract syntax looks like in GF:



Nat : Set
Even : Exp -> Prop
Odd : Exp -> Prop
Gt : Exp -> Exp -> Prop
Sum : Exp -> Exp


You cannot ignore Google when working on machine translation: for most people, it is the state of the art for translation on the web. We see MOLTO translation as an approach diametrically opposed to Google's (precision rather than coverage) and also with different application (producer's rather than consumer's tool). The underlying technology is different: Google translation is based on statistics, MOLTO on grammars. Despite all these differences, hybrid systems might well combine MOLTO with Google translate. In hybrid systems, it is

We will collect feedback from our web-based demos. We will also use standard machine translation evaluation tools, BLEU and TAUS, and make comparisons with other translation tools. In addition to translation quality, we will measure the productivity and usability of our tools in user studies. And like many other European projects, we will have a scientific board with independent experts to monitor our progress.