The main idea is to use interlinguas based on domain semantics and equipped with reversible generation functions. Thus translation is a composition of parsing the source language and generating the target language. An implementation of this technology is provided by GF, Grammatical Framework, grammaticalframework.org. GF is in MOLTO complemented by the use of ontologies, such as used in the semantic web. We will also use methods of statistical machine translation (SMT) for improving robustness and extracting grammars from data.
GF is a framework for defining multilingual grammars, each based on a
common abstract syntax. The abstract syntax is defined by using
type theory, in the same way as in
logical frameworks.
The natural language generation part is called concrete syntax,
which is a feature-based grammar formalism equivalent to
PMCFG (Parallel Multiple Context-Free Grammars) and
has polynomial parsing behaviour.
GF uses PMCFG as its "machine language", which is compiled from
GF has been developed for 12 years now, and multilingual GF-based
translation has been tested in numerous applications, ranging
from mathematics via software specifications to spoken dialogue
systems (see
GF homepage).
We also believe there are lots of interesting domain translation
tasks out there, even if we cannot provide a competitor to
open-domain systems like Google translate.
Yes, it is, if we want to have a universal interlingua working for everything.
This is why we don't believe we can ever translate newspapers with MOLTO techniques.
However, domain-specific interlinguas have proved quite feasible. Notice that
this move is similar to what has happened in ontologies: they have moved from
universal ontologies to domain ontologies.
The first challenge is to scale up the size of applications.
Not so much the number of languages, which we know how to manage already,
but the lexicon size - from hundreds to thousands of words. We need techniques
to build manually and extract automatically such translation lexica.
This leads to the second challange, which is to minimize the development effort, in terms of skills and time: to make GF available for people with no special training, as a part of their normar work flows.
This is perhaps the most speculative research topic in MOLTO.
We will, first of all, attach to the increasing efforts on hybrid systems, where statistics is used as fall-back of rule-based translation, and there are many yet-to-be-explored technical ideas around this. We will also use statistics to automatically
extract translation rules, and to resolve ambiguities. But we want
to maintain the control of the quality of the translation; thus we
won't blindly return uncertain fall-back translations without warning
the user about the uncertainty.
The main generic tools are extensions of GF with new user interfaces:
a grammar engineer's tool for building systems for new domains,
and a translator's tool for using a given translation system.
On top of these generic tools, we will build tools tailored to
the domains of our case studies. Thus, while the generic translator's tool
will be usable in the mathematics domain as well, the users will
appreciate its integration with computer algebra systems; the museum
object tools will be integrated with existing tools for browsing the
Our code will run on all major operating systems: Linux, Mac OS X, and Windows.
So users can download and install MOLTO tools on their own computers. But
we will also make them available as web services. The translator's tool,
in particular, should be usable within a web browser without any software
downloading required. Some kinds of translators, e.g. tourist phrasebooks,
will also be natural to run on mobile phones, e.g. on the iPhone and
Android platforms. We will provide user interfaces adapted to these platforms,
for both on-line and off-line use.
Here is a concrete example of how it can go on. Let's say you want to build a translator for arithmetic propositions. Then you build first of all an abstract syntax, which defines basic concepts such as the set of natural numbers, the properties "even" and "odd", and the relation "greater than"; properties and relations are functions from expressions to propositions. This is how the abstract syntax looks like in GF:
Nat : Set
Even : Exp -> Prop
Odd : Exp -> Prop
Gt : Exp -> Exp -> Prop
Sum : Exp -> Exp
You cannot ignore Google when working on machine translation: for most
people, it is the state of the art for translation on the web. We see MOLTO
translation as an approach diametrically opposed to Google's
(precision rather than coverage) and also with different application
(producer's rather than consumer's tool). The underlying
technology is different: Google translation is based on statistics,
MOLTO on grammars. Despite all these differences, hybrid systems
might well combine MOLTO with Google translate. In hybrid systems, it is
We will collect feedback from our web-based demos. We will also use standard machine translation evaluation tools, BLEU
and TAUS, and make comparisons with other translation tools.
In addition to translation quality, we will measure the productivity and usability of our tools in user studies. And like many other European projects, we will have a scientific board with independent experts to monitor our progress.