warning: Creating default object from empty value in /home/local/www/molto-project.eu/sites/all/modules/i18n/i18ntaxonomy/i18ntaxonomy.pages.inc on line 34.

     

Progress

Beyond the state of the art

From the Corpora-List "Release: 23M German-English parallel sentences from patent text"

Institut für Computerlinguistik -- Universität Heidelberg

We are happy to announce the release of a parallel corpus of patent text for the German-English language pair. The corpus has been constructed from EPO, WIPO and USPTO patent documents extracted from the MAREC collection and contains 23 million sentence pairs from all patent text sections.

All sentences are labeled with metadata: patent document id, patent family, patent classification and publication date.

The corpus is distributed under a Creative Commons License.

Your Friendly Evaluation Coordinator here...

Hi everybody,

I shall probably pester a lot of you people when looking for evaluators and asking silly questions, so I take the liberty to introduce myself:

I'm Jussi Rautio, the week-old MT Evaluation Coordinator working at the University of Helsinki tasked to evaluate the quality of MOLTO translations.

ICT and Cultural Heritage: Research, Innovation and Policy - ERCIM news

ERCIM News devoted Issue 86, in July 2011, to ICT in Cultural Heritage. The pdf is online http://ercim-news.ercim.eu/images/stories/EN86/EN86-web.pdf and collects articles meant to reflect the current status of research in Europe in the area. The issue appeared before the work on Cultural Heritage in MOLTO started.

A new GF runtime is coming

A preview version of libpgf, a C-based reimplementation of the GF runtime, is now available. When finished, it should make GF technology accessible to applications that cannot make use of the current Haskell- and Java-based runtimes either due to resource constraints or interoperability concerns. In particular, libpgf should be easier to access from non-JVM-based programming languages. Bindings for Python are already in the pipeline.

Downloads and further information are available from the libpgf home page.

Ramona Enache is visiting UPC

Ramona Enache from UGOT is spending a research study visit at UPC to work with the local team on hybrid methods for robust statistical translation. She is one of the expert developer of GF, so do not miss talking to her if you are interested in the current research done at Chalmers on grammar-based machine translation.

ELRA - Language Resources Catalogue - Update

ELRA is happy to announce that 1 new Monolingual Lexicon, 3 new Speech Resources and 3 new Evaluation Packages are now available in its catalogue. Moreover, updated versions of the ESTER Corpus, ESTER Evaluation Package and Bulgarian WordNet have also been released.

Visit the On-line Catalogue: http://catalog.elra.info

Visit the Universal Catalogue: http://universal.elra.info

Archives of ELRA Language Resources Catalogue Updates: http://www.elra.info/LRs-Announcements.html

Hindi Resource Grammar Released

Hi All,

We are happy to release the first version of our Hindi GF resource grammar, and an improved version of the Urdu resource grammar.

Hindi is very similar to Urdu. Indeed they are in many ways different registers of one language: Hindustani, to use an old name. Modern Urdu and Hindi differ mostly in the advanced or recent parts of the lexicon. Urdu evolved from an old Delhi dialect with copious borrowing from Persian and some from Arabic, and is written in a Perso-Arabic alphabet. Hindi, a more recent evolution from Hindustani, has borrowed much more from Sanskrit.

Sindhi Resource Grammar Released

We are happy to release the Sindhi Resource Grammar. It is 5th Indo-Iranian language added in the GF resource grammar library (Others are Urdu, Punjabi, Persian, and Nepali). The development took almost 6 months, and was developed as a Master thesis project. Sindhi belongs to the Indo-Aryan branch of the Indo-Iranian family. It is widely spoken in Pakistan and India. In Pakistan it is the official language of Sindh (province of Pakistan), and in India it one of the scheduled languages officially recognized by federal government of India.

Interactive knowledge-based systems

ID: 
12

Use of resources

Node Budgeted Period 1 Period 2 (est) Period 3 (est)
UGOT 4 X X
UHEL
UPC
Ontotext
BI 20 X X
UZH
Timeline: 
January, 2012 - June, 2013
Related Publication: 
WP12

Phrasebook and MGL are now available for Russian

Phrasebook and Mathematical Grammar Library feature support for Russian now. This work has identified some shortcomings in the Russian resource grammar, so in order to build the new versions, the development branch of GF resource grammars has to be installed.

The known issues in these two applications include word order in questions and agreement in cardinal determiners.

Syndicate content