Concept and Objectives

From the Corpora-List "Release: 23M German-English parallel sentences from patent text"

Submitted by cristina.españa on 6 March, 2013 - 15:50

Institut für Computerlinguistik -- Universität Heidelberg

We are happy to announce the release of a parallel corpus of patent text for the German-English language pair. The corpus has been constructed from EPO, WIPO and USPTO patent documents extracted from the MAREC collection and contains 23 million sentence pairs from all patent text sections.

All sentences are labeled with metadata: patent document id, patent family, patent classification and publication date.

The corpus is distributed under a Creative Commons License.

Robust and statistical translation methods

Login to post comments
Read more

Your Friendly Evaluation Coordinator here...

Submitted by Jussi.Rautio on 12 November, 2012 - 16:17

Hi everybody,

I shall probably pester a lot of you people when looking for evaluators and asking silly questions, so I take the liberty to introduce myself:

I'm Jussi Rautio, the week-old MT Evaluation Coordinator working at the University of Helsinki tasked to evaluate the quality of MOLTO translations.

Translation quality

ICT and Cultural Heritage: Research, Innovation and Policy - ERCIM news

Submitted by olga.caprotti on 10 September, 2012 - 17:10

ERCIM News devoted Issue 86, in July 2011, to ICT in Cultural Heritage. The pdf is online http://ercim-news.ercim.eu/images/stories/EN86/EN86-web.pdf and collects articles meant to reflect the current status of research in Europe in the area. The issue appeared before the work on Cultural Heritage in MOLTO started.

Multilingual grammars

Login to post comments

A new GF runtime is coming

Submitted by lauri.alanko on 8 July, 2012 - 11:00

A preview version of libpgf, a C-based reimplementation of the GF runtime, is now available. When finished, it should make GF technology accessible to applications that cannot make use of the current Haskell- and Java-based runtimes either due to resource constraints or interoperability concerns. In particular, libpgf should be easier to access from non-JVM-based programming languages. Bindings for Python are already in the pipeline.

Downloads and further information are available from the libpgf home page.

Progress

Login to post comments

Ramona Enache is visiting UPC

Submitted by olga.caprotti on 25 June, 2012 - 14:41

Ramona Enache from UGOT is spending a research study visit at UPC to work with the local team on hybrid methods for robust statistical translation. She is one of the expert developer of GF, so do not miss talking to her if you are interested in the current research done at Chalmers on grammar-based machine translation.

Robust and statistical translation methods

Login to post comments

ELRA - Language Resources Catalogue - Update

Submitted by olga.caprotti on 7 May, 2012 - 11:24

ELRA is happy to announce that 1 new Monolingual Lexicon, 3 new Speech Resources and 3 new Evaluation Packages are now available in its catalogue. Moreover, updated versions of the ESTER Corpus, ESTER Evaluation Package and Bulgarian WordNet have also been released.

Visit the On-line Catalogue: http://catalog.elra.info

Visit the Universal Catalogue: http://universal.elra.info

Archives of ELRA Language Resources Catalogue Updates: http://www.elra.info/LRs-Announcements.html

Translator's tools

Login to post comments

Contacting PLuTO (Patent Language Translations Online)

Submitted by cristina.españa on 26 April, 2012 - 13:24

Today I've attended a talk by John Tinsley from the Pluto project. It's been quite interesting and it's the first time I hear about their results!

I'll make a summary because he had said some things about their system and especially about the evaluation that can be useful also for us.

They have huge amounts of data, so many that they cannot use all of them in their webservice.

Login to post comments
Read more

Hindi Resource Grammar Released

Submitted by Shafqat.Virk on 24 February, 2012 - 09:47

Hi All,

We are happy to release the first version of our Hindi GF resource grammar, and an improved version of the Urdu resource grammar.

Hindi is very similar to Urdu. Indeed they are in many ways different registers of one language: Hindustani, to use an old name. Modern Urdu and Hindi differ mostly in the advanced or recent parts of the lexicon. Urdu evolved from an old Delhi dialect with copious borrowing from Persian and some from Arabic, and is written in a Perso-Arabic alphabet. Hindi, a more recent evolution from Hindustani, has borrowed much more from Sanskrit.

Grammar engineering

Login to post comments
Read more

Sindhi Resource Grammar Released

Submitted by Shafqat.Virk on 21 February, 2012 - 12:17

We are happy to release the Sindhi Resource Grammar. It is 5th Indo-Iranian language added in the GF resource grammar library (Others are Urdu, Punjabi, Persian, and Nepali). The development took almost 6 months, and was developed as a Master thesis project. Sindhi belongs to the Indo-Aryan branch of the Indo-Iranian family. It is widely spoken in Pakistan and India. In Pakistan it is the official language of Sindh (province of Pakistan), and in India it one of the scheduled languages officially recognized by federal government of India.

Grammar engineering

Login to post comments
Read more

Program (8 March 2012)

Work-packages Day

This day is a Consortium-only event. The program will be presentations concerning the details of each ongoing work-package. Since the review will occur soon, the presentations should be prepared as to serve also for the review.

We also have the possibility to attend the seminar talk, leaving at 16:30:

08.03.2012, 17:15 - Automated Reasoning for Ontology Engineering

Speaker: Prof. Dr.

Progress

Read more

Demos

Recent News

Recent Publications