warning: Creating default object from empty value in /home/local/www/molto-project.eu/sites/all/modules/i18n/i18ntaxonomy/i18ntaxonomy.pages.inc on line 34.
Institut für Computerlinguistik -- Universität Heidelberg
We are happy to announce the release of a parallel corpus of patent text
for the German-English language pair. The corpus has been constructed
from EPO, WIPO and USPTO patent documents extracted from the MAREC
collection and contains 23 million sentence pairs from all patent text
sections.
All sentences are labeled with metadata: patent document id, patent
family, patent classification and publication date.
The corpus is distributed under a Creative Commons License.
Submitted by olga.caprotti on 10 September, 2012 - 17:10
ERCIM News devoted Issue 86, in July 2011, to ICT in Cultural Heritage. The pdf is online http://ercim-news.ercim.eu/images/stories/EN86/EN86-web.pdf and collects articles meant to reflect the current status of research in Europe in the area. The issue appeared before the work on Cultural Heritage in MOLTO started.
A preview version of libpgf, a C-based reimplementation of the GF runtime, is now available. When finished, it should make GF technology accessible to applications that cannot make use of the current Haskell- and Java-based runtimes either due to resource constraints or interoperability concerns. In particular, libpgf should be easier to access from non-JVM-based programming languages. Bindings for Python are already in the pipeline.
Downloads and further information are available from the libpgf home page.
Ramona Enache from UGOT is spending a research study visit at UPC to work with the local team on hybrid methods for robust statistical translation. She is one of the expert developer of GF, so do not miss talking to her if you are interested in the current research done at Chalmers on grammar-based machine translation.
ELRA is happy to announce that 1 new Monolingual Lexicon, 3 new Speech Resources and 3 new Evaluation Packages are now available in its catalogue. Moreover, updated versions of the ESTER Corpus, ESTER Evaluation Package and Bulgarian WordNet have also been released.
Submitted by Shafqat.Virk on 24 February, 2012 - 09:47
Hi All,
We are happy to release the first version of our Hindi GF resource grammar, and an improved version of the Urdu resource grammar.
Hindi is very similar to Urdu. Indeed they are in many ways different registers of one language: Hindustani, to use an old name. Modern Urdu and Hindi differ mostly in the advanced or recent parts of the lexicon. Urdu evolved from an old Delhi dialect with copious borrowing from Persian and some from Arabic, and is written in a Perso-Arabic alphabet. Hindi, a more recent evolution from Hindustani, has borrowed much more from Sanskrit.
Submitted by Shafqat.Virk on 21 February, 2012 - 12:17
We are happy to release the Sindhi Resource Grammar. It is 5th Indo-Iranian language added in the GF resource grammar library (Others are Urdu, Punjabi, Persian, and Nepali). The development took almost 6 months, and was developed as a Master thesis project.
Sindhi belongs to the Indo-Aryan branch of the Indo-Iranian family. It is widely spoken in Pakistan and India. In Pakistan it is the official language of Sindh (province of Pakistan), and in India it one of the scheduled languages officially recognized by federal government of India.
Submitted by nikita.frolov on 25 November, 2011 - 11:15
Phrasebook and Mathematical Grammar Library feature support for Russian now. This work has identified some shortcomings in the Russian resource grammar, so in order to build the new versions, the development branch of GF resource grammars has to be installed.
The known issues in these two applications include word order in questions and agreement in cardinal determiners.