warning: Creating default object from empty value in /home/local/www/molto-project.eu/sites/all/modules/i18n/i18ntaxonomy/i18ntaxonomy.pages.inc on line 34.
Institut für Computerlinguistik -- Universität Heidelberg
We are happy to announce the release of a parallel corpus of patent text
for the German-English language pair. The corpus has been constructed
from EPO, WIPO and USPTO patent documents extracted from the MAREC
collection and contains 23 million sentence pairs from all patent text
sections.
All sentences are labeled with metadata: patent document id, patent
family, patent classification and publication date.
The corpus is distributed under a Creative Commons License.
Ramona Enache from UGOT is spending a research study visit at UPC to work with the local team on hybrid methods for robust statistical translation. She is one of the expert developer of GF, so do not miss talking to her if you are interested in the current research done at Chalmers on grammar-based machine translation.
The MTSummit 2011 has been this week, including a workshop specialised on patent translation. MOLTO has been presented with talk at the workshop.
There have been presentations of the most important patents offices and, as expected, all of them apply manual evaluation to their translations. It seems interesting to us to use similar criteria to theirs in our evaluation.
Submitted by aarne.ranta on 9 September, 2011 - 14:31
We have now received OK from the EPO to proceed in getting a better license for the patent corpus we need for carrying out our work in WP5 and WP7. This means that we can publish the results more freely than with the previous, personal license.