A very good idea. But why not proposed by EU institutions?
Please read: Europarl: A Parallel Corpus for Statistical
Machine Translation, Philipp Koehn, MT Summit 2005, pdf.
Please cite the paper, if you use this corpus in your work. See also the
extended (but earlier) version of the report (ps, pdf).
The Europarl parallel corpus is extracted from the proceedings of the European Parliament. It includes
versions in 11 European languages: Romanic (French, Italian, Spanish,
Portuguese), Germanic (English, Dutch, German, Danish, Swedish), Greek and
Finnish.
The goal of the extraction and processing was to
generate sentence aligned text for statistical machine translation systems. For this purpose
we extracted matching items and labeled them with corresponding document IDs.
Using a preprocessor we identified sentence boundaries. We sentence aligned the
data using a tool based on the Church and Gale algorithm.
http://www.statmt.org/europarl/
See also
Comments