A Large Spanish-Catalan Parallel Corpus Release for Machine Translation
Keywords:
Catalan-Spanish parallel corpus, machine translationAbstract
We present a large Spanish-Catalan parallel corpus extracted from ten years of the paper edition of a bilingual Catalan newspaper. The produced corpus of 7.5 M parallel sentences (around 180 M words per language) is useful for many natural language applications. We report excellent results when building a statistical machine translation system trained on this parallel corpus. The Spanish-Catalan corpus is partially available via ELDA (Evaluations and Language Resources Distribution Agency) in catalog number ELRA-W0053.Downloads
Download data is not yet available.
Downloads
How to Cite
Costa-Jussà, M. R., Fonollosa, J. A. R., Mariño, J. B., Poch, M., & Farrús, M. (2015). A Large Spanish-Catalan Parallel Corpus Release for Machine Translation. Computing and Informatics, 33(4), 907–920. Retrieved from http://147.213.75.17/ojs/index.php/cai/article/view/2807
Issue
Section
Articles