Towards the Primary Platform for
Language Technologies in Europe

Multilingual Resources for CEF.AT in the legal domain

Short Name: MARCELL
Name: Multilingual Resources for CEF.AT in the legal domain
Coordinator: Tamás Váradi, Research Institute for Linguistics
Consortium: Research Institute for Linguistics, Institute for Bulgarian Language “Prof. Lyubomir Andreychin”, University of Zagreb, Polish Academy of Sciences, Academia Romana, Jazykovedný ústav Ľ. Štúra Slovenskej akadémie vied, “Jožef Stefan” Institute
Project Runtime: 1 October 2018 – 31 March 2021
Funded by: European Commission
http://marcell-project.eu
The overall objective of Multilingual Resources for CEF.AT in the legal domain – MARCELL Action is to provide automatic translation on the body of national legislation (laws, decrees, regulations) in seven countries: Bulgaria, Croatia, Hungary, Poland, Romania, Slovakia and Slovenia. At present national legislation texts are not automatically available to CEF.AT and present Machine Translation (MT) systems could be improved if they had access to national legislative texts.
The Action aims to process two resources available in all seven languages concerned i.e. the multilingual ontology-based thesaurus EUROVOC on the one hand and the corpora of all national legislation in the respective languages on the other. As a result, the Action will produce the following deliverables:
1. Seven large-scale suitably pre-processed (tokenized and morphologically tagged) monolingual corpora of national legislation documents classified into EUROVOC topics/descriptors and enriched with EUROVOC and IATE terms identified.
2. Comparable corpus of seven languages aligned at the topic level domains identified by EUROVOC descriptors.
3. Croatian English parallel corpus consisting of ca. 1800 legislative documents.
In addition to the expected overall improvement of the MT system in the seven languages concerned, the Action will have an impact both on the e-justice and the Online Dispute Resolution Digital Service Infrastructures as the resources focus on national legislation, which is of direct relevance to both DSI’s.