Towards the Primary Platform for
Language Technologies in Europe

Multilingual Anonymisation for Public Administrations

Short Name: MAPA
Name: Multilingual Anonymisation for Public Administrations
Coordinator: Manuel Herranz, Pangeanic
Consortium: Pangeanic, Tilde, ELDA, SEDIA, LIMSI, University of Malta and Vicomtech
Project Runtime: January 2020 – December 2021
Funded by: European Commission
MAPA project is funded under the Connecting Europe Facility programme, whose goal is the development of an open-source de-identification toolkit for all official European Union languages. The MAPA anonymisation toolkit will rely on Named Entity Recognition and Classification (NERC) techniques using the latest neural networks and deep learning techniques. MAPA will count on a data collection activity to provide the necessary training and testing data for the toolkit development. Data is currently being identified and collected for the 24 relevant European languages. As a part of the project, a connection to eTranslation, 8 an online machine translation service provided by the European Commission, will be established to foster the provision of multilingual datasets by public administrations that may in turn improve the coverage and quality of machine translation systems. The toolkit will be publicly available and particularly targeted to public administrations in the health and legal domains, as a result of the specific use cases addressed during the development of the project.