Towards the Primary Platform for
Language Technologies in Europe

Cross-lingual Embeddings for Less-Represented Languages in European News Media

Short Name: EMBEDDIA
Name: Cross-lingual Embeddings for Less-Represented Languages in European News Media
Coordinator: Senja Pollak, Jožef Stefan Institute
Consortium: Jožef Stefan Institute, Queen Mary University of London, University of Ljubljana, University of La Rochelle, University of Helsinki, University of Edinburgh, Texta OU, AS Ekspress Meedia, Trikoder (Styria Media Group), OY Suomen Tietotoimisto
Project Runtime: 1 January 2019 – 31 December 2021
Funded by: European Commission
As Europe becomes more multicultural, access to fundamental resources such as local news and government services is limited by the great language diversity. For the EU to realise a truly equitable, open, multilingual online content and tools to support its management, new technologies allowing high quality transformations (not translations) between languages are urgently needed. While advanced natural language processing tools and resources exist for a few dominant languages, many of Europe’s smaller language communities—and the news media industry that serves them—lack appropriate tools for multilingual internet development and multilingual news industry.
The EMBEDDIA project seeks to address these challenges by leveraging innovations in the use of cross-lingual embeddings coupled with deep neural networks to allow existing monolingual resources and tools to be used across languages, leveraging their high speed of operation for near real-time applications, without the need for large computational resources. Across three years, the project’s six academic and four industry partners will develop novel solutions, focusing on morphologically rich European languages and test them in real-world news and media production contexts.