Towards the Primary Platform for
Language Technologies in Europe

Towards Digital Language Equality: META-FORUM 2022 presents project results and newest release of European Language Grid

Three years after the last in-person meeting, the annual European conference on Language Technology returns to Brussels: META-FORUM 2022 takes place as a hybrid event on 8 and 9 June and focuses on “Joining the European Language Grid – Together Towards Digital Language Equality”. The conference presents the results of the European Language Grid and European Language Equality projects, but also includes highlights such as an LT industry session and insights into the needs and future demands of the European Language Technology community. Registration for online and on-site participation is open and, as usual, free of charge.

The 11th META-FORUM will be a very special edition of the international conference series on powerful and innovative language technologies for the information society: With both projects, European Language Grid and European Language Equality, coming to an end in June, the two-day event will be divided into presentations of the final results of the two initiatives, their continuation and the next steps. META-FORUM 2022: Joining the European Language Grid – Together Towards European Language Equality takes place on 8 and 9 June 2022 in Brussels as well as online and is as usual free of charge.

The first conference day will be devoted to the European Language Grid as well as the European Language Technology (LT) industry, with highlights such as a demonstration of the newest ELG platform and its features and use cases, a panel discussion on language-centric Artificial Intelligence (AI), an LT industry session, an outlook on the future of ELG and more. On the second day, the focus lies on the results and continuation of the European Language Equality project. The project team will present the measurement and monitoring of Digital Language Equality over the next years, in order to achieve the goal of full Digital Language Equality in Europe by 2030. A strategic research agenda according to this goal will be presented at META-FORUM 2022 and illustrated with numerous project results, such as the state of national LT and AI strategies in Europe and a demonstration of the Digital Language Equality Metric and Dashboard.

Georg Rehm, coordinator of ELG and co-coordinator of ELE: “We are extremely happy to return to Brussels for META-FORUM 2022 to celebrate the successful end of our two projects. For both of them, this year’s conference is more of a milestone than a curtain call. With the newest release of the ELG platform, a wide array of findings on the state of Digital Language Equality in Europe and a range of speakers from industry and research, the 11th META-FORUM will be more insightful and – thanks to its hybrid form – more accessible to the European LT community than ever. For both projects, we will also present our plans for the next steps and continuation of the initiatives.”

The registration is, as usual, free of charge. For further information, the full programme and the opportunity to register, please visit www.meta-forum.eu. Current updates about the conference and the named initiatives are shared through the European Language Technology social media channels on Twitter and LinkedIn and featured in the ELT Newsletter.

Overview
META-FORUM 2022
Joining the European Language Grid – Towards Digital Language Equality
Hybrid conference
BluePoint Brussels (Boulevard Auguste Reyers 80, 1030 Brussels) and online
8-9 June 2022
Free of charge
Programme, info and registration:
www.meta-forum.eu

Contact
Prof. Dr. Georg Rehm
Coordinator European Language Grid, Co-Coordinator European Language Equality
German Research Center for Artificial Intelligence (DFKI)
Speech and Language Technology (SLT)
Georg.Rehm@dfki.de
+49 30 23895 1833


Become an active member of the ELG Community – 5 simple steps for your organisation to join the European Language Grid

When you are reading this tutorial, you most likely have received a link from us that leads you to an entry of your organisation in the European Language Grid (ELG). There are many good reasons to have your company, research department or academic institution listed in ELG, the non-profit platform for Language Technologies in Europe. An overview of the ideas behind ELG and its many benefits can be found here. In this short five-step tutorial, we explain how you can take over (“claim”) your organisation’s page in ELG as your own. A brief step-by-step instruction can be found at the end of the tutorial.

To make things easier for you, we have taken the liberty to create a default entry for your organisation. This means that you do not have to set up a new page but can simply claim your organisation’s page so that you can modify it. We used public information to set up your page and invite you to edit your organisation’s page to make it complete, individual and representative by adding your logo, keywords and contact details. An edited page could look like this:

Screenshot of the ILSP organisation

The first step is to be logged into your ELG account. If you have not registered an ELG account yet, here is a guide on how to create it. An important note: An organisation’s ELG page can only be claimed by one user, so it would be ideal if you used your professional email address and ensured that you are the right person to do this for your organisation.

Once you are logged in and have your organisation’s ELG page open, click the “Claim” button in the top right corner. This sends an automatic message to us and we will validate your request. Afterwards, you receive an email from us that confirms your request, which also unpublishes the organisation entry from the ELG. This means that it can now be edited by you.

Screenshot of an organisation page

When you enter the “My Grid” section, which is found in the top right corner next to your name, you will find your organisation’s entry under “My items”. Here, you can edit the page, add further information and list contact details. Once you are finished, you submit the entry for publication. Our ELG team will do a quick technical check and re-publish your organisation’s page to the European Language Grid.

Screenshot of the my Grid section with the claimed organisation

Newly claimed organisations are frequently featured in a short profile in our ELT Newsletter, which has more than 4,000 readers, and will soon also be highlighted on the frontpage of the European Language Grid. So don’t wait and join the European Language Grid with your organisation and become part of the European non-profit network for Language Technology services, resources, companies, research organisations and users.

How to take over your organisation’s page in short:

1. Log in or register to ELG (please use your professional email address)

2. Open your organisation’s page and click the ‘Claim’ button (top right) – the ELG team validates your claim and informs you via email

3. Open your organisation’s page under ‘My items’ in the ‘My grid’ section

4. Edit your organisation’s page

5. Click ‘Submit for publication’ – the ELG team will then publish your page


Doubling the database: How research on Digital Language Equality led to 6,000 new resources for the European Language Grid

Over the course of a weekend in the middle of January 2022, the European Language Grid (ELG) doubled in size. More than 6,000 new data resources, tools and services for 87 different languages were added to the ELG platform, pushing the ELG much closer to one of its central objectives: developing into a joint European language technology platform in which ideally all relevant language resources and technologies are registered. With the update, we are now confident that the majority of resources available in Europe can be found in and through the ELG, whether they are corpora, tools, conceptual resources or models. How did that happen? A look at the beginnings of ELG’s sister project, European Language Equality (ELE), might help.

Prospering languages in a digital world

The ELE project’s main goal is to achieve Digital Language Equality in Europe by 2030. According to the preliminary definition, Digital Language Equality describes the state in which all languages have the technological support and situational context necessary for them to continue to exist and to prosper as living languages in the digital age. While this definition paints a clear and desirable picture for the future of multilingualism in Europe, the main work was still lying ahead: developing a strategic research, innovation and implementation agenda and roadmap that leads towards this desired state.

One of the key parts of the strategy agenda is the DLE metric, a measure or quantified index that allows to compare the levels of digital readiness of and across Europe’s languages. This metric combines several factors about each language taken into consideration, such as the number of its speakers, its recognition in the EU, but most importantly the level of technological support it currently receives. In order to suggest how digital language equality can become a reality for all European languages, detailed knowledge about the current state of technological support for each language is necessary. But how does one gather this amount of data for 87 different languages?

Creating the primary platform for European language technology

The task was part of the ELE investigation into the current LT support for Europe’s languages, in which 33 project partners from different countries described the status quo of their respective language, based on empirical data and findings. In addition to these national institutions with expertise in language technology, several associations such as the European Language Equality Network (ELEN) and the European Civil Society Platform (ECSPM) focussed on smaller languages within the European Union. Altogether, the ELE consortium gathered metadata from around 1,000 organisations such as LT companies, universities and research institutions in a total of 87 different languages. 4,147 new data resources and 2,216 new tools and services were identified and their metadata documented.

These are new tools and resources because the approximately 6,000 resources gathered by the ELE consortium had not been available in the European Language Grid yet, which already consisted of more than 5,000 resources from the European LT landscape. Including the additional 6,000 resources collected by ELE, the ELG platform now provides information about more than 11,000 language technology resources – either as ELG-compatible services that can be downloaded and used directly through the ELG, or in the form of metadata including links to the original hosting platform.

All data leads to Athens

The import itself was handled by the Institute of Language and Speech Processing (ILSP) of the “Athena” Research Center in Greece. The team in Athens, which forms part of both projects, coordinated the metadata collection effort, ensured the compatibility with the ELG platform, homogenised and curated all the metadata records to prepare them for the import. The new import includes both public as well as on-demand data and services, hosted directly by their providers or through platforms such as Huggingface or GitHub.

The ELE resource import represents a prime example for the effective collaboration between the two projects and the reason why we consider them sister projects: the development of the strategic research, innovation and implementation agenda and roadmap for full digital language equality requires a comprehensive and empirical overview of the current technological support of Europe’s languages. While the European Language Grid provides exactly this kind of service, the new data in return pushes it much closer towards one of its central objectives in becoming the primary platform for European language technology.

Join the European Language Grid




Lowering Language Barriers – How Coreon uses the ELG to provide access to multilingual resources

One of the main goals of the European Language Grid is to combat the fragmentation of the European Language Technology community. But how exactly can the ELG be used to aid communication across languages? A use case can be found in Coreon’s pilot project “Multilingual Knowledge Systems as Linguistic Linked Open Data”. Michael Wetzel, Managing Director of the Berlin-based company, explains the project in collaboration with the ELG and the act of providing access to multilingual resources. Find out how together, Coreon and the ELG help bridge the gap between languages and lower barriers of communication.

Logo of the company Coreon

Early on in its runtime, the ELG project put out a first open call for pilot projects. The idea was to help fund innovative language technology that would incorporate the ELG platform and be accessible through it. The call was opened with several intentions: to allow the ELG to grow and communicate with its user base and to let applicants have the opportunity to realize creative visions for LT through collaboration with the platform and the project. One of the projects that received funding was initiated by a German company called Coreon.

Who is Coreon?

Coreon is a Berlin-based company that produces a software by the same name, a Multilingual Knowledge System. This system allows users to manage, visualize and model data in so-called concept maps that are arranged and exploreable in forms such as tree diagrams. Coreon combines this system with terminology management: A browsable example on their website shows the Coreon system being used to model Eurovoc, a multilingual thesaurus run by the European Union. Here, the relations between words and their respective translations into 22 languages are visualised in a tree diagram. Mapping out knowledge and language can also be useful to software developers looking to create translation tools, chatbots or other software.

What is the Pilot Project?

The example mentioned above is accessible directly from a browser. While this works well for the exploration of a thesaurus, things become more complicated when trying to integrate Coreon’s technology into other developers’ projects. So far, this was only possible by exporting the data or via a rich yet proprietary API. The ELG’s open call for pilot projects gave Coreon the opportunity to create a simpler and faster way to access their repositories, with the ELG providing additional funding and support and thus lessening the risk.

According to Michael Wetzel, Managing Director of Coreon, the goal of the pilot project titled Multilingual Knowledge Systems (MKS) as Linguistic Linked Open Data was to create “a way easier and more straight-forward way to put our resources in other software applications”. The main development was the creation of a SPARQL endpoint, which allows for real-time and direct querying of repositories. This SPARQL endpoint can be found on the ELG, enabling users to access the Coreon repositories straight from the ELG. This service was technically not yet foreseen by the ELG, Wetzel explains: “We helped the technical folks in the grid to enhance their technical support for the kind of service which we were developing”. This led to a close cooperation between Coreon and the ELG.

How does Coreon work?

As the ELG itself aims to bridge the gap between multiple languages and make Language Technology more accessible, Coreon’s Pilot Project was particularly interesting to the ELG due to their particular combination of Knowledge Graphs as a way to model data with terminology management, which can incorporate many languages. But what do these Knowledge Graphs look like?

Knowledge Graphs, or in this case multilingual concept maps, are systems that link concepts with their subcategories in tree diagrams. For example, typing the word fish into the search bar of their Eurovoc visualisation results in this graph.

With one look at the resulting tree diagram, it is clear that the concept fish is a subcategory of a variety of other concepts and is itself divided into sea fish and freshwater fish. The sidebar notably lists a number of translations for fish and some of its related concepts. Linking knowledge and language in this way can be useful to software developers, e.g. for training a chatbot or in other NLP applications. With the successful implementation of the collaborative pilot project between Coreon and the ELG, this is now possible in a far more convenient way.

Summary

Because the pilot project was completed only a few months ago, its impact is still hard to predict, but Michael Wetzel is optimistic: “We are seeing that other software companies do start using these endpoints that we’ve developed. If you ask me in a year or two from now, I think we will see quite some integrations based on that technology.” His hope is that the pilot project will have a more unifying effect on the European LT community because multilingual knowledge systems are now more easily available to software developers for the creation of translation tools, chatbots and other software.

In a more general sense, this is quite similar to the goals of the ELG itself. Europe is wonderfully diverse, but the diversity also causes fragmentation, particularly in the LT community. To overcome this challenge, the ELG aims to become the central hub for European Language Technology. This would help bridge the gap between LT developers of different languages, as aspects like communication, collaboration and the availability of multilingual products like Coreon’s would be strengthened.

Wetzel is hopeful in this aspect. “The technological fragmentation, we can overcome. In Europe, we will continue to have hundreds of language or software companies, focusing on language technologies, but let’s help them so that they can more easily connect with and complement each other’s services. I think this is what the ELG really is good for.”


How to use the ELG: Video tutorial for the European Language Grid

The ELG Video Tutorial hands you the basics of the European Language Grid: How to browse it, register, become a provider and upload, store and share resources like Language Technology tools and corpora. It also touches upon the ELG Python SDK, which is explained in detail in the ELG Documentation. For a shorter introduction into the functionalities and many advantages of the European Language Grid, have a look!


How does the European Language Grid strengthen linguistic diversity?

Happy faces and the ELG logo

Europe consists of more than 40 different countries and even more cultures. Everyone brings something unique to the table, languages being one of the more obvious aspects. Although it is possible to encounter five different languages within a fifteen minute train ride, this diversity is less represented when it comes to the digital world and especially language technology. As was shown in the META-NET White Paper Series in 2012, tools like machine translation, text-to-speech applications and text summarisation work predominantly in English, with languages like German, French and Spanish following closely behind. Languages with weaker support include Icelandic, Latvian, Welsh and Irish.

In order to preserve and strengthen Europe’s unique linguistic diversity, languages that are less widespread need to be equally supported and represented. Welsh serves as a fitting example here: although the overall use of the language was declining, the last few decades have been marked by revitalisation efforts – governmental, scientific and social – that work towards bilinguality being more common in Wales. One of the key aspects of this is strengthening bilingual communication and representation online.

For many, English is the go-to language of the internet. Not only is it used in communication; a lot of websites also default to English even though versions in other languages are available. Looking at the big picture, this risks smaller languages falling by the wayside. On an individual level, there is another reason for this to be an issue: not everyone speaks English, and for some of those that do, it can be a chore to get through a paragraph they would much more comfortably read in their own language. Once again regarding Welsh, there is a tool that provides a start in overcoming this issue: The Welshify Widget. The plugin lets users know when a Welsh version of a website is available and guides them through the process of changing their browser settings to set Welsh as their preferred language.

By highlighting Welsh versions of websites, the widget fosters an online environment that is more inclusive towards Welsh native speakers. There are a variety of digital language tools that have similar effects for a wide range of European languages, by making smaller languages available in the digital world and supporting their usage. Each one of them contributes towards strengthening linguistic diversity and equality among European languages.

In an effort to reach those goals, it is necessary to know where each European language has gaps in digital support. The European Language Equality (ELE) project examines 70+ European languages individually, analysing where sufficient support exists and where more is needed. The results of this research will be presented in a strategic agenda and roadmap, detailing what needs to be done to reach digital language equality by 2030.

In order to make that equality a reality, language resources need to reach their intended user base. Potential consumers need to know what is available. The European Language Grid (ELG) aims to facilitate this, among other things. The ELG is a platform that hosts European language technologies with the goal of becoming their primary hub. Companies and research facilities can upload and link their projects on ELG. Having one centralised hub like the ELG will enable developers to get the word out about their products, while users have an easier time finding and downloading the type of tool they want.

ELG also allows developers to test their tools or services, which in turn makes them easier and faster to finalize. This is also aided by the communication that is made possible through the ELG. Language technology developers are able to learn from and collaborate with each other, which, among other things, opens the door to potential translations of existing tools into other European languages. Faster development of tools and communication within the language technology community will quickly create more available technologies and resources. The heightened number and visibility of these resources will not only boost individual languages – in doing so, the linguistic diversity that already exists in Europe will be strengthened as well.

Tools like the Welshify Widget make the online experience more inclusive for non-English speakers and help revitalize the language of a European culture. The ELG as the main hub for European language technology aims to provide the platform for projects like these to reach their full potential and work towards digital language equality.


Survey for LT developers and users: Shape the future of European Language Technology

Despite the recognizable advantages and historical and cultural worth of multilingualism, the many European languages face a striking imbalance in terms of their preservation in the digital world and their support by language technology. The European Language Equality project (ELE) addresses this risk to European identity in the digital age by preparing a Strategic Research and Innovation Agenda and Roadmap working towards digital language equality by 2030. The European Language Grid (ELG) is closely related to this project, offering LT developers, researchers and providers an inclusive platform to present, share and market their language technologies and connect within the European LT community.

As part of the two projects that are funded by the European Commission and address an appeal by the European Parliament resolution titled “Language equality in the digital age”, we are reaching out both to LT developers and LT users to participate in a large-scale, EU-wide consultation that will impact and shape the future of language technologies in the multilingual continent. The two surveys are aimed on the one hand at academic and commercial developers in the field of Language Technology (LT), Natural Language Processing (NLP) and Language-centric Artificial Intelligence (AI) and on the other at all Language Technology users and consumers.

The questionnaire takes approximately 20 minutes to fill in; your answers will help evaluating the level of LT support for European Languages, indicating the challenges and highlighting the needs and expectations of professionals and users in the future. Your contributions will be carefully taken into account when preparing the ELE strategic agenda and roadmap.

The European Language Equality project is a pan-European effort that will significantly impact the field and funding situation of LT in Europe for the next 10 to 15 years. Help us shape the future of multilingualism in the digital age – join in!


  • Survey for Language Technology developers

  • Survey for Language Technology users and consumers

  • Anonymisation Training Day for Young Researchers

    On 25 May 2021, the H2020 project LT-Bridge will organise a training day focusing on anonymisation, which is especially targeted at young researchers in the area of computer linguistics, computer science and language technology. Supported by the Latvian Language Technology company Tilde, the free online event will take place from 11:00 am to 15:30 pm CEST via Zoom.

    The training day will give early-stage researchers the possibility to get together to explore the possibilities of anonymisation in the context of language data, in addition to connecting and extending their networks. Besides a general overview of anonymisation, including different strategies, levels and challenges with regard to language data, the training day will also focus more specifically on:

    – Anonymisation for monolingual text analysis
    – Named Entity Recognition for anonymisation
    – Parallel text anonymisation for machine translation
    – Anonymisation of speech data
    – Legal aspects of anonymisation
    – Available tools and applications

    To join the event, register here

    About LT-Bridge

    The LT-Bridge project is coordinated by the University of Malta’s AI Department and the Institute of Computational Linguistics. It aims to strengthen both the research and innovation capacities at the University of Malta and to create a European-level Centre of Excellence, which is capable of narrowing or even closing the technology gap for Maltese. The Horizon 2020 project started on 1 January 2021 and is based on a collaboration between University of Malta, the German Research Centre for Artificial Intelligence (DFKI), the ADAPT Centre of Dublin City University (DCU), and the IDIAP Research Institute (IDIAP) in Switzerland.


    CEF Market Study Report published


    Cover page of the market study
    Market study PDF download

    diagram of LT technology use, headline: Are you interested in or already using
    Status of use of LT in Europe
    In October 2019 the „Final study report on CEF Automated Translation value proposition in the context of the European LT market/ecosystem“ was published. The study was commissioned by the European Commission and it was carried out by Luc Meertens (CROSSLANG), Stefania Aguzzi (IDC) as well as ELG consortium members Khalid Choukri (ELDA) and Andrejs Vasiljevs (Tilde).
    This study provides an analysis of the Language Technology market in the EU (taking into account supply and demand), of LT adoption by public services in the EU, and of the EU’s competitiveness with respect to the US and Asia in three LT areas. The analyses show that suppliers are often SMEs with local solutions and that public services have a strong interest for translation technology.
    However, the worldwide LT market, that is dominated by large players, has deficiencies regarding under-resourced languages, customisation needs, and security and privacy requirements. Based on the results of the analyses, the study develops a business model for CEF AT by defining the latter’s value proposition in the context of the market.
    Title: Report on CEF Automated Translation value proposition in the context of the European LT market/ecosystem
    Project Number: 2019.1438
    Linguistic version Media/Volume Catalogue number ISBN DOI
    EN PDF PDF/Volume_01 KK-03-19-154-EN-N 978-92-76-00783-8 10.2759/142151