Towards the Primary Platform for
Language Technologies in Europe

Lowering Language Barriers – How Coreon uses the ELG to provide access to multilingual resources

One of the main goals of the European Language Grid is to combat the fragmentation of the European Language Technology community. But how exactly can the ELG be used to aid communication across languages? A use case can be found in Coreon’s pilot project “Multilingual Knowledge Systems as Linguistic Linked Open Data”. Michael Wetzel, Managing Director of the Berlin-based company, explains the project in collaboration with the ELG and the act of providing access to multilingual resources. Find out how together, Coreon and the ELG help bridge the gap between languages and lower barriers of communication.

Logo of the company Coreon

Early on in its runtime, the ELG project put out a first open call for pilot projects. The idea was to help fund innovative language technology that would incorporate the ELG platform and be accessible through it. The call was opened with several intentions: to allow the ELG to grow and communicate with its user base and to let applicants have the opportunity to realize creative visions for LT through collaboration with the platform and the project. One of the projects that received funding was initiated by a German company called Coreon.

Who is Coreon?

Coreon is a Berlin-based company that produces a software by the same name, a Multilingual Knowledge System. This system allows users to manage, visualize and model data in so-called concept maps that are arranged and exploreable in forms such as tree diagrams. Coreon combines this system with terminology management: A browsable example on their website shows the Coreon system being used to model Eurovoc, a multilingual thesaurus run by the European Union. Here, the relations between words and their respective translations into 22 languages are visualised in a tree diagram. Mapping out knowledge and language can also be useful to software developers looking to create translation tools, chatbots or other software.

What is the Pilot Project?

The example mentioned above is accessible directly from a browser. While this works well for the exploration of a thesaurus, things become more complicated when trying to integrate Coreon’s technology into other developers’ projects. So far, this was only possible by exporting the data or via a rich yet proprietary API. The ELG’s open call for pilot projects gave Coreon the opportunity to create a simpler and faster way to access their repositories, with the ELG providing additional funding and support and thus lessening the risk.

According to Michael Wetzel, Managing Director of Coreon, the goal of the pilot project titled Multilingual Knowledge Systems (MKS) as Linguistic Linked Open Data was to create “a way easier and more straight-forward way to put our resources in other software applications”. The main development was the creation of a SPARQL endpoint, which allows for real-time and direct querying of repositories. This SPARQL endpoint can be found on the ELG, enabling users to access the Coreon repositories straight from the ELG. This service was technically not yet foreseen by the ELG, Wetzel explains: “We helped the technical folks in the grid to enhance their technical support for the kind of service which we were developing”. This led to a close cooperation between Coreon and the ELG.

How does Coreon work?

As the ELG itself aims to bridge the gap between multiple languages and make Language Technology more accessible, Coreon’s Pilot Project was particularly interesting to the ELG due to their particular combination of Knowledge Graphs as a way to model data with terminology management, which can incorporate many languages. But what do these Knowledge Graphs look like?

Knowledge Graphs, or in this case multilingual concept maps, are systems that link concepts with their subcategories in tree diagrams. For example, typing the word fish into the search bar of their Eurovoc visualisation results in this graph.

With one look at the resulting tree diagram, it is clear that the concept fish is a subcategory of a variety of other concepts and is itself divided into sea fish and freshwater fish. The sidebar notably lists a number of translations for fish and some of its related concepts. Linking knowledge and language in this way can be useful to software developers, e.g. for training a chatbot or in other NLP applications. With the successful implementation of the collaborative pilot project between Coreon and the ELG, this is now possible in a far more convenient way.

Summary

Because the pilot project was completed only a few months ago, its impact is still hard to predict, but Michael Wetzel is optimistic: “We are seeing that other software companies do start using these endpoints that we’ve developed. If you ask me in a year or two from now, I think we will see quite some integrations based on that technology.” His hope is that the pilot project will have a more unifying effect on the European LT community because multilingual knowledge systems are now more easily available to software developers for the creation of translation tools, chatbots and other software.

In a more general sense, this is quite similar to the goals of the ELG itself. Europe is wonderfully diverse, but the diversity also causes fragmentation, particularly in the LT community. To overcome this challenge, the ELG aims to become the central hub for European Language Technology. This would help bridge the gap between LT developers of different languages, as aspects like communication, collaboration and the availability of multilingual products like Coreon’s would be strengthened.

Wetzel is hopeful in this aspect. “The technological fragmentation, we can overcome. In Europe, we will continue to have hundreds of language or software companies, focusing on language technologies, but let’s help them so that they can more easily connect with and complement each other’s services. I think this is what the ELG really is good for.”


How does the European Language Grid strengthen linguistic diversity?

Happy faces and the ELG logo

Europe consists of more than 40 different countries and even more cultures. Everyone brings something unique to the table, languages being one of the more obvious aspects. Although it is possible to encounter five different languages within a fifteen minute train ride, this diversity is less represented when it comes to the digital world and especially language technology. As was shown in the META-NET White Paper Series in 2012, tools like machine translation, text-to-speech applications and text summarisation work predominantly in English, with languages like German, French and Spanish following closely behind. Languages with weaker support include Icelandic, Latvian, Welsh and Irish.

In order to preserve and strengthen Europe’s unique linguistic diversity, languages that are less widespread need to be equally supported and represented. Welsh serves as a fitting example here: although the overall use of the language was declining, the last few decades have been marked by revitalisation efforts – governmental, scientific and social – that work towards bilinguality being more common in Wales. One of the key aspects of this is strengthening bilingual communication and representation online.

For many, English is the go-to language of the internet. Not only is it used in communication; a lot of websites also default to English even though versions in other languages are available. Looking at the big picture, this risks smaller languages falling by the wayside. On an individual level, there is another reason for this to be an issue: not everyone speaks English, and for some of those that do, it can be a chore to get through a paragraph they would much more comfortably read in their own language. Once again regarding Welsh, there is a tool that provides a start in overcoming this issue: The Welshify Widget. The plugin lets users know when a Welsh version of a website is available and guides them through the process of changing their browser settings to set Welsh as their preferred language.

By highlighting Welsh versions of websites, the widget fosters an online environment that is more inclusive towards Welsh native speakers. There are a variety of digital language tools that have similar effects for a wide range of European languages, by making smaller languages available in the digital world and supporting their usage. Each one of them contributes towards strengthening linguistic diversity and equality among European languages.

In an effort to reach those goals, it is necessary to know where each European language has gaps in digital support. The European Language Equality (ELE) project examines 70+ European languages individually, analysing where sufficient support exists and where more is needed. The results of this research will be presented in a strategic agenda and roadmap, detailing what needs to be done to reach digital language equality by 2030.

In order to make that equality a reality, language resources need to reach their intended user base. Potential consumers need to know what is available. The European Language Grid (ELG) aims to facilitate this, among other things. The ELG is a platform that hosts European language technologies with the goal of becoming their primary hub. Companies and research facilities can upload and link their projects on ELG. Having one centralised hub like the ELG will enable developers to get the word out about their products, while users have an easier time finding and downloading the type of tool they want.

ELG also allows developers to test their tools or services, which in turn makes them easier and faster to finalize. This is also aided by the communication that is made possible through the ELG. Language technology developers are able to learn from and collaborate with each other, which, among other things, opens the door to potential translations of existing tools into other European languages. Faster development of tools and communication within the language technology community will quickly create more available technologies and resources. The heightened number and visibility of these resources will not only boost individual languages – in doing so, the linguistic diversity that already exists in Europe will be strengthened as well.

Tools like the Welshify Widget make the online experience more inclusive for non-English speakers and help revitalize the language of a European culture. The ELG as the main hub for European language technology aims to provide the platform for projects like these to reach their full potential and work towards digital language equality.