ELG Tutorial 2020: How to ingest your Language Technology service or data set into the European Language Grid
Co-located with LREC 2020
Just like LREC 2020, the tutorial was cancelled due to the Covid-19 pandemic.
The tutorial is relevant for participants from industry or research/academia who develop Language Technology services in the form of application-ready prototypes (including written and spoken). Accordingly, general familiarity with NLP, NLG or speech applications, first experience with containerization-related technology (Docker, Kubernetes, Helm etc.) as well as general programming and software development experience is helpful. Participants are requested to bring their own laptops, ideally with Docker and Python installed. If available but not necessary, participant scan bring their own LT services which can then be made available through the ELG platform.
This tutorial is organised under the umbrella of the EU project European Language Grid (ELG; 2019-2022). ELG develops a scalable cloud platform, providing, in an easy-to-integrate way, access to hundreds of commercial and non-commercial Language Technologies (LTs) for all European languages, including running tools and services as well as datasets and resources. At META-FORUM 2019 (Oct. 2019), a first prototype version of the ELG was demonstrated. Currently, about 100 functional services are up and running in the ELG; in addition, more than 200 LT resources in over 50 languages are available. By May 2020, more services, datasets and features such as user management as well as a more detailed catalogue will be available. The ELG aims at listing all relevant stakeholders, from technology development to research centres, from SMEs to large enterprises. All stakeholders can register themselves in the ELG – the “Yellow Pages” of European Language Technology. Through the ELG, companies, academic organisations and individual researchers can gain visibility, provide services, datasets and resources and make use of those services, datasets and resources made available through the platform by others. The online catalogue is browsable and searchable: users can filter and search for domains, sectors, regions, countries, languages, service types, datasets and more. The ELG is the first large-scale LT platform aiming at production use, based on containerisation technology (i.e., Docker) and the Kubernetes container orchestration platform. The ELG enables the commercial and non-commercial European LT community to deposit and upload their technologies and datasets into the platform and to deploy them through the grid. This tutorial explains how to integrate an application (= LT service) into the ELG platform and how to set up a profile page. For more details: https://www.european-language-grid.eu
Motivation and Topics of Interest
Language barriers affect cross-lingual communication and the free flow of knowledge and thought. Monolingual and multilingual Language Technologies can help overcome these omnipresent language barriers and significantly improve trade, administration, politics, culture, intercultural communication and understanding. The vision of technology-enabled multilingualism applies both to the whole planet and also to several multilingual regions – ELG’s focus is Europe with its 24 official EU Member State languages, 60 additional unofficial or minority languages as well as languages of immigrants and trade partners. To date, a general LT platform that attempts to bring together all offerings (in terms of functional processing services and datasets) does not exist. Instead, there are multiple activities that typically have a specific focus on a concrete domain. Language resources and datasets are distributed over different heterogeneous repositories and organisations. A similar situation exists for basic and also more sophisticated processing and generation tools including ASR and TTS, tokenisers, NER, morphological analysers, lexical resources, ontologies, sentence and word aligners, chunkers, parsers, generators ,translation systems, IR and IE systems, QA systems, dialogue systems, sentiment and emotion analysers, multi-modal systems processing speech, text, images and video, etc. Hundreds, if not thousands, of functional services exist but are extremely difficult to find, which is not only highly inefficient but also frustrating both for the user and also for the provider of the respective service. This situation is true both for the commercial and also for the academic area. Most European Language Technology developing companies are SMEs that operate in highly focused domains and/or geographic niches. ELG brings together the expertise of nine project partners from academia and industry. ELG is a project organised by the European LT community for the European LT community.
(times may vary)
09:00 Part 1 – Overview and Introduction
- Short overview: ELG – introduction, overview, history, concept
- Exploring the ELG catalogue (hands-on)
09:40 Part 2 – How to use the European Language Grid
- ELG as the “yellow pages” of the European LT community
- ELG’s metadata schema
- Use ELG’s metadata editor for creating/registering entities, e.g., for datasets, lexical conceptual resources, LT services, organizations, SMEs
10:30 Coffee Break
11:00 Part 3 – How to integrate an LT service/application into the ELG platform
- The ELG LT Service integration API
- Integration of a simple example service (hands-on)
- Custom helper libraries/tools provided by ELG that facilitate the process of creating an ELG compatible LT service (e.g., Spring Boot-based for Java)
12:00 Part 4 – Install the ELG tooling locally for testing purposes
- ELG platform architecture
- Local installation for testing purposes
13:00 End of Tutorial