Project abbreviation: SignON
Project name: Sign Language Translation Mobile Application and Open Communications Framework
Project coordinator: Prof. Andy Way (Dublin City University, Ireland) and Dr. Dimitar Shterionov (Tilburg University, The Netherlands)
Project consortium:17 European partners:
- Dublin City University
- Fincons Group
- Instituut voor de Nederlandse taal
- The University of the Basque Country
- Universitat Pompeu Fabra Barcelona
- Trinity College Dublin
- University College Dublin
- Vlaamse Gebarentaalcentrum
- Universiteit Gent
- KU Leuven
- Radboud Universiteit
- Tilburg University
- TU Dublin
Funding: This project has received funding from the European Union’s Horizon 2020 Research and
Innovation Programme under Grant Agreement No. 101017255
Project duration: 36 months
Main key words: Sign language translation, Sign to spoken language translation, Machine translation, Deep learning, Avatar, Application, Distributed software, Multilingual models, Interlingua
Background of the research topic: Access to information is a human right. In the modern, globalised world this implies access to multilingual content and cross-lingual communication with others. The World Health Organisation (WHO) reports that there are some 466 million people in the world today with (some degree of) hearing loss; it is estimated that this number will double by 2050. According to the World Federation of the Deaf (WFD), over 70 million people are deaf and communicate primarily via a sign language (SL).
Machine translation (MT) is a core technique for reducing language barriers that has advanced, and seen many breakthroughs since it began in the 1950s, to reach quality levels comparable to humans. Despite the significant advances of MT for spoken languages in the recent couple of decades, MT is in its infancy when it comes to SLs. Current, as well as past, state-of-the-art methods and models for MT rely on alignment of words (generally tokens) and sentences between two languages. Sign languages are visual languages. Sign language data is typically recorded as video; sometimes these videos are supported with textual annotations. As such, the (machine) translation process from sign to spoken language is broken down into (i) recognizing and converting sign language videos into an intermediate format and (ii) training a model to translate between this intermediate representation and text. In the reverse translation direction, from spoken language to sign language, a common approach is to use 3D animated characters that act the appropriate signs. In this process, once again, spoken language is translated into an intermediate representation.
The complexity of the problem, automatically translating between SLs or SL and spoken languages, requires a multi-disciplinary approach – image and video processing, machine translation, natural language processing and understanding, 3D animation, etc. Developing such a service that can easily be delivered to users via an application is even a greater challenge which requires a distributed, highly efficient software architecture.
Goal of the project: SignON aims to reduce the communication gap between the deaf, hard of hearing and hearing communities. We bring together the experience and know-how of deaf and hard of hearing communities with multidisciplinary academic and industry expertise. A team of experts from different backgrounds will develop the SignON communication service which will focus on the translation between sign and oral languages.
Project abstract: Any communication barrier is detrimental to society. In order to reduce such barriers in the communication between the deaf and hard-of-hearing (DHH) community and the hearing community, the SignON project is researching the use of machine translation to translate between sign and non-sign languages. SignON is a Horizon 2020 project funded by the European Commission, which commenced 01.01.2021 and runs until 31.12.2023.
Within the SignON project, we are developing a free and open-source framework for translation between sign language video input, verbal audio input, text input and sign language avatar output, verbal audio output or text output. Such a framework includes the following components: (1) input recognition components, (2) a common representation and translation component, and (3) output generation components.
1. The input side can consist of video containing a message in a sign language, in which case the meaning of the message in this specific sign language (Irish, British, Dutch, Flemish or Spanish Sign Language) needs to be recognised. Another input modality could be speech or text (English, Irish, Dutch or Spanish).
2. We will use a common representation for mapping of video, audio and text into a unified space that will be used for translating into the target modality and language. This representation will serve as the input for the output generation component.
3. The output generation component is concerned with delivering the output message to the user. In the simplest case the output is plain text; but it could also be speech, in which case a commercial text-to-speech (TTS) system will be used, or it could be that the requested output should be signed in one of the specified sign languages. In that case, the message will first be translated into a computational, formal representation of that specific sign language (Sign_A), which will then be converted into a series of behavioural markup language (BML) commands to steer the animation and rendering of a virtual signer (aka avatar).
SignON will incorporate machine learning capabilities that will allow (i) learning new sign, written and spoken languages; (ii) style-, domain- and user-adaptation and (iii) automatic error correction, based on user feedback.
The SignON framework will be distributed on the cloud where the computationally intensive tasks will be executed. A light-weight mobile app will interface with the SignON framework to allow communication between signers and non-signers through common mobile devices. During the development of the SignON application, collaboration with end user focus groups (consisting of deaf, hard of hearing and hearing people) and an iterative approach to development will ensure that the application and the service meet the expectations of the end users.
- Santiago Egea Gómez, Euan McGill and Horacio Saggion. “Syntax-aware Transformers for Neural Machine Translation: The Case of Text to Sign Gloss Translation”. Proceedings of the 14th Workshop on Building and Using Comparable Corpora. (RANLP 2021). 2021
- Mathieu De Coster, Mieke Van Herreweghe, and Joni Dambre. “Isolated Sign Recognition from RGB Video using Pose Flow and Self-Attention.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.
- Horacio Saggion, Dimitar Shterionov, Gorka Labaka, Tim Van de Cruys, Vincent Vandeghinste, and Josep Blat. “SignON: Bridging the gap between Sign and Spoken Languages.” XXXVII Spanish Society for Natural Language Processing conference (SEPLN2021). 2021.
- Dimitar Shterionov, Vincent Vandeghinste, Horacio Saggion, Josep Blat, Mathieu De Coster, Joni Dambre, Henk van den Heuvel, Irene Murtagh, Lorraine Leeson, Ineke Schuurman. “The SignON project: a Sign Language Translation Framework”. 31st Meeting of Computational Linguistics in the Netherlands (CLIN31). 2021.
- Dimitar Shterionov, John J O’Flaherty, Edward Keane, Connor O’Reilly, Marcello Paolo Scipioni, Marco Giovanelli, Matteo Villa. “Early-stage development of the SignON application and open framework – challenges and opportunities”. Proceedings of Machine Translation Summit XVIII: Users and Providers Track. 2021
- Mathieu De Coster, Karel D’Oosterlinck, Marija Pizurica, Paloma Rabaey, Severine Verlinden, Mieke Van Herreweghe, Joni Dambre. “Frozen Pretrained Transformers for Neural Sign Language Translation”, Proceedings of the 1st International Workshop on Automatic Translation for Signed and Spoken Languages (AT4SSL). 2021
- Mirella De Sisto, Dimitar Shterionov, Irene Murtagh, Myriam Vermeerbergen, Lorraine Leeson. “Defining meaningful units. Challenges in sign segmentation and segment-meaning mapping”, Proceedings of the 1st International Workshop on Automatic Translation for Signed and Spoken Languages (AT4SSL). 2021
- Fowley, F., & Ventresque, A. (2021). Sign Language Fingerspelling Recognition using Synthetic Data. AICS.
- M. De Coster and J. Dambre, “Leveraging Frozen Pretrained Written Language Models for Neural Sign Language Translation,” Information, vol. 13, no. 5, 2022.
- Vincent Vandeghinste, Bob Van Dyck, Mathieu De Coster, Maud Goddefroy. A Corpus for Training Sign Language Recognition and Translation: The Belgian Federal COVID-19 Video Corpus. Accepted at 32nd Meeting of Computational Linguistics in The Netherlands (CLIN) 2022.