Towards the Primary Platform for
Language Technologies in Europe

National Competence Centre Spain

The Languages of Spain

Spanish is the second most spoken language of the world. It has around 480 million native speakers and 75 million speakers, who learned Spanish as a second language. Although Spanish developed in Spain from the former Vulgar Latin, the most native speakers live in Latin-America. Besides the official language Spanish, Basque, Catalan and Galician are co-official languages in some regions of Spain. Basque is spoken in a small region of the Western Pyrenees. With only approximately 750.000 speakers, it is officially classified as a “vulnerable” language on the UNESCO Map of the world’s languages in Danger. Catalan is spoken by approximately 10 million citizen in the region of Catalonia, Balearic islands and Valencia. Furthermore, it is the only official language of Andorra. Galician is spoken by approximately 2.4 million native speakers in Galicia, little communities in other regions of Spain, in some European countries and in America.

  • Spanish is a Romantic language of the Ibero branch.
  • The Spanish varies much between the European and the Latin-American version.
  • Features of Spanish:
    • The inflectional system is limited for the declination of nouns, adjectives and determiners, but the conjugation of verbs produce over 50 word forms per verb.
    • Spanish is an SVO-language, but the word order deviates sometimes from the ordinary order.
    • Direct and indirect object pronouns are often used, although there are in many situations redundant.
  • Basque is the only pre-Indo-European language of the western Europe. The only known related language is the extinct language Aquitanian.
  • Although Basque is spoken by a small community, it has many dialectal variations. The six commonly accepted dialects differ on the lexical, phonetic, morphological and prosodic level.
  • Features of Basque:
    • Basque is an agglutinating language. Grammatical and lexical morphemes with single meanings are attached and form a long prosodic unit with one root and several affixes, which are clearly assignable to their function.
    • Moreover, it is an ergative-absolutive-language. The casus ergative and absolutive have the function to mark subject and direct object. Within sentences with intransitive verbs, the absolutive marks the subject, but in sentences with transitive verbs the absolutive marks the direct object and the ergative the subject.
  • Catalan belongs to the romance language family. The closest related languages are Italian and France.
  • The regional variations are classified in 5 main dialects, which differ at the pronunciation of the vowels, the used functional words and some vocabularies.
  • Features of Catalan:
    • Catalan is an SVO-language, but the word order is changed sometimes by the use of clitic elements.
    • It is also a Pro-Drop-language. Usually, the subject pronoun can be skipped, because the verb form contains the information about the subject.
    • The verb and the auxiliary verb can not be seperated in a sentence. They have to be adjacent words in every sentence.
  • Galician belongs to the Western-Ibero branch of the romance language family. It is related closely to Portuguese.
  • The dialects are grouped in three main dialects: Eastern, Central and Western Galician dialect. They differ mostly on phonological and morphological level.
  • Features of Galician:
    • The stress of syllables is a distinctive feature of words.
    • It is also an SVO-language with clitic elements, which can change the sentence structure.
    • The passive form of verbs is rarely used in daily life. Instead, the speakers use an inverted word order or the active form with a third reflexive pronoun or an impersonal structure with a verb, inflected as a third person singular, and the pronoun “se”, but without a subject.
Wikipedia contributors. (2020, July 6). Indo-European languages. In Wikipedia, The Free Encyclopedia. Retrieved 17:00, July 6, 2020, from https://en.wikipedia.org/wiki/Indo-European_languages
Wikipedia contributors. (2020, July 6). Spanish language. In Wikipedia, The Free Encyclopedia. Retrieved 17:00, July 6, 2020, from https://en.wikipedia.org/wiki/Spanish_language
Wikipedia contributors. (2020, July 5). Basque language. In Wikipedia, The Free Encyclopedia. Retrieved 17:00, July 6, 2020, from https://en.wikipedia.org/wiki/Basque_language
WWikipedia contributors. (2020, July 5). Catalan language. In Wikipedia, The Free Encyclopedia. Retrieved 17:00, July 6, 2020, from https://en.wikipedia.org/wiki/Catalan_language
Wikipedia contributors. (2020, June 24). Galician language. In Wikipedia, The Free Encyclopedia. Retrieved 10:00, July 7, 2020, from https://en.wikipedia.org/wiki/Galician_language

NCC Lead Spain

Dr. Marta Villegas has been working for more than 25 years as a researcher in the field of Natural Language Processing. First in the Fundacio Bosch i Gimpera (University of Barcelona), later in the Universitat Pompeu Fabra, the Universitat Autònoma de Barcelona and, more recently in the Centro Nacional Investigaciones Oncológicas (CNIO) and the Barcelona Supercomputing Center. She has been involved in more than 15 EU projects such as OpenMinTeD, CLARIN, DASISH and META-NET, Panacea among many others.

Currently she is co-leading the Text Mining Unit at the Barcelona Supercomputing Center where they are involved in a national initiative led by the Secretary of Digitalisation and Artificial Intelligence to promote the use of IA and language technologies in Spain within the framework of PLan-TL.

She is also leading the BSC participation in the IctusNet (an Interreg Sudoe Program); the recently approved EU projects IntelCOMP project (EU project 101004870) and ELE (European Language Equality), the initiative promoted by the Catalan Government for the development of resource infrastructures for the Catalan language and the collaborative project with IBM.

Current National Initiatives

  • The Plan for the Promotion of LT was approved in 2015 to promote the development of NLP, automatic translation and conversational systems in Spanish and co-official languages in areas like health, justice, and technology watch. It has focused on the production of resources and basic tools for Spanish and other languages in Spain.

META-NET White Paper on Spanish, Basque, Catalan and Galician

Maite Melero, Toni Badia, and Asunción Moreno. La lengua española en la era digital – The Spanish Language in the Digital Age. META-NET White Paper Series: Europe’s Languages in the Digital Age. Springer, Heidelberg, New York, Dordrecht, London, 9 2012. Georg Rehm and Hans Uszkoreit (series editors).

Full text of this META-NET White Paper (PDF)
Additional information on this META-NET White Paper

Inmaculada Hernáez, Eva Navas, Igor Odriozola, Kepa Sarasola, Arantza Diaz de Ilarraza, Igor Leturia, Araceli Diaz de Lezana, Beñat Oihartzabal, and Jasone Salaberria. Euskara Aro Digitalean – The Basque Language in the Digital Age. META-NET White Paper Series: Europe’s Languages in the Digital Age. Springer, Heidelberg, New York, Dordrecht, London, 9 2012. Georg Rehm and Hans Uszkoreit (series editors).

Full text of this META-NET White Paper (PDF)
Additional information on this META-NET White Paper

Asunción Moreno, Núria Bel, Eva Revilla, Emília Garcia, and Sisco Vallverdú. La llengua catalana a l’era digital – The Catalan Language in the Digital Age. META-NET White Paper Series: Europe’s Languages in the Digital Age. Springer, Heidelberg, New York, Dordrecht, London, 9 2012. Georg Rehm and Hans Uszkoreit (series editors).

Full text of this META-NET White Paper (PDF)
Additional information on this META-NET White Paper

Carmen García-Mateo and Montserrat Arza Rodríguez. O idioma galego na era dixital – The Galician Language in the Digital Age. META-NET White Paper Series: Europe’s Languages in the Digital Age. Springer, Heidelberg, New York, Dordrecht, London, 9 2012. Georg Rehm and Hans Uszkoreit (series editors).

Full text of this META-NET White Paper (PDF)
Additional information on this META-NET White Paper

Availability of Tools and Resources for Spanish, Basque, Catalan and Galician (as of 2012)

The following table illustrates the support of the Spanish language through speech technologies, machine translation, text analytics and language resources.

Speech technologies Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Machine translation Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Text analytics Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Language resources Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support

The following table illustrates the support of the Basque language through speech technologies, machine translation, text analytics and language resources.

Speech technologies Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Machine translation Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Text analytics Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Language resources Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support

The following table illustrates the support of the Catalan language through speech technologies, machine translation, text analytics and language resources.

Speech technologies Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Machine translation Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Text analytics Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Language resources Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support

The following table illustrates the support of the Galician language through speech technologies, machine translation, text analytics and language resources.

Speech technologies Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Machine translation Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Text analytics Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Language resources Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support