Towards the Primary Platform for
Language Technologies in Europe

National Competence Centre United Kingdom

The Languages of the United Kindom

English is the language with the most speakers worldwide. The 1.270 million speakers are split into 370 million native speakers and 900 million speakers who acquired English as a second language. It is the
(co-)official language of 53 countries in the world. In the UK, it is estimated that 95% of the population grow up monolingual with English as the first language. Welsh, Scottish Gaelic, Irish, Scots and Ulster Scots, the official minority languages, are spoken by relatively small communities. Scots are the biggest minority with 2.5% speakers of the population. Since 2011, Welsh is an official language in Wales alongside with English. In the last few years, the number of speakers increased incredibly fast. 560.000 speakers of Welsh were counted in the year 2011. The number increased to 855.000 speakers during the last nine years.

English belongs to the Anglo-Frisian Branch of the Germanic languages. The result of the extension of English to other continents is a big variation of the spoken English in many countries. The largest groups of dialects are American and Australian English. The most significant differences between the dialects are in the areas of pronunciation and vocabulary.

Features of English:

  • English is a language with minimal inflection. It lacks grammatical gender and, to some extent, case and adjectival agreement. This circumstance is the reason for a very strict word order.
  • Every learner of English has to deal with the mismatch between spelling and pronunciation. Many sounds can be spelled in different forms and many forms have different ways to be pronounced.
  • The vocabulary contains a large amount of phrasal verbs and the meanings are not always predictable from the meaning of the constituents.

Welsh is a Celtic language of the Brythonic branch like Breton and Cornish. The written standard differs much from the spoken language and the oral language can be subdivided into many dialectal variations.

Features of Welsh:

  • The alphabet contains many digraph letters.
  • It has a rich inflectional system.
  • Initial consonant mutation appears by word formation and inflection. The mutation is divided into three different mutation types and occurs in nine letters.

Wikipedia contributors. (2020, July 2). English Language. In Wikipedia, The Free Encyclopedia. Retrieved 13:00, July 8, 2020, from https://en.wikipedia.org/wiki/English_language.

Wikipedia contributors. (2020, July 8). Welsh Language. In Wikipedia, The Free Encyclopedia. Retrieved 13:00, July 8, 2020, from https://en.wikipedia.org/wiki/Welsh_language.

Wikipedia contributors. (2020, July 6). Indo-European languages. In Wikipedia, The Free Encyclopedia. Retrieved 13:00, July 8, 2020, from https://en.wikipedia.org/wiki/Indo-European_languages.

Wikipedia contributors. (2020, July 9). List of languages by total number of speakers. In Wikipedia, The Free Encyclopedia. Retrieved 17:00, July 9, 2020, from https://en.wikipedia.org/wiki/List_of_languages_by_total_number_of_speakers.

Wikipedia contributors. (2020, June 30). Languages of the United Kingdom. In Wikipedia, The Free Encyclopedia. Retrieved 17:00, July 9, 2020, from https://en.wikipedia.org/wiki/Languages_of_the_United_Kingdom.

NCC Lead United Kingdom

Prof. Kalina Bontcheva works at the Department of Computer Science at the University of Sheffield as a Professor of Text Analysis and is Head of the Natural Language Processing research group. Her research interests include NLP for social media, semantic search, general architecture for text engineering, crowdsourcing of NLP corpora, and collaborative text annotation. In this area, she took part in many national and international projects. She was the leader of the EU project PHEME, Principal Investigator of EU projects like TrendMiner and DecarboNet, and Co-Investigator of the uComp project. Moreover, she was involved in community building while being the Co-ordinator of the TAO-Consortium.
She authored and co-authored over 200 scientific articles, books, book chapters and conference proceedings.

Current National Initiatives

  • The Engineering and Physical Sciences Research Council (EPSRC) funds NLP actions, but this is primarily blue skies research. Natural Language Processing is currently designated as a growth area, and it is tried to increase the funding allocated to NLP research. In general, however, LT and the corresponding research infrastructures are not considered as a funding priority.
  • GATE and most recently GATE Cloud are among the most widely used and established LT tools, services, and platforms.

META-NET White Paper on English and Welsh

Sophia Ananiadou, John McNaught, and Paul Thompson. The English Language in the Digital Age. META-NET White Paper Series: Europe’s Languages in the Digital Age. Springer, Heidelberg, New York, Dordrecht, London, 9 2012. Georg Rehm and Hans Uszkoreit (series editors).

Full text of this META-NET White Paper (PDF)
Additional information on this META-NET White Paper

Jeremy Evas. Y Gymraeg yn yr Oes Ddigidol – The Welsh Language in the Digital Age. META-NET White Paper Series: Europe’s Languages in the Digital Age. Springer, Heidelberg, New York, Dordrecht, London, 1 2014. Georg Rehm and Hans Uszkoreit (series editors).

Full text of this META-NET White Paper (PDF)
Additional information on this META-NET White Paper

Availability of Tools and Resources for English and Welsh (as of 2012)

The following table illustrates the support of the English language through speech technologies, machine translation, text analytics and language resources.

Speech technologies Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Machine translation Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Text analytics Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Language resources Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support

The following table illustrates the support of the Welsh language through speech technologies, machine translation, text analytics and language resources.

Speech technologies Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Machine translation Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Text analytics Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Language resources Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support