Towards the Primary Platform for
Language Technologies in Europe

National Competence Centre Slovakia

The Languages of Slovakia

Slovak is spoken by approximately 5.2 million native speakers (2015). The biggest minorities in Slovakia are Hungarians and Roma. Nonetheless, Slovak is the only official language on the country level. Slovak is also spoken in big communities in the United States and Czech Republic.
Slovak belongs to the West branch of the Slavic languages, like Polish, Czech and Sorbian.
It is split in many different regional dialects, which are subclassified in three main classes: Western, Central and Eastern Slovak dialects. The regional variation differs the most in the mountainous parts of Slovakia.
Features of Slovak:

  • Slovak speakers use a modified Latin alphabet with extra diacritical marks. The diacritical marks are used to denote palatalisation, postalveolar sibilants and the length of vowels.
  • The Slovak pronunciation is based on a rhythmic rule. This rhythmic rule describes the tendency of not having two long adjacent syllables.
  • The rich inflectional system contains 6 cases and 4 genders. The gender masculine is separated in masculine animate and inanimate.
  • The Slovak alphabet has the most letters compared to other European languages.
Wikipedia contributors. (2020, July 1). Slovak language. In Wikipedia, The Free Encyclopedia. Retrieved 17:00, July 2, 2020, from https://en.wikipedia.org/wiki/Slovak_language
Wikipedia contributors. (2020, July 4). Slovakia. In Wikipedia, The Free Encyclopedia. Retrieved 12:00, July 6, 2020, from https://en.wikipedia.org/wiki/Slovakia

NCC Lead Slovakia

Dr. Radovan Garabík works at the Ludovit Stur Institute of Linguistics at the Slovak Academy of Sciences. Originally, he studied nuclear physics. Later his research interest turned into Computational linguistics.
Currently, he participates in international projects, like the Nexus Linguarum and MARCELL. On the national level, he takes part in the project SNK (Slovak National Corpus).
In the area of computational linguistics, he authored and co-authored more than 60 scientific papers and conference proceedings.

Dr. Radovan Garabík works at the Ľudovít Štúr Institute of Linguistics, Slovak Academy of Sciences. His research interests cover corpus
linguistics, natural language processing, computational linguistics, human language technology and modern digital lexicography. He is the main architect of the Slovak National Corpus project, a huge representative corpus of modern Slovak, together with relevant NLP tools for Slovak, such as Slovak language morphology analysis and POS tagging. He also led development of several other significant resources: Slovak WordNet, Slovak Multext East morphosyntactic specification, several parallel corpora.

He represented Ľ. Štúr Institute of Linguistics in following European projects: MONDILEX – Conceptual Modelling of Networking of Centres for High-Quality Research in Slavic Lexicography and Their Digital Resources; Slovak Online; CESAR – CEntral and South-east europeAn Resources; NETWORDS – The European Network on Word Structure; EuroMatrixPlus “Bringing Machine Translation for European Languages to the User” ; www.slovake.eu – Extending the offer of the e-learning platform for the Slovak language; lingvo.info; MARCELL – Multilingual Resources for CEF.AT in the legal domain; CURLICAT – Curated Multilingual Language Resources for CEF AT; Nexus Linguarum – European network for Web-centred linguistic data science, LITHME – Language in the Human-Machine Era. He is the principal author of several specialized dictionaries of contemporary Slovak language.

Current National Initiatives

  • There is no LT funding programme; some minor programmes oriented towards LT have been successful in the past, but mostly as parts of other actions or grants.
  • LT oriented industry is rare, with companies usually trying to use existing technologies rather than developing new ones.
  • There is a lack of understanding of NLP within the industry.

META-NET White Paper on Slovak

Mária Šimková, Radovan Garabík, Katarína Gajdošová, Michal Laclavík, Slavomír Ondrejovič, Jozef Juhár, Ján Genči, Karol Furdík, Helena Ivoríková, and Jozef Ivanecký. Slovenský jazyk v digitálnom veku – The Slovak Language in the Digital Age. META-NET White Paper Series: Europe’s Languages in the Digital Age. Springer, Heidelberg, New York, Dordrecht, London, 9 2012. Georg Rehm and Hans Uszkoreit (series editors).

Full text of this META-NET White Paper (PDF)
Additional information on this META-NET White Paper

Availability of Tools and Resources for Slovak (as of 2012)

The following table illustrates the support of the Slovak language through speech technologies, machine translation, text analytics and language resources.

Speech technologies Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Machine translation Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Text analytics Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Language resources Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support