Towards the Primary Platform for
Language Technologies in Europe

National Competence Centre Croatia

The Languages of Croatia

Around the world, Croatian is spoken by around 5.6 million native speakers. It is an official minority language in Serbia, Montenegro, Austria, Hungary, and Slovakia. The Croatian language belongs to the West-South Slavic subgroup of the Slavic branch of the Indo-European linguistic family. It is composed of three dialectal groups: Čakavian, Kajkavian, and Štokavian. The dialectal groups differ on all linguistic levels: phonological, morphological, syntactic and lexical, and each level includes a number of archaisms and innovations (e.g. borrowings), specific to a particular dialectal group.

Features of Croatian:

  • Standard Croatian has a prosodic system with four accents, which have distinctive features. The four accents differ in their length and descending or ascending tone.
  • The rich inflectional system contains, besides seven cases, for some declined words markings of special categories, like animacy, definiteness, etc.
  • Croatian inflected verbs express tense and modal meaning in a sentence, while participles are also marked for number and gender.

Wikipedia contributors. (2020, April 11). Croatian language. In Wikipedia, The Free Encyclopedia. Retrieved 09:00, April 17, 2020, from https://en.wikipedia.org/wiki/Croatian_language.

NCC Lead Croatia

Prof. Marko Tadić is a Linguist and Professor at the Department of Linguistics, University of Zagreb. He has been the Head of the Chair of Algebraic and Computational Linguistics since 2001 and an associated member of the Croatian Academy of Sciences and Arts since 2008. He was also a member of the Standing Committee for the Humanities of the European Science Foundation (2009-2012) and a member of the National Council for the Humanities of the National Scientific Council of the Republic of Croatia (2004-2013, 2017-present). He is one of the authors of the largest Croatian frequency dictionary “Hrvatski čestotni rječnik” (1999). His interests are in corpus linguistics, computational linguistics, language technologies and research infrastructures in (e-)humanities and social sciences. He is author or co-author of important language resources for the Croatian language such as the Croatian National Corpus, Croatian Morphological Lexicon, Croatian Dependency Treebank, Croatian WordNet and the portal Language Technologies for the Croatian Language. He was the leader of Croatian teams participating in several nationally funded projects as well as FP7 RI project CLARIN, FP7 project ACCURAT, ICT-PSP projects Let’sMT! and CESAR, FP7 project XLike, ESF project HR4EU, CEF projects MARCELL, CURLICAT and EU Presidency Translator, as well as MSC ITN CLEOPATRA. He is the president and one of the founders of the Croatian Language Technologies Society.
Connecting with researchers from neighboring countries (e.g. Slovenia, Hungary, Slovakia, etc.) is ongoning within different EU-projects as well as deeper involvement in CLARIN (through newly established HR-CLARIN consortium).

Current National Initiatives

  • No funding programs exist although there is a need for building up a new generation of LR/LT for Croatian. Some are developed within the cooperation of Croatian institutions as partners in COST, CEF and MSC projects.
  • Connecting with neighboring countries is expected as well as deeper involvement in CLARIN (through HR-CLARIN).

META-NET White Paper on Croatian

Marko Tadić, Dunja Brozović-Rončević, and Amir Kapetanović. Hrvatski Jezik u Digitalnom Dobu – The Croatian Language in the Digital Age. META-NET White Paper Series: Europe’s Languages in the Digital Age. Springer, Heidelberg, New York, Dordrecht, London, 9 2012. Georg Rehm and Hans Uszkoreit (series editors).

Full text of this META-NET White Paper (PDF)
Additional information on this META-NET White Paper

Availability of Tools and Resources for Croatian (as of 2012)

The following table illustrates the support of the Croatian language through speech technologies, machine translation, text analytics and language resources.

Speech technologies Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Machine translation Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Text analytics Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Language resources Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support