Towards the Primary Platform for
Language Technologies in Europe

National Competence Centre Czech Republic

The Languages of the Czech Republic

Czech is the official language of the Czech Republic and has about ten million speakers. Most of them are located in the Czech Republic. There are about 200 thousand speakers, mostly emigrants from WWI and WWII, now living in other parts of the world.
Czech, along with Slovak, Polish, and the Upper and Low Sorbian, belongs to the western Slavonic group.
The Czech language has several varieties, especially in its spoken form. Literary (Standard) Czech is used in education, in official negotiations and in the media. In everyday life, Common Czech (based on the Central Bohemian interdialect) is prefered.

Features of Czech:

  • The Czech language is a highly inflectional language with a complex morphology and free word order.
  • Some Czech words such as chrt (hound), krk (neck) or trh (market) don’t have any vowels because consonants can act as vowels.
  • Diminutives play an essential role in the forming of new vocabulary. Some Czech words have various degrees of diminutives (kníha – knížka – knížečka all translate to book)

Czech in Ethnologue
Eberhard, David M., Gary F. Simons, and Charles D. Fennig (eds.). 2020. Ethnologue: Languages of the World. Twenty-third edition. Dallas, Texas: SIL International. Online version: http://www.ethnologue.com.

NCC Lead Czech Republic

Prof. Dr. Jan Hajič is a Full Professor of Computational Linguistics at the Institute of Formal and Applied Linguistics at the School of Computer Science at the Charles University in Prague. His interests cover morphology of inflective languages, machine translation, deep language understanding, and the application of statistical methods in natural language processing in general. He also has extensive experience in building annotated language resources. His work experience includes both industrial research (IBM Research, Yorktown Heights, 1991–1993) and academia (Charles University, Prague and Johns Hopkins University, Baltimore, MD, USA). He has been the PI or Co-PI of several national and international grants and projects, most notably the large Czech Grant Agency grant for “Large Corpora” (2001–2005), several EU projects on Machine Translation (EuroMatrix, Faust, META-NET, CRACKER, Khresmoi, HimL, QT21, Clarin PLUS) and the U.S.-based large ITR project “Malach” (coordinated by the Visual History Foundation, Los Angeles, CA, USA). Currently he is the PI of the large national infrastructure project LINDAT/CLARIAH-CZ (2010–2022). He is a panel and/or scientific board member of several Universities and grant agencies, including the US-based National Science Foundation. Jan Hajič is the Chair of META-NET Executive Board.

Current National Initiatives

  • The awareness of LT has increased significantly since the META-NET White Papers have been published; LT is now listed as one of the three largest areas of research in the Czech government’s National AI Strategy.
  • For LT and AI, the total amount of funding in basic and applied research areas is over 2 million EUR per year. The established research infrastructures, LINDAT/CLARIAH-CZ and the Czech National Corpus receive around 3.3 million EUR per year.
  • Czech has now relatively good coverage in terms of linguistically annotated resources.

Events

2021
12th Regional ELG Workshop: Slovakia, Czech Republic Regional workshop Oktober 18

META-NET White Paper on Czech

Ondřej Bojar, Silvie Cinková, Jan Hajič, Barbora Hladká, Vladislav Kuboň, Jiří Mírovský, Jarmila Panevová, Nino Peterek, Johanka Spoustová, and Zdeněk Žabokrtský. Čeština v digitálním věku – The Czech Language in the Digital Age. META-NET White Paper Series: Europe’s Languages in the Digital Age. Springer, Heidelberg, New York, Dordrecht, London, 9 2012. Georg Rehm and Hans Uszkoreit (series editors).

Full text of this META-NET White Paper (PDF)
Additional information on this META-NET White Paper

Availability of Tools and Resources for Czech (as of 2012)

The following table illustrates the support of the Czech language through speech technologies, machine translation, text analytics and language resources.

Speech technologies Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Machine translation Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Text analytics Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Language resources Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support