Towards the Primary Platform for
Language Technologies in Europe

National Competence Centre Poland

The Languages of Poland

Polish is spoken by approx. 50 million speakers, 45 million native and 5 million non-native. The largest minority languages in Poland are Kashubian, German, Belarusian and Romani. Kashubian is the only language declared as an official ethnic-minority language.
Polish belongs to the West Slavic branch of the Indo-European languages. It is the most spoken West Slavic language in the world.

Features of Polish:

  • Polish has a relatively very free word order with the dominant subject-verb-object agreement. There are no articles, and subject pronouns are often dropped.
  • The inflectional paradigm is very rich with categories and values. The exact number of possible word forms is still a matter of dispute.
  • The writing system uses many diacritics for the expression of sounds, which have no match in the ordinary Latin alphabet.

Wikipedia contributors. (2020, June 16). Polish language. In Wikipedia, The Free Encyclopedia. Retrieved 10:00, June 30, 2020, from https://en.wikipedia.org/wiki/Polish_language.

NCC Lead Poland

Maciej Ogrodniczuk is Head of the Linguistic Engineering Group at the Institute of Computer Science, Polish Academy of Sciences.
He is a graduate of the University of Warsaw and holds a master’s degree in computer science, a PhD in linguistics and a habilitation in information and communication technology.
He takes an active part in the LT and AI community and is involved in numerous national and international projects related to Natural Language Processing. His research interests include coreference resolution, discourse processing and computational analysis of parliamentary data. He has published over 100 scientific articles.

Current National Initiatives

  • CLARIN-PL and DARIAH-PL, two research infrastructures creating language resources and tools for Polish are present at the Polish Map of Research Infrastructure approved by the Minister of Science and Higher Education and are currently supported by the European Regional Development Fund.
  • The main challenge remains the further development of the National Corpus of Polish, a resource with enormous impact on research in linguistics, humanities and LT, which was completed in 2011 and has not been updated since.
  • LT is mentioned in the “Policy for the Development of AI in Poland for the years 2019–2027” as a key function of AI but no specific actions are planned.

Events

2020
3rd Regional ELG Workshop: Poland Regional workshop

(online)

Warsaw, Poland October 27

META-NET White Paper on Polish

Marcin Miłkowski. Język polski w erze cyfrowej – The Polish Language in the Digital Age. META-NET White Paper Series: Europe’s Languages in the Digital Age. Springer, Heidelberg, New York, Dordrecht, London, 9 2012. Georg Rehm and Hans Uszkoreit (series editors).

Full text of this META-NET White Paper (PDF)
Additional information on this META-NET White Paper

Availability of Tools and Resources for Polish (as of 2012)

The following table illustrates the support of the Polish language through speech technologies, machine translation, text analytics and language resources.

Speech technologies Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Machine translation Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Text analytics Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Language resources Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support