Towards the Primary Platform for
Language Technologies in Europe

National Competence Centre Finland

The Languages of Finland

Finnish and Swedish are defined as the official languages of Finland. Approximately 90% of the Finnish population (2015) talks Finnish as their native language. That are 4.9 million speakers. In Sweden, Finish is declared as an official minority language. Finnish is also spoken by communities in Estonia, Russia, United States and Australia. The speakers of the communities count approximately 300,000 persons. In the past, many different languages were spoken in Finland, like Sámi languages, Romany, Karelian languages or sign languages. Today, because of the immigration since 1970, there are round about 100 immigrant languages.
The dialects are classified in two categories: east and west. They differ mainly in their pronunciation, some word forms and vocabularies.
Finnish is part of the Finno-Ugric language group and belongs to the Baltic-Finnish branch like Estonian.
Features of Finnish:

  • There is no grammatical genus or articles, instead words can be inflected by 15 cases.
  • Finnish has a rich inflectional system. Because of the affixes, which mark the syntactical role of the words, the speakers can choose a relatively free word order.
  • Every noun can have till 2000 word forms and every verb till 12.000. The word forms are build with several affixes, which can be stacked.
  • New words are build by derivation and composition. In Finnish the basic words are just 10-15% of the vocabulary, derivates are 20-30% and compounds are the majority of 60-70%.
  • Morphophonological features of Finnish are a vowel harmony and vocal mutation between stems and endings.
Wikipedia contributors. (2020, May 15). Finnish language. In Wikipedia, The Free Encyclopedia. Retrieved 17:30, June 15,
2020, https://en.wikipedia.org/wiki/Finnish_language.

NCC Lead Finland

Dr. Krister Linden is the Research Director of Language Technology at the Department of Language Technology and the Deputy Team Supervisor of the Centre of Excellence in Ancient Near Eastern Empires (ANEE). He received his PhD in Language Technology 2005 at the University of Helsinki. His research interests focus on language technology application, language resources in research infrastructures and digital humanities applied to ancient near eastern empires.
Since 2015, he is the National Coordinator of FIN-CLARIN, the Finnish part of the CLARIN initiative. In addition, he leads the research activities of the Language Bank of Finland since 2010.

Current National Initiatives

  • The government has opened resources and databases produced by government-funded activities.
  • In late 2019, the Ministry of Finance issued a “Development and implementation plan for AuroraAI 2019–2023”, which includes the goal to identify service needs which the citizen expresses in natural language, written or spoken. The reports assume that LT is available for the languages used in Finland, so from 2019, the state-owned development company VAKE has included support for LT development in its strategy for digitalisation.

Events

2020
4th Regional ELG Workshop: FinlandSlides Regional workshop (online) Helsinki, Finland December 15

META-NET White Paper on Finnish

Kimmo Koskenniemi, Krister Lindén, Lauri Carlson, Martti Vainio, Antti Arppe, Mietta Lennes, Hanna Westerlund, Mirka Hyvärinen, Imre Bartis, Pirkko Nuolijärvi, and Aino Piehl. Suomen kieli digitaalisella aikakaudella – The Finnish Language in the Digital Age. META-NET White Paper Series: Europe’s Languages in the Digital Age. Springer, Heidelberg, New York, Dordrecht, London, 9 2012. Georg Rehm and Hans Uszkoreit (series editors).

Full text of this META-NET White Paper (PDF)
Additional information on this META-NET White Paper

Availability of Tools and Resources for Finnish (as of 2012)

The following table illustrates the support of the Finnish language through speech technologies, machine translation, text analytics and language resources.

Speech technologies Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Machine translation Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Text analytics Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Language resources Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support