Towards the Primary Platform for
Language Technologies in Europe

National Competence Centre Netherlands

The Languages of the Netherlands

Dutch is spoken by approximately 24 million speakers as the first language and five million speakers as a second language. In addition to the Netherlands, it is an official language in the Flemish part of Belgium, Surinam, Aruba, Curacao and Sint-Marteen. Dutch immigrants spread the language all over the world and it is still spoken in little communities in France, Germany, Brazil, South-Africa, Indonesia, Canada and the United States. In the Netherlands, Frisian is an official minority language of the province Friesland.
Dutch has many dialects, which differ within syntactic constructions and lexical meanings. The biggest discrepancies exist between the dialects spoken in the Netherlands and in the Flanders.
It is an Indo-European language of the West-Germanic family and belongs to the Low-Franconian Branch.
Features of Dutch:

  • The speaker is allowed to use a relatively free word order. It is common to use subjects, objects and adverbials in the first position of the sentence.
  • New words are formed with composition, which is a highly productive process of word formation.
  • There are so called “R-Pronouns” which tend to occur distant to the prepositions they belong to. In addition, the pronouns sometimes have more then one function or preposition they belong to. For Natural Language Processing, it is difficult to allocate this pronouns to their phrases.
  • Dutch has, like German, verbs with prefixes which occur in different positions of the sentence.
Wikipedia contributors. (2020, June 28). Dutch language. In Wikipedia, The Free Encyclopedia. Retrieved 15:00, June 29, 2020, from https://en.wikipedia.org/wiki/Dutch_language.

NCC Lead Netherlands

Prof. Dr. Jan Odijk is a full professor of language and speech technology at the Department of Languages, Literature and Communication of the University of Utrecht.
His research interests range from theoretical syntax to language technologies. In this area, he took part in many national and international projects. On the national level, he was a member of the Spoken Dutch Corpus (CGN), IMIX steering committees and the ELRA board. On the international level, he participated in the EU project META-NET as the Netherlands National Anchor Point, and was part of the programme committee of the Dutch/Flemish language. Moreover, he engages actively in the CLARIN Initiative, as the programme director of CLARIN-NL (2009-2015) and CLARIAH-SEED (2013-2014), and the director of CLARIAH-CORE since 2015. He published over 300 articles, conference proceedings and other scholarly output in the 27 years.

Current National Initiatives

  • There is no dedicated programme for LT development, though several projects are ongoing.
  • Some LT development takes place in the context of CLARIAH, especially on speech recognition, event extraction and POS tagging. There is, thanks to the STEVIN programme, no immediate danger for digital extinction of the Dutch language. The META-NET White Papers increased the awareness of the Interparliamentary Committee for the Dutch Language Union of the importance of LT.
  • In 2015, without committing any funding the committee invited the Dutch LT community to submit a proposal for a new LT programme. Such a proposal was never defined.

META-NET White Paper on Dutch

Jan Odijk. Het Nederlands in het Digitale Tijdperk – The Dutch Language in the Digital Age. META-NET White Paper Series: Europe’s Languages in the Digital Age. Springer, Heidelberg, New York, Dordrecht, London, 9 2012. Georg Rehm and Hans Uszkoreit (series editors).

Full text of this META-NET White Paper (PDF)
Additional information on this META-NET White Paper

Availability of Tools and Resources for Dutch (as of 2012)

The following table illustrates the support of the Dutch language through speech technologies, machine translation, text analytics and language resources.

Speech technologies Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Machine translation Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Text analytics Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Language resources Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support