Towards the Primary Platform for
Language Technologies in Europe

National Competence Centre Bulgaria

The Languages of Bulgaria

Bulgarian is the official language of the Republic of Bulgaria. It is spoken by approximately nine million native speakers (as of 2011), mainly in Bulgaria. Bulgarian belongs to the family of South Slavic languages and forms part of the Balkan linguistic Union.
The Bulgarian regional variations are split into Eastern and Western by the “Yat Border”, which marks the different mutations of the Old Bulgarian “yat” form, the thirty-second letter of the old Cyrillic alphabet.

Features of Bulgarian:

  • The official Bulgarian alphabet is called the Cyrillic. It is the first Slavic language with its own writing system and dates back to the 9th century.
  • As a Slavic language it possesses a rich inflectional and derivational morphology. However, due to the mutual influence of Balkan languages, Bulgarian lost it’s noun cases (except vocative) and also completely lost the infinitive form.
  • Specific characteristics of the Bulgarian language pose a challenge for the computational processing of natural language. The rather flexible word order which when combined with the lack of morphological distinction for nominal cases and subject omission is a real challenge for natural language processing of Bulgarian.

NCC Lead Bulgaria

Dr. Svetla Koeva is Head of the Department of Computational Linguistics at the Institute for Bulgarian language, Bulgarian Academy of Sciences. She received her Ph.D. in Structural, Applied and Mathematical Linguistics at the Institute for Bulgarian Language (Bulgarian Academy of Sciences). She has been involved in research and development of a variety of linguistic resources for Bulgarian, for example WordNet, FrameNet, numerous corpora, spell and grammar checkers and the Bulgarian NLP chain. She was awarded with three national scientific awards. Her research interests are in the field of computational linguistics; natural language processing; problems of formal natural language description and ontologies. Her current research priorities are oriented towards problems of machine translation; corpus studies and syntactic parsing.

Current National Initiatives

  • There is a need for a large collection of data sets and resources, services and tools for spoken language.
  • The National Scientific Fund supports LT projects in common with all other disciplines.
  • In 2020 the Bulgarian government has adopted the “Concept for the Development of Artificial Intelligence in Bulgaria until 2030”.

Events

2021
13th National ELG Workshop: Bulgaria National workshop Bulgaria October 7

META-NET White Paper on Bulgarian

Diana Blagoeva, Svetla Koeva, and Vladko Murdarov. Българският език в дигиталната епоха – The Bulgarian Language in the Digital Age. META-NET White Paper Series: Europe’s Languages in the Digital Age. Springer, Heidelberg, New York, Dordrecht, London, 9 2012. Georg Rehm and Hans Uszkoreit (series editors).

Full text of this META-NET White Paper (PDF)
Additional information on this META-NET White Paper

Availability of Tools and Resources for Bulgarian (as of 2012)

The following table illustrates the support of the Bulgarian language through speech technologies, machine translation, text analytics and language resources.

Speech technologies Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Machine translation Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Text analytics Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Language resources Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support