Towards the Primary Platform for
Language Technologies in Europe

National Competence Centre France

The Languages of France

French is the official language of France. It is spoken by approximately 300 million speakers and is the sixth most spoken language in the world. Out of the 300 million speakers, 77.2 million are first-language speakers. More than 80 different dialects exist, which are officially part of the country’s cultural heritage. Moreover, there is a small region in the Pyrenees, where Basque is spoken.
The “Académie Française” in France has the task to maintain the French language, including the monitoring of neologisms. Also, there are several laws for science, economy or the public domain that safeguard the use of the French language.
French is a Latin language and part of the “Union Latine”.

Features of French:

  • The French sound system contains 16 distinct vowels, twelve oral vowels and four nasal vowels.
  • French is popular for the mismatch between orthography and pronunciation.
  • The conjugation of the verb and the use of the subjunctive is considered as difficult to grasp by learners of French.

More detailed information about Basque can be found on the NNC page for: Spain

French in Ethnologue
Eberhard, David M., Gary F. Simons, and Charles D. Fennig (eds.). 2020. Ethnologue: Languages of the World. Twenty-third edition. Dallas, Texas: SIL International. Online version: http://www.ethnologue.com.

NCC Lead France

Professor François Yvon is a CNRS Senior Member of the TLP (“Spoken Language Processing”) Group of LISN (CNRS & Univ. Paris-Saclay) where he leads the research activities in machine learning and statistical machine translation. Between 2013 and 2020, he was also the general director of LIMSI-CNRS. Over the years, François Yvon has actively contributed to the fields of speech synthesis, speech recognition, text mining, automatic spell checking, machine translation and multilingual NLP, mostly developing approaches based on statistical machine learning techniques and probabilistic modelling. He has published over 200 papers in peer-reviewed international journals and conferences on a wide array of issues related to speech and language technologies. François Yvon has been involved in more than 20 national and international projects. He is currently a member of the board of META-NET and the European chapter of the ACL as well as of the steering committee of the IWSLT conference series. In addition, he is an Action Editor of the Transaction of the ACL (TACL) journal, and Associate Editor to the “ACM Computing Surveys”.

Current National Initiatives

  • The government has launched an ambitious plan for research in all areas of AI, which has yielded the creation of four national AI institutes (located in Paris, Toulouse, Grenoble and Nice), some of which target LT in their roadmaps.
  • A national programme of approx. 40 chairs and 200 PhDs in AI has been launched in 2020, several of them targeting LTs. This programme also targets international cooperation, mostly with Germany, Canada, and Japan. The main funding agency ANR funds three to five large-scale LT projects annually.

META-NET White Paper on French

Joseph Mariani, Patrick Paroubek, Gil Francopoulo, Aurélien Max, François Yvon, and Pierre Zweigenbaum. La langue française à l’ Ère du numérique – The French Language in the Digital Age. META-NET White Paper Series: Europe’s Languages in the Digital Age. Springer, Heidelberg, New York, Dordrecht, London, 9 2012. Georg Rehm and Hans Uszkoreit (series editors).

Full text of this META-NET White Paper (PDF)
Additional information on this META-NET White Paper

Availability of Tools and Resources for French (as of 2012)

The following table illustrates the support of the German language through speech technologies, machine translation, text analytics and language resources.

Speech technologies Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Machine translation Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Text analytics Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Language resources Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support