The Impact of Language Technology on Society

Organized by Giellatekno – Centre for Saami Language Technology

10:30-11:00	Lene Antonsen Why ICALL for indigenous languages?
11:00-11:30	Ken Beesley, Sonja Bosch and Laurette Pretorius The Impact of Language Technologies on South Africa's Lesser-studied Official Languages
11:30-12:00	Mari Keränen Language technology and standardization – a sociolinguistic perspective
12:00-13:30 LUNCH
13:30-14:00	Sjur Moshagen Proofing tools, open source and language diversity
14.30-15.00	Leena Niiranen and Kaisa Maliniemi Language technology used in revitalization of Kven: Problems and Possibilities
15.00-15.30	Trond Trosterud Language technology and domain retrieval: a global perspective

Why ICALL for indigenous languages?

Lene Antonsen

Traditional CALL (Computer-Assisted Language Learning) tools provide limited exercise types and many of them focus on base forms, along with limited possibilities to provide feedback since the exercises are static, and the answers have to be pre-stored. With ICALL (Intelligent CALL) one adds natural language processing (NLP) to the tools for language learning.

Many indigenous languages have the following in common:
– complex morphology
– weak norms, extensive linguistic variation
– need for distance teaching
– lack of teaching materials
– lack of text corpus

With a system based on NLP it is e.g. possible to
– generate large amounts of appropriate grammar tasks based on the learners' needs
– put the grammar tasks into a meaningful setting
– give automatic diagnosis and feedback to the learner based on what he has actually produced
– enrich digital dictionaries for language learners with inflected forms in support of both perception and production
– handle linguistic variation

ICALL tools can in part be built on already existing NLP resources. For many indigenous languages there are no analysers, but ICALL tools can nevertheless be established since the NLP tools can be limited to the vocabulary and morphology addressed in the teaching materials. Making such basic analysers can also be a good start for building analysers to cover the whole language, which is required for making spell checking programs.

The Impact of Language Technologies on South Africa's Lesser-studied Official Languages

Ken Beesley, Sonja Bosch and Laurette Pretorius

Under apartheid, until 1994, South Africa had only two official languages: Afrikaans and English. The new Constitution of 1996 recognizes an additional nine official languages, all from the Bantu language family, and explicitly requires that “all languages must enjoy parity of esteem and must be treated equitably.” We review the computer-based language-technology projects that contribute to the social status and practical usability of the ten official languages, excluding English, that are lesser-studied.

Language technology and standardization – a sociolinguistic perspective

Mari Keränen

How does language technology respond to the variation of a language (and vice versa), and does language technology have any influence on a language community? Norway is a textbook example of linguistic variation and diversity, regarding both majority and minority languages. The standardization of Kven – a Finnic language spoken in Northern Norway – began in 2007. The first part of standardization will soon be finished, when the three varieties of the Kven grammar will be published, and the first language technology tools are currently being developed. In my study I have described and analyzed the principles that have been used in the standardization of Kven, and here I will discuss some sociolinguistic aspects regarding language technology.

Proofing tools, open source and language diversity

Sjur Moshagen

One key issue in maintaining language diversity is to maintain language use and language transfer - each language community must continue to use their language so the language can be transferred to the next generation. Even when language transfer do happen, young people often find that there is no place to use the language outside their homes in oral communication. There is no writing support in popular applications or on the net, often no way to use their language in mobile texting, no suitable keyboard layouts, fonts with only partial coverage, etc.
The presentation will discuss how language technology can help in letting lesser-resourced languages gain a foothold in new domains, especially connected to many aspects of the digital life of today. It also discusses the importance of open source in this context. The focus will be on the work done with proofing tools especially for the Sámi languages and their grammatical and language technology basis, but also more generally on the platforms and technologies used to make proofing tools available and usable in office applications.

Language technology used in revitalization of Kven: Problems and Possibilities

Leena Niiranen and Kaisa Maliniemi

We present a co-operation project between the Kven Institute, Giellatekno, and the Institute of Languages at the University of Tromsø with an aim to create a language analyzing and spelling program for Kven language. The most important goal for our project is to assist in the revitalization of Kven: to strengthen the usage of Kven, to make the Kven language more visible, and to raise the status of Kven among users and in society. We will discuss the role of language technology concerning these goals in our presentation.

The Kven language is a regional minority language in Norway recognized by the Norwegian government in 2005, and protected by the European Charter for Regional or Minority Languages. It is estimated that there are 10 000 – 15 000 Kvens in Norway today, and about 4000 – 8000 language speakers, but only a few of them are able to write Kven. One of the most important goals for the Kven Institute – launched in 2007 – is to develop a written standard for Kven. Standardization is considered necessary if the goal is language maintenance; moreover, without written language the participation in a modern society is difficult. The Kven institute has named a language council that decides the standardization of Kven, which is a difficult task, as there are few written materials in Kven.

In our project, Giellatekno offers a suitable infrastructure, and the language specific work is based on existing resources: a Kven – Norwegian digital dictionary and the vocabulary list, which is a part of teaching material of Kven (Söderholm 2012). The project has a target to enlarge an existing digital dictionary and equip it with inflectional forms. Another necessary basis for analyses is Söderholm’s grammar. In addition, the decisions of Kven language council form a basis for our work. Kven has a rich morphological system like Finnish, but its inflectional forms are close to Northern Finnish dialect forms and differ from standard Finnish. Also Söderholm’s grammar is not comprehensive enough to cover the whole grammatical system, as it is meant for language learners, not for computer analyses. Other challenges are morphophonological variation in Kven dialects and the low number of words in existing vocabularies.

The Kven analyzing and spelling program is meant to be a tool for everybody interested in Kven. We will discuss how it can help students, translators and those who can Kven only orally to read and write in the minority language. Who especially needs a dictionary including all inflectional forms? Language technology gives also possibilities to interactive language learning and creation of learning materials easily available. Language technology offers a helping hand to minorities so that they can face the requirements and challenges of the modern world. Furthermore, Kven language technology and language work can be a source of inspiration for other minorities without a written language, but with an interest to revitalize their language and culture.

Language technology and domain retrieval: a global perspective

Trond Trosterud

With earlier assessments (Trosterud 2004, 2012) as a starting point, the talk will investigate to what extent language technology solutions are available for languages around the world, what technologies are in use, and what possibilities there are for porting them to new languages. My earlier presentations focused upon initial prerequisites such as localisation, this talk will look more at grammatical models and their use.

Literature:
Trosterud, T 2004: Language technology for endangered languages: Sámi as a case study. Invited talk at the conference Dialogue of Cultures, celebrating the 75th birthday of Vigdís Finnbogadóttir, April 15th 2004.

Trosterud, Trond 2012: A restricted freedom of choice: Linguistic diversity in the digital landscape. Nordlyd 2012; Volum 39. (2) s. 89-104