Skip to main contentSkip to search and navigation

UEF eREPOSITORY

    • English
    • suomi
  • English 
    • English
    • suomi
  • Login
View Item 
  •   Home
  • Artikkelit
  • Filosofinen tiedekunta
  • View Item
  •   Home
  • Artikkelit
  • Filosofinen tiedekunta
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

The Nordic tweet stream: A dynamic real-Time monitor corpus of big and rich language data

Thumbnail
Files
Article (446.1Kb)
Self archived version
published version
Date
2018
Author(s)
Laitinen, Mikko
Lundberg, Jonas
Levin, Magnus
Martins, Rafael
Metadata
Show full item record
More information
Research Database SoleCris

Self-archived article

Citation
Laitinen, Mikko. Lundberg, Jonas. Levin, Magnus. Martins, Rafael. (2018). The Nordic tweet stream: A dynamic real-Time monitor corpus of big and rich language data.  3rd Conference on Digital Humanities in the Nordic Countries, DHN 2018; Helsinki; Finland; 7 March 2018 through 9 March 2018, 2084, 349-362.
Rights
© Authors
Licensed under
All rights reserved
Abstract

This article presents the Nordic Tweet Stream (NTS), a cross-disciplinary corpus project of computer scientists and a group of sociolinguists interested in language variability and in the global spread of English. Our research integrates two types of empirical data: We not only rely on traditional structured corpus data but also use unstructured data sources that are often big and rich in metadata, such as Twitter streams. The NTS downloads tweets and associated metadata from Denmark, Finland, Iceland, Norway and Sweden. We first introduce some technical aspects in creating a dynamic real-time monitor corpus, and the following case study illustrates how the corpus could be used as empirical evidence in sociolinguistic studies focusing on the global spread of English to multilingual settings. The results show that English is the most frequently used language, accounting for almost a third. These results can be used to assess how widespread English use is in the Nordic region and offer a big data perspective that complement previous small-scale studies. The future objectives include annotating the material, making it available for the scholarly community, and expanding the geographic scope of the data stream outside Nordic region.

Subjects
Twitter   corpus linguistics   language choice   English as a lingua franca   
URI
https://erepo.uef.fi/handle/123456789/6697
Link to the original item
http://ceur-ws.org/Vol-2084/short10.pdf
Collections
  • Filosofinen tiedekunta [248]
University of Eastern Finland
OpenAccess
eRepo
erepo@uef.fi
OpenUEF
Service provided by
the University of Eastern Finland Library
Library web pages
Twitter
Facebook
Youtube
Library blog
 sitemap
Search

Browse

All of the ArchiveResource types & CollectionsBy Issue DateAuthorsTitlesSubjectsFacultyDepartmentFull organizationSeriesMain subjectThis CollectionBy Issue DateAuthorsTitlesSubjectsFacultyDepartmentFull organizationSeriesMain subject

My Account

Login
University of Eastern Finland
OpenAccess
eRepo
erepo@uef.fi
OpenUEF
Service provided by
the University of Eastern Finland Library
Library web pages
Twitter
Facebook
Youtube
Library blog
 sitemap