Skip to main contentSkip to search and navigation

UEF eREPOSITORY

    • English
    • suomi
  • English 
    • English
    • suomi
  • Login
View Item 
  •   Home
  • Artikkelit
  • Filosofinen tiedekunta
  • View Item
  •   Home
  • Artikkelit
  • Filosofinen tiedekunta
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Utilizing multilingual language data in (nearly) real time: the case of the Nordic Tweet Stream

Thumbnail
Files
Article (433.6Kb)
Self archived version
published version
Date
2017
Author(s)
Laitinen, Mikko
Lundberg, Jonas
Levin, Magnus
Lakaw, Alexander
Metadata
Show full item record
More information
Research Database SoleCris

Self-archived article

Citation
Laitinen, Mikko. Lundberg, Jonas. Levin, Magnus. Lakaw, Alexander. (2017). Utilizing multilingual language data in (nearly) real time: the case of the Nordic Tweet Stream.  Journal of Universal Computer Science, 23 (11) , 1038-1056.
Rights
© Journal of Universal Computer Science
Licensed under
All rights reserved
Abstract

This paper presents the Nordic Tweet Stream, a cross-disciplinary digital humanities project that downloads Twitter messages from Denmark, Finland, Iceland, Norway and Sweden. The paper first introduces some of the technical aspects in creating a real-time monitor corpus that grows every day, and then two case studies illustrate how the corpus could be used as empirical evidence in studies focusing on the global spread of English. Our approach in the case studies is sociolinguistic, and we are interested in how widespread multilingualism which involves English is in the region, and what happens to ongoing grammatical change in digital environments. The results are based on 6.6 million tweets collected during the first four months of data streaming. They show that English was the most frequently used language, accounting for almost a third. This indicates that Nordic Twitter users choose English as a means of reaching wider audiences. The preference for English is the strongest in Denmark and the weakest in Finland. Tweeting mostly occurs late in the evening, and high-profile media events such as the Eurovision Song Contest produce considerable peaks in Twitter activity. The prevalent use of informal features such as univerbated verb forms (e.g., gotta for (HAVE) got to) supports previous findings of the speech-like nature of written Twitter data, but the results indicate that tweeters are pushing the limits even further.

Subjects
Twitter   corpus linguistics   language choice   oral discourse style   
URI
https://erepo.uef.fi/handle/123456789/6322
Link to the original item
http://www.jucs.org/jucs_23_11
Publisher
Know-Center
Collections
  • Filosofinen tiedekunta [248]
University of Eastern Finland
OpenAccess
eRepo
erepo@uef.fi
OpenUEF
Service provided by
the University of Eastern Finland Library
Library web pages
Twitter
Facebook
Youtube
Library blog
 sitemap
Search

Browse

All of the ArchiveResource types & CollectionsBy Issue DateAuthorsTitlesSubjectsFacultyDepartmentFull organizationSeriesMain subjectThis CollectionBy Issue DateAuthorsTitlesSubjectsFacultyDepartmentFull organizationSeriesMain subject

My Account

Login
University of Eastern Finland
OpenAccess
eRepo
erepo@uef.fi
OpenUEF
Service provided by
the University of Eastern Finland Library
Library web pages
Twitter
Facebook
Youtube
Library blog
 sitemap