Thursday, March 6, 2014

Sochi #Olympics and #Crimea Tweets in R (and Justin Bieber?!)

Hello Readers,

With the 2014 Olympics in Sochi over, we can reflect upon the social media coverage in the Olympic Games. What were Twitter users tweeting about that involved #Olympics during the games? And read on to see how Justin Bieber pops up (I was confused/surprised too).

In other news across the pond, if you have not been stuck in a cave recently, you might have heard about Russian forces 'occupying' the southern Ukrainian island of Crimea. And here is a series of informative satellite images of Crimea and military force disposition around Ukraine. We will see what Twitter users are talking about concerning #Crimea.

The twitteR package in R has many interesting features which we will use to query tweets from the newly modified Twitter API. So be sure to double check your unique consumer and secret keys, though they should not have changed. Also verify that the URLs used in the OAuth generation includes https://, and not just http://. 

Let us see what Twitter users have been tweeting about the Olympics and Crimea. Onward with text analysis!

Querying Tweets

In previous posts I covered retrieving tweets from the Twitter API, and transforming them into documents to be analyzed. So here I will simply show the R code,  beginning with establishing a handshake with the Twitter API (use https:// with recent changes). 

Setting Up OAuth

During some of the times I queried tweets I found that the number of tweets returned from the API varies from 25 to 199, even if n was set at 300. A way around was to query multiple times and join the resulting tweets together. I had no problem with the Olympics tweets however, but they were queried weeks ago during the games.

Give Me More Tweets!
For the Olympics tweets the code would simply be:

olympics <- searchTwitter("#Olympics", n=300)

Text Transformations

Next we would convert the lists to data.frames, and then to a text corpus. After they are in a text corpus, we can transform the text so we can query the words effectively. The crimea.c transformations similar to the olympics.c so they are omitted.

Raw Lists to Transformed Text Corpus
Now we are ready to normalize the text so that we can count their frequencies, and turn them into term document matrices.

Word Stemming and Term Doc Matrix

From the term document matrix, we can calculate the frequency of terms across all tweets. For example, from the crimea.tdm tweets, we can print the words occurring more than 20 times:

High Frequency Words in Crimea Tweets

The next step would be to visualize these frequencies in a word cloud. #Olympics is up first.


In a previous post, we created a word cloud visualizing words from @nbastats. Here we shall do the same for the trend #Olympics.

Creating a Word Cloud

And here it is!

#Olympics Word Cloud

We see olympics, gold, sochi, and other strange terms such as jeremybieber (what!?). Who is Jeremy Bieber? An athlete? I had no idea, so I Google searched him, and he is definitely not an athlete. Apparently he is the father of 'notorious' pop star Justin Bieber. Some celebrity drama was unfolding during the Olympic Games and people flew to Twitter to comment. But it was weird (for me anyways) that music celebrities would be included in the same tweet tagged with #Olympics.

Upon more digging with Google, I found possible reasons: ice hockey and Canada. So the Beibers are originally from Canada and many Justin Bieber 'haters' want him to go back to Canada. With the American and Canadian hockey teams facing off at the Olympic semi-finals, this billboard popped up in Chicago:

Loser Keeps Bieber
So there was the cause for all the hullabaloo on Twitter. Unfortunately, so the 'haters', Justin stays because the USA men's hockey team lost to Canada's team 1-0 in the semi-finals. I could not make this up.


For Crimea (Bieber is not Ukrainian too, is he?), the word cloud code is the same, except change the *.tdm to crimea.tdm. And here is is:

#Crimea Word Cloud

We see many terms which associate with Crimea, such as ukraine, russia, putin, referendum, kiev, and etc., and also some Ukrainian terms as well- kyiv (Kiev, the Ukrainian capital). The majority of people in Crimea are Russian, while many in Western Ukraine are Ukrainian, wanting to join the EU. Crimea previously held a referendum for joining Russia in 2004 and would held another one today March 6th, 2014. The local lawmakers in Crimea voted unanimously in a referendum to join Russia, and would hold a regional vote in 10 days. For a video on the history of Crimea/Ukraine/Russia, click here.

I thought the terms in #Crimea were more logical and politically relevant than terms in #Olympics, although it was amusing to see Justin and his dad mentioned.


Hopefully this post shows you how Twitter keywords or trends can be analyzed and visualized, especially when current events are concerned. It is near real-time text data of what people thinking about, and it is easy to analyze the tweets using R. Stay tuned for more R posts!

As always, thanks for reading,


1 comment: