Valence Analytics: tweets

Thursday, March 6, 2014

Sochi #Olympics and #Crimea Tweets in R (and Justin Bieber?!)

Hello Readers,

With the 2014 Olympics in Sochi over, we can reflect upon the social media coverage in the Olympic Games. What were Twitter users tweeting about that involved #Olympics during the games? And read on to see how Justin Bieber pops up (I was confused/surprised too).

In other news across the pond, if you have not been stuck in a cave recently, you might have heard about Russian forces 'occupying' the southern Ukrainian island of Crimea. And here is a series of informative satellite images of Crimea and military force disposition around Ukraine. We will see what Twitter users are talking about concerning #Crimea.

The twitteR package in R has many interesting features which we will use to query tweets from the newly modified Twitter API. So be sure to double check your unique consumer and secret keys, though they should not have changed. Also verify that the URLs used in the OAuth generation includes https://, and not just http://.

Let us see what Twitter users have been tweeting about the Olympics and Crimea. Onward with text analysis!

Querying Tweets

In previous posts I covered retrieving tweets from the Twitter API, and transforming them into documents to be analyzed. So here I will simply show the R code, beginning with establishing a handshake with the Twitter API (use https:// with recent changes).

Setting Up OAuth

During some of the times I queried tweets I found that the number of tweets returned from the API varies from 25 to 199, even if n was set at 300. A way around was to query multiple times and join the resulting tweets together. I had no problem with the Olympics tweets however, but they were queried weeks ago during the games.

Give Me More Tweets!

For the Olympics tweets the code would simply be:

olympics <- searchTwitter("#Olympics", n=300)

Text Transformations

Next we would convert the lists to data.frames, and then to a text corpus. After they are in a text corpus, we can transform the text so we can query the words effectively. The crimea.c transformations similar to the olympics.c so they are omitted.

Raw Lists to Transformed Text Corpus

Now we are ready to normalize the text so that we can count their frequencies, and turn them into term document matrices.

Word Stemming and Term Doc Matrix

From the term document matrix, we can calculate the frequency of terms across all tweets. For example, from the crimea.tdm tweets, we can print the words occurring more than 20 times:

High Frequency Words in Crimea Tweets

The next step would be to visualize these frequencies in a word cloud. #Olympics is up first.

#Olympics

In a previous post, we created a word cloud visualizing words from @nbastats. Here we shall do the same for the trend #Olympics.

Creating a Word Cloud

And here it is!

#Olympics Word Cloud

We see olympics, gold, sochi, and other strange terms such as jeremybieber (what!?). Who is Jeremy Bieber? An athlete? I had no idea, so I Google searched him, and he is definitely not an athlete. Apparently he is the father of 'notorious' pop star Justin Bieber. Some celebrity drama was unfolding during the Olympic Games and people flew to Twitter to comment. But it was weird (for me anyways) that music celebrities would be included in the same tweet tagged with #Olympics.

Upon more digging with Google, I found possible reasons: ice hockey and Canada. So the Beibers are originally from Canada and many Justin Bieber 'haters' want him to go back to Canada. With the American and Canadian hockey teams facing off at the Olympic semi-finals, this billboard popped up in Chicago:

Loser Keeps Bieber

So there was the cause for all the hullabaloo on Twitter. Unfortunately, so the 'haters', Justin stays because the USA men's hockey team lost to Canada's team 1-0 in the semi-finals. I could not make this up.

#Crimea

For Crimea (Bieber is not Ukrainian too, is he?), the word cloud code is the same, except change the *.tdm to crimea.tdm. And here is is:

#Crimea Word Cloud

We see many terms which associate with Crimea, such as ukraine, russia, putin, referendum, kiev, and etc., and also some Ukrainian terms as well- kyiv (Kiev, the Ukrainian capital). The majority of people in Crimea are Russian, while many in Western Ukraine are Ukrainian, wanting to join the EU. Crimea previously held a referendum for joining Russia in 2004 and would held another one today March 6th, 2014. The local lawmakers in Crimea voted unanimously in a referendum to join Russia, and would hold a regional vote in 10 days. For a video on the history of Crimea/Ukraine/Russia, click here.

I thought the terms in #Crimea were more logical and politically relevant than terms in #Olympics, although it was amusing to see Justin and his dad mentioned.

___________________________

Hopefully this post shows you how Twitter keywords or trends can be analyzed and visualized, especially when current events are concerned. It is near real-time text data of what people thinking about, and it is easy to analyze the tweets using R. Stay tuned for more R posts!

As always, thanks for reading,

Wayne
@beyondvalence

Monday, January 27, 2014

Querying Twitter API using Python: Part 2, Tweets

Hello Readers,

Here we will continue where we left off from querying US and World Twitter trends in Python. After we obtained the trends what else can we do? Using current international and US trends, we can create sets and perform intersections of the two trending sources to see which are in common. Then we will search for tweets based on a common popular trend!

Just a quick note, remember to recheck your Twitter OAuth tokens to ensure that they still are the same. Update them if they have changed, or else your previous script will not work presently. Also, I am running the python code as script from the command prompt, which is why the results have a black background.

Let us get started.

Twitter Trends as Sets

From the last post I demonstrated how to obtain the OAuth keys from Twitter by creating an app. Using the keys, we were able to access Twitter API using the twitter module. Afterwards we queried world and US trending topics and for this post, I queried the topics again to obtain recent trends. Here are some recent (as of the writing of this post) US trends shown in json format:

US Trends- json format

Yes, the '#Grammys' were on, and 'Smokey the Bear' was a trending item too, apparently. Next we will use the world and US trends and determine if there are any similar items by using the intersection of sets.

Finding Similar Trends

Below we have the results for the US trends as well as the intersection of the US and world trends. Note that the similar trending topics were: '#BeyonceAtGrammys', '#TheBachelorWedding', and 'Bey and Jay'.

US Trends and Popular Trends

Searching Tweets

Now that we have trending topics, we can search for tweets with a specific trending topic, say '#TheBachelorWedding'. We use the twitter_api.search.tweets() to query for tweets. However, we will take this one step further and query for more results with a for loop. Inside the search_results metadata there is a heading for 'next_results', which we will keep querying 5 times and add them to the statuses (tweets). Additionally, we will using '+= ' to modify statuses in place with new search_results.

One notion we must be aware is cursoring, as Twitter does not use pagination in the search results. Instead, we are given an integer signifying our location among the search results broken into pages. There is both a next and previous cursor given as we move along in the search results. The default cursor value is -1, and when we hit 0, there are no more results pages left.

Here are the results for the first status-tweet. It is quite extensive for a 140 character tweet, with 5kb of metadata. The tweet was:

"Catherine and Sean are perf. Bye. $\square$ $\square$ #TheBachelorWedding"

Tweet Data!

It is so lengthy that I had to truncate the results to capture it! I circled the tweeter's handle: @ImSouthernClass.

More Tweet Data!

Fantastic, we were able to query tweets based on the popular trends we obtained from Twitter. Next we will analyze the metadata behind multiple tweets and trends! So stay tuned!

Thanks for reading,

Wayne
@beyondvalence

Pages