Scraping Twitter for abusive tweets and Twitter Abused, a Video Narrative

1) Scraping Twitter for abusive tweets



As a result of an interesting CAST London lab session on the use of Python and advanced Twitter API search techniques to scrape Twitter for tweets within a specific period of time and around a specific subject matter, I decided to apply these methods to do a basic Python scrape in relation to my main research question: “On the Twitter platform, who abuses who, and how do they respond?”, and perform an analysis of the most recent Football (soccer) controversy, where England Captain John Terry has been accused of allegedly racially abusing footballer Anton Ferdinand during a football match.

To answer this question, I aimed to analyse a small sample of the abusive comments that have been appeared on Twitter towards both footballers involved in the John Terry and Anton Ferdinand controversy.

I was inspired by the visualisation of tweets by on the Toris Eye website, and their animation of the Firefox Tweet Machine for the Mozilla 2010 Summit, and wanted to use a similar method to visualise my results.


Using some Python scripting techniques mixed with some advanced Twitter Developer API commands learnt from from the CAST Lab’s Python and Twitter scraping session, I ran the following script to scrape abusive tweets aimed at John Terry and his involvement the row over alleged racial abuse of Queens Park Rangers footballer Anton Ferdinand:


import tweepy

# create our twitter access object
api = tweepy.API()

# downloads the timeline
timeline =‘”john terry” + “ferdinand” + “racist” + “since:2011-11-02″‘)

# iterate through each of the tweets, + print its contents
for result in timeline:
print result.text
print result.iso_language_code

Running this Python scrape gave me a list of about 50 tweets that mentioned negative comments in relation to the John Terry vs Anton Ferdinand incident and police investigation. I then decided to use a combination of the Twitter Developer API’s ‘GET search’ methods and Python to write a small script using jQuery and JSON to do a scrape which covered the terms “q=johnterry+racist since:2011-11-02”, to pull through the results of my Twitter scrape into an interactive web animation.  I displayed them on a web page on the Goldsmith’s server using a combination of CSS and HTML5 to make individual abusive tweets appear when a user hovers their mouse over each bird flying across a map of London (where the incident took place).

Results and Discussion

See my visualisation here:

My search terms were deliberately quite restricted as I wanted to find out how many people were tweeting negatively about John Terry by using a deliberately negative term “racist” in my search query.   I considered including re-tweets in my search terms, but found that this brought up a list of re-tweeted news reports from the general media, only good if I wanted to analyse the sentiment media representation of the John Terry incident, but not if I wanted to find out what the wider population were tweeting – for this reason, I decided to leave out re-tweets in my search term

As a result of this I found that on a general reading, quite a lot of tweets were abusive, and made derisive jokes linking John Terry to other racially charged incidents reported over the same period in the press, I also found that some of the tweets were actually quite neutral, questioning if JohnTerry really was racist, wondering what the outcome of the police investigation would be, and commenting on the way the story has been handled in the press.  In terms of who was doing the abusive (or negative) tweeting, the general results showed the majority of tweets were from people expressing their views on the whole incident.

My interactive visualisation of these results gives one a somewhat broad snapshot of the general sentiment of people’s tweets about the incident, however, the mixed sentiments highlighted above made me want to see a more quantitative snapshot of twitter sentiment around the John Terry at this point in time. To do this, I decided to go back to Python, and use a very basic sentiment analysis script to measure the mood re. John Terry on Twitter:


import urllib2
import simplejson

url = “”
sentence = “‘John Terry'”

query = “text=%s” % sentence

request = urllib2.Request(url, query, {‘Content-Type’: ‘application/json’})
response = urllib2.urlopen(request)
body =
sentiment = simplejson.loads(body)
print “Positive: %.2f%%” % sentiment[‘probability’][‘pos’]
print ” Neutral: %.2f%%” % sentiment[‘probability’][‘neutral’]
print “Negative: %.2f%%” % sentiment[‘probability’][‘neg’]

The raw python results showed that Twitter sentiment re. John Terry (at 1st December 2011) was:

Positive: 0.46%
Neutral: 0.78%
Negative: 0.54%

This result shows that a python scrape of Twitter using the simple term “John Terry” shows that quantitatively, the sentiment of tweets mentioning his name is more neutral than negative.


It seems that visualising my initial python scrape shows a more general and negative mood towards John Terry in relation to his alleged racial abuse of Anton Ferdinand.  The majority of tweets shown when hovering over the birds in my animation, reveal that it is mainly real people tweeting their views on the controversy – (as mentioned above, I tended to pick up more news reports from the press when I included re-tweets in my Twitter scrape).

However, a further more quantitative sentiment analysis scrape shows that far from the more general feeling of negative sentiment experienced from just looking at my visualisation, although there seems to be a high rate of negative sentiment on Twitter (0.54%), there is actually higher percentage of neutral commentary (0.78%).  One reason for this could be that my simple python sentiment scrape did not differentiate the re-tweets of more general news articles and press releases, which may blur the boundaries between real human sentiment and basic reporting.  This is something I would like to investigate further at a later date.

2) Twitter Abused, a Video Narrative


In another session in the CAST London Sandbox Lab, I learnt how to edit in Final Cut Pro X, using a mixture of video and audio material.


I decided to use these skills to put together a combination of news clips, music, screengrabs of key articles, and a dramatization of Twitter abuse. This narrative gives examples of the varied people who abuse others on Twitter, how they respond to abuse, and how depending on the context, some words can be seen as more abusive than others.

Once uploaded to Youtube, I used the platform’s annotation tools to added notes and speech bubbles to the video to guide the viewer through the key questions addressed in the piece.



In relation to part of my question: “Who abuses who?”, I found that in the process of collecting, editing and annotating the video footage,  at least in the way it is reported in the media, there is quite a varied spread of public figures and celebrities who have been abused via Twitter (e.g. Deputy Prime minister Nick Clegg or TV chef Lorraine Pascale), or who have committed abuse (whether deliberately or inadvertently, e.g. ex-MP Stuart MacLennan or Ricky Gervais) via Twitter since the launch of the platform.

The Twitter platform also opens up public figures and celebrities to a more immediate and critical analysis of what they have said on their personal Twitter accounts.  A good example from the video is the case of Ricky Gervais and his controversial use of the word ‘Mong’ – in the space of a few tweets, he had managed to inadvertently insult a range of people who had associations or children with, Down’s Syndrome.  It also provoked an interesting debate over what words are deemed as abusive, and whether the context and specific use of the words in comedic terms could be construed as offensive – suddenly parents, representatives of charities for the disabled people, and more, were pulled into the debate. In terms of “how people respond” with regards to this particular case, in the end it seemed somewhat ironic that the final statement from Gervais was on Twitter, where the whole controversy began, admiting that he may have been naive about the use of the term.

The question of “how people respond” was addressed again at a more formal level, where I found articles showing differences in how the UK police responded more proactively to reports of Twitter abuse from celebrities such as TV chef Lorraine Pascale, but less enthusiastically when the person reporting the abuse did not have as high a profile in the media (e.g Nabila Ramdani).

The final dramatisation, although giving an element of light relief, also shows how potentially dangerous sending abusive messages via Twitter can be, especially if the person you are abusing has access to you in real life.

1 Comment + Add Comment