Data is everywhere in the modern world. Almost every interaction touches, in some way, a digital communications medium, meaning it can be tracked, it can be saved and it can be logged for analysis and reference. It's estimated that more than 90% of the world's data has been generated in the last two years, with an increasing part of that coming from social networks - there's so much data that no one could fully understand what it might mean in terms of insights and analytics, and the possibilities of what we might be able to ascertain from such masses of information.
Reports have shown that social media data - Twitter data, specifically - can predict things like flu outbreaks, earthquakes, even stock market fluctuations. But can Twitter data be used to predict the outcome of an election? A range of academic studies have been conducted on this very topic, and while most suggest that Twitter can be used as a solid indicator of election outcomes, there are some provisos built in around the accuracy of voter swings and the relevance of sentiment.
Interestingly, we may have a ready-made test case for this, with Twitter's Canadian blog this week publishing a listing of stats ahead of the nation's upcoming federal election. The findings show that Justin Trudeau is clearly in the lead in terms of both mentions and follower growth - but will that mean Trudeau will ultimately win the election? It's an interesting one to watch - here are the findings thus far.
On the Campaign Trail
From the Twitter Canada blog post:
"With 6,000,000+ election-related Tweets sent over the past two-and-a-half months, Canadians have flocked to Twitter to discuss key issues, follow candidate debates and share opinions as the #elxn42 campaign took shape across the nation. By comparison, there were just over 4,000,000 #cdnpoli tweets during the 12 months of 2014."
That's a lot of data to work with - given this, there must be some way to glean some level of comparative insights from tweet mentions, right? The conversation around the Canadian election has ramped up over time, with peaks at important junctures in the campaign, as highlighted in this mentions graph.
So we know when people have been more engaged, and that they have, in fact, been significantly engaged in the election via tweet - but how does that relate to the specific candidates? And what can that show us, in terms of predicting the final outcome?
As noted, Twitter data currently indicates that Justin Trudeau holds a commanding lead in terms of mentions, holding 36% of the share of voice.
And Trudeau has also gained the most Twitter followers during the campaign with 94,000+ new additions between August 2 and October 14, 2015.
Which would suggest that Trudeau is the one to watch - but, of course, these graphs don't measure sentiment, which is an important element of any such analysis. The first graph alone, looking at share of voice, could be meaningless, as more mentions doesn't necessarily mean more popularity, but then the inclusion of the second indicator, follower growth, would suggest that Trudeau is gaining support, which contextualizes those mentions a little more. Even without sentiment, could these two charts, in conjunction, be indicative of the likely outcome on October 19th?
Sentiment and Context
In an article published in The Atlantic in 2012, Alexander Furnas argued that you can't use Twitter to predict elections because, among other reasons, Twitter is a non-representative sample of people, and political tweeting is also a niche activity, which biases the results.
"It may be possible to model public opinion with Twitter, but it would require a much more sophisticated understanding than we now have about who tweets about politics and why, how their tweets relate to their offline actions, and how they differ from the general voting population. Without high-level mechanisms for accounting for these systemic biases, proper weighting and careful, context sensitive interpretation and analysis election prediction with Twitter is a case of "garbage in, garbage out" data work. "
While that article was published a few years back now, this sentiment is largely indicative of the skepticism around Twitter's data accuracy, and thus, the ability to use such input as a means to predict polling outcomes. In counter to this, in a study titled "On Using Twitter to Monitor Political Sentiment and Predict Election Results" published by researchers from Dublin City University in 2011, the analysts concluded that Twitter data could be used as an accurate indicator, with tweet volume cited as the key measurable.
"...we observe that volume is the single biggest predictive variable followed by inter-party sentiment. Given sufficient data, intra-party sentiment appears to be less valuable as a predictive measure. Our speculation is that the relative success of the inter-party sentiment is due to the closed nature of the system."
That research indicated that volume was a more accurate indicator than sentiment because volume better represents the relative popularity among the population, while sentiment can be reactive and influenced by responses to a given news story or event.
Researchers from Germany came to similar conclusions in 2010:
"The mere number of tweets reflects voter preferences and comes close to traditional election polls, while the sentiment of Twitter messages closely corresponds to political programs, candidate profiles, and evidence from the media coverage of the campaign trail."
In this sense, the correlation of share of voice and increased following could well be indicative of the pending result in the Canadian election. Is that how it will play out? While it's only one example, it's an interesting case study, and one worth considering as we await the results of the Canadian election, which is being held next week.