Measuring Twitter's reputation
Twitter has announced - to much fanfare and discussion - that it wants to build a reputation ranking system for its users in order to bring more credibility to its trend and search tools. In fact, Twitter trending has been one of the hottest topics on - Twitter. Having thought a lot about how to measure reputation and extract trends from social media, we wondered how Twitter planned to do this given their terse, unstructured activity streams.
Issues of users gaming the system aside (which we'll address in a later post), 140 characters, no tweet categorization other than voluntary hashtags, and no community feedback (e.g. voting) on tweets doesn't leave much to work with. To add a trustworthy reputation ranking system to the existing service, Twitter will likely have to introduce more structure and categorization in tweets. This will force users to work harder, and will almost certainly impact the fast, free-form ethos that now characterizes Twitter. Ironically, one of the main things that drove Twitter's phenomenal growth - input simplicity - may be the biggest impediment to making searches and trend analyses of tweets trustworthy.
What exactly do we mean by “more structure and categorization” of user input? Well, what better way to explain it than to look at how Vanno determines the reputation of Twitter the company? To be clear, we're analyzing the activity stream around a company to measure the reputation of a company, but the process is the same for analyzing the activity stream of a user to measure the reputation of the user.
The graph at the top of the post shows the trend in Twitter's overall reputation rank (out of the 6000+ companies we track) for the last 7 months, along with the submission dates of some of the stories that drove the rankings. If you look at story details for Twitter on Vanno, you'll see that a user has to do some real work to submit a story - identify specific companies to which the story refers and then explicitly decide whether the story strengthens or weakens the selected aspects of reputation (chosen from a structured list) for those companies. Then other registered users have to vote on whether they agree or disagree with the submitter's analysis. This structure, coupled with the Bayesian analysis methods we use, allows us to quickly infer a company's reputation from a relatively small sample of stories and votes.
Another benefit of more structured user input is the ability to see exactly how specific aspects of reputation drive the overall rank. In this plot, for example, you can see how social responsibility (which includes things like corporate governance and the avoidance of controversial business) and customer satisfaction drive Twitter's reputation.
The challenge that Twitter now faces in trying to implement a user reputation ranking system highlights a fundamental tension in social media. Low barriers of entry (e.g. simple, unstructured input) help grow a user base quickly. But a large user base generates lots of tempting data to mine and exploit, which surfaces the credibility and reliability issues that ultimately impose more constraints/requirements on users and/or their input.