Validating social media data

Generating quantitative observations from unstructured data in social media is new. So, surprise, it's a field that doesn't have mature standards yet. Really, we don't even have accepted definitions yet, because mainstream marketersâ€"who are still hearing that they need to start listeningâ€"don't know enough about social media practices to drive standardization. It's not too early for challenges to the validity of the data, however.

Social media analysis is not (usually) survey research
Because it's not widely understood, and the discussion has tended to focus on the benefits of listening, social media analysis is sometimes criticized for not following the standards of other types of research. George Silverman wrote a good example comparing online and traditional focus groups, for example.

Justin Kirby took a different swing at social media measurement, comparing data mining to survey research:

Just look at buzz monitoring practitioners who place great stock in sentiment analysis, but have none of the usual checks and balances (such as standard deviation) that underpin data validity within traditional research. If you can't calculate any margin of error, let alone show that you're listening to a representative sample of a target market, then how can you really prove that your analysis is sorting the wheat from the chaff and contributing valuable actionable data to your client's business?

(Justin has points worth pondering in the rest of his article, so go read it. I did note that marketers became advertisers early in the article, which suggests a partial answer to his complaint.)

Traditional research is based on sampling, where tests to determine the validity of the sample data are critical (and, typically, poorly understood). Most social media analysis vendors are using automated methods to find all of the relevants posts and comments on a topic, which go into their analytical processes.

Testing the results
I won't argue against the idea of tests to validate the data, but tests created for surveys and samples aren't necessarily relevant to new techniques. The question is, what's the right test of a "boil the ocean" methodology? Here are some of the challenges, which are different from "is the sample representative?"

How much of the relevant content did the search collect (assuming the goal is 100%â€"if not, you're sampling)?
How accurately did the system eliminate splogs and duplicate content?
How accurately did the system identify relevant content?
How accurate is the content scoring?

Reaching perfection
Ideally, results would be reproducible, competing vendors would get identical results, and clients would be able to compare data between vendors. Theoretically, everyone is starting with the same data and using similar techniques. All that's left is standardizing the definitions of metrics and closing the gap between theory and practice. Easy, right?