Early last year, when the Cambridge Analytica scandal first broke, Facebook announced a full review of every app which could have possibly been granted the same data access that Cambridge Analytica had, in order to ensure that it plugged all the loopholes and addressed all the security issues which had enabled CA to essentially weaponize Facebook's personal data insights for political gain.
This week, Facebook has provided an update on the state of this investigation, which again underlines the scope of the issue.
As per Facebook:
"Our App Developer Investigation is by no means finished. But there is meaningful progress to report so far. To date, this investigation has addressed millions of apps. Of those, tens of thousands have been suspended for a variety of reasons while we continue to investigate."
'Tens of thousands' is a lot, but that's also not surprising, given the way in which Facebook's developer system had been structured before The Social Network implemented security changes back in 2014.
As a refresher, Cambridge Analytica, under the guise of academic research, gained access to Facebook's massive database of user insights, which it then used to build a psychological profiling system for voters in the US and UK (and potentially other nations). With this, CA was then able to formulate political advertising which would work to each individual's personal psychological leanings, playing on their fears and stoking division, in order to influence the outcomes of elections.
The case revealed a massive flaw in Facebook's developer eco-system, which had enabled many app developers to access the personal information of Facebook users. In fact, some apps that didn't even seek these insights were able to access it - The Atlantic reported on an app called 'Cow Clicker', in which users clicked on cows to gain 'points', and how its developer suddenly found that he was able to mine Facebook data, even though he had no desire to do so:
"If you played Cow Clicker, even just once, I got enough of your personal data that, for years, I could have assembled a reasonably sophisticated profile of your interests and behavior. I might still be able to; all the data is still there, stored on my private server, where Cow Clicker is still running, allowing players to keep clicking where a cow once stood, before my caprice raptured them into the digital void."
The capacity to access user data in this way was not necessarily even sought for malicious purposes, but was an oversight on Facebook's part, which then lead to the Cambridge Analytica issue. Of course, no platform has ever had access to the depth of personal insight that Facebook now has - at 2.4 billion users on its main app, many sharing thousands of data points every day, Facebook has found itself at the epicenter of the 'big data' shift. But even it hadn't considered what that could mean, how it could be misused. Till Cambridge Analytica did it.
But that also points to another key issue here - while Facebook can take retrospective action, and ban all the developers and/or tools it suspects of potential data mining, even ban every tool previously approved before its rule changes and start all over again. Even if Facebook went that far, it wouldn't fix the problem. It's already too late to stop potential misuse of those insights.
Back in 2015, I interviewed Dr. Michal Kosinski, who was part of a research team that had conducted a very similar experiment to the one reportedly manipulated by Cambridge Analytica. In their study, Kosinski and his team used the results of a hundred question psychological study, which had been completed by more than 86,000 participants through a Facebook-linked app, to map the responses alongside each users’ respective Facebook likes, which were also accessible through the app’s permissions.
Based on these correlating data points, the research team was able work out a range of psychological traits, based on commonalities, with their results showing more predictive accuracy than family, friends, even partners of the subjects.
Using these insights, the team was able to establish baselines for a range of complex queries – here’s an example from Kosinski:
"One of our most surprising findings was that we could even predict whether your parents were divorced or not, based on your Facebook likes. Actually, when I saw those results, I started doubting my methods and I re-ran the analyses a few more times. I couldn't believe that what you like on Facebook could be affected by your parents' divorce, which could have happened many years earlier - we're talking here about people who might be 30 or 40 years old."
The insights gleaned from this research are still valid, and are just as accurate as they would have been back then. In this respect, it's not individual data that matters, but data scale, and once you've built a model of this type, it'll continue to be valid. You could match the data of someone who joined Facebook yesterday, and who's Liked or engaged with various Pages and posts, against this model, and this system would still show your psychological leanings, based on data from 2014.
People's psychological traits don't change, so the predictive capacity of these insights is still accurate, and will be for a long time. Individual elements may shift, but the point is that once the data has been ripped from Facebook's servers, it's gone, and it can be used for nefarious purpose.
There's nothing Facebook can do about that. Definitely, cleaning up its systems, and doing all it can to limit potential abusers is a necessary step to show that it's doing all it can to fix past problems. But does it matter?
The critical step lies in fixing the same issues moving forward, stopping more third party groups from gaining more data insights, which Facebook has done by changing its rules several years ago to plug the data drain.
But then again, Facebook itself still has the full data insight. Facebook itself, an individual, commercial organization, can still create incredibly detailed, accurate models of who you are, what you like. What will influence your vote.
And then there are the many other data sources - credit card companies know where you go, what you buy, what you spend, supermarket loyalty schemes know your purchase habits and your travel patterns (in combination with fuel rewards). Just recently, former Trump adviser Steve Bannon explained how they had used location data obtained from carriers to target ads to people who had recently been to Roman Catholic churches in certain regions, in order to target them with political ads.
“If your phone’s ever been in a Catholic church, it’s amazing, they got this data. Literally, they can tell who’s been in a Catholic church and how frequently.”
There are other ways to gather personal data. Facebook, through its sheer scale, remains a key concern, but the idea that this investigation will stop the usage of personal data for such manipulation and targeting is massively flawed.
You can read more about Facebook's ongoing app investigation here.