A front-page story in last Monday’s Wall Street Journal declared a “privacy breach” of Facebook information based on an investigation conducted by the paper. The Journal found that third-party applications using the Facebook Platform were leaking users’ Facebook IDs to other companies, such as advertising networks.
The report generated controversy across the Web, and some reactions were strongly negative. On TechCrunch, Michael Arrington dismissed the article as alarmist and overblown. Forbes’ Kashmir Hill surveyed other responses, including a conversation on Twitter between Jeff Jarvis and Henry Blodget, and expressed skepticism over the Journal’s tone.
I’ve been a bit surprised by the degree to which some have written off the Journal’s coverage. Some may disagree with the label of “privacy breach,” but I thought the report laid out the issues well and did not paint the problem as a conspiracy on the part of Facebook or application developers. Either way, I’m glad to see that the article has sparked renewed conversation about shortcomings of web applications and databases of information about web users. Also, many may not realize that information leakage on the Facebook Platform has historically been even worse.
Information leakage via a referrer is not a new problem and can certainly affect other websites. But that doesn’t lessen the significance of the behavior observed in the WSJ investigation. Privacy policies are nearly always careful to note that a service does not transfer personally identifiable information to third parties without consent. Online advertising networks often stress the anonymity of their tracking and data collection. The behavior of Facebook applications, even if unintentional, violated the spirit of such statements and the letter of Facebook’s own policies.
Some people downplayed the repercussions of such a scenario on the basis that it did not lead to any “private” profile information being transferred to advertisers – a point Facebook was quick to stress. Yet when did that become the bar for our concept of acceptable online privacy? Should other services stop worrying about anonymizing data or identifying users, since now we should only be concerned about “private” content instead of personally identifiable information? Furthermore, keep in mind that Facebook gets to define what’s considered private information in this situation – and that definition has changed over the last few years. At one time in the not-too-distant past, even a user’s name and picture could be classified as private.
Many reactions have noted that a Facebook user’s name and picture are already considered public information, easily accessed via Facebook’s APIs. Or as a Facebook spokesmen put it, “I don’t see from a logic standpoint how information available to anyone in the world with an Internet connection can even be ‘breached.’” But this argument fails to address the real problem with leaked IDs in the referrer. The issue was not simply what data applications were leaking, but when and how that data was leaked. The problem was not that advertisers could theoretically figure out your name given an ID number – it’s that they were given a specific ID number at the moment a user accessed a particular page. Essentially, advertisers and tracking networks were able to act as if they were part of Facebook’s instant personalization program. Ads could have theoretically greeted users by name – the provider could connect a specific visit with a specific person.
Interestingly enough, many past advertisements in Facebook applications did greet users by name. Some ads also including names and pictures of friends. Facebook took steps several times to quell controversies that arose from such tactics, but I’m not sure many people understood the technical details that enabled such ads. Rather than simply leak a user’s ID, applications were actually passing a value called the session secret to scripts for third-party ad networks.
With a session secret, such networks could (and often did) make requests to the Facebook API for private profile information of both the user and their friends, or even private content, such as photos. Typically, this information was processed client-side and used to dynamically generate advertisements. But no technical limitations prevented ad networks from modifying their code to retrieve the information. In fact, a number of advertisements did send back certain details, such as age or gender.
Change to the Facebook Platform, such as the introduction of OAuth earlier this year, have led to the deprecation of session secrets and removed this particular problem. I’m not sure how much this sort of information leakage or similar security problems motivated the changes, but problems with session secrets certainly persisted quite a while prior to them. If the WSJ had conducted their study a year ago, the results could have been even more worrying.
Still, I’m glad that the Journal’s research has led many to look more closely at the issues they raised. First, the story has drawn attention to more general problems with web applications. Remember, the Web was originally designed for accessing static pages of primarily textual information, not the sort of complex programs found in browsers today. (HTML 2.0 didn’t even have a script tag.) Data leaking via referrers or a page’s scripts all having the same scope are problems that go beyond Facebook apps and will likely lead to more difficulties in the future if not addressed.
Second, people are now investigating silos of information collected about website visitors, such as RapLeaf’s extensive database. Several responses to the Journal piece noted that many such collections of data provide far more detail on web users and are worthy of greater attention. I agree that they deserve scrutiny, and now reporters at the Journal seem to be helping in that regard as well.
We’ve entered an age where we can do things never previously possible. Such opportunities can be exciting and clearly positive, but others could bring unintended consequences. I think the availability and depth of information about people now being gathered and analyzed falls into the latter category. Perhaps we will soon live in a world where hardly any bit of data is truly private, or perhaps we will reach a more open world through increased sharing of content. But I think it well worth our time to stop and think about the ramifications of technological developments before we simply forge ahead with them.
Over the last few years, I’ve tried to bring attention to some of the issues relating to the information Facebook collects and uses. They’re certainly not the only privacy issues relevant to today’s Internet users, and they may not be the most important. But I think they do matter, and as Facebook grows, their importance may increase. Similarly, I think it wrong to dismiss the Journal’s investigation as “complete rubbish,” and I look forward to the rest of the dialogue they’ve now generated.