Security Through Obscurity and Privacy in Practice

Yesterday, security researcher Ron Bowes published a 2.8GB database of information collected from public Facebook pages. These pages list all users whose privacy settings enable a public search listing for their profile. Bowes wrote a program to scan through the listings and save the first name, last name, and profile URI of each user (though only if their last name began with a Latin character). The database includes this data for about 171 million profiles.

On the one hand, I wasn’t entirely surprised by this news – it was only a matter of time before someone started building up such a dataset. I’ve previously mentioned that developer Pete Warden had planned on releasing public profile information for 210 million Facebook users until the company’s legal team stepped in. But nothing technical prevented someone else from attempting the task and posting data without notice. I imagine Facebook may not be too happy with Bowes’ data, but I’m not going to delve into the legal issues surrounding page scraping.

However, the event did remind me of a related issue I’ve pondered over the last few months: the notion of “security through obscurity” as it relates to privacy issues.

I’ve often referenced the work of danah boyd, a social media researcher that I highly respect. In a talk earlier this year at WWW2010 entitled, ”Privacy and Publicity in the Context of Big Data,” she outlines several excellent considerations on handling massive collections of data about people. One in particular that’s worth remembering in the context of public Facebook information: “Just because data is accessible doesn’t mean that using it is ethical.Michael Zimmer at the University of Wisconsin-Milwaukee has made similar arguments, noting that mass harvesting of Facebook data goes against the expectations of users who maintain a public profile for discovery by friends, among other issues. Knowing some of the historical issues with academic research involving human subjects, I tend to agree with these positions.

But a related point from boyd’s talk concerns me from a security perspective: “Security Through Obscurity Is a Reasonable Strategy.” As an example, she notes that people talking in public settings may still discuss personal matters, but they rely on being one conversation among hundreds to maintain privacy. If people knew other people were specifically listening to their conversation, they would adjust the topic accordingly.

In this “offline” example, taking advantage of obscurity makes sense. But boyd applies the same idea online: “You may think that they shouldn’t rely on being obscure, but asking everyone to be paranoid about everyone else in the world is a very very very unhealthy thing…. You may be able to stare at everyone who walks by but you don’t. And in doing so, you allow people to maintain obscurity. What makes the Internet so different? Why is it OK to demand the social right to stare at everyone just because you can?”

I would respond that at least three aspects make the Internet different. First, you rarely have anyway of knowing if someone is “staring at you” online. Public content on Facebook gets transferred to search engines, application developers, and individual web surfers every day without any notification to the creators of that content. Proxies and anonymizers can spoof or remove information that might otherwise help identify the source of a request. And as computing power increases each day, tracking down publicly accessible resources becomes ever easier.

Second, the nature of online data means that recording, parsing, and redistributing it tends to be far simpler than in the offline world. If I want to record someone’s in-person conversations, it’s theoretically possible that I could acquire a small recording device, place it in a convenient location, save the audio from it, type up a transcript of the person’s words, then send it to another person to read. But if I want to record someone’s conversations on Twitter (as an example), I can have all them in a format understandable to various computer-based analysis tools in just a few clicks. In fact, I could setup an automated system which monitors the person’s Twitter account and updates me whenever certain words of interest appear. Add the fact that this is true of any public Twitter account, and the capabilities for online monitoring grow enormously.

Finally, while digital content is in some ways more ephemeral than other media, web data tends to persist well beyond a creator’s ability to control. Search engine caches, archival sites, and user redistribution all contribute to keeping content alive. If someone records a spoken conversation on a tape, the tape can be destroyed before copies are made. But if you (or a friend of yours) post a sentence or photo on a social networking site, you may never be able to erase it fully from the Internet. Several celebrities have learned this the hard way lately.

From a privacy perspective, I wholeheartedly agree with boyd that we can’t expect users to become paranoid sysadmins. The final point of my own guide to Facebook privacy admonished, “You Have to Live Your Life.” But from a security perspective, I know that there will always be people and automated systems which are “staring at you” on the Internet. I’ve seen time and again that if data is placed where others can access it online, someone will access it – perhaps even unintentionally (Google indexes many pages that were obviously not meant for public consumption).

In my opinion, the only way to offer any setup online which resembles the sort of “private in public” context boyd described requires some sort of a walled garden, such as limiting your Facebook profile to logged in users. That alone still doesn’t provide the same degree of privacy, since many fake profiles exist and applications may still have access to your data. But while “security through obscurity” (or perhaps more accurately, privacy through obscurity) may be a decent strategy in many “offline” social situations, it simply can’t be relied on to protect users and data online.

Facebook users are starting to discover this firsthand. I’ve seen several reactions to Bowes’ release that characterize it as a security issue or privacy issue, and people have seemed quite surprised that building such a dataset was even possible. Yet it really shouldn’t come as a surprise to someone familiar with current technology and ways of accessing Facebook data. And it won’t be the last time we see someone make use of “public” data in surprising ways. Some of these uses may be unfortunate or unethical (see above), but we’ve often seen technology steam ahead in pursuit of fortune, and the web has many users with differing ideas on ethics. Reversing the effects of such actions may prove impossible, which is why I would argue we need to prevent them by not trusting obscurity for protection. And how do we balance this perspective to avoid unhealthy paranoia? I’m honestly not sure – but if content is publicly accessible online without any technical limitations, we can hardly consider it immune to publicizing.

Spam via Facebook Events Highlights Ongoing Challenges

Earlier today, I received an invitation to a Facebook event from “Giovanna” – someone I’d never heard of and certainly never added as a friend. The invite came as a bit of a surprise, since my profile was fairly locked down. While anyone could search for it, all profile information was set to “Friends Only,” and sending messages or making friend requests was limited to “Friends of Friends.” None of my friends seem to know Giovanna, and her profile is probably fake anyway.

The event title proclaimed “iPhone Testers Needed!” and might be enticing to users who want an iPhone. While the event page included more information on the supposed testing program, the invite was followed by a message from the event creator. Once you’re on the guest list for a Facebook event, the event administrators can send out Facebook messages you’ll receive, regardless of privacy settings. This particular message (which also arrived in my e-mail inbox due to notifications settings) included a link to the iPhone opportunity, which unsurprisingly was a typical “offer” page that required me to submit personal information and try out some service before I could get my fancy new phone.

I began investigating how this all happened. When you create a Facebook event and try to invite people, you’ll only see a list of your friends to choose from. But it turns out that on the backend, nothing prevents you from submitting requests directly to Facebook with other people’s Facebook IDs. In my testing, I’ve been able to send event invitations to other users even if we’re not friends and they have tight privacy settings. I’m guessing that using this technique to invite more than a few people could raise a spam alert, but I’m not sure. Also, an event invitation does not give the event creator increased access to any profile information of guests, but as already noted, it does let event administrators send messages to people they might otherwise not be able to contact.

I’m sure Facebook will take action soon to clamp down on this particular loophole, so I think it unlikely we’ll see it exploited too widely. (The iPhone testing event currently has around 1800 guests – significant, but tiny compared to other Facebook scams.) But it does demonstrate the sort of challenges Facebook is having to handle as their network and power expand. Several years ago, when the site was used for little besides keeping in touch with college classmates and other offline friends, Facebook was seen as mostly spam-free, in contrast to services like Myspace. Now that applications, social gaming friends, and corporate brands have all become integral parts of the Facebook experience, black hat marketers keep finding new ways to spread links among users. And worse, those tricks can often be used to spread malware as well.

I do think that Facebook wants to avoid annoying users with spam, and works to prevent your inbox on the site from becoming as flooded as a typical e-mail account. But a network of 500 million people presents a very enticing target, and we’ll keep seeing new scam ideas pop up as Facebook expands and adds features. In the mean time, continue to be wary of any links  promising a glamorous reward for free.