Looking at Facebook’s Strategy and Possible New Directions

Over the last few months, Facebook has rolled out several significant new features, such as Places and the updated Groups. On Monday, Facebook is holding another event to announce what many expect to be an improved messaging feature. As I’ve watched these changes, I’ve been thinking about where Facebook might be headed.

At first, I started to think Facebook was simply looking to extend its reach by acting as an invisible layer of sorts. Anil Dash once talked about Facebook melting into the larger Web, but perhaps Facebook would end up becoming part of the underlying fabric of the Internet. In past public appearances, Facebook CEO Mark Zuckerberg seemed to be the kind of person who was content to remain in the background, and the company’s strategy seemed to reflect a similar style. I’ve mentioned before the idea of Facebook becoming and identity layer on the Internet, and innovations such as their Graph API have made it easier than ever for sites to integrate with Facebook.

But Facebook’s updated Groups feature changed my perspective, since it added functionality that would drive users back to facebook.com. Of course, the upgrade did enable e-mail as a way of interacting with groups. In some ways, Facebook’s overall strategy could be compared to Google’s. Years ago, many sites focused on “stickiness,” trying to keep users hooked. By contrast, Google drove users away by providing relevant links to other sites. But to see Google as non-sticky would be an oversimplification. In fact, the company built a successful ad network that extended its reach across the web. Also, Google has created a number of other products that many people stay logged into, such as Gmail.

And now, people are expecting Facebook to announce a web-based e-mail client that will compete with Gmail. I’m predicting that Facebook will roll out a new messaging system, but it won’t be a Gmail clone or simply another client for managing traditional POP/IMAP e-mail. That’s not to say there won’t be any e-mail gateway, but I think Facebook’s plans will go much further. I’m guessing that at least part of the new system will involve somehow extending private messaging features across Facebook-integrated websites.

In any event, I think Facebook’s announcement will include at least a few surprises for those who have been discussing the possibilities. Facebook has a history of introducing features that aren’t quite what people expected – and often end up leading to practical implementations of ideas that were previously niche experiments. Personally, I think it’s a bit short-sighted to think that Facebook would simply join the market for web-based e-mail without trying to reinvent it, especially given the service’s cautiousness about past features that allowed or potentially allowed spam-like behaviors.

Facebook has also been accused many times of somehow standing in opposition to “openness.” Personally, I think the term has become a buzzword that’s often used without much specificity. And even though I’ve often been a critic of Facebook, I do think many of the accusations aren’t entirely fair. From RSS feeds to developer APIs, Facebook has opened up data in ways that many other sites can’t claim. Today’s Facebook is certainly far more “open” that years ago – in fact, I would argue that the site has at times been too open lately, such as when some user data became reclassified as “publicly available” last fall. But regardless of Facebook’s degree of openness, the company has always been careful to maintain a high degree of control over information and features on the site. This can be positive, such as quickly removing malware links, or negative, such as controversial decisions to bar users or certain content.

Either way, that control has helped the site build a powerful database of profiles that generally reflects real people and real relationships. That’s part of what fascinated me about the site’s recent spat with Google over contact information. In the past, a list of e-mail addresses was about the only semi-reliable way to identify a group of people across the Internet. Now, many sites rely on Facebook’s social graph for that function. In terms of identity, the value of e-mail addresses has declined, and I don’t think exporting them from Facebook would provide as much value as Google might think. On the other hand, Google may realize this and be so concerned about the shift that they’re trying to curb Facebook’s influence. This would especially make sense if Google intends to introduce a more comprehensive social networking product that would need e-mail addresses as a starting point. Regardless, I’m sure Google feels threatened by the prospect of Facebook providing a better alternative to traditional e-mail – a change that would only bolster the value of a Facebook profile as the primary way to identify a typical Internet user.

Thoughts on the Wall Street Journal’s Facebook Investigation

A front-page story in last Monday’s Wall Street Journal declared a “privacy breach” of Facebook information based on an investigation conducted by the paper. The Journal found that third-party applications using the Facebook Platform were leaking users’ Facebook IDs to other companies, such as advertising networks.

The report generated controversy across the Web, and some reactions were strongly negative. On TechCrunch, Michael Arrington dismissed the article as alarmist and overblown. Forbes’ Kashmir Hill surveyed other responses, including a conversation on Twitter between Jeff Jarvis and Henry Blodget, and expressed skepticism over the Journal’s tone.

I’ve been a bit surprised by the degree to which some have written off the Journal’s coverage. Some may disagree with the label of “privacy breach,” but I thought the report laid out the issues well and did not paint the problem as a conspiracy on the part of Facebook or application developers. Either way, I’m glad to see that the article has sparked renewed conversation about shortcomings of web applications and databases of information about web users. Also, many may not realize that information leakage on the Facebook Platform has historically been even worse.

Information leakage via a referrer is not a new problem and can certainly affect other websites. But that doesn’t lessen the significance of the behavior observed in the WSJ investigation. Privacy policies are nearly always careful to note that a service does not transfer personally identifiable information to third parties without consent. Online advertising networks often stress the anonymity of their tracking and data collection. The behavior of Facebook applications, even if unintentional, violated the spirit of such statements and the letter of Facebook’s own policies.

Some people downplayed the repercussions of such a scenario on the basis that it did not lead to any “private” profile information being transferred to advertisers – a point Facebook was quick to stress. Yet when did that become the bar for our concept of acceptable online privacy? Should other services stop worrying about anonymizing data or identifying users, since now we should only be concerned about “private” content instead of personally identifiable information? Furthermore, keep in mind that Facebook gets to define what’s considered private information in this situation – and that definition has changed over the last few years. At one time in the not-too-distant past, even a user’s name and picture could be classified as private.

Many reactions have noted that a Facebook user’s name and picture are already considered public information, easily accessed via Facebook’s APIs. Or as a Facebook spokesmen put it, “I don’t see from a logic standpoint how information available to anyone in the world with an Internet connection can even be ‘breached.’” But this argument fails to address the real problem with leaked IDs in the referrer. The issue was not simply what data applications were leaking, but when and how that data was leaked. The problem was not that advertisers could theoretically figure out your name given an ID number – it’s that they were given a specific ID number at the moment a user accessed a particular page. Essentially, advertisers and tracking networks were able to act as if they were part of Facebook’s instant personalization program. Ads could have theoretically greeted users by name – the provider could connect a specific visit with a specific person.

Interestingly enough, many past advertisements in Facebook applications did greet users by name. Some ads also including names and pictures of friends. Facebook took steps several times to quell controversies that arose from such tactics, but I’m not sure many people understood the technical details that enabled such ads. Rather than simply leak a user’s ID, applications were actually passing a value called the session secret to scripts for third-party ad networks.

With a session secret, such networks could (and often did) make requests to the Facebook API for private profile information of both the user and their friends, or even private content, such as photos. Typically, this information was processed client-side and used to dynamically generate advertisements. But no technical limitations prevented ad networks from modifying their code to retrieve the information. In fact, a number of advertisements did send back certain details, such as age or gender.

Change to the Facebook Platform, such as the introduction of OAuth earlier this year, have led to the deprecation of session secrets and removed this particular problem. I’m not sure how much this sort of information leakage or similar security problems motivated the changes, but problems with session secrets certainly persisted quite a while prior to them. If the WSJ had conducted their study a year ago, the results could have been even more worrying.

Still, I’m glad that the Journal’s research has led many to look more closely at the issues they raised. First, the story has drawn attention to more general problems with web applications. Remember, the Web was originally designed for accessing static pages of primarily textual information, not the sort of complex programs found in browsers today. (HTML 2.0 didn’t even have a script tag.) Data leaking via referrers or a page’s scripts all having the same scope are problems that go beyond Facebook apps and will likely lead to more difficulties in the future if not addressed.

Second, people are now investigating silos of information collected about website visitors, such as RapLeaf’s extensive database. Several responses to the Journal piece noted that many such collections of data provide far more detail on web users and are worthy of greater attention. I agree that they deserve scrutiny, and now reporters at the Journal seem to be helping in that regard as well.

We’ve entered an age where we can do things never previously possible. Such opportunities can be exciting and clearly positive, but others could bring unintended consequences. I think the availability and depth of information about people now being gathered and analyzed falls into the latter category. Perhaps we will soon live in a world where hardly any bit of data is truly private, or perhaps we will reach a more open world through increased sharing of content. But I think it well worth our time to stop and think about the ramifications of technological developments before we simply forge ahead with them.

Over the last few years, I’ve tried to bring attention to some of the issues relating to the information Facebook collects and uses. They’re certainly not the only privacy issues relevant to today’s Internet users, and they may not be the most important. But I think they do matter, and as Facebook grows, their importance may increase. Similarly, I think it wrong to dismiss the Journal’s investigation as “complete rubbish,” and I look forward to the rest of the dialogue they’ve now generated.

Two New Social Media Security White Papers Released

My employer (SecureState) has released two white papers as part of our Social Media Security Awareness Month.  You can also download some cool wallpaper for this month created by Rob our graphic designer (see the picture on the right).  🙂

First is some research several of my colleagues and I worked on.  The paper is titled: “Profiling User Passwords on Social Networks”.  The paper discusses the password problem that we all know and love as well as how you can determine passwords by what individuals post on their profiles.  We dive into tools from Robin Wood, Mark Baggett and others that can be used to pull keywords from profiles and other sources to create wordlists.  These wordlists can be used for brute force attacks on user accounts.  Next, we look at password complexity of several popular social networks with some research around brute force controls that some of the social networks have implemented, or in some cases haven’t.  Lastly, we discuss some things that users of social networks can do when choosing passwords.  You can download my paper here.

The other paper released is titled: “Security Gaps in Social Media Websites for Children Open Door to Attackers Aiming To Prey On Children” by my colleague Scott White.  In his paper he looks at the security of social media websites specifically designed for children.  This is some very detailed research and sheds some light on how predators are using these sites to target children as well as some issues that are unique to these types of social media websites.  You can download Scott’s paper here.

Speaking of social media…I’ll be presenting “Social Impact: Risks and Rewards of Social Media” at the Information Security Summit this Friday at 10am.  I’ll have the slide deck posted shortly after the conference.

Share and Enjoy


FacebookTwitterDeliciousDiggStumbleUponAdd to favoritesEmailRSS


Instant Personalization Program Gets New Partner, Security Issue

Facebook announced last week that movie information site Rotten Tomatoes would join Docs.com, Pandora, and Yelp as a partner in the social networking service’s “instant personalization” program. Rotten Tomatoes will now be able to automatically identify and access public information for visitors logged in to Facebook, unless those users have opted out of the program. This marks the first new partner since Facebook launched the feature earlier this year.

Soon after that initial roll-out, security researchers noted vulnerabilities on Yelp’s website that allowed an attacker to craft pages which would hijack Yelp’s credentials and gain the same level of access to user data. TechCrunch writer Jason Kincaid reported on the cross-site scripting (XSS) holes, and made this prediction: “I suspect we’ll see similar exploits on Facebook partner sites in the future.”

Kincaid’s suspicions have now been confirmed, as the latest site with instant personalization also had an exploitable XSS vulnerability, which has now been patched. I’ll quickly add that Flixster, the company behind Rotten Tomatoes, has always been very responsive when I’ve contacted them about security issues. They have assured me that they have done XSS testing and prevention, which is more than could be said for many web developers. In posting about this issue, I primarily want to illustrate a larger point about web security.

When I heard about the expansion of instant personalization, I took a look at Rotten Tomatoes to see if any XSS problems might arise. I found one report of an old hole, but it appeared to be patched. After browsing around for a bit, though, I discovered a way I could insert some text into certain pages. At first it appeared that the site properly escaped any characters which could lead to an exploit. But ironically enough, certain unfiltered characters affected a third-party script used by the site in such a way that one could then execute arbitrary scripts. Since I had not seen this hole documented anywhere, I reported it to Rotten Tomatoes, and they promptly worked to fix it.

I’ve long argued that as more sites integrate with Facebook in more ways, we’ll see this type of problem become more common. Vulnerable applications built on the Facebook Platform provided new avenues for accessing and hijacking user accounts; now external websites that connect to Facebook open more possible security issues. As Kincaid noted in May, “Given how common XSS vulnerabilities are, if Facebook expands the program we can likely expect similar exploits. It’s also worth pointing out that some large sites with many Facebook Connect users – like Farmville.com or CNN – could also be susceptible to similar security problems. In short, the system just isn’t very secure.”

Overcoming such weaknesses is not a trivial matter, though, especially given the current architecture of how scripts are handled in a web page. Currently, any included script has essentially the same level of access and control as any other script on the page, including malicious code injected via an XSS vulnerability. If a site uses instant personalization, injected scripts can access the data used by Facebook’s code to enable social features. That’s not Facebook’s fault, and it would be difficult to avoid in any single sign-on infrastructure.

Of course, all of this applies to scripts intentionally included in the page as well, such as ad networks. With the Rotten Tomatoes roll-out, Facebook made clear that “User data is never transferred to ad networks.” Also, “Partner sites follow clear product/security/privacy guidelines,” and I assume Facebook is monitoring their usage. I’m not disputing any of these claims – Facebook is quite correct that advertisers are not getting user data.

But that’s due to policy limitations, not technical restrictions. Rotten Tomatoes includes a number of scripts from external sources for displaying ads or providing various functions. Any of these scripts could theoretically access a Facebook user’s information, though it would almost certainly be removed in short order. I did find it interesting that an external link-sharing widget on the site builds an array of links on the page, including the link to a user’s Facebook profile. This happens client-side, though, and the data is never actually transferred to another server.

I bring up these aspects simply to note the technical challenges involved in this sort of federated system. I think it’s very possible that we will eventually see ad network code on a Facebook-integrated site that tries to load available user data. After all, I’ve observed that behavior in many Facebook applications over the last few years – even after Facebook issued explicit policies against such hijacking.

These dangers are part of the reason why JavaScript guru Douglas Crockford has declared security to be the number one problem with the World Wide Web today. Crockford has even advocated that we halt HTML5 development and focus on improving security in the browser first. While that won’t likely happen, I think Crockford’s concerns are justified and that many web developers have yet to realize how dangerous cross-site scripting can be. Perhaps these issues with instant personalization sites will help increase awareness and understanding of the threat.

Postscript: This morning, an XSS vulnerability on Twitter led to script-based worms (somewhat reminiscent of “samy is my hero”) and general havoc across the site. This particular incident was not related to any mashups, but once again emphasizes the real-world security ramifications of cross-site scripting in a world of mainstream web applications.

Update (Sep. 27): Today news broke that Scribd had also become part of Facebook’s Instant Personalization program. I took a look at the site and discovered within minutes that it has a quite trivial XSS vulnerability. This particular issue should have been obvious given even a basic understanding of application security. It also indicates that Facebook is not doing much to evaluate the security of new instant personalization partners. Update 2: Scribd patched the most obvious XSS issue right about the time I updated this post: entering HTML into the search box brought up a page that loaded it unfiltered. Another search issue remained, however: starting with a closing script tag would still affect code later in the results page. After about half an hour, that problem was also patched. I’m glad Scribd moved so quickly to fix these problems, but I still find it disconcerting they were there to start with. I’ve not done any further checking for other XSS issues.

Hacking Your Location With Facebook Places

I just published a post over on the SecureState blog about how to hack your location using Facebook Places.  The post brings up some interesting questions about how social networks are going to have a problem with fake location check-in’s. In the meantime, it’s a way to have fun with your friends…:-)

Share and Enjoy


FacebookTwitterDeliciousDiggStumbleUponAdd to favoritesEmailRSS


Facebook Privacy & Security Guide Updated to v2.3

Just a quick post that I have updated the Facebook Privacy & Security Guide to include information on configuring the privacy settings for Facebook Places.  You can find this on the first page under “Sharing on Facebook”.  Stay tuned for more information on Facebook Places in the next day or so!

Download the updated Facebook Privacy & Security Guide here (pdf download).

Facebook Places Brings Simple Location Sharing to the Masses

Yesterday, Facebook announced a much-anticipated feature that allows users to easily post their current location on the site. The new setup, known as Facebook Places, works much like other location-based services, such as Foursquare or Gowalla, by letting users “check in” at nearby places. Geolocation providers, such as a mobile phone’s GPS, pinpoint the user, and Localeze provides the initial database of places. Eventually, users will be able to add their own locations to the Facebook map. Inside Facebook has a run-down of the overall functionality.

Facebook also allows your friends to check you in at locations, and these check-ins are indistinguishable from ones you made for yourself. In typical opt-out fashion, you can disable these check-ins via your privacy settings, and you’ll be asked about allowing them the first time a friend checks you in somewhere.

Even if you stop friends from checking you in to places, however, they can still tag you with their check-ins, similar to how friends can tag you in photos or status updates. Such tags will appear on your wall, as tagged status updates do now. You’ll be able to remove tags after the fact, but it doesn’t seem that you’ll be able to prevent friends from tagging you altogether.

Applications have two new permissions related to places. One gives access to your check-ins, the other gives access to your friends’ check-ins as well. Both will appear in the list of requested permissions when you authorize an application, and they are required for API access to check-ins. If your friends grant an application access to friends’ check-ins, you can prevent yours from appearing via “Applications and Websites” privacy controls.

API access is currently read-only – authorized applications can access your check-ins, but can’t submit check-ins to Facebook. That sort of functionality is currently in closed testing, though.

ReadWriteWeb has a nice guide to applicable privacy settings. When these controls first appeared on my profile, Facebook set the visibility for all my check-ins to “Friends Only” by default and disabled API access to my check-ins via friends by default. But they also enabled by default another setting which makes individual check-ins visible to anyone nearby at the time, whether friends or not. The option for letting friends check me in was not specifically set, but apparently I would have been prompted the first time a friend checked me in.

According to Facebook, you will only be able to check-in at locations near where you are, as determined by the geolocation feature of your browser (or your phone’s GPS for the iPhone app). I’m a bit suspicious on how difficult faking a check-in will be, but I don’t yet have the ability to test that out.

Facebook’s initial geolocation rollout brings a fairly modest feature set, but when integrated with Facebook Pages and made available to a network of 500 million people, the service offers great potential. As with other recent changes, adding check-ins reduces friction for users to share their location and provides Facebook with another valuable set of data about people’s daily activities. It remains to be seen whether users will react with discomfort over the potential for an entirely new meaning of “Facebook stalking” or with excitement over potential new product offerings. Either way, the amount and variety of information under Facebook’s control continues to expand rapidly.

Security Through Obscurity and Privacy in Practice

Yesterday, security researcher Ron Bowes published a 2.8GB database of information collected from public Facebook pages. These pages list all users whose privacy settings enable a public search listing for their profile. Bowes wrote a program to scan through the listings and save the first name, last name, and profile URI of each user (though only if their last name began with a Latin character). The database includes this data for about 171 million profiles.

On the one hand, I wasn’t entirely surprised by this news – it was only a matter of time before someone started building up such a dataset. I’ve previously mentioned that developer Pete Warden had planned on releasing public profile information for 210 million Facebook users until the company’s legal team stepped in. But nothing technical prevented someone else from attempting the task and posting data without notice. I imagine Facebook may not be too happy with Bowes’ data, but I’m not going to delve into the legal issues surrounding page scraping.

However, the event did remind me of a related issue I’ve pondered over the last few months: the notion of “security through obscurity” as it relates to privacy issues.

I’ve often referenced the work of danah boyd, a social media researcher that I highly respect. In a talk earlier this year at WWW2010 entitled, ”Privacy and Publicity in the Context of Big Data,” she outlines several excellent considerations on handling massive collections of data about people. One in particular that’s worth remembering in the context of public Facebook information: “Just because data is accessible doesn’t mean that using it is ethical.Michael Zimmer at the University of Wisconsin-Milwaukee has made similar arguments, noting that mass harvesting of Facebook data goes against the expectations of users who maintain a public profile for discovery by friends, among other issues. Knowing some of the historical issues with academic research involving human subjects, I tend to agree with these positions.

But a related point from boyd’s talk concerns me from a security perspective: “Security Through Obscurity Is a Reasonable Strategy.” As an example, she notes that people talking in public settings may still discuss personal matters, but they rely on being one conversation among hundreds to maintain privacy. If people knew other people were specifically listening to their conversation, they would adjust the topic accordingly.

In this “offline” example, taking advantage of obscurity makes sense. But boyd applies the same idea online: “You may think that they shouldn’t rely on being obscure, but asking everyone to be paranoid about everyone else in the world is a very very very unhealthy thing…. You may be able to stare at everyone who walks by but you don’t. And in doing so, you allow people to maintain obscurity. What makes the Internet so different? Why is it OK to demand the social right to stare at everyone just because you can?”

I would respond that at least three aspects make the Internet different. First, you rarely have anyway of knowing if someone is “staring at you” online. Public content on Facebook gets transferred to search engines, application developers, and individual web surfers every day without any notification to the creators of that content. Proxies and anonymizers can spoof or remove information that might otherwise help identify the source of a request. And as computing power increases each day, tracking down publicly accessible resources becomes ever easier.

Second, the nature of online data means that recording, parsing, and redistributing it tends to be far simpler than in the offline world. If I want to record someone’s in-person conversations, it’s theoretically possible that I could acquire a small recording device, place it in a convenient location, save the audio from it, type up a transcript of the person’s words, then send it to another person to read. But if I want to record someone’s conversations on Twitter (as an example), I can have all them in a format understandable to various computer-based analysis tools in just a few clicks. In fact, I could setup an automated system which monitors the person’s Twitter account and updates me whenever certain words of interest appear. Add the fact that this is true of any public Twitter account, and the capabilities for online monitoring grow enormously.

Finally, while digital content is in some ways more ephemeral than other media, web data tends to persist well beyond a creator’s ability to control. Search engine caches, archival sites, and user redistribution all contribute to keeping content alive. If someone records a spoken conversation on a tape, the tape can be destroyed before copies are made. But if you (or a friend of yours) post a sentence or photo on a social networking site, you may never be able to erase it fully from the Internet. Several celebrities have learned this the hard way lately.

From a privacy perspective, I wholeheartedly agree with boyd that we can’t expect users to become paranoid sysadmins. The final point of my own guide to Facebook privacy admonished, “You Have to Live Your Life.” But from a security perspective, I know that there will always be people and automated systems which are “staring at you” on the Internet. I’ve seen time and again that if data is placed where others can access it online, someone will access it – perhaps even unintentionally (Google indexes many pages that were obviously not meant for public consumption).

In my opinion, the only way to offer any setup online which resembles the sort of “private in public” context boyd described requires some sort of a walled garden, such as limiting your Facebook profile to logged in users. That alone still doesn’t provide the same degree of privacy, since many fake profiles exist and applications may still have access to your data. But while “security through obscurity” (or perhaps more accurately, privacy through obscurity) may be a decent strategy in many “offline” social situations, it simply can’t be relied on to protect users and data online.

Facebook users are starting to discover this firsthand. I’ve seen several reactions to Bowes’ release that characterize it as a security issue or privacy issue, and people have seemed quite surprised that building such a dataset was even possible. Yet it really shouldn’t come as a surprise to someone familiar with current technology and ways of accessing Facebook data. And it won’t be the last time we see someone make use of “public” data in surprising ways. Some of these uses may be unfortunate or unethical (see above), but we’ve often seen technology steam ahead in pursuit of fortune, and the web has many users with differing ideas on ethics. Reversing the effects of such actions may prove impossible, which is why I would argue we need to prevent them by not trusting obscurity for protection. And how do we balance this perspective to avoid unhealthy paranoia? I’m honestly not sure – but if content is publicly accessible online without any technical limitations, we can hardly consider it immune to publicizing.

1 2 3 4 25