Access Facebook Data Without Logging in to Facebook

(N.B.: This is not an April Fool’s joke.)

Programmer Pete Warden made headlines a few months ago after creating a dataset of public profile information from 210 million Facebook users. Warden gathered his data by crawling the public search pages of some users have enabled, and planned on releasing it to the public. But Facebook threatened legal action, prompting Warden to destroy the information rather than risk an expensive court battle.

While I’m sympathetic to the privacy implications that led some to criticize Warden’s planned release, I also think that exposing the data would be an effective way of awakening Facebook users to what’s possible with information now classified as public. And while Warden abided by Facebook’s demands, it’s only a matter of time before someone less compliant publishes a similar dataset. Besides, many search engines already have similar resources in their indexes.

I’ve previously demonstrated how much content is actually available for logged-in Facebook users through various techniques. But indexing all of that content would definitely violate Facebook’s terms of use. What about truly public data, though, that’s accessible even to anonymous Facebook visitors and search engines? How much information can be seen without logging in?

To answer that, I’ve created yet another bookmarklet, though this one is far more complex and will likely not yield many results for most user. This trick is more a proof of concept. If you’re trying to access private profile information, this tool will not help you.

The bookmarklet works by adding a bar of links to a public search page for a Facebook user. (Note that not all users allow a public search page to appear for their profile.) These links attempt to load public content for several of Facebook’s standard applications, including the user’s “Boxes” tab. In order to see anything, the user must at minimum (1) set the visibility of the given application to “everyone,” and (2) create content within the application marked as visible to “everyone.” Even then, you may not get any results – I’ve found that the photos application seems to only display a user’s “Profile Pictures” album if it is set to public.

To see the trick in action, Mark Zuckerberg, Robert Scoble, or Louis Gray.

Feedback and questions are welcome (theharmonyguy@gmail.com or comment below), but please note I publish this bookmarklet as a convenience and will likely not provide detailed technical support.

Update (April 12): A reader pointed out to me that the bookmarklet was not working on public search pages for users who do not have vanity URIs. I’ve now updated the code to work regardless of the URI format.

Social Media Security Podcast 12 – New Facebook Privacy Changes, Social Gaming Threats, Social Media in the Workplace

This is the 12th episode of the Social Media Security Podcast recorded March 28, 2010.  This episode was hosted by Tom Eston and Scott Wright.  Below are the show notes, links to articles and news mentioned in the podcast:

Please send any show feedback to feedback [aT] socialmediasecurity.com or comment below.  You can also call our voice mail box at 1-613-693-0997 if you have a question for our Q&A section on the next episode.  You can also subscribe to the podcast in iTunes. Thanks for listening!

Facebooks Proposed Privacy Changes: What You Need to Know

I won’t put together a long post about the recently proposed Facebook Privacy Policy/Statement of Rights and Responsibilities changes.  There are already some very good analysis on the subject.  However, below are links to some of the best blog posts and research to check out.  Note that the comment period ends on April 3, 2010 at 12am PDT.  Make your comments on the Facebook Site Governance document page here.

Links to the proposed changes
Facebook Privacy Policy and Statement of Rights and Responsibilities Updates

Detailed Analysis Worth Reading
Facebook Proposes Broad Updates To Governing Docs — Our Analysis (from Inside Facebook)
How Facebook is Adding an Identity Layer to the Internet (from theharmonyguy)
Yet Again, Facebook Misunderstands Privacy (from MichaelZimmer.org)
Facebook Again to Test Privacy Boundaries (from Fred Stutzman)
Is Facebook Unliking Privacy? (from the ACLU of Northern California)

Also, be sure to check out Social Media Security Podcast Episode 12 which will be released soon!  Scott Wright and I will be talking about these changes with some analysis as well.

Facebook Allowed Automatic Data Sharing Last November

Proposed changes to Facebook’s governing documents would allow the service to automatically share certain data when users visit third-party web sites, a move drawing widespread criticism and concern. However, I took another look at changes Facebook made last year, and from what I read, the sort of behavior people are worried about is already allowed. Facebook’s current privacy policy was last revised December 9, 2009, but all of the sections referenced in this post were added on November 19, 2009.

First, let’s recap what Facebook considers publicly available information:

Certain categories of information such as your name, profile photo, list of friends and pages you are a fan of, gender, geographic region, and networks you belong to are considered publicly available, and therefore do not have privacy settings. You can limit the ability of others to find this information on third party search engines through your search privacy settings.

This also applies to content marked “everyone,” though without the search engine exception:

Information set to “everyone” is publicly available information, may be accessed by everyone on the Internet (including people not logged into Facebook), is subject to indexing by third party search engines, may be associated with you outside of Facebook (such as when you visit other sites on the internet), and may be imported and exported by us and others without privacy limitations.

The policy goes on to discuss how this applies to “Facebook-enhanced” applications and websites, which are previously defined as applications using the Facebook Platform or sites using Facebook Connect (and also notes earlier that “in order to personalize the process of connecting, we may receive a limited amount of information even before you authorize the application or website”). Here’s the relevant section, with my emphasis added:

As mentioned above, we do not own or operate Facebook-enhanced applications or websites. That means that when you visit Facebook-enhanced applications and websites you are making your Facebook information available to someone other than Facebook. To help those applications and sites operate, they receive publicly available information automatically when you visit them, and additional information when you formally authorize or connect your Facebook account with them.

In other words, the current Facebook privacy policy already allows your “publicly available information,” which includes your name, gender, geographic region, friends list, fan pages, and your content marked “everyone,” to be automatically shared with external web sites when you visit them. The only thing apparently preventing this from happening right now is technology – Facebook has not yet rolled out an official means for Facebook Connect sites to automatically access such data. Apparently they soon plan on adding that technology for certain “pre-approved” sites, an update which the newer governing documents make more explicit.

How Facebook is Adding an Identity Layer to the Internet

In what may become the next major privacy controversy for Facebook, the company has announced plans to automatically share certain information when a Facebook user visits certain “pre-approved” sites. In clarifying the feature, a spokesperson told VentureBeat that people should “think about Facebook Connect, but the user gets that experience when they arrive at the site rather than after clicking Connect.”

Given the way Facebook has repeatedly described “publicly available information” (PAI) since last fall’s privacy changes, this update is actually a logical next step for the company. Under a strict interpretation of Facebook’s policies, nothing would prevent a site from making use of such information already. Only technological barriers currently block the information flow – specifically, a site doesn’t automatically know who you are on Facebook when you visit.

At least, so it would seem. Researchers have already outlined ways that sites can infer a visitor’s social networking profile from other tracking mechanisms. In some ways, the new Facebook auto-connect simply builds on cookies and inline frames, the sources of earlier online privacy controversies. Furthermore, several security researchers have demonstrated exploits that led to data leakage. Nitesh Dhanjani demonstrated earlier this year that an authentication issue could give sites automatic access to the PAI of visitors, and just this week I reported to Facebook a vulnerability in their Platform that would allow sites to silently harvest all of a user’s profile information (details pending a patch).

Given the amount of data already flowing to Facebook applications and Facebook Connect sites (as well as their advertisers), the company’s moves towards more and more public sharing, and the history of privacy/security problems on the Facebook Platform, I’ve long argued that Facebook users should treat all of their content on the site as public. But Facebook has worked hard to maintain user trust, even making some content appear to be more private than it actually is. When I first discussed accessing public but hidden photo albums last December, I commented, “Making the albums hard to find gives an illusion of privacy and only delays any rude awakenings that may come from users who have inadvertently shared private photos.”

Now it may seem that Facebook users will finally understand the ramifications of default privacy settings. But the new system will probably be fairly subtle at first. Some users will find it creepy to be greeted on other sites by name, but such information will probably appear in a distinct, Facebook-labeled box (i.e., a Facebook Widget) to let a user know where the content comes from and make it still seem somewhat separate from the rest of the site. On the backend, though, the site will have access to the user’s public data.

What users may not realize is how much data they’re already sharing. This new style of Facebook Connect actually mirrors the behavior of Facebook itself. When you visit a Facebook application for the first time, it automatically knows who you are and can access your public data. (Correction: This only occurs in certain circumstances; more information here.) When you then click “Allow” to authorize the app, you give it access to all of your private data. Currently, an external web site knows nothing about you until you click “Connect.” If you do click, it has the same access to your private data as an authorized application. Now, Facebook is letting sites initially act like new applications by giving them access to your public data prior to full authorization.

In discussing the Facebook Platform, Anil Dash gave this analogy: “Think of the web, of the Internet itself, as water. Proprietary platforms based on the web are ice cubes. They can, for a time, suspend themselves above the web at large. But over time, they only ever melt into the water.” Depending on your perspective, either Facebook is finally melting into the water or the Web turned out to be the ice cube. With an automatic Connect system and the Open Graph API, Facebook is expanding its Platform to the rest of the Web. The only major difference between a Facebook-enabled web site and an actual Facebook application may soon be the URI.

You can start to get a sense of how this expansion may look by reading proposed changes to the service’s governing documents (see Inside Facebook’s excellent analysis):

We may also make information about the location of your computer or access device and your age available to applications and websites in order to help them implement appropriate security measures and control the distribution of age-appropriate content.

Currently, many sites hosting pornographic content will ask visitors to click a link verifying they are at least 18 or 21 before loading the material. With Facebook, the site could simply check your profile information first. Media companies worry about visitors accessing content outside of a given country; perhaps soon they can use your Facebook information to check your location.

Granted, providing fake details on your Facebook could easily foil some of these checks, but in many cases, that’s hardly different from lying about your age when you click or using a routing service to mask your location. Also, since if interact with friends on Facebook, you have a greater incentive to keep some information accurate. Facebook also reserves the right to terminate your account if you provide false profile information (despite also suggesting this strategy as a protection against identity theft).

My point is not to suggest that porn sites will soon be on Facebook’s “pre-approved” list or that Hulu would trust your profile over geographic IP data. I simply give these hypothetical scenarios to illustrate a larger trend: for better or for worse, your Facebook profile is becoming a virtual ID card.

Adding an identity layer to the Internet is not a new idea, but this may be the first time a system finds widespread adoption. Yet the Facebook identity model conflicts with many visions of how online identity should operate. “Open Stack” technologies, such as OpenID and OAuth, allow for federated setups. One of the first “Laws of Identity” by Kim Cameron states, “Digital identity systems must only reveal information identifying a user with the user’s consent.” Much of the consent in Facebook’s system comes from accepting the site’s terms at sign-up; many users will likely think that an opt-out Connect model violates Cameron’s principle.

And ultimately, user perception will be key to Facebook finding acceptance of its new endeavor. As social media researcher danah boyd discussed in her SXSW keynote, services with nothing technologically wrong can still disrupt social expectations (e.g. Google Buzz). (I rank the entire talk as must-read material for anyone working in the social networking space, but I’m only focusing on a few points here.) She also made a noteworthy distinction that I think will come up often as Facebook evolves:

Keep in mind that people don’t always make material publicly accessible because they want the world to see it….

Just because something is publicly accessible does not mean that people want it to be publicized. Making something that is public more public is a violation of privacy.

I think this distinction will be severely tested as the availability of Facebook data increases. I don’t dispute boyd’s evaluation, but coming from the perspective of security research, I know that when data becomes publicly available, it’s only a matter of time before it gets publicized in some way. With the wealth of information stored on Facebook’s servers, the site is becoming a favorite of both advertisers and attackers. Already we’ve seen hacks and tricks that make public Facebook data more public (see above), and each new site that integrates with Facebook is a new attack surface.

I’ve been cussed out by visitors to my site who think that by publishing weaknesses in the Facebook Platform or exposing seemingly hidden content I’m assisting those who maliciously hack people’s profiles. But much of what I post attempts to raise awareness of potential privacy and security issues before they get exploited by black hats. I can guarantee you I’m not the only one looking for Facebook weaknesses.

And that’s part of what concerns me about boyd’s distinction. The same technology that makes content “public” makes it easy to aggregate and publicize. For example, Pete Warden recently announced that he had built a dataset of 215 million Facebook profiles that he planned to publish for research purposes. Facebook eventually threatened to sue, prompting him to destroy the data, but no technology stands in the way of someone else recreating the dataset for their own purposes. In fact, with Facebook’s auto-connect system and the possibility of lighter rules for data storage, web sites may soon inadvertently recreate the dataset.

I honestly don’t think that Facebook is evil or that they care nothing about user privacy. Their new identity layer will likely bring benefits to many users and provide sites with valuable features. But just as Facebook became successful through providing users with a more private experience, the Internet became successful in large part because of its anonymity. While many users are happy with their personal Facebook account being a place “where everyone knows your name,” many users also value the rest of the Internet not knowing if they’re a dog. And as danah boyd put it so well, “No matter how many times a privileged straight white male technology executive pronounces the death of privacy, Privacy Is Not Dead.”

Security pros use layered techniques, but so do attackers

For many years security professionals have advocated using layered safeguards to reduce the risk of threats. While many organizations do employ multiple technologies like firewalls, anti-virus and intrusion detection to try to stop hackers, these guys are getting very good at navigating our layers of security. It’s like the old Mario and Donkey Kong video games where you had to jump over land mines, climb ladders, wait for doors to open and avoid swinging obstacles to reach the bonus prizes.

As an example of how many layers they are able to traverse, consider the reported attack on a financial institution’s enterprise network, which started life as a hacked Facebook account. (Click HERE for the full story.)

To make a long story short the attackers did the following:

  1. They captured the Facebook credentials of an individual who worked for a financial institution
  2. They then scanned the user’s Facebook profile to find recent social events involving co-workers on Facebook (finding a company picnic)
  3. They then sent emails to multiple Facebook friends who were co-workers saying, “Hey, have a look at the pictures I took at the company picnic!”
  4. The emails contained links to malicious web pages that attempted to launch a keylogger on the victims’ computers.
  5. They then scanned the keystrokes of an employee whose laptop had become infected with the keylogger and found the authentication credentials for the corporate VPN
  6. They infiltrated the VPN and infected a computer inside the corporate perimeter and performed vulnerability scans around the network to find servers with sensitive information on them.

The attack lasted as long as 2 weeks. If the attackers’ vulnerability scans had not been so “noisy”, they may not have been noticed, and the company could have suffered severe losses in terms of costly data breaches and corrupted databases, as well as system repairs.

So, what will happen now? Will the company add another layer of security to prevent a similar attack in the future? Probably… and these attackers will probably move on to other organizations with a bit less security. The cat and mouse game continues.

What’s interesting in this story is that the initial attack on the employees’ Facebook friends is pretty hard to defend against, since nothing seemed out of the ordinary. There really was a corporate picnic!

What would you do next if you were a security manager at this financial institution?

Social Media Security Podcast 11 – Google Buzz, Geostalking, Twitter’s Phishing Filter

This is the 11th episode of the Social Media Security Podcast recorded March 15, 2010.  Sorry for the delay on releasing this!  We should be back on our biweekly schedule soon.  This episode was hosted by Tom Eston and Scott Wright.  Below are the show notes, links to articles and news mentioned in the podcast:

Please send any show feedback to feedback [aT] socialmediasecurity.com or comment below.  You can also call our voice mail box at 1-613-693-0997 if you have a question for our Q&A section on the next episode.  You can also subscribe to the podcast in iTunes. Thanks for listening!

New Trick to View Hidden Facebook Photos and Tabs

Last December, I posted a bit of JavaScript known as a bookmarklet that allowed you to see photo albums for any Facebook user if the album privacy settings allowed it. This highlighted an example of “security through obscurity,” since the lack of links to photos on most profiles seemed to indicate no photos could be viewed. The trick worked as advertised, though it only displayed a few albums for those who had many.

The code came from my own experiments on accessing the hidden photos. It worked quite manually, retrieving data from a particular Facebook interface and stuffing it into the current page. I figured a more elegant solution could be found by re-using the code already embedded in the page, but I had not been able to sort out all of the built-in functions.

Last night and this morning, I found what I’d been missing before, and I now present a far simpler version that gives full access to all available albums of a given user. Simply bookmark this link (right-click and choose to add a bookmark) and click the bookmark when viewing someone’s profile on Facebook.

Once again, please note that this does not in any way circumvent a user’s privacy settings. If you mark your albums as visible only to your friends, this trick will not override that setting. I do not currently know of a way to access private photo albums, and if I did find one, I would report it to Facebook. My purpose in posting this code is to prove a point, not break into users’ accounts.

Here is the new source code:

javascript:(function(){CSS.removeClass(document.body, 'profile_two_columns');tab_controller.changePage("photos");})()

As I said, much simpler! I only had to find the right commands.

But the story doesn’t end there. This new method can be very easily adapted to load other information from a user’s profile, and the new possibilities raise more privacy ramifications. Once again, the trick does not actually override any settings, but it may break some user expectations and highlight the importance of overlooked or unknown settings.

The new behavior is that once can use similar code to access the canvas pages of applications the user has interacted with, as if the user had added the application as a tab on their profile. This includes the “Boxes” tab for users who have it. From what I understand, visibility of this tab page comes from the “Privacy” box under “Edit Settings” next to each application listed in a user’s Application Settings. Such controls have often been overlooked, particularly because they may not have seemed very relevant in the past. While many users stay aware of the privacy settings on their photos and wall posts, they may not think about the content they generate in the context of applications. Often, that content has little if any privacy controls applied.

Typically, any information available on an application tab is also available through the application itself, but this technique makes it far easier to find. However, it also raises some disturbing possibilities related to application data retention, and issue I’ve noted in the past but not seen discussed much elsewhere. For example, quite a while ago (as in months to years), I used the Pieces of Flair application with my personal Facebook account, arranging various buttons on my virtual corkboard. Eventually I pared down the number of applications I had authorized, and Pieces of Flair was one I uninstalled a number of months ago. Today, however, if you use the sort of bookmarklet posted above to check my Facebook profile for a Pieces of Flair tab page, you’ll see all my virtual buttons once again.

Facebook does notify applications when a user uninstalls them, but it’s up to the developer to actually do something about the data left behind. Apparently Pieces of Flair does nothing with the data, meaning a user has to manually delete their flair before removing the application if they want to truly get rid of the content they generated. Based on my experience, many applications behave in a similar fashion. Some may argue that this behavior is similar to Facebook “deactivating” an account, but at what point should the content expire, and how many applications offer a full deletion? Such issues become matters of retention policies, and based on my past studies of whether applications even had a privacy policy, I would guess that most applications do not currently have such terms.

All of this once again highlights the current complexity of data and privacy on the Facebook Platform. Granted, dealing with third-party applications is not a simple problem to solve, and I’m not simply criticizing Facebook for failing to build a perfect system. But these issues can very easily lead to unpleasant surprises for end users, and at some point someone will have to sort them out.

1 9 10 11 12 13 35