Facebook’s Open Graph Still Faces Semantic Web Hurdles

Geek level: Fairly technical. Aimed at web developers and security researchers.

In the wake of last week’s Facebook announcements, people have begun dissecting more of the technical details involved and adding various critiques. One point of discussion has been Facebook’s use of the buzzword “open,” with some observers feeling the description masks certain negative aspects of the new Open Graph.

But amid all the debate about openness, critics and supporters alike seem at times to inadvertently conflate three different (albeit related) technologies. First, the Open Graph Protocol defines a structure for website authors to provide certain bits of metadata (such as title, type, description, location, etc.) about their pages. Second, Facebook is expanding their “social graph” concept by building a database of connections among people, brands, groups, etc. The label “Open Graph” has been variously applied to this new map. Finally, the social networking site has introduced new methods for accessing these stored connections as part of their Graph API.

From a technical perspective, each of these offer great potential. But as they are currently being implemented, they still face difficulties that may hinder Facebook’s vision of the Semantic Web. In fact, while Facebook may have brought certain Semantic Web ideas to a more mainstream audience, they have not addressed some of the issues that have stymied advocates of similar technologies – including criticisms found in Cory Doctorow’s famous “Metacrap” essay from 2001. But first, I think it worthwhile to explore some of the details of Facebook’s three new components.

According to the spec’s website, the Open Graph Protocol is an RDFa vocabulary created by Facebook, though “inspired by” a few other related specs. Four properties are required for every OGP-enabled page, providing a title, type, image, and canonical URI. Optional fields include a description, a site name, location data, certain product codes, and contact information. Since OGP uses RDFa, each of these properties are specified via “meta” tags in the page’s “head” element.

Anyone is free to implement OGP in their pages or consume it with their services, as the technology is published under the Open Web Foundation Agreement 0.9. In that sense, the spec is certainly “open,” though some seem disappointed that the label is applied to a vocabulary apparently developed privately by one company without feedback from others. While Facebook does note already published standards they drew on for inspiration, OGP at times seems to be reinventing the wheel a bit. (Update: One reader pointed out to me that Facebook’s approach uses RDFa to specify data in a separate namespace, so my criticism may have been unjustified.) For instance, the HTML spec has always included a way to specify a page’s description via a “meta” tag – a feature many abused in the past to improve search rankings.

Facebook will not be immune to such abuse in their new namespace for metadata. Doctorow’s first problem with “meta-utopia” was that people lie. In my testing thus far, the OGP properties of title, canonical URI, and site name are essentially arbitrary. This means that not only can page authors add “like” buttons for other pages, they can add false metadata that produces deceptive feed stories. For instance, a feed story may say that a user “liked The Rock on IMDb” when the story links actually point to a malware host. If Facebook wants to build a semantic search engine, they will still have to deal with old black hat SEO tricks.

In addition to OGP properties, Facebook checks pages for an “fb:admins” parameter that sets which Facebook users can administer analytics and information for a given website. Since the site requires no further authentication, I find it a bit disconcerting that a simple XSS hole could provide an attacker with access to so much power for a site that heavily integrates with Facebook. I was glad to see that redirection techniques or spoofed metadata did not enable cross-domain application of “fb:admins”, but I’m still unsure of how some cross-domain (or cross-subdomain) issues will factor in to Facebook’s graph technologies.

Ironically enough, Facebook has yet to add OGP metadata to their own pages, and the new “like” button will not work for pages on facebook.com domains.

While the OGP can help authors describe individual pages, it does not include any way of establishing links between pages. That’s where Facebook’s ambitions become perhaps a little less “open.” The Open Graph of connections between Facebook profiles and OGP-enabled pages is housed on Facebook’s servers. The company does offer many simple ways for other applications to add or access edges of the graph, including the new Graph API. But Facebook is the gatekeeper, and some fear what that control could produce. Also, while Facebook has updated their privacy policy to reflect recent feature changes, their terms of service still include a clause about accessing data using “automated means.” Consequently, I’m still not entirely certain how much of the Open Graph can be automatically replicated.

Apart from concerns about control, however, the new Open Graph opens many possibilities by providing a set of links between pages and people with far more structure than the hyperlinks crawled by search engines today. But several factors may limit the possibilities. If sites do not implement OGP metadata in their pages (and that will include a significant percentage for the foreseeable future), Facebook has to infer data from the page. As already noted, data poisoning could become a significant factor. Maintaining a complex database will also require other types of maintenance, and currently the Open Graph can lead to issues of redundancy or caching of expired data.

If all website authors sought to protect their visitors and provide accurate, structured information on their pages, Facebook’s Open Graph would be a fairly certain success – but then again, it may not even be needed in that case. Meanwhile, since we have to take into account a range of problems and attacks when indexing online content, Facebook will still have to address basic problems encountered by past implementations of Semantic Web ideas. The company’s vision for mapping connections is ambitious, but plenty of work still remains.

Pros and Cons of Today’s Facebook Announcements

Earlier today, Facebook held a developer conference called f8 and took the opportunity to announce a number of new features that impact both developers and average users. I’ve assembled a non-exhaustive list of several important changes the company described, along with a summary of each change and a quick pro/con evaluation from my perspective. I’ll be looking at these and other new features in-depth over the next several days.

The Open Graph

While Facebook has often talked about how its users friend relationships form a “social graph,” the company is now focused on creating a broader “open graph.” This is essentially a map of connections between people, companies, products, websites, and so on. When you list your interests and tastes on your profile, you’re helping build this structured database of links.

Pros

  • In many ways, this idea echoes the vision of a “Semantic Web” that others have outlined in the past. In fact, World Wide Web creator Tim Berners-Lee has long called for building a similar structure.
  • Facebook’s implementation includes simple ways for sites to add usable information about them, and they’ve built a simple interface for accessing data on pieces in the graph.

Cons

  • While this graph may be “open” for contribution and access, it’s definitely controlled by Facebook alone. That setup has obvious business, political, and philosophical implications, but centralized administration of such a graph has technical trade-offs as well, such as dependence on a single point of failure.
  • Facebook’s new version of the Semantic Web still carries many of the same issues as older versions, such as major privacy concerns, data poisoning, and data inconsistencies.

Universal Social Experience

In today’s keynote, Facebook CEO Mark Zuckerberg often talked about the high-level goal of enabling social experiences for users across the entire web.  By combining the latest features Facebook offers, any site can bring identity and relationships into its own ecosystem.

Pros

  • Much of the information that you encounter on sites today is generic and requires that you spend time sorting or searching to make the site more relevant. With data from your part of the open graph, sites could customize and optimize in a way that’s tailor made for you, providing more relevant content right away.
  • This approach greatly reduces friction on other sites as well, since you won’t have to go through the tiresome process of setting up a new account, remembering another password, and trying to find people to connect with or useful content.

Cons

  • One person’s feature is another person’s privacy violation. However well-intentioned other sites may be, their “social experiences” can fail to recognize the value of anonymity or take into account a rightful degree of user control.
  • As others have pointed out previously, since this type of optimization often centers around your establish relationships, it can create an echo chamber effect and further isolate socioeconomic or ideological groups from each other.

Instant Personalization

This is the marketing term for a feature Facebook first earlier this year. The company has partnered with certain “pre-approved” websites that can now automatically identify a Facebook user at their first visit. The sites can also access what Facebook classifies as publicly available information.

Pros

  • This is a more specific example of Facebook’s vision for social experiences reducing friction. The feature is aptly named “instant,” as it basically sets up a user’s account on another site without any interaction, a behavior some may find very convenient.
  • From a privacy standpoint, Facebook has included a global opt-out under users’ application privacy settings, and clearly indicates when this sort of automatic authentication takes place with a banner at the top of the site.

Cons

  • The feature still raises a number of privacy concerns, and essentially repeats several of Google’s well-documented mistakes with the launch of Buzz. And while a full opt-out does exist, users are opted in by default. This personalization will likely be the source of many surprises and violated expectations.
  • Facebook controls who has access to the setup, and currently it’s not entirely clear how sites can become pre-approved or how much the program will expand in the future. The privacy controls also lack some clarity, as the opt-out does not cover information shared by friends who use instantly personalized sites.

Social Plugins

Any web site now has access to a range of simple tools that add Facebook features, such as “liking” a page and publishing approved stories to a user’s news feed. These widgets also replace some of the options previously offered to developers under Facebook Connect.

Pros

  • Facebook has built these plugins with ease of deployment in mind, and they drastically reduce the complexity of integrating with the service. Many developers will be pleased with the simplicity of these functions.
  • From a security perspective, Facebook’s approach also sets up a barrier between the external site and Facebook content the users sees. While the like buttons and friend pictures may seem to be simply part of the page, they actually reside in a separate data space from the rest of the page’s content until you choose to authorize access for the other site. This helps protect both the developer and you as a Facebook user.

Cons

  • In practice, the deceptive appearance just described may mislead many users into thinking that Facebook is exchanging far more data with other websites than they actually are. This will likely lead to some unwarranted panic.
  • These plugins do rely in many ways on developers providing accurate data, and it’s likely we’ll see these features abused by scam artists and distributors of malware. Currently, the plugins seem to lack certain authentications that may lead to unintended consequences.

OAuth 2.0

As part of a more streamlined development experience, Facebook has launched a technology called OAuth 2.0 for authenticating applications and websites. This replaces the proprietary model the site had been using and should once again simplify building Facebook-enhanced services.

Pros

  • This is a major validation for an open standard many companies have helped put together. Many developers will be encouraged to see Facebook choosing OAuth over a proprietary system.
  • As already mentioned, this is another way that Facebook has simplified application development. OAuth should reduce confusion over how other sites can access Facebook information.

Cons

  • While perhaps not a completely fair point, I’ll note that the use of OAuth does not diminish the threat of application-based attacks through vulnerabilities known as XSS and CSRF.
  • A number of other sites, such as Twitter, have used OAuth for some time, but this is a major roll-out of a very new version. We may see new security issues related to Facebook’s implementation.

Facebook Credits

At f8, Facebook expanded on their plans to offer a virtual currency system for application payments. Several applications are already using Facebook Credits, but we’ll likely see far more implementations in the near future.

Pros

  • Yet again, this system helps reduce friction. For developers, Facebook offers a simple way to include payments without having to worry about a number of implementation details.
  • Also, for users, virtual currency can reduce the hassle of worrying about issues such as international currency conversion.

Cons

  • Since Facebook is already facing widespread criticism over privacy issues, some users may hesitate to add credit card information to their Facebook profiles, even if it can only be accessed by Facebook.
  • This service makes Facebook a middleman in potentially millions of dollars of transactions, and could raise liability issues.

Granular Data Access

Though perhaps overlooked, Facebook made good on their promise to include more granular permissions when applications request user information. This feature comes in response to concerns raised by Canada’s Privacy Commissioner last fall. With the new setup, applications will have to individually request private profile fields when a user chooses to authorize.

Pros

  • This change will immediately provide more transparency and accountability, since users will see listed out exactly what fields an application will want access to when they authorize.
  • Many users may simply click through anyway, but the new system may raise awareness for many users who did not previously understand the range of information applications could access. Seeing a greedy list of data fields may give users pause.

Cons

  • Since announcing granular access last fall, Facebook has radically changed the definition of what constitutes “private” information. Consequently, many of the fields that might have been included in this setup are now considered “public” and thus generally outside access controls.
  • While commendable, this change may not lead to any substantial changes in practice. The model relies on developers limiting their requests, and many users will probably still want access to applications that ask for all information.

Persistent Data Storage

Until this week, applications and Facebook-enabled websites could not store most information accessed via the Facebook API beyond 24 hours. Now, Facebook has removed this time limit, meaning developers can save user data for as long as they want.

Pros

  • This change will significantly reduce overhead for both developers and Facebook, since applications will no longer have to exchange data with the service each day a user connects.
  • Users will likely see some performance gains from applications, since they can cache data locally rather than constantly checking with Facebook before rendering content.

Cons

  • Facebook applications will now be far more valuable targets for attackers. If a popular application suffers a database compromise, millions of users’ private information could be put at risk. Hacking Facebook directly tends to be difficult, but many applications lack the same level of security.
  • This increases opportunities for behavioral targeting and visitor tracking, since third-party developers will now be able to maintain complete archives of profile information.
Digg This  Reddit This  Stumble Now!  Buzz This  Vote on DZone  Share on Facebook  Bookmark this on Delicious  Kick It on DotNetKicks.com  Shout it  Share on LinkedIn  Bookmark this on Technorati  Post on Twitter  Google Buzz (aka. Google Reader)  

More Changes to Facebook Privacy, and More to Come

Yesterday, Facebook announced two new features: Community Pages and “connections” for certain profile information. The first combines some of the generic fan pages that have become popular over the last few months with Wikipedia articles to create a sort of social encyclopedia. I’m not entirely clear on what Facebook envisions with this feature, but it will be interesting to watch it develop.

The second feature, however, has attracted much more attention, and rightfully so. I’m again still sorting through details and have not yet seen the new connections in action, but certain parts are pretty clear. Facebook is replacing the manual lists in parts of the “info” tab on your profile to lists of fan pages you connect with. Along with the new setup, Facebook is changing the “Become a Fan” buttons to “Like” buttons. If you want to connect with a page for something you’re interested in, you now will simply “like” the page.

In a blog post, Facebook spun the connections as an exciting improvement: “Instead of just boring text, these connections are actually Pages, so your profile will become immediately more connected to the places, things and experiences that matter to you.” I can see three main reasons why Facebook would make this change, and none of them involve text being boring.

First, this helps software more easily process your interests. With textual lists, you may find titles such as these under a user’s favorite movies: “LOTR,” “Lord of the Rings,” “Lord.Of.The.Rings,” “***Lord of the Rings!***”, “i just LOVE lord of the rings so much,” etc. It’s obvious to a human that these all refer to the same trilogy of movies, but not to a computer. By essentially turning sections of your profile into database relationships, Facebook can take all of these disparate descriptions and replace them all with a link to an official Lord of the Rings page.

Second, the shift to “liking” reduces friction. The semantics may be subtle, but I’m sure Facebook has done research on this. “Liking” implies a simple, casual gesture (represented by the thumbs up icon), while “becoming a fan” or “subscribing” carries more of a commitment and desire for further interaction. I’m guessing users are far more likely to say they “like” something than “become a fan” of it, and Facebook wants users to connect and share as much as possible.

Third, this increases the useful data Facebook can offer to others. It’s likely that a large majority of Facebook’s users currently have privacy settings that only allow friends to see the “boring text” in their profiles. But since last fall’s privacy changes, connections to fan pages are now considered publicly available information. By taking the simple step of “liking” a page, users will add an easily processed connection that certain sites and applications will be able to access when visited.

Since the new setup has obvious privacy implications, Facebook added privacy controls, but unfortunately, they seem to also add further confusion. As Facebook notes, the new settings relate only to profile visibility: “You can control which friends are able to see connections listed on your profile, but you may still show up on Pages you’re connected to.” This is yet another example of Facebook making information appear to be private without actually making it private. As TechCrunch writer Jason Kincaid put it well, “In short, this section is about the data on Facebook that you can’t actually control. You can make it harder to find, and even hide it from your profile, but you can’t remove it entirely.”

Facebook stands to gain enormously from users embracing these new profile connections, and fan pages within Facebook are only the beginning. Tomorrow is f8, a developer conference hosted by Facebook, and the company will likely be introducing several new features and plans, such as adding location information to wall posts. Inside Facebook has an excellent round-up of what to expect. Several of these changes will likely have a significant impact on user privacy; I expect we’ll hear more detail about pre-approved Facebook Connect sites gaining automatic access to user data. Another item of interest will be the Open Graph API, which takes the “liking” behavior described above and extends it to any website.

That means that rather than simply say you’re a fan of Social Hacking, for instance, you could potentially “like” theharmonyguy.com. In other words, you could create a connection between your profile and a given URI (website address). That opens up many new possibilities, but once again adds significant information to your public profile.

As I said, certain details are still not clear to me; for instance, Facebook seems to have backtracked on whether your list of friends is publicly available information, and says that fan page connections will not be public for minors. I’ll certainly be watching to see what Facebook announces tomorrow, and will likely have much more to say about it in the next week or so. (In fact, I’ve been holding off on a few posts until I see how the f8 announcements will impact the issues they deal with.) I should also have shorter, quicker updates throughout the day tomorrow on my Twitter feed.

Facebook Platform Vulnerability Enabled Silent Data Harvesting

A few weeks ago, I sent Facebook a demonstration of what appeared to be a previously unknown attack combining two behaviors of the Facebook Platform. The technique allowed one to create a seemingly innocent web page that would invisibly and silently steal a visitor’s private Facebook content. Facebook has now disabled the attack by modifying one of the exploited behaviors.

It’s unlikely that any real-world attacks used this particular vulnerability, and I certainly have no record of such a case. But it’s also unclear how long the problem has existed. I discovered one part of the technique, a “return_session” parameter for application authorization, while examining the behavior of the Yahoo! contact importer, which only launched a month ago. However, discussions on Facebook’s developer forum mention the parameter in the context of Facebook Connect implementations as far back as February 2009. The other main component, now modified by Facebook, may have existed since the beginning of the Platform in 2007.

In my proof-of-concept demonstration, I loaded a harmless-looking web page on a server external to Facebook. The page included code for an inline frame sized to be invisible to the user. This frame then loaded the login page for a Facebook application. If the user has already authorized an application, its login page will automatically forward to the application, and that’s exactly what I wanted to happen. I chose FarmVille for my demo, since it has a wide install base. Keep in mind that while FarmVille currently lists about 83 million monthly active users, the attack would have worked for anyone who has authorized the application, regardless of how long ago. The attack could also target multiple applications at once using multiple iframes, meaning nearly any of Facebook’s 400 million active users could have fallen prey.

But the first main component of the attack involved a slight modification to the login page URI. By adding a “next” parameter, one can specify an alternate landing page for authorized users. Not all applications take advantage of this parameter, but many do. The parameter would not work for an arbitrary site, but Facebook previously did allow any URI that began with apps.facebook.com. Thus one could craft a login page URI that checked whether the user had authorized one application and then forward the user to a second application.

The next part of the attack came from adding “return_session=1″ to the login page URI. This parameter causes Facebook to append particular session variables for the authorized application onto the URI of the landing page – in our case, the second application given by the “next” parameter. That application merely has to check its address for the session data, which provides enough information to execute API requests using the credentials of the already authorized application. Since an authorized application essentially operates on behalf of a user, it has access to nearly all private profile information (essentially, everything but your e-mail address and phone number) and content (photos, links, notes, etc.) that can be loaded via the API, and hence the second application had such access as well. This entire process could be fully automated without any user interaction and did not require any authorization for the second application. Also, the attack could generally be executed quick enough to avoid Facebook’s measures for detecting when their pages are loaded in frames.

To patch the attack, Facebook has restricted the “next” parameter; it now only forwards to addresses for the application specified on the login page, preventing any appended session data from reaching the wrong destination. Since an authorized application already has API access, using return_session with that application will not add any new privileges.

I commend Facebook for responding quickly to this issue and for being open to white-hat security reports. But in my opinion, this vulnerability is simply the latest reminder that the Facebook Platform can open users to many problems quite separate from the security of Facebook itself. I personally think that aspects of the Platform’s implementation fail to match user expectations of privacy, as I’ve discussed previously. And while this particular problem may be solved, vulnerabilities in specific applications and the nature of application access continue to put private data at risk of unwanted disclosure.

Correction on Public Information Access by Facebook Applications

I don’t take my responsibility as a blogger lightly, and I realize that many readers look to this site for reliable information on privacy and security issues with social networking applications. Consequently, I strive to maintain high standards of accuracy and clarity in my posts. Over the last few years, I’ve set some personal rules for myself, such as reproducing a vulnerability before relaying it here. I would never want to mislead my readers or betray their trust.

However, I must issue an apology regarding what I view as a significant error that I discovered today while researching a new idea. In at least two recent posts, I misrepresented how much information Facebook applications are able to access without explicit authorization. My apologies to Facebook for overstating such access.

Previously, I’d stated that Facebook applications have access to your “publicly available information” and content marked accessible to “Everyone” prior to authorizing the application. In one case, I stated this could be used by a fan page tab to identify users without explicit authorization.

As it turns out, applications only have this automatic access in certain circumstances. According to Facebook’s documentation, such access only occurs when users arrive at an application page from certain Facebook channels and can be affected by strong privacy settings. I misunderstood this process and consequently applied in situations where it would not actually come into play.

As for fan pages, a tab apparently does not have automatic means of identifying a user and would need to request authentication to access such information.

It bothers no one more than me that I misled my readers on this point, and I will certainly strive all the more to avoid such an error in the future.

Dissecting a Typical Facebook Fan Page Scam

Update: I strive to maintain accuracy on my blog and spend time verifying issues before posting them. However, further investigation has led me to question whether my understanding of applications automatically accessing “publicly available information” is actually correct. I plan on doing more thorough research this weekend on such access and will update this post accordingly.

Update 2: See my full correction.

Original Post

I’ll admit, I was intrigued. Facebook informed me that a good friend had become a fan of page proclaiming that “94% of the people fall asleep immediately when seeing this picture”. That would be quite a picture. Who wouldn’t want to give it a shot? Over 270,000 people must have agreed, since that many people gave into the page’s demand that you become a fan before seeing the amazing photo. In the past I’ve simply ignored such scams, but this time, I did a bit of investigation and became intrigued once more.

I’ve come across many pages and applications that promise a tempting reward if you simply complete a few steps, which usually involve authorizing the app or becoming a fan of the page (I refuse to say “fanning the page”) and then inviting all of your friends to do the same. Rewards include tracking all visitors to your profile or getting a nice gift card. I would argue that it doesn’t take much evaluation to figure out why such scams are bogus, but untold Facebook users fall prey to them daily. Next time you’re tempted by a Facebook free lunch, remember that authorizing an application grants the developer access to all of your private info. Becoming a fan isn’t quite as drastic, but as you may have discovered, that’s rarely the last step in such offers.

Let’s get back to the hypnotizing pic. When you first load the page, it opens a tab tantalizingly entitled “THE PICTURE”. Ah, but before the powerful picture loads, you have to complete “two simple steps.” First, become a fan. But you have to click the button at the top – if you click the representation of it in the instructions, a dialog pops up saying you have to use the top button “to get access to the scantron hack.” Come again? Oh and the picture in that dialog is for another fan page entitled “How to Change Your Profile Layout.”

Anyway, become a fan and you’ll see step two: “Suggest this page to your friends.” Again, clicking the instructions brings up a dialog emphasizing you must invite at least 40 friends “to bypass the human verification gateway” (sounds high-tech). The picture this time is for some fan page involving “hot” girls. If you click step 3 (see the picture!) without inviting your friends first, you instead encounter the dreaded human verification gateway.

Of course, if you did annoy 40 friends first, I’m pretty sure you’d still see the gateway, which ironically offers for you to take a survey entitled “How DUMB are YOU?” As with so many similar pages, this page is entirely fake. First clue: the page has all wall posts (Correction: wall posts are hidden by default, but not disabled), reviews, and discussions disabled, so nowhere can “fans” actually share whether the trick worked or not.

Oh wait, “THE PICTURE” tab does include a comment box with testimonials from a few fans. However, if you actually click some of the profile links, you’ll find that the names don’t always match up. If you try adding your own comment, I can assure you from scanning network traffic that your feedback is not recorded. The comment box is simply a bit of static code made to look legitimate.

In fact, I assumed “THE PICTURE” tab was using the Static FBML application to load its contents. But the tab actually loads a special application called “sleeps” (whose URI includes the string “heyhaha”). What does “sleeps” do? It displays the page you see on “THE PICTURE” tab. Why bother with a custom app simply to load static code? When you visit an application, it has access to your “publicly available information” (for new readers, that includes your name, networks, friends list, location, content marked available to “Everyone,” pages you’re a fan of, etc.) without you ever clicking a button or granting specific permission. While only Facebook could say for certain, I’m guessing that “sleeps” takes advantage of this access and takes note of everyone who stops by. (See update at the top of this post.)

Applications have to get their code from somewhere besides Facebook, though, and “sleeps” loads it from the charmingly-named web site “www.drysnuff.info”. By examining the full source code of the page, we can see exactly what happens when you click on fateful step 3. The page loads an inline frame that links to a file on drysnuff.info called cpa.php.

As I’ve looked at various scams and attack over the last year or so, I’ve often encountered a particular type of trick that involves a CPAlead gateway. I have no idea what the motives are of the people behind CPAlead or how trustworthy their company is, but I can attest that CPAlead gateways are constantly exploited by untrustworthy people who are looking to make a quick buck. Our sleep-inducing fan page is no exception: that “human verification gateway” is simply another CPAlead setup.

The gateway asks you to complete a survey, which loads in a separate window. Once you’ve finished the “offer,” the gateway gets confirmation and grants you access to whatever it’s hiding. But finishing the survey will likely require you enter a mobile phone number, a very common online scam that will lead to plenty of unwanted charges on your next bill.

And I can save you the trouble – in this case, it’s not hard to discover what you would see once the gateway verified your humanity. If five racy images of “The sexiest girls from MAFIA WARS” make you fall asleep, then you’re one of the 94%. (Update: Apparently that’s another scam from the same people, and using the hypnotizing fan page may take you to a different destination – albeit still fake.)

I took the time to walk through this particular scam for two reasons. First, I find it fun to explore the code and figure out exactly what’s going on (CPAlead employs several obfuscation techniques in their JavaScript, for instance). Second, this story does have some important ramifications. At first, it may appear no different from many other online scams that pop up when a user clicks some flashy advertisement. As I said, I’ve encountered CPAlead many times before, and other sites have written at length about the dangers of offers that require your mobile phone number.

What makes this case different, however, is the Facebook integration. The scam artists behind this fan page quite literally know who their victims are. When you simply visit the page out of curiosity, the owners know you by name, along with a link to your profile and some basic information about you. This happens whether you fall for the offer scheme or not. (See update at the top of this post.)

Also, several clues in the fan page indicate that its owners run other pages with similar setups. Given the number of advertising-driven fake applications I’ve seen, it’s likely they have apps as well – and if you visit one of those apps, all of your private information can be connected to your profile. Facebook requires developers to destroy most of that data after 24 hours, but has no way of enforcing or verifying compliance with that rule. It’s entirely possible that the swindlers behind all these cons have built a sizable database of information on millions of Facebook users.

I’m not trying to simply spread FUD (fear, uncertainty, and doubt) here. I cannot definitively prove these claims, but I think they are quite realistic based on my history of investigating Facebook applications and news stories on various scams and rogue apps I’ve tracked. And even if this scenario has not happened yet, the determination of past online scammers and the ease of executing such a setup lead me to believe it’s only a matter of time.

Access Facebook Data Without Logging in to Facebook

(N.B.: This is not an April Fool’s joke.)

Programmer Pete Warden made headlines a few months ago after creating a dataset of public profile information from 210 million Facebook users. Warden gathered his data by crawling the public search pages of some users have enabled, and planned on releasing it to the public. But Facebook threatened legal action, prompting Warden to destroy the information rather than risk an expensive court battle.

While I’m sympathetic to the privacy implications that led some to criticize Warden’s planned release, I also think that exposing the data would be an effective way of awakening Facebook users to what’s possible with information now classified as public. And while Warden abided by Facebook’s demands, it’s only a matter of time before someone less compliant publishes a similar dataset. Besides, many search engines already have similar resources in their indexes.

I’ve previously demonstrated how much content is actually available for logged-in Facebook users through various techniques. But indexing all of that content would definitely violate Facebook’s terms of use. What about truly public data, though, that’s accessible even to anonymous Facebook visitors and search engines? How much information can be seen without logging in?

To answer that, I’ve created yet another bookmarklet, though this one is far more complex and will likely not yield many results for most user. This trick is more a proof of concept. If you’re trying to access private profile information, this tool will not help you.

The bookmarklet works by adding a bar of links to a public search page for a Facebook user. (Note that not all users allow a public search page to appear for their profile.) These links attempt to load public content for several of Facebook’s standard applications, including the user’s “Boxes” tab. In order to see anything, the user must at minimum (1) set the visibility of the given application to “everyone,” and (2) create content within the application marked as visible to “everyone.” Even then, you may not get any results – I’ve found that the photos application seems to only display a user’s “Profile Pictures” album if it is set to public.

To see the trick in action, Mark Zuckerberg, Robert Scoble, or Louis Gray.

Feedback and questions are welcome (theharmonyguy@gmail.com or comment below), but please note I publish this bookmarklet as a convenience and will likely not provide detailed technical support.

Update (April 12): A reader pointed out to me that the bookmarklet was not working on public search pages for users who do not have vanity URIs. I’ve now updated the code to work regardless of the URI format.

Facebook Allowed Automatic Data Sharing Last November

Proposed changes to Facebook’s governing documents would allow the service to automatically share certain data when users visit third-party web sites, a move drawing widespread criticism and concern. However, I took another look at changes Facebook made last year, and from what I read, the sort of behavior people are worried about is already allowed. Facebook’s current privacy policy was last revised December 9, 2009, but all of the sections referenced in this post were added on November 19, 2009.

First, let’s recap what Facebook considers publicly available information:

Certain categories of information such as your name, profile photo, list of friends and pages you are a fan of, gender, geographic region, and networks you belong to are considered publicly available, and therefore do not have privacy settings. You can limit the ability of others to find this information on third party search engines through your search privacy settings.

This also applies to content marked “everyone,” though without the search engine exception:

Information set to “everyone” is publicly available information, may be accessed by everyone on the Internet (including people not logged into Facebook), is subject to indexing by third party search engines, may be associated with you outside of Facebook (such as when you visit other sites on the internet), and may be imported and exported by us and others without privacy limitations.

The policy goes on to discuss how this applies to “Facebook-enhanced” applications and websites, which are previously defined as applications using the Facebook Platform or sites using Facebook Connect (and also notes earlier that “in order to personalize the process of connecting, we may receive a limited amount of information even before you authorize the application or website”). Here’s the relevant section, with my emphasis added:

As mentioned above, we do not own or operate Facebook-enhanced applications or websites. That means that when you visit Facebook-enhanced applications and websites you are making your Facebook information available to someone other than Facebook. To help those applications and sites operate, they receive publicly available information automatically when you visit them, and additional information when you formally authorize or connect your Facebook account with them.

In other words, the current Facebook privacy policy already allows your “publicly available information,” which includes your name, gender, geographic region, friends list, fan pages, and your content marked “everyone,” to be automatically shared with external web sites when you visit them. The only thing apparently preventing this from happening right now is technology – Facebook has not yet rolled out an official means for Facebook Connect sites to automatically access such data. Apparently they soon plan on adding that technology for certain “pre-approved” sites, an update which the newer governing documents make more explicit.

1 2 3 4 5 15