= What is Fingerprinting? = [[PageOutline]] This page describes the mechanisms WebKit offers to ports and clients interested in reducing the ability of websites to identify users and track their behaviour without obtaining consent. Use cases for fingerprinting include: * Sites attempting to identify users on devices previously used for fraud * Sites attempting to establish a unique visitor count * Advertising networks attempting to establish a unique click-through count * Advertising networks attempting to profile users to increase ad relevance * Sites attempting to profile the behaviour of unregistered users * Sites attempting to link the visits of users when they are both registered and unregistered and identify the user when visiting the site without authenticating. In order to evade fingerprinting users will often disable/clear HTTP Cookies or change their IP address between visits. Research has demonstrated that these measures are not sufficient and that in order to evade tracking by third parties a wide variety of technologies implemented in the browser must be considered. The bugzilla entry for tracking progress against items in this page is: * https://bugs.webkit.org/show_bug.cgi?id=41801 == Entropy and Fingerprinting == Entropy defines the amount of uniqueness a specific property exposed by the browser (such as the User-Agent header) introduces into a browser fingerprint. It's usually expressed in bits. For example, Peter Eckersley's ['Panopticlick' study for the EFF http://panopticlick.eff.org/browser-uniqueness.pdf] finds that the User-Agent header provide 10.0 bits of entropy. Since 2^10^ == 1024 that means only 1 in 1024 random browsers visiting a site are expected to share the same user-agent header. == Why this isn't about 'Private Browsing' == There are a number of things WebCore could do differently to reduce the user's exposure to remote tracking mechanisms but which it is not necessarily desirable to implement as default behaviour. At first glance, some of these appear a good fit with 'Private Browsing' but really they're addressing a different problem. Private browsing's overriding objective is to ensure no trace of a browsing session is left on your disk. Fingerprinting a user's browser and tracking that browser's visits presents a different set of challenges. == Private Browsing in WebKit == WebKit defines private browsing as: {{{ // When this option is set, WebCore will avoid storing any record of browsing activity // that may persist on disk or remain displayed when the option is reset. // This option does not affect the storage of such information in RAM. // The following functions respect this setting: // - HTML5/DOM Storage // - Icon Database // - Console Messages // - Cache // - Application Cache // - Back/Forward Page History // - Page Search Results }}} Some of the items discussed in this page are already taken care of by private browsing in webkit - so if you are trying to defend your webkit browser against fingerprinting you will at the very least need to enable private browsing through your WebKit port's API. == Private Browsing in WebKit and your browser == Be aware that not all aspects of 'private browsing' are implemented within WebKit. WebKit ports offer different APIs to clients. Some functionality relevant to mitigating browser fingerprinting is left entirely to the client. For example, QtWebKit does not implement a HTTP Cookie store or Page cache: the client is responsible for managing both. You need to pay close attention to the limitations and capabilities of your chosen port's WebKit API - in particular the aspects of cookie and cache management you are expected to implement yourself. == Things that aren't 'Private Browsing' but are 'Fingerprinting' == === 1. Session Isolation === #SessionIsolation If you have implemented an anti-fingerprinting mode you don't want a website to access information from the browser's normal mode - doing so might reveal information that allows the site to fingerprint your user or browser. This is not a property of private browsing - which is only interested in preventing data from the private session leaving any traces on the user's disk. === 2. Session Persistence === An accidental property of private browsing is that HTTP Cookies and page caches generated in one private browsing session are not available in any subsequent or simultaneous browsing sessions because they are not stored to disk. This property of non-persistence is an important counter-measure against fingerprinting, however where sites use a browser feature as a side channel for simulating the behaviour of HTTP cookies you may not be able to rely on private browsing to take care of this for you. === 3. Long-Running Sessions === At the moment private browsing happens to take care of some aspects of session isolation and persistence for you. But a browser session that is never closed, on a machine that is never turned off, will become trackable over time. This means you will need to decide on a way of managing long-running browser sessions and cannot rely on any side-effect benefits from private browsing in webkit. The obvious way of managing this problem is to implement a periodic cleardown of cookies and page cache, as well as any other sidechannel-cookies such as 'window.name'. == A 'Tracking-Resistant Mode' vs 'A Tracking-Resistant Browser' == If you implement a tracking-resistant mode which users can switch in and out of then you need to worry about SessionIsolation. If websites can read cookies and cache objects from your browser's 'normal' mode that will undo a lot of the work you have put into managing the user's fingerprint in tracking-resistant mode. This means you will have to ensure the browser maintains separate profiles for each mode and no information is shared between them. This is not a concern if you are impementing a browser that is always tracking-resistant, since you can purge cookies and cache objects without having to worry about the state of any other sessions maintained by the user. == Creating a Common Fingerprint == This page is premised on the notion that the best way to mitigate against browser tracking is to implement a fingerprint for your browser that all users of your browser will share. The larger your user base the more rewarding this policy becomes. Even with a relatively small userbase this approach is still useful as long as you can close off as many sources of client entropy as possible. A browser with a small userbase is inherently more vulnerable to tracking if it leaks entropy than a browser with a large userbase suffering from a similar problem. == Creating a Dynamic Fingerprint == Not much attention has been paid to approaches that suggest creating a constantly mutating browser fingerprint. Common sense suggests it would be hard to implement and even harder to get right. = Creating a Static Fingerprint for your WebKit Browser = == 1. Javascript Objects == Javascript exposes a lot of entropy-rich information to websites through the Screen object, the Window object, the Navigator object, the Document object, and even the Date Object. WebKit-based browsers need to decide on a strategy for reducing the variety of information these objects leak to websites. The most common approach is to decide on a pre-determined set of values that the browser will always use in fingerprint-resistant mode - this allows users of the browser in that mode to be indisinguishable from each other based on information from the JS objects alone. If you are building a tracking-resistant mode for your WebKit-based browser you will need to consider managing at least the following values: === i. Document Object === The document.referrer property needs to be managed in the same way as [#RefererHeader Referer Header and Origin Header] below. === ii. Window.History Object === [https://developer.mozilla.org/en/DOM/window.history history.length] has the potential, in cases where it is unusually high due to prolonged use of a single browser/tab session, to assist sites in tracking the user. In the case of long-running sessions you may need to update the value periodically so that it does not become revealing. See https://bugs.webkit.org/show_bug.cgi?id=55965 === iii. Window object === You have two choices here: - restrict the layout of the actual browser window to three or four predetermined-sizes, and return those. - return values that do not reflect the real size of the browser window. The properties you need to override are at least: {{{ outerHeight() outerWidth() innerHeight() innerWidth() screenX() screenY() scrollX() scrollY() }}} === iv. Window.name === This deserves a special mention. Window.name is cross-domain and persists across page loads. Not surprisingly, many sites use it as a cookie. Since it does not persist across sessions it's not as much of a problem as a HTTP cookie but it does allow tracking within sessions - and this is a worry if the session is long-running. There is room here for WebKit to restrict cross-domain access to window.name but you are probably better off managing the value in this field yourself with the JS API. The chances are you will break the user's experience on at least some websites. === v. Screen object === [http://www.torproject.org/torbutton/index.html.en Torbutton] and [https://github.com/mwenge/torora Torora] use the following values for the Screen object's properties: {{{ height() = window innerHeight() width() = width() rounded to the nearest 50 px colorDepth() = 24 pixelDepth() = 24 availLeft() = 0 availTop() = 0 availHeight() = window innerHeight() availWidth() = window innerWidth() }}} Entropy for the values in the Screen object can be as much as 4.83 bits. === vi. Navigator Object === You will need to look closely at all the values exposed by the Navigator object and decide on a set of values that can remain static across many releases as long as possible. You will also need to ensure that the values decided upon here are also presented in the [#UserAgentHeader user-agent HTTP header] by your browser. You will need to decide what to do with navigator.plugins. Internet Explorer does not return anything to navigator.plugins, so websites tend not to rely on it - making it a relatively safe option to follow IE's suit. If you do expose a list of plugins through this property you will need to ensure your decision is consistent with the behaviour you have implemented under [#InstalledPlugins Plugins] below. === vii. Date Object === ==== a. Timezone ==== You need to decide what timezone your browser is in when in fingerprinting-resistance mode - a common choice is UTC. Rather than attempt to override the Date object you should set the local timezone to UTC by setting the environment variable TZ as follows: {{{ setenv("TZ",":UTC",1); }}} This will ensure that WebKit always returns the local time as UTC when the Date object is queried. ==== b. Timing Users ==== Believe it or not, there is evidence that some companies have tracked users based on the typing cadence gleaned from querying the Date object: http://arstechnica.com/tech-policy/news/2010/02/firm-uses-typing-cadence-to-finger-unauthorized-users.ars Currently, WebKit does not offer any means of countering this. === viii. Language Object === You need to decide on a consistent set of values for your browser in fingeprinting-resistance mode. For example: {{{ charset = 'iso-8859-1,*,utf-8' language = 'en-us, en' locale = 'en-US' }}} Your decision should be consistent with your implementation of the Accept-Language header. == 2. Overriding Javascript Objects == WebCore does allow WebKit ports to overload the values of JS objects. Most WebKit ports expose this API to browser clients. === i. Overriding Javascript Objects - Qt === You can use the function below to overload existing JS objects with a Qt object. Any functions called on the JS object will call the equivalently named function in your Qt object: {{{ void QWebFrame::addToJavaScriptWindowObject ( const QString & name, QObject *object ) }}} For an example of this in practice see: * https://github.com/mwenge/torora/commit/5191dca4d5df08514f21e68472cacb6cacd2eb06 In order to override specific properties of a JS object in Qt see: * https://bugs.webkit.org/show_bug.cgi?id=46566 == 3. Form Auto-Filling == Javascript can inspect the contents of form fields at any time so auto-completing forms with cached values should be avoided. At the very least you will want to ensure that values cached from the normal browsing mode are not used when in tracking-resistant mode. The safest bet is to disable auto-completion altogether. == 4. CSS == === i. CSS Media Queries === The same information that can be collected from the Screen and Window object can also be collected via CSS Media Queries. WebKit currently does not offer a means of countering this. * https://bugs.webkit.org/show_bug.cgi?id=50895 === ii. CSS Fonts === CSS rules may be used to inspect locally available fonts. A working example of this 'font introspection' using simple CSS rules can be found at http://flippingtypical.com. WebKit currently does not offer a means of countering this. There needs to be a mechanism for allowing WebKit clients to decide which fonts are locally available when CSS rules are evaluated. See also [#Fonts Fonts] below. === iii. Querying Page History with CSS === There is a well-known attack on the CSS 'visited:' rule that allows a CSS stylesheet to determine by brute-force which websites the user has visited. A good examples of this in action is available at: * http://ha.ckers.org/weird/CSS-history.cgi This method can inspect user history across sessions and is deployed by live websites. WebKit has mitigated against this attack since the implementation of https://bugs.webkit.org/show_bug.cgi?id=24300, based on the approach outlined in http://dbaron.org/mozilla/visited-privacy. == 5. Plugins and Java Applets == If you want complete control over the information your browser reveals to websites then you can't let your browser run someone else's executable code. That means plugins and Java applets. That also means a pretty unusable browser by most people's lights. WebKit does not offer you very much in this category. There is no sandboxing of NPAPI plugins - all WebKit ports currently dlopen() the binary blob and let it run with the user's privileges. In the case of a malicious plugin this means relatively unfettered OS access. In the case of well-defined plugins such as Flash there is still scope for collecting a lot of information. === i. Using the List of Installed Plugins To Build Up A Fingerprint === #InstalledPlugins As well as isolating users who have an exotic set of installed plugins, a major source of entropy is found in the micro-version information provided by navigator.plugins. There is scope here for WebKit to limit the version information at the request of the client. Apart from querying navigator.plugins, a page can attempt to instantiate as many plugins as possible and inspect the page's layout to see what the user has installed. There are a few possible countermeasures here of varying efficacy, though none have been tried in practice: * Limit the number of plugins that can be loaded on a single page * Prevent zero-size plugins from loading * Disable plugins completely * Only load individual plugins at the user's request - e.g. 'Click to Flash' Note that blocking plugins can actually improve the quality of fingerprint: http://panopticlick.eff.org/browser-uniqueness.pdf finds that the uniqueness of flash-blocking browsers is over 1 in 400,000. === ii. Collecting System Fonts via Flash Plugins === The Adobe Flash API allows flash applications to obtain a list of all the fonts installed on a system in unsorted order. This is a very rich source of entropy for anyone trying to fingerprint a user. Worse, it is completely outside the control of WebKit and the client browser. Short of running your browser in a chroot'ed jail there is nothing you can do to prevent Flash inspecting your file system for the fonts folder and supplying the list of fonts installed there to swf objects that request it. This is a problem you will have to address at application-level for now. There are no open bugs for introducing plugin-sandboxing (i.e. intercepting plugin's system calls) in WebKit at the time of writing. === iii. Flash Cookies (Local Shared Objects) === You should read (it's short): http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1446862 From the paper: "Flash data is stored in a different folder on different computing platforms. For instance, on an Apple, Flash local shared objects (labeled .sol) are stored at: /users/[username]/Library/Preferences/Macromedia/Flash Player/ On a Windows computer, they are stored at: \Documents and Settings\[username]\Application Data\Macromedia \Flash Player Several subdirectories may reside at that location: “#SharedObjects” contains the actual Flash cookies and subdirectories under “Macromedia.com” contains persistent global and domain-specific settings for how the Flash player operates. As such, there will be a subdirectory for each Flash-enabled domain a user visits under the “Macromedia.com” settings folder. This has privacy implications .." So Flash LSOs are used by a number of well-known sites to regenerate or respawn HTTP cookies the user has already deleted. Fortunately, Adobe Flash has supported private browsing since Flash Player 10.1 and does not store flash cookies when in private browsing mode. == 6. SilverLight And ActiveX == TBC == 7. Fonts == #Fonts A site may render a page in a number of different fonts and then use getComputedSize() to determine which were rendered correctly. WebKit currently does not offer a means of countering this. There needs to be a mechanism for allowing WebKit clients to decide which fonts are locally available when the page is rendered. == 8. Cookies == You either have these (i) disabled completely, (ii) clear them every time a new browsing session starts, or (iii) clear them every N minutes/hours. Since Private Browsing already takes care of (ii), you may decide that (i) and (iii) are not worth the trouble. Whatever you decide, you need to be consistent with your implementation of the Page Cache. == 9. Third Party Cookies == If you are clearing all cookies periodically then third-party cookies are not something you need to worry about any more than their first-party cousin. The default behaviour of third-party cookies is the subject of a lot of interoperability issues between browsers. Firefox has a proposal in active development and there has been some good discussion in at least one WebKit bug. Most WebKit ports offer you the possibility of managing third-party cookies however you choose, and the default behaviour between WebKit ports often differs - Safari is the most restrictive as it does not allow 3rd parties to set new cookies, though they can update existing ones. * https://bugs.webkit.org/show_bug.cgi?id=35824 * https://wiki.mozilla.org/Thirdparty * https://bugzilla.mozilla.org/show_bug.cgi?id=565965 Qt does not offer a means of identifying third party cookies yet: * https://bugs.webkit.org/show_bug.cgi?id=45455 == 10. Page Cache == The [http://samy.pl/evercookie/ evercookie] is an excellent practical demonstration of how a website can inspect the browser's cache to determine if the user has visited the site before. So if your browser has disabled cookies completely or disables cookie storage across sessions or over long periods of time, you will need to treat the page cache in the same way. == 11. HTTP Headers == You will need to decide what to do with the Referer header, the Origin header, the Accept header, and the Accept-Language header. === Manipulating HTTP headers in QtWebKit === In QtWebKit you can manipulate HTTP headers by subclassing QNetworkAccessManager and reimplementing: {{{ QNetworkReply * QNetworkAccessManager::createRequest ( Operation op, const QNetworkRequest & req, QIODevice * outgoingData = 0 ) }}} You could then perform the following: {{{ if (req.hasRawHeader("Referer")) req.setRawHeader("Referer", "/"); if (req.hasRawHeader("Origin")) req.setRawHeader("Origin", "/"); }}} === i. User-Agent Header === #UserAgentHeader Whatever decision you make about the User-Agent header, be prepared to stick with the values you set initially for as long as possible. The simple reason for this is that every change to the user-agent will divide your userbase into those who have the old header and those who have the new one - creating new, unnecessary entropy each time. === ii. Referer Header and Origin Header === #RefererHeader Manipulating these headers isn't strictly a fingerprinting-resistance requirement, however they do leak information about the user's browsing history. Manipulating them can break website behaviour and may even get your browser blacklisted by certain sites. Possible countermeasures in the case of FireFox are discussed at: https://bugzilla.mozilla.org/show_bug.cgi?id=587523. The best suggestion there is to scrub the path but not the domain from the referrer header. * https://bugs.webkit.org/show_bug.cgi?id=51638 === iii. Accept-Language Header === This should be consistent with the value you choose to return from the Javascript Language object, e.g.'en-us'. === iv. Accept Header === The entropy provided by an Accept header will depend largely on the language and charsets your browser has decided to support or permit to the user while in fingerprinting-resistance mode. === v. HTTP ETags === [http://en.wikipedia.org/wiki/HTTP_ETag ETags] can be used as a [http://en.wikipedia.org/wiki/HTTP_ETag#Tracking_using_ETags substitute for HTTP cookies] and this use has been [http://samy.pl/evercookie/ demonstrated in practice]. WebKit supports ETags. Implementing the recommendations in Page Cache will mitigate against their use by preventing them from persisting across sessions and even long-running browser sessions if you're prepared to implement an aggressive cache-clearing policy. == 12. DOM LocalStorage/DOM SessionStorage/DOM GlobalStorage == Private browsing in WebKit denies read and write access to DOM storage since https://bugs.webkit.org/show_bug.cgi?id=49329. == 13. TLS/SSL Session IDs == #SessionIDs WebKit does not implement your TLS/SSL network connections, but if you are offering a tracking-resistant mode to users you will need to ensure that you keep a separate TLS session cache for tracking-resistant mode. You need to avoid the situation where a user can go to https://gmail.com in ordinary mode, open a window in tracking-resistant mode, go to https://gmail.com and use the same TLS Session ID from ordinary mode to resume that TLS session. For more information see: * http://code.google.com/p/chromium/issues/detail?id=30877 == 14. TLS/SSL Client Certificates == As with [#SessionIDs Session IDs], WebKit is not responsible for your SSL stack. But you will need to ensure that you keep a separate certificate store for use in tracking-resistant mode. See also: * http://code.google.com/p/chromium/issues/detail?id=47129 == 15. GeoLocation == You will, um, need to ensure you disable geolocation if it supported by your chosen WebKit port. = Further Reading = * http://panopticlick.eff.org/browser-uniqueness.pdf * https://www.torproject.org/torbutton/design/#FirefoxBugs * http://browserspy.dk * http://blog.torproject.org/blog/firefox-private-browsing-mode-torbutton-and-fingerprinting * http://www.collinjackson.com/research/private-browsing.pdf * https://wiki.mozilla.org/Security/Anonymous_Browsing * https://wiki.mozilla.org/Security/Fingerprinting * http://samy.pl/evercookie/ * http://dbaron.org/mozilla/visited-privacy * https://wiki.mozilla.org/Thirdparty * http://lists.macosforge.org/pipermail/webkit-dev/2009-May/007788.html * http://flippingtypical.com