Changes between Initial Version and Version 1 of Fingerprinting


Ignore:
Timestamp:
Dec 28, 2010, 2:11:50 PM (14 years ago)
Author:
robert@roberthogan.net
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Fingerprinting

    v1 v1  
     1= What is Fingerprinting? =
     2[[PageOutline]]
     3
     4This page describes the mechanisms WebKit offers to ports and clients
     5interested in reducing the ability of websites to identify users and track
     6their behaviour without obtaining consent.
     7
     8Use cases for fingerprinting include:
     9  * Sites attempting to identify users on devices previously used for fraud
     10  * Sites attempting to establish a unique visitor count
     11  * Advertising networks attempting to establish a unique click-through count
     12  * Advertising networks attempting to profile users to increase ad relevance
     13  * Sites attempting to profile the behaviour of unregistered users
     14  * Sites attempting to link the visits of users when they are both registered
     15    and unregistered and identify the user when visiting the site without
     16    authenticating.
     17
     18In order to evade fingerprinting users will often disable/clear HTTP Cookies
     19or change their IP address between visits. Research has demonstrated that
     20these measures are not sufficient and that in order to evade tracking by
     21third parties a wide variety of technologies implemented in the browser must
     22be considered.
     23
     24
     25== Entropy and Fingerprinting ==
     26Entropy defines the amount of uniqueness a specific property exposed by the
     27browser (such as the User-Agent header) introduces into a browser fingerprint.
     28It's usually expressed in bits. For example, Peter Eckersley's
     29['Panopticlick' study for the EFF
     30http://panopticlick.eff.org/browser-uniqueness.pdf] finds that the User-Agent
     31header provide 10.0 bits of entropy. Since 2^10^ == 1024 that means only 1 in
     321024 random browsers visiting a site are expected to share the same
     33user-agent header.
     34
     35== Why this isn't about 'Private Browsing' ==
     36There are a number of things WebCore could do differently to reduce the user's
     37exposure to remote tracking mechanisms but which it is not necessarily
     38desirable to implement as default behaviour. At first glance, some of these
     39appear a good fit with 'Private Browsing' but really they're addressing a
     40different problem. Private browsing's overriding objective is to ensure no
     41trace of a browsing session is left on your disk.
     42Fingerprinting a user's browser and tracking that browser's visits presents a
     43different set of challenges.
     44
     45== Private Browsing in WebKit ==
     46
     47WebKit defines private browsing as:
     48{{{
     49// When this option is set, WebCore will avoid storing any record of browsing
     50activity
     51// that may persist on disk or remain displayed when the option is reset.
     52// This option does not affect the storage of such information in RAM.
     53// The following functions respect this setting:
     54//  - HTML5/DOM Storage
     55//  - Icon Database
     56//  - Console Messages
     57//  - Cache
     58//  - Application Cache
     59//  - Back/Forward Page History
     60//  - Page Search Results
     61}}}
     62
     63Some of the items discussed in this page are already taken care of by private
     64browsing
     65in webkit - so if you are trying to defend your webkit browser against
     66fingerprinting you will at the very least need to enable private browsing
     67through your WebKit port's API.
     68
     69== Private Browsing in WebKit and your browser ==
     70
     71Be aware that not all aspects of 'private browsing' are implemented within
     72WebKit.
     73WebKit ports offer different APIs to clients. Some functionality relevant to
     74mitigating browser fingerprinting is left entirely to the client. For example,
     75QtWebKit does not implement a HTTP Cookie store or Page cache: the
     76client is responsible for managing both.
     77
     78You need to pay close attention to the limitations and capabilities of your
     79chosen port's WebKit API - in particular the aspects of cookie and cache
     80management you are expected to implement yourself.
     81
     82== Things that aren't 'Private Browsing' but are 'Fingerprinting' ==
     83
     84=== 1. Session Isolation ===
     85If you have implemented an anti-fingerprinting mode you don't want a website
     86to access information from the browser's normal mode - doing so might reveal
     87information that allows the site to fingerprint your user or browser. This is
     88not a property of private browsing - which is only interested in preventing
     89data from the private session leaving any traces on the user's disk.
     90
     91=== 2. Session Persistence ===
     92An accidental property of private browsing is that HTTP Cookies and
     93page caches generated in one private browsing session are not available in any
     94subsequent or simultaneous browsing sessions because they are not stored to
     95disk. This property of non-persistence is an important counter-measure against
     96fingerprinting, however where sites use a browser feature as a side channel
     97for simulating the behaviour of HTTP cookies you may not be able to rely on
     98private browsing to take care of this for you.
     99
     100=== 3. Long-Running Sessions ===
     101At the moment private browsing happens to take care of some aspects of session
     102isolation and persistence for you. But a browser session that is never closed,
     103on a machine that is never turned off, will become trackable over time. This
     104means you will need to decide on a way of managing long-running
     105browser sessions and cannot rely on any side-effect benefits from private
     106browsing in webkit. The obvious way of managing this problem is to implement a
     107periodic cleardown of cookies and page cache, as well as any other
     108sidechannel-cookies such as 'window.name'.
     109
     110== Creating a Common Fingerprint ==
     111This page is premised on the notion that the best way to mitigate against
     112browser tracking is to implement a fingerprint for your browser that all users
     113of your browser will share. The larger your user base the more rewarding this
     114policy becomes. Even with a relatively small userbase this approach is still
     115useful as long as you can close off as many sources of client entropy as
     116possible. A browser with a small userbase is inherently more vulnerable to
     117tracking if it leaks entropy than a browser with a large userbase suffering
     118from a similar problem.
     119
     120== Creating a Dynamic Fingerprint ==
     121Not much attention has been paid to approaches that suggest creating a
     122constantly mutating browser fingerprint. Common sense suggests it would be
     123hard to implement and even harder to get right.
     124
     125= Creating a Static and Common Fingerprint for your WebKit Browser =
     126
     127== 1. Javascript Objects ==
     128
     129Javascript exposes a lot of entropy-rich information to websites through the
     130Screen object, the Window object, the Navigator object, the Document object,
     131and even the Date Object.
     132
     133WebKit-based browsers need to decide on a strategy for reducing the variety of
     134information these objects leak to websites. The most common approach is to
     135decide on a pre-determined set of values that the browser will always use in
     136fingerprint-resistant mode - this allows users of the browser in that mode to
     137be indisinguishable from each other based on information from the JS objects
     138alone.
     139
     140If you are building a tracking-resistant mode for your WebKit-based browser
     141you will need to consider managing at least the following values:
     142
     143=== i. Document Object ===
     144
     145The document.referrer property needs to be managed in the same way as
     146[Referer Header and Origin Header] below.
     147
     148=== ii. History Object ===
     149
     150history.length::
     151This value has potential, in cases where it is unusually high due to prolonged
     152use of a single browser/tab session, to assist sites in tracking the
     153user. That said, such users already have a pretty revealing cache and
     154coookie data set.
     155
     156=== iii. Window object ===
     157
     158You have two choices here:
     159- restrict the layout of the actual browser window to three or four
     160predetermined-sizes, and return those.
     161- return values that do not reflect the real size of the browser window.
     162
     163The properties you need to override are at least:
     164{{{
     165outerHeight()
     166outerWidth()
     167innerHeight()
     168innerWidth()
     169screenX()
     170screenY()
     171scrollX()
     172scrollY()
     173}}}
     174
     175=== iv. Window.name ===
     176
     177This deserves a special mention. Window.name is cross-domain and persists
     178across page loads. Not surprisingly, many sites use it as a cookie. Since it
     179does not persist across sessions it's not as much of a problem as a HTTP
     180cookie but it does allow tracking within sessions - and this is a worry if the
     181session is long-running.
     182
     183There is room here for WebKit to restrict cross-domain access to window.name
     184but you are probably better off managing the value in this field yourself with
     185the JS API. The chances are you will break the user's experience on at least
     186some websites.
     187
     188=== v. Screen object ===
     189
     190Torbutton and Torora use the following values for the Screen object's
     191properties:
     192
     193{{{
     194height()      = window innerHeight()
     195width()       = width()  rounded to the nearest 50 px
     196colorDepth()  = 24
     197pixelDepth()  = 24
     198availLeft()   = 0
     199availTop()    = 0
     200availHeight() = window innerHeight()
     201availWidth()  = window innerWidth()
     202}}}
     203
     204Entropy for the values provided here is as much as 4.83 bits.
     205
     206===  vi. Navigator Object ===
     207You will need to look closely at all the values exposed by the Navigator
     208object and decide on a set of values that can remain static across many
     209releases as long as possible. You will also need to ensure that the values
     210decided upon here are also presented in the user-agent HTTP header by your
     211browser.
     212
     213You will need to decide what to do with navigator.plugins. Internet Explorer
     214does not return anything to navigator.plugins, so websites tend not to rely on
     215it - making it a relatively safe option to follow IE's suit. If you do expose
     216a list of plugins through this property you will need to ensure your decision
     217is consistent with the behaviour you have implemented under Plugins below.
     218
     219=== vii. Date Object ===
     220
     221==== a. Timezone ====
     222
     223You need to decide what timezone your browser is in when in
     224fingerprinting-resistance mode - a common choice is UTC. Rather than attempt
     225to override the Date object you should set the local timezone to UTC by setting
     226the environment variable TZ as follows:
     227{{{
     228              setenv("TZ",":UTC",1);
     229}}}
     230This will ensure that WebKit always returns the local time as UTC when the
     231Date object is queried.
     232
     233==== b. Timing Users ====
     234Believe it or not, there is evidence that some companies have tracked users
     235based on the typing cadence gleaned from querying the Date object:
     236
     237http://arstechnica.com/tech-policy/news/2010/02/firm-uses-typing-cadence-to-finger-unauthorized-users.ars
     238
     239Currently, WebKit does not offer any means of countering this.
     240
     241=== viii. Language Object ===
     242
     243You need to decide on a consistent set of values for your browser in
     244fingeprinting-resistance mode. For example:
     245
     246{{{
     247charset = 'iso-8859-1,*,utf-8'
     248language = 'en-us, en'
     249locale = 'en-US'
     250}}}
     251
     252Your decision should be consistent with your implementation of the
     253Accept-Language header.
     254
     255
     256== 2. Overriding Javascript Objects ==
     257
     258WebCore does allow WebKit ports to overload the values of JS objects. Most
     259WebKit ports expose this API to browser clients.
     260
     261=== i. Overriding Javascript Objects - Qt ===
     262
     263You can use the function below to overload existing JS objects with a Qt
     264object. Any functions called on the JS object will call the equivalently named
     265function in your Qt object:
     266{{{
     267void QWebFrame::addToJavaScriptWindowObject ( const QString & name, QObject *object )
     268}}}
     269For an example of this in practice see:
     270
     271  * https://github.com/mwenge/torora/commit/5191dca4d5df08514f21e68472cacb6cacd2eb06
     272
     273In order to override specific properties of a JS object in Qt see:
     274
     275  * https://bugs.webkit.org/show_bug.cgi?id=46566
     276
     277== 3. CSS ==
     278
     279===  i. CSS Media Queries ===
     280
     281The same information that can be collected from the Screen and Window object
     282can also be collected via CSS Media Queries.
     283
     284WebKit currently does not offer a means of countering this.
     285
     286  * https://bugs.webkit.org/show_bug.cgi?id=50895
     287
     288=== ii. CSS Fonts ===
     289
     290CSS rules may be used to inspect locally available fonts. A working example of
     291this 'font introspection' using simple CSS rules can be found at
     292http://flippingtypical.org.
     293
     294WebKit currently does not offer a means of countering this.
     295
     296There needs to be a mechanism for allowing WebKit clients to decide which
     297fonts are locally available when CSS rules are evaluated.
     298
     299See also Fonts.
     300
     301=== iii. Querying Page History with CSS ===
     302
     303There is a well-known attack on the CSS 'visited:' rule that allows a CSS
     304stylesheet to determine by brute-force which websites the user has visited.
     305A good examples of this in action is available at:
     306  * http://ha.ckers.org/weird/CSS-history.cgi
     307
     308This method can inspect user history across sessions and is deployed by live
     309websites.
     310
     311WebKit has mitigated against this attack since the implementation of
     312https://bugs.webkit.org/show_bug.cgi?id=24300, based on the approach outlined
     313in http://dbaron.org/mozilla/visited-privacy.
     314
     315
     316== 4. Plugins and Java Applets ==
     317
     318If you want complete control over the information your browser reveals to
     319websites then you can't let your browser run someone else's
     320executable code. That means plugins and Java applets. That also means a pretty
     321unusable browser by most people's lights.
     322
     323WebKit does not offer you very much in this category. There is no sandboxing
     324of NPAPI plugins - all WebKit ports currently dlopen() the binary blob and let
     325it run with the user's privileges. In the case of a malicious plugin this
     326means relatively unfettered OS access. In the case of well-defined plugins such
     327as Flash there is
     328still scope for collecting a lot of information.
     329
     330=== i. Using the List of Installed Plugins To Build Up A Fingerprint ===
     331
     332As well as isolating users who have an exotic set of installed plugins, a major
     333source of entropy is found in the micro-version information provided
     334by navigator.plugins. There is scope here for WebKit to limit the version
     335information at the request of the client.
     336
     337Apart from querying navigator.plugins, a page can attempt to instantiate as
     338many plugins as possible and inspect the page's layout to see what the user
     339has installed. There are a few possible countermeasures here of varying
     340efficacy, though none have been tried in practice:
     341
     342  * Limit the number of plugins that can be loaded on a single page
     343  * Prevent zero-size plugins from loading
     344  * Disable plugins completely
     345  * Only load individual plugins at the user's request - e.g. 'Click to Flash'
     346
     347Note that blocking plugins can actually improve the quality of fingerprint:
     348http://panopticlick.eff.org/browser-uniqueness.pdf finds that the uniqueness
     349of flash-blocking browsers is over 1 in 400,000.
     350
     351=== ii. Collecting System Fonts via Flash Plugins ===
     352
     353The Adobe Flash API allows flash applications to obtain a list of all the
     354fonts installed on a system in unsorted order. This is a very rich source of
     355entropy for anyone trying to fingerprint a user. Worse, it is completely
     356outside the control of WebKit and the client browser.
     357
     358Short of running your browser in a chroot'ed jail there is nothing you can do
     359to prevent Flash inspecting your file system for the fonts folder and
     360supplying the list of fonts installed there to swf objects that request
     361it.
     362
     363This is a problem you will have to address at application-level for now. There
     364are no open bugs for introducing plugin-sandboxing (i.e. intercepting plugin's
     365system calls) in WebKit at the time of writing.
     366
     367=== iii. Flash Cookies (Local Shared Objects) ===
     368
     369You should read (it's
     370short): http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1446862
     371
     372From the paper:
     373  "Flash data is stored in a different folder on different computing platforms.
     374  For instance, on an Apple, Flash local shared objects (labeled .sol) are
     375  stored at: /users/[username]/Library/Preferences/Macromedia/Flash Player/
     376  On a Windows computer, they are stored at:
     377  \Documents and Settings\[username]\Application Data\Macromedia
     378  \Flash Player
     379  Several subdirectories may reside at that location:
     380  “#SharedObjects” contains the actual Flash cookies and subdirectories under
     381  “Macromedia.com” contains persistent global and domain-specific settings for
     382  how the Flash player operates. As such, there will be a subdirectory for each
     383  Flash-enabled domain a user visits under the “Macromedia.com” settings
     384  folder.
     385  This has privacy implications .."
     386
     387So Flash LSOs are used by a number of well-known sites to regenerate or respawn
     388HTTP cookies the user has already deleted.
     389
     390Fortunately, Adobe Flash has supported private browsing since Flash
     391Player 10.1 and does not store flash cookies when in private browsing mode.
     392
     393
     394== 5. SilverLight And ActiveX ==
     395TBC
     396
     397== 6. Fonts ==
     398
     399A site may render a page in a number of different fonts and then use
     400getComputedSize() to determine which were rendered correctly.
     401
     402WebKit currently does not offer a means of countering this.
     403
     404There needs to be a mechanism for allowing WebKit clients to decide which
     405fonts are locally available when the page is rendered.
     406
     407
     408== 7. Cookies ==
     409
     410You either have these (i) disabled completely, (ii) clear them every time a new
     411browsing session starts, or (iii) clear them every N minutes/hours. Since
     412Private Browsing already takes care of (ii), you may decide that (i) and (iii)
     413are not worth the trouble. Whatever you decide, you need to be consistent with
     414your implementation of the Page Cache.
     415
     416== 8. Third Party Cookies ==
     417
     418If you are clearing all cookies periodically then third-party cookies are not
     419something you need to worry about any more than their first-party cousin.
     420
     421The default behaviour of third-party cookies is the subject of a lot of
     422interoperability issues between browsers. Firefox has a proposal in active
     423development and there has been some good discussion in at least one WebKit
     424bug.
     425
     426Most WebKit ports offer you the possibility of managin third-party cookies
     427however you choose, and the default behaviour between WebKit ports often
     428differs - Safari is the most restrictive as it does not allow 3rd parties to
     429set new cookies, though they can update existing ones.
     430
     431  * https://bugs.webkit.org/show_bug.cgi?id=35824
     432  * https://wiki.mozilla.org/Thirdparty
     433  * https://bugzilla.mozilla.org/show_bug.cgi?id=565965
     434
     435Qt does not offer a means of identifying third party cookies yet:
     436  * https://bugs.webkit.org/show_bug.cgi?id=45455
     437
     438== 9. Page Cache ==
     439
     440The [http://samy.pl/evercookie/ evercookie] is an excellent practical
     441demonstration of how a website can
     442inspect the browser's cache to determine if the user has visited the site
     443before. So if your browser has disabled cookies completely or disables cookie
     444storage across sessions or over long periods of time, you will need to treat
     445the page cache in the same way.
     446
     447== 10. HTTP Headers ==
     448
     449You will need to decide what to do with the Referer header, the Origin header,
     450the Accept header, and the Accept-Language header.
     451
     452=== i. User-Agent Header ===
     453
     454Whatever decision you make about the User-Agent header, be prepared to stick
     455with the values you set initially for as long as possible.
     456
     457=== ii. Referer Header and Origin Header ===
     458
     459Manipulating these headers isn't strictly a fingerprinting-resistance
     460requirement, however they do leak information about the user's browsing
     461history. Manipulating them can break website behaviour and may even get your
     462browser blacklisted by certain sites.
     463
     464Possible countermeasures in the case of FireFox are discussed at:
     465https://bugzilla.mozilla.org/show_bug.cgi?id=587523. The best suggestion
     466there is to scrub the path but not the domain from the referrer header.
     467
     468  * https://bugs.webkit.org/show_bug.cgi?id=51638
     469
     470=== iii. Accept-Language Header ===
     471
     472This should be consistent with the value you choose to return from
     473the Javascript Language object, e.g.'en-us'.
     474
     475=== iv. Accept Header ===
     476
     477The entropy provided by an Accept header will depend largely on the language
     478and charsets your browser has decided to support or permit to the user while
     479in fingerprinting-resistance mode.
     480
     481=== v. HTTP ETags ===
     482[http://en.wikipedia.org/wiki/HTTP_ETag ETags] can be used as a [http://en.wikipedia.org/wiki/HTTP_ETag#Tracking_using_ETags substitute for HTTP cookies] and
     483this use has been [http://samy.pl/evercookie/ demonstrated in practice].
     484
     485WebKit supports ETags. Implementing the recommendations in Page Cache will
     486mitigate against their use by preventing them from persisting across sessions
     487and even long-running browser sessions if you're prepared to implement an
     488aggressive cache-clearing policy.
     489
     490== 11. DOM LocalStorage/DOM SessionStorage/DOM GlobalStorage ==
     491Private browsing in WebKit denies read and write access to DOM storage since
     492https://bugs.webkit.org/show_bug.cgi?id=49329.
     493
     494== 12. GeoLocation ==
     495
     496You will, um, need to ensure you disable geolocation if it supported by your
     497chosen WebKit port.
     498
     499= Further Reading =
     500
     501  * http://panopticlick.eff.org/browser-uniqueness.pdf
     502  * https://www.torproject.org/torbutton/design/#FirefoxBugs
     503  * http://browserspy.dk
     504  *http://blog.torproject.org/blog/firefox-private-browsing-mode-torbutton-and-
     505  fingerprinting
     506  * http://www.collinjackson.com/research/private-browsing.pdf
     507  * https://wiki.mozilla.org/Security/Anonymous_Browsing
     508  * https://wiki.mozilla.org/Security/Fingerprinting
     509  * http://samy.pl/evercookie/
     510  * http://dbaron.org/mozilla/visited-privacy
     511  * https://wiki.mozilla.org/Thirdparty
     512  * http://lists.macosforge.org/pipermail/webkit-dev/2009-May/007788.html
     513  * http://flippingtypical.org