Some notes about how different browsers implement disk cache
Konqueror
kdelibs/kioslave/http/http_cache_cleaner.cpp kdelibs/kioslave/http/http.cpp
Cache is stored in "~/.kde/cache-katherline/http/[0-9a-z]/"
The top level directories are the first letter in the host and is historical when processing a lot of files in one dir was harder.
Each file is stored by itself. So the file http://www.reddit.com/static/aupmod.png goes in: r/www.reddit.com_static_aupmod.png_2cd5ba49 Each file: "host"_"file"_"fullUrlHash"
Each file contains the following (minus the text before each : so the first line is just '7')
Version: 7 url: http://www.reddit.com/static/aupmod.png Creation date: 1213180806 Expire date: 1213765446 ETag: 1207334405.0-334 Last Modified: Fri, 04 Apr 2008 18:40:05 GMT File: <all contents>
In KDE 4 the file is gzip'd for 90% savings Also r/www.reddit.com_static_aupmod.png_2cd5ba49_freq contains the number of times the url has been requested in a fashion that is not lock safe a has a bunch of problems. Not something I would recomend copying
A seperate application goes through cleaning the cache removing oldest first.
A very simple system.
FireFox:
mozilla/netwerk/cache/src/*
A detailed article on extracting information from the FireFox cache, a good overview: http://www.securityfocus.com/print/infocus/1832
Chrome http://sites.google.com/a/chromium.org/dev/developers/design-documents/disk-cache