Some notes about how different browsers implement disk cache
Cache is stored in "~/.kde/cache-katherline/http/[0-9a-z]/"
The top level directories are the first letter in the host and is historical when processing a lot of files in one dir was harder.
Each file is stored by itself. So the file http://www.reddit.com/static/aupmod.png goes in: r/www.reddit.com_static_aupmod.png_2cd5ba49 Each file: "host"_"file"_"fullUrlHash"
Each file contains the following (minus the text before each : so the first line is just '7')
Version: 7 url: http://www.reddit.com/static/aupmod.png Creation date: 1213180806 Expire date: 1213765446 ETag: 1207334405.0-334 Last Modified: Fri, 04 Apr 2008 18:40:05 GMT File: <all contents>
In KDE 4 the file is gzip'd for 90% savings Also r/www.reddit.com_static_aupmod.png_2cd5ba49_freq contains the number of times the url has been requested in a fashion that is not lock safe a has a bunch of problems. Not something I would recomend copying
A seperate application goes through cleaning the cache removing oldest first.
A very simple system.
A detailed article on extracting information from the FireFox cache, a good overview: http://www.securityfocus.com/print/infocus/1832