Changes between Version 5 and Version 6 of QtWebKitMirrorGuide


Ignore:
Timestamp:
Nov 26, 2009 4:44:12 AM (10 years ago)
Author:
zecke@selfish.org
Comment:

Mention the Math.random() problem with the test content.

Legend:

Unmodified
Added
Removed
Modified
  • QtWebKitMirrorGuide

    v5 v6  
    1212Example of building and running:
    1313{{{
    14 $ cd mirror
     14$ cd host-tools/mirror
    1515$ qmake
    1616$ make
     
    3838 5. ''crawl_db.db'' now contains a copy of gmail.com. It can be served with the '''http_server'''
    3939
     40=== Generate more stable loading times ===
     41Some webpages use Math.random() to randomize which content to be displayed. In the case of Wikipedia this can be one of the various announcement, in case of the apple.com website this is the image to be displayed on the front page and the advertisement query. The problem with this randomisation is that the loading time may vary from page view to page view and that the resulting loading time is not stable. On way to deal with it is to remove all calls to Math.random() with a constant.
     42
     43The mirror utility contains a script called ?`store_all.py` which will store all files from the db to disk. This allows to search for the Math.random() and replace it with a constant. Using the `put_file.py` utility one can put a new version into the DB. You will have to remove the URL from the top of the file and use it as argument to `put_file.py`. Once you are done you will need to update the headers of the table, this can be done by invoking the `update_content_length.py` script.
     44
     45Another similiar source of trouble comes from using the current date to fetch resources or fetch resources depending on the useragent. Currently there are no hints on how to deal with that. The problem might be that on different dates, or different platforms a 404 will be returned instead of the real content making a comparison hard.
     46
    4047=== Step through to mirror gmail.com and everything on a screen cast ===
    4148 * A video on mirroring gmail.com can be seen [http://blip.tv/file/2662874 here].