Changes between Version 6 and Version 7 of QtWebKitMirrorGuide


Ignore:
Timestamp:
Jan 16, 2010 12:51:41 AM (14 years ago)
Author:
zecke@selfish.org
Comment:

Mention the manipulate-content.py script for post-processing.

Legend:

Unmodified
Added
Removed
Modified
  • QtWebKitMirrorGuide

    v6 v7  
    3939
    4040=== Generate more stable loading times ===
    41 Some webpages use Math.random() to randomize which content to be displayed. In the case of Wikipedia this can be one of the various announcement, in case of the apple.com website this is the image to be displayed on the front page and the advertisement query. The problem with this randomisation is that the loading time may vary from page view to page view and that the resulting loading time is not stable. On way to deal with it is to remove all calls to Math.random() with a constant.
     41Some webpages use Math.random() or the current date to randomize which content to be displayed. In the case of Wikipedia this can be one of the various announcement, in case of the apple.com website this is the image to be displayed on the front page and the advertisement query. The problem with this randomisation is that the loading time may vary from page view to page view and that the resulting loading time is not stable. On way to deal with it is to remove all calls to Math.random() with a constant.
    4242
    43 The mirror utility contains a script called ?`store_all.py` which will store all files from the db to disk. This allows to search for the Math.random() and replace it with a constant. Using the `put_file.py` utility one can put a new version into the DB. You will have to remove the URL from the top of the file and use it as argument to `put_file.py`. Once you are done you will need to update the headers of the table, this can be done by invoking the `update_content_length.py` script.
     43The `manipulate-content.py` script was added to remove these random sources and replace them with constatnt. Currently Math.random() and new Date().getTime() gets replaced with constants in the db.
    4444
    4545Another similiar source of trouble comes from using the current date to fetch resources or fetch resources depending on the useragent. Currently there are no hints on how to deal with that. The problem might be that on different dates, or different platforms a 404 will be returned instead of the real content making a comparison hard.