== Using the mirror application to mirror websites == For benchmarking we want to use real webcontent but we don't want to be subject to different versions of websites (e.g. when they are dynamically created), different network bandwidth and latency. The '''mirror''' application can be used to store the downloaded content in a local SQLite3 database and the '''http_server''' can serve this content. It can also be used in cases where a user is seeing a problem but it can not be reproduced locally. In this case the user should attempt to use the mirror application to create a copy and forward the database to the developer. === Building the mirror application === The mirror application is using qmake to build and is working best with version 4.6 of Qt. Example of building and running: {{{ $ cd host-tools/mirror $ qmake $ make $ ./mirror -h ./mirror options [url] -c cookies.ini Use the cookies from this file. The cookie file is compatible with Arora. -v Show the WebView when running -k Keep the application running. }}} === Using the mirror application to mirror content === The mirror application is able to use the [http://www.arora-browser.org Arora] CookieJar. On a GNU/Linux system this file is normally located in '''$HOME/.local/share/data/Arora/cookies.ini'''. The benefit of using a cookie file is that one can login to websites like ''gmail.com'', ''facebook.com'' using Arora and then will be able to mirror pages with the logged-in state. By default the '''mirror''' application is loading the page and then exiting. One can use the '''-k''' option to keep the application running. This can be used on pages that utilize a lot of AJAX that will load more resources even after having finished the initial loading. This option was used on the Nokia benchmarking content for the ''gmail.com'' website. The '''-v''' option can be used to make the QWebView used to download the pages visible. This can be used to see which sites got downloaded or to manually crawl the web. === Step through to mirror gmail.com === 1. build the mirror application like shown above 2. use arora to login into the gmail.com service. Click "stay signed in" as this will store cookie that we can use in the '''mirror''' application. 3. Use `./mirror -v -k -c $HOME/.local/share/data/Arora/cookies.ini http://www.gmail.com` to start mirroring 4. Wait for being logged in and the site is loaded completely 5. ''crawl_db.db'' now contains a copy of gmail.com. It can be served with the '''http_server''' === Generate more stable loading times === Some webpages use Math.random() to randomize which content to be displayed. In the case of Wikipedia this can be one of the various announcement, in case of the apple.com website this is the image to be displayed on the front page and the advertisement query. The problem with this randomisation is that the loading time may vary from page view to page view and that the resulting loading time is not stable. On way to deal with it is to remove all calls to Math.random() with a constant. The mirror utility contains a script called ?`store_all.py` which will store all files from the db to disk. This allows to search for the Math.random() and replace it with a constant. Using the `put_file.py` utility one can put a new version into the DB. You will have to remove the URL from the top of the file and use it as argument to `put_file.py`. Once you are done you will need to update the headers of the table, this can be done by invoking the `update_content_length.py` script. Another similiar source of trouble comes from using the current date to fetch resources or fetch resources depending on the useragent. Currently there are no hints on how to deal with that. The problem might be that on different dates, or different platforms a 404 will be returned instead of the real content making a comparison hard. === Step through to mirror gmail.com and everything on a screen cast === * A video on mirroring gmail.com can be seen [http://blip.tv/file/2662874 here]. * A video on running the do_mirror.sh script can be seen [http://blip.tv/file/2662945 here].