wiki:QtWebKitMirrorGuide

Version 2 (modified by zecke@selfish.org, 11 years ago) (diff)

--

Using the mirror application to mirror websites

For benchmarking we want to use real webcontent but we don't want to be subject to different versions of websites (e.g. when they are dynamically created), different network bandwidth and latency. The mirror application can be used to store the downloaded content in a local SQLite3 database and the http_server can serve this content.

It can be used in cases a user is seeing a problem but it can not be reproduced locally. In this case the user should attempt to use the mirror application to create a copy and forward the database to the developer.

Building the mirror application

The mirror application is using qmake to build and is working best with Qt4.6.

$ cd mirror
$ qmake
$ make
$ ./mirror -h
./mirror options [url]
        -c cookies.ini  Use the cookies from this file.
                        The cookie file is compatible with Arora.
        -v              Show the WebView when running
        -k              Keep the application running.

Using the mirror application to mirror content

The mirror application is able to use the Arora CookieJar. On a GNU/Linux system this file is normally located in $HOME/.local/share/data/Arora/cookies.ini. The benefit of using a cookie file is that one can log in to websites like gmail.com, facebook.com using Arora and then will be able to mirror pages with the logged in state.

By default the mirror application is loading the page and then exiting. One can use the -k option to keep the application running. This can be used on pages that utilize a lot of AJAX that will load more resources even after having finished the initial loading. This option was used on the Nokia benchmarking content for the gmail.com website.

The -v option can be used to make the QWebView used to download the pages visible. This can be used to see which sites got downloaded or to manually crawl the web.

Step through to mirror gmail.com

  1. build the mirror application like shown above
  2. use arora to login into the gmail.com service. Click "stay signed in" this will store a cookie w are going to use.
  3. Use ./mirror -v -k -c $HOME/.local/share/data/Arora/cookies.ini http://www.gmail.com to start mirroring
  4. Wait for being logged in and the site is loaded completely
  5. crawl_db.db now contains a copy of gmail.com. It can be served with the http_server