This script crawls all available sites on http://docs.openstack.org and extracts all URLs. Based on the URLs the script generates a sitemap for search engines according to the protocol described at http://www.sitemaps.org/protocol.html.
To generate a new sitemap file simply run the spider using the following command. It will take several minutes to crawl all available sites on http://docs.openstack.org. The result will be available in the file
$ scrapy crawl sitemap
It's also possible to crawl other sites using the attribute
For example to crawl http://developer.openstack.org use the following command. The result will be available in the file
$ scrapy crawl sitemap -a domain=developer.openstack.org
To write log messages into a file append the parameter
To install the needed modules you can use pip or the package management system included in your distribution. When using the package management system maybe the name of the packages differ. When using pip it's maybe necessary to install some development packages.
$ pip install scrapy