To install the needed modules you can use pip or the package management system included in your distribution. When using the package management system maybe the name of the packages differ. Installation in a virtual environment is recommended.
$ virtualenv venv $ . venv/bin/activate $ pip install Scrapy
When using pip, you may also need to install some development packages. For example, on Ubuntu 16.04 install the following packages:
$ sudo apt install gcc libssl-dev python-dev python-virtualenv
To generate a new sitemap file, change into your local clone of the
openstack/openstack-doc-tools repository and run the following commands:
$ cd sitemap $ scrapy crawl sitemap
The script takes several minutes to crawl all available sites on https://docs.openstack.org. The result is available in the
domain to crawl. Default is
For example, to crawl https://developer.openstack.org use the following command:
$ scrapy crawl sitemap -a domain=developer.openstack.org
The result is available in the
You can define a set of additional start URLs using the
urls attribute. Separate multiple URLs with
$ scrapy crawl sitemap -a domain=developer.openstack.org -a urls="https://developer.openstack.org/de/api-guide/quick-start/"
Write log messages to the specified file.
For example, to write to
$ scrapy crawl sitemap -s LOG_FILE=scrapy.log