openstack-doc-tools/sitemap
Percila c050836a8f doc-tools unit tests
Renamed sitemap file to avoid module name conflict
when importing at the sitemap unittest

Added py.test tox environment

Change-Id: I94480e374b29802414b62591a51c04ecd804905e
Closes-Bug: #1387716
2016-08-03 07:05:51 +00:00
..
generator doc-tools unit tests 2016-08-03 07:05:51 +00:00
test doc-tools unit tests 2016-08-03 07:05:51 +00:00
README.rst [sitemap] introduce attribute to define start URLs 2015-10-16 11:22:05 +02:00
__init__.py doc-tools unit tests 2016-08-03 07:05:51 +00:00
requirements.txt [sitemap] add a requirements file 2015-10-02 08:28:37 +02:00
scrapy.cfg script to generate the sitemap.xml for docs.openstack.org 2014-05-29 01:29:18 +02:00
transform-sitemap.xslt Remove /draft from sitemap 2015-04-18 09:43:10 +02:00

README.rst

Sitemap Generator

This script crawls all available sites on http://docs.openstack.org and extracts all URLs. Based on the URLs the script generates a sitemap for search engines according to the protocol described at http://www.sitemaps.org/protocol.html.

Usage

To generate a new sitemap file simply run the spider using the following command. It will take several minutes to crawl all available sites on http://docs.openstack.org. The result will be available in the file sitemap_docs.openstack.org.xml.

$ scrapy crawl sitemap

It's also possible to crawl other sites using the attribute domain.

For example to crawl http://developer.openstack.org use the following command. The result will be available in the file sitemap_developer.openstack.org.xml.

$ scrapy crawl sitemap -a domain=developer.openstack.org

To write log messages into a file append the parameter -s LOG_FILE=scrapy.log.

It is possible to define a set of additional start URLs using the attribute urls. Separate multiple URLs with ,.

$ scrapy crawl sitemap -a domain=developer.openstack.org -a urls="http://developer.openstack.org/de/api-guide/quick-start/"

Dependencies

To install the needed modules you can use pip or the package management system included in your distribution. When using the package management system maybe the name of the packages differ. When using pip it's maybe necessary to install some development packages.

$ pip install -r requirements.txt