openstack-doc-tools/sitemap
Andreas Jaeger 06d7ac27b7 Update docs building
Use sphinx-build and cleanup unused config.
Switch to openstackdocstheme 1.20.0 and remove obsolete settings from
conf.py files.

Update some RST files, they had wrong indentation.

Change-Id: Iaad2841db809f8a343fb8b1031cf8d0587d70442
2019-08-11 09:48:06 +02:00
..
generator Update sitemap file 2018-09-09 12:00:39 +02:00
__init__.py doc-tools unit tests 2016-08-03 07:05:51 +00:00
README.rst Update docs building 2019-08-11 09:48:06 +02:00
scrapy.cfg script to generate the sitemap.xml for docs.openstack.org 2014-05-29 01:29:18 +02:00
transform-sitemap.xslt Remove /draft from sitemap 2015-04-18 09:43:10 +02:00

Sitemap Generator

This script crawls all available sites on https://docs.openstack.org and extracts all URLs. Based on the URLs the script generates a sitemap for search engines according to the sitemaps protocol.

Installation

To install the needed modules you can use pip or the package management system included in your distribution. When using the package management system maybe the name of the packages differ. Installation in a virtual environment is recommended.

$ virtualenv venv
$ . venv/bin/activate
$ pip install Scrapy

When using pip, you may also need to install some development packages. For example, on Ubuntu 16.04 install the following packages:

$ sudo apt install gcc libssl-dev python-dev python-virtualenv

Usage

To generate a new sitemap file, change into your local clone of the openstack/openstack-doc-tools repository and run the following commands:

$ cd sitemap
$ scrapy crawl sitemap

The script takes several minutes to crawl all available sites on https://docs.openstack.org. The result is available in the sitemap_docs.openstack.org.xml file.

Options

domain=URL

Sets the domain to crawl. Default is docs.openstack.org.

For example, to crawl https://developer.openstack.org use the following command:

$ scrapy crawl sitemap -a domain=developer.openstack.org

The result is available in the sitemap_developer.openstack.org.xml file.

urls=URL

You can define a set of additional start URLs using the urls attribute. Separate multiple URLs with ,.

For example:

$ scrapy crawl sitemap -a domain=developer.openstack.org -a urls="https://developer.openstack.org/de/api-guide/quick-start/"
LOG_FILE=FILE

Write log messages to the specified file.

For example, to write to scrapy.log:

$ scrapy crawl sitemap -s LOG_FILE=scrapy.log