![]() Introduced changes: - pre-commit config and rules - Add pre-commit to pep8 gate, Flake8 is covered in the pre-commit hooks. - Applying fixes for pre-commit compliance in all code. Also commit hash will be used instead of version tags in pre-commit to prevend arbitrary code from running in developer's machines. pre-commit will be used to: - trailing whitespace; - Replaces or checks mixed line ending (mixed-line-ending); - Forbid files which have a UTF-8 byte-order marker (check-byte-order-marker); - Checks that non-binary executables have a proper shebang (check-executables-have-shebangs); - Check for files that contain merge conflict strings (check-merge-conflict); - Check for debugger imports and py37+ breakpoint() calls in python source (debug-statements); - Attempts to load all yaml files to verify syntax (check-yaml); - Run flake8 checks (flake8) (local) For further details about tests please refer to: https://github.com/pre-commit/pre-commit-hooks Change-Id: I7ac1599e903577e28fb64bb07a6b984e1ff8a023 Signed-off-by: Moisés Guimarães de Medeiros <moguimar@redhat.com> |
3 years ago | |
---|---|---|
.. | ||
generator | 3 years ago | |
README.rst | 4 years ago | |
__init__.py | 7 years ago | |
scrapy.cfg | 9 years ago | |
transform-sitemap.xslt | 3 years ago |
README.rst
Sitemap Generator
This script crawls all available sites on https://docs.openstack.org and extracts all URLs. Based on the URLs the script generates a sitemap for search engines according to the sitemaps protocol.
Installation
To install the needed modules you can use pip or the package management system included in your distribution. When using the package management system maybe the name of the packages differ. Installation in a virtual environment is recommended.
$ virtualenv venv
$ . venv/bin/activate
$ pip install Scrapy
When using pip, you may also need to install some development packages. For example, on Ubuntu 16.04 install the following packages:
$ sudo apt install gcc libssl-dev python-dev python-virtualenv
Usage
To generate a new sitemap file, change into your local clone of the openstack/openstack-doc-tools
repository and run the following commands:
$ cd sitemap
$ scrapy crawl sitemap
The script takes several minutes to crawl all available sites on https://docs.openstack.org. The result is available in the sitemap_docs.openstack.org.xml
file.
Options
- domain=URL
Sets the
domain
to crawl. Default isdocs.openstack.org
.For example, to crawl https://developer.openstack.org use the following command:
$ scrapy crawl sitemap -a domain=developer.openstack.org
The result is available in the
sitemap_developer.openstack.org.xml
file.- urls=URL
You can define a set of additional start URLs using the
urls
attribute. Separate multiple URLs with,
.For example:
$ scrapy crawl sitemap -a domain=developer.openstack.org -a urls="https://developer.openstack.org/de/api-guide/quick-start/"
- LOG_FILE=FILE
Write log messages to the specified file.
For example, to write to
scrapy.log
:$ scrapy crawl sitemap -s LOG_FILE=scrapy.log