Update sitemap tests

- add py3 to tox.ini (gate already tests py3)
- move all tests to $GITROOT/test so they can all run
  through testr
- add scrapy to test-requirements.txt to support sitemap tests
- move tests from test_items.py to test_sitemap_file.py
- fix broken sitemap tests
- add newton to list of old releases in sitemap_file.py
- ignore flake8 H101 as it returns false positives for Sphinx conf.py
- Use openstackdocstheme for docs
- Update sitemap README
- Restructure repo docs
- fix minor style issues

Change-Id: I22c018149b2eefde6ca5c38c22ac06886fe9a7a8
This commit is contained in:
Brian Moss 2017-04-06 14:40:51 +10:00
parent efdab278fc
commit b41c0bdc6a
24 changed files with 156 additions and 119 deletions

.gitignore vendored
View File

@ -12,6 +12,7 @@ eggs
# Unit test / coverage reports

View File

@ -8,7 +8,7 @@ Team and repository tags
.. Change things from this point on
OpenStack Doc Tools
This repository contains tools used by the OpenStack Documentation
@ -16,8 +16,12 @@ project.
For more details, see the `OpenStack Documentation Contributor Guide
* License: Apache License, Version 2.0
* Source: https://git.openstack.org/cgit/openstack/openstack-doc-tools
* Bugs: https://bugs.launchpad.net/openstack-doc-tools
You need to have Python 2.7 installed for using the tools.
@ -57,12 +61,7 @@ On Ubuntu::
$ apt-get install libxml2-dev libxslt-dev
* License: Apache License, Version 2.0
* Source: https://git.openstack.org/cgit/openstack/openstack-doc-tools
* Bugs: https://bugs.launchpad.net/openstack-doc-tools
Regenerating config option tables
See :ref:`autogenerate_config_docs`.

View File

@ -14,6 +14,8 @@
import os
import sys
import openstackdocstheme
sys.path.insert(0, os.path.abspath('../..'))
# -- General configuration ----------------------------------------------------
@ -37,7 +39,7 @@ master_doc = 'index'
# General information about the project.
project = u'openstack-doc-tools'
copyright = u'2014, OpenStack Foundation'
copyright = u'2017, OpenStack Foundation'
# If true, '()' will be appended to :func: etc. cross-reference text.
add_function_parentheses = True
@ -51,10 +53,13 @@ pygments_style = 'sphinx'
# -- Options for HTML output --------------------------------------------------
# The theme to use for HTML and HTML Help pages. Major themes that come with
# Sphinx are currently 'default' and 'sphinxdoc'.
# html_theme_path = ["."]
# html_theme = '_theme'
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
html_theme = 'openstackdocs'
# Add any paths that contain custom themes here, relative to this directory.
html_theme_path = [openstackdocstheme.get_html_theme_path()]
# html_static_path = ['static']
# Output file base name for HTML help builder.

View File

@ -1,3 +1,4 @@
Welcome to openstack-doc-tool's documentation!
@ -6,16 +7,17 @@ Contents:
.. toctree::
:maxdepth: 2
Indices and tables
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`

View File

@ -2,11 +2,15 @@
At the command line::
At the command line:
$ pip install openstack-doc-tools
.. code-block:: console
Or, if you have virtualenvwrapper installed::
$ pip install openstack-doc-tools
$ mkvirtualenv openstack-doc-tools
$ pip install openstack-doc-tools
Or, if you have virtualenvwrapper installed:
.. code-block:: console
$ mkvirtualenv openstack-doc-tools
$ pip install openstack-doc-tools

View File

@ -114,4 +114,4 @@ Bugs
* openstack-doc-tools is hosted on Launchpad so you can view current
bugs at
`Bugs : openstack-manuals <https://bugs.launchpad.net/openstack-manuals/>`__
`Bugs : openstack-doc-tools <https://bugs.launchpad.net/openstack-doc-tools/>`__

View File

@ -1,2 +1 @@
.. include:: ../../RELEASE_NOTES.rst

View File

@ -0,0 +1 @@
.. include:: ../../sitemap/README.rst

View File

@ -1,7 +1,9 @@
To use openstack-doc-tools in a project::
To use openstack-doc-tools in a project:
import os_doc_tools
.. code-block:: python
import os_doc_tools

View File

@ -7,7 +7,7 @@ iso8601>=0.1.11 # MIT
lxml!=3.7.0,>=2.3 # BSD
oslo.config>=3.22.0 # Apache-2.0
docutils>=0.11 # OSI-Approved Open Source, Public Domain
sphinx>=1.5.1 # BSD
sphinx>=1.5.1,<1.6 # BSD
demjson # GLGPLv3+
PyYAML>=3.10.0 # MIT
cliff-tablib>=1.0 # Apache-2.0

View File

@ -2,46 +2,80 @@
Sitemap Generator
This script crawls all available sites on http://docs.openstack.org and extracts
all URLs. Based on the URLs the script generates a sitemap for search engines
according to the protocol described at http://www.sitemaps.org/protocol.html.
This script crawls all available sites on http://docs.openstack.org and
extracts all URLs. Based on the URLs the script generates a sitemap for search
engines according to the `sitemaps protocol
To install the needed modules you can use pip or the package management system included
in your distribution. When using the package management system maybe the name of the
packages differ. Installation in a virtual environment is recommended.
To install the needed modules you can use pip or the package management system
included in your distribution. When using the package management system maybe
the name of the packages differ. Installation in a virtual environment is
$ virtualenv venv
$ source venv/bin/activate
$ pip install -r requirements.txt
.. code-block:: console
When using pip it's maybe necessary to install some development packages.
For example on Ubuntu 16.04 install the following packages.
$ virtualenv venv
$ source venv/bin/activate
$ pip install -r requirements.txt
$ sudo apt install gcc libssl-dev python-dev python-virtualenv
When using pip, you may also need to install some development packages. For
example, on Ubuntu 16.04 install the following packages:
.. code-block:: console
$ sudo apt install gcc libssl-dev python-dev python-virtualenv
To generate a new sitemap file simply run the spider using the
following command. It will take several minutes to crawl all available sites
on http://docs.openstack.org. The result will be available in the file
To generate a new sitemap file, change into your local clone of the
``openstack/openstack-doc-tools`` repository and run the following commands:
$ scrapy crawl sitemap
.. code-block:: console
It's also possible to crawl other sites using the attribute ``domain``.
$ cd sitemap
$ scrapy crawl sitemap
For example to crawl http://developer.openstack.org use the following command.
The result will be available in the file ``sitemap_developer.openstack.org.xml``.
The script takes several minutes to crawl all available
sites on http://docs.openstack.org. The result is available in the
``sitemap_docs.openstack.org.xml`` file.
$ scrapy crawl sitemap -a domain=developer.openstack.org
To write log messages into a file append the parameter ``-s LOG_FILE=scrapy.log``.
It is possible to define a set of additional start URLs using the attribute
``urls``. Separate multiple URLs with ``,``.
Sets the ``domain`` to crawl. Default is ``docs.openstack.org``.
$ scrapy crawl sitemap -a domain=developer.openstack.org -a urls="http://developer.openstack.org/de/api-guide/quick-start/"
For example, to crawl http://developer.openstack.org use the following
.. code-block:: console
$ scrapy crawl sitemap -a domain=developer.openstack.org
The result is available in the ``sitemap_developer.openstack.org.xml`` file.
You can define a set of additional start URLs using the ``urls`` attribute.
Separate multiple URLs with ``,``.
For example:
.. code-block:: console
$ scrapy crawl sitemap -a domain=developer.openstack.org -a urls="http://developer.openstack.org/de/api-guide/quick-start/"
Write log messages to the specified file.
For example, to write to ``scrapy.log``:
.. code-block:: console
$ scrapy crawl sitemap -s LOG_FILE=scrapy.log

View File

@ -69,7 +69,7 @@ class ExportSitemap(object):
def spider_opened(self, spider):
output = open(os.path.join(os.getcwd(), 'sitemap_%s.xml'
% spider.domain), 'w')
% spider.domain), 'w')
self.files[spider] = output
self.exporter = SitemapItemExporter(output, item_element='url',
@ -80,7 +80,7 @@ class ExportSitemap(object):
output = self.files.pop(spider)
tree = lxml.etree.parse(os.path.join(os.getcwd(), "sitemap_%s.xml"
% spider.domain))
% spider.domain))
with open(os.path.join(os.getcwd(), "sitemap_%s.xml" % spider.domain),
'w') as pretty:
pretty.write(lxml.etree.tostring(tree, pretty_print=True))

View File

@ -11,7 +11,10 @@
# under the License.
import time
import urlparse
import urlparse
except ImportError:
import urllib.parse as urlparse
from scrapy import item
from scrapy.linkextractors import LinkExtractor
@ -41,7 +44,8 @@ class SitemapSpider(spiders.CrawlSpider):
rules = [

View File

@ -1 +0,0 @@

View File

@ -1,37 +0,0 @@
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
import mock
from sitemap.generator import items
import unittest
class TestSitemapItem(unittest.TestCase):
def test_class_type(self):
self.assertTrue(type(items.SitemapItem) is items.scrapy.item.ItemMeta)
def test_class_supports_fields(self):
with mock.patch.object(items.scrapy.item, 'Field'):
a = items.SitemapItem()
supported_fields = ['loc', 'lastmod', 'priority', 'changefreq']
for field in supported_fields:
a[field] = field
not_supported_fields = ['some', 'random', 'fields']
for field in not_supported_fields:
with self.assertRaises(KeyError):
a[field] = field
if __name__ == '__main__':

View File

@ -11,9 +11,12 @@ doc8 # Apache-2.0
pylint==1.4.5 # GPLv2
reno>=1.8.0 # Apache-2.0
oslosphinx>=4.7.0 # Apache-2.0
openstackdocstheme>=1.7.0 # Apache-2.0
testrepository>=0.0.18 # Apache-2.0/BSD
# mock object framework
mock>=2.0 # BSD
# sitemap scraping tool
scrapy>=1.0.0 # BSD

View File

@ -78,26 +78,22 @@ class TestExportSitemap(unittest.TestCase):
def test_spider_opened_calls_open(self):
with mock.patch.object(pipelines, 'open',
return_value=None) as mocked_open:
with mock.patch.object(pipelines,
with mock.patch.object(pipelines, 'SitemapItemExporter'):
def test_spider_opened_assigns_spider(self):
prev_len = len(self.export_sitemap.files)
with mock.patch.object(pipelines, 'open',
with mock.patch.object(pipelines,
with mock.patch.object(pipelines, 'open', return_value=None):
with mock.patch.object(pipelines, 'SitemapItemExporter'):
after_len = len(self.export_sitemap.files)
self.assertTrue(after_len - prev_len, 1)
def test_spider_opened_instantiates_exporter(self):
with mock.patch.object(pipelines, 'open',
with mock.patch.object(pipelines, 'open', return_value=None):
with mock.patch.object(pipelines,
'SitemapItemExporter') as mocked_exporter:
@ -105,8 +101,7 @@ class TestExportSitemap(unittest.TestCase):
def test_spider_opened_exporter_starts_exporting(self):
with mock.patch.object(pipelines, 'open',
with mock.patch.object(pipelines, 'open', return_value=None):
with mock.patch.object(pipelines.SitemapItemExporter,
'start_exporting') as mocked_start:

View File

@ -11,10 +11,30 @@
# under the License.
import mock
import scrapy
from sitemap.generator.spiders import sitemap_file
import unittest
class TestSitemapItem(unittest.TestCase):
def test_class_type(self):
self.assertTrue(type(sitemap_file.SitemapItem) is scrapy.item.ItemMeta)
def test_class_supports_fields(self):
with mock.patch.object(scrapy.item, 'Field'):
a = sitemap_file.SitemapItem()
supported_fields = ['loc', 'lastmod', 'priority', 'changefreq']
for field in supported_fields:
a[field] = field
not_supported_fields = ['some', 'random', 'fields']
for field in not_supported_fields:
with self.assertRaises(KeyError):
a[field] = field
class TestSitemapSpider(unittest.TestCase):
def setUp(self):
@ -38,16 +58,18 @@ class TestSitemapSpider(unittest.TestCase):
def test_parse_items_inits_sitemap(self):
response = mock.MagicMock()
with mock.patch.object(sitemap_file.items,
with mock.patch.object(sitemap_file,
'SitemapItem') as mocked_sitemap_item:
with mock.patch.object(sitemap_file, 'time'):
with mock.patch.object(sitemap_file.urlparse,
with mock.patch.object(sitemap_file, 'time'):
def test_parse_items_gets_path(self):
response = mock.MagicMock()
with mock.patch.object(sitemap_file.items, 'SitemapItem'):
with mock.patch.object(sitemap_file, 'SitemapItem'):
with mock.patch.object(sitemap_file.urlparse,
'urlsplit') as mocked_urlsplit:
with mock.patch.object(sitemap_file, 'time'):
@ -60,7 +82,7 @@ class TestSitemapSpider(unittest.TestCase):
path = sitemap_file.urlparse.SplitResult(
@ -77,7 +99,7 @@ class TestSitemapSpider(unittest.TestCase):
path = sitemap_file.urlparse.SplitResult(
@ -94,7 +116,7 @@ class TestSitemapSpider(unittest.TestCase):
path = sitemap_file.urlparse.SplitResult(

View File

@ -1,6 +1,6 @@
minversion = 2.0
envlist = py27,pep8
envlist = py3,py27,pep8
skipsdist = True
@ -27,11 +27,14 @@ commands =
commands = pylint os_doc_tools cleanup
commands = pylint os_doc_tools cleanup sitemap
commands = sphinx-build -a -E -W -d releasenotes/build/doctrees -b html releasenotes/source releasenotes/build/html
# commands = functional test command goes here
commands = {posargs}
@ -44,3 +47,4 @@ builtins = _
# 28 is currently the most complex thing we have
ignore = H101