Transfer large scale scaling stories to rst documentation

Signed-off-by: Ramona Rautenberg <rautenberg@osism.tech> Change-Id: Ie5bb8c7e09ed49f633290cc789b874107676f9e4
2022-07-01 15:04:56 +02:00 · 2022-07-01 15:04:56 +02:00 · 6698cddaaf
parent 97066f8399
commit 6698cddaaf
7 changed files with 105 additions and 8 deletions
--- a/doc/source/index.rst
+++ b/doc/source/index.rst
@ -10,6 +10,7 @@ Contents:
   readme
   journey/index
   contributor/index
+   stories/index

 Indices and tables
 ==================
--- a/doc/source/journey/index.rst
+++ b/doc/source/journey/index.rst
@ -15,6 +15,5 @@ Contents:
   scale_up
   scale_out
   upgrade_and_maintain
-   large_scale_scaling_stories

 WIP: Transfer the content from https://wiki.openstack.org/wiki/Large_Scale_SIG
--- a/doc/source/journey/large_scale_scaling_stories.rst
+++ b/doc/source/journey/large_scale_scaling_stories.rst
@ -1,5 +0,0 @@
-===========================
-Large Scale Scaling Stories
-===========================
-
-# WIP
--- a/doc/source/journey/scale_up.rst
+++ b/doc/source/journey/scale_up.rst
@ -36,7 +36,7 @@ A: If you found out that your rabbitmq queue keep piling up for a certain servic
 Resources
 ---------

-* A curated collection of :doc:`scaling stories <large_scale_scaling_stories>`, as we collect them
+* A curated collection of :doc:`scaling stories <../stories/stories>`, as we collect them

 * Evaluation of internal messaging

@ -70,4 +70,4 @@ Other SIG work on that stage

  * Submit scaling stories on https://etherpad.openstack.org/p/scaling-stories

-  * Curate them on :doc:`scaling stories <large_scale_scaling_stories>`
+  * Curate them on :doc:`scaling stories <../stories/stories>`
--- a/doc/source/stories/2020-01-29.rst
+++ b/doc/source/stories/2020-01-29.rst
@ -0,0 +1,70 @@
+===================================================
+Large Scale Scaling Stories/2020-01-29-AlbertBraden
+===================================================
+
+Here are the scaling issues I've encountered recently at Synopsys, in reverse chronological order:
+
+
+Thursday 12/19/2019: openstack server list –all-projects does not return all VMs.
+---------------------------------------------------------------------------------
+
+In /etc/nova/nova.conf we have default: # max_limit = 1000 
+
+The recordset cleanup script depends on correct output from “openstack server list –all-projects" 
+
+Fix: Increased max_limit to 2000
+
+The recordset cleanup script will run “openstack server list –all-projects|wc –l" and compare the output to max_limit, and refuse to run if max_limit is too low. If this happens, increase max_limit so that it is greater than the number of VMs in the cluster. 
+
+As time permits we need to look into paging results: https://docs.openstack.org/api-guide/compute/paginated_collections.html
+
+
+Friday 12/13/2019: Arp table got full on pod2 controllers
+---------------------------------------------------------
+
+https://www.cyberciti.biz/faq/centos-redhat-debian-linux-neighbor-table-overflow/ 
+
+Fix: Increase sysctl values:
+
+.. code-block:: console
+
+   --- a/roles/openstack/controller/neutron/tasks/main.yml
+   +++ b/roles/openstack/controller/neutron/tasks/main.yml
+   @@ -243,6 +243,9 @@
+        with_items:
+          - { name: 'net.bridge.bridge-nf-call-iptables', value: '1' }
+          - { name: 'net.bridge.bridge-nf-call-ip6tables', value: '1' }
+   +      - { name: 'net.ipv4.neigh.default.gc_thresh3', value: '4096' }
+   +      - { name: 'net.ipv4.neigh.default.gc_thresh2', value: '2048' }
+   +      - { name: 'net.ipv4.neigh.default.gc_thresh1', value: '1024' }
+
+
+12/10/2019: RPC workers were overloaded
+---------------------------------------
+
+http://lists.openstack.org/pipermail/openstack-discuss/2019-December/011465.html 
+
+Fix: increase number of RPC workers. modify /etc/neutron/neutron.conf on controllers: 
+
+.. code-block:: console
+
+   148c148
+   < #rpc_workers = 1
+   ---
+   > rpc_workers = 8
+
+
+October 2019: Rootwrap
+----------------------
+
+Neutron was timing out because rootwrap was taking too long to spawn. 
+
+Fix: Run rootwrap daemon: 
+
+Add line to /etc/neutron/neutron.conf on the controllers: 
+
+root_helper_daemon = "sudo /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf" 
+ 
+Add line to /etc/sudoers.d/neutron_sudoers on the controllers: 
+
+neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf
--- a/doc/source/stories/index.rst
+++ b/doc/source/stories/index.rst
@ -0,0 +1,11 @@
+================
+ Scaling Stories
+================
+
+Contents:
+
+.. toctree::
+   :maxdepth: 2
+
+   stories
+   2020-01-29
--- a/doc/source/stories/stories.rst
+++ b/doc/source/stories/stories.rst
@ -0,0 +1,21 @@
+===========================
+Large Scale Scaling Stories
+===========================
+
+As part of its goal of further pushing back scaling limits within a given cluster, the Large Scale SIG collects scaling stories from OpenStack users.
+
+There is a size/load limit for single clusters past which things in OpenStack start to break, and we need to start using multiple clusters or cells to scale out. The SIG is interested in hearing:
+
+* what broke first for you, is it RabbitMQ or something else
+* what were the first symptoms
+* what size/load did it start to break
+* things you did to fix it
+
+This will be a great help to document expected limits, and identify where improvements should be focused.
+
+You can submit your story directly here, or on this `etherpad <https://etherpad.opendev.org/p/scaling-stories>`_.
+
+Stories
+-------
+
+* :doc:`2020-01-29-AlbertBraden <2020-01-29>`