Merge "Transfer large scale scaling stories to rst documentation"

2022-07-04 07:30:19 +00:00 · 2022-07-04 07:30:19 +00:00 · 09408fdda2
parent d823a0fa0d 6698cddaaf
commit 09408fdda2
7 changed files with 105 additions and 8 deletions
--- a/doc/source/index.rst
+++ b/doc/source/index.rst
@ -10,6 +10,7 @@ Contents:
   readme
   journey/index
   contributor/index
+   stories/index

 Indices and tables
 ==================
--- a/doc/source/journey/index.rst
+++ b/doc/source/journey/index.rst
@ -15,6 +15,5 @@ Contents:
   scale_up
   scale_out
   upgrade_and_maintain
-   large_scale_scaling_stories

 WIP: Transfer the content from https://wiki.openstack.org/wiki/Large_Scale_SIG
--- a/doc/source/journey/large_scale_scaling_stories.rst
+++ b/doc/source/journey/large_scale_scaling_stories.rst
@ -1,5 +0,0 @@
-===========================
-Large Scale Scaling Stories
-===========================
-
-# WIP
--- a/doc/source/journey/scale_up.rst
+++ b/doc/source/journey/scale_up.rst
@ -36,7 +36,7 @@ A: If you found out that your rabbitmq queue keep piling up for a certain servic
 Resources
 ---------

-* A curated collection of :doc:`scaling stories <large_scale_scaling_stories>`, as we collect them
+* A curated collection of :doc:`scaling stories <../stories/stories>`, as we collect them

 * Evaluation of internal messaging

@ -70,4 +70,4 @@ Other SIG work on that stage

  * Submit scaling stories on https://etherpad.openstack.org/p/scaling-stories

-  * Curate them on :doc:`scaling stories <large_scale_scaling_stories>`
+  * Curate them on :doc:`scaling stories <../stories/stories>`
--- a/doc/source/stories/2020-01-29.rst
+++ b/doc/source/stories/2020-01-29.rst
@ -0,0 +1,70 @@
+===================================================
+Large Scale Scaling Stories/2020-01-29-AlbertBraden
+===================================================
+
+Here are the scaling issues I've encountered recently at Synopsys, in reverse chronological order:
+
+
+Thursday 12/19/2019: openstack server list –all-projects does not return all VMs.
+---------------------------------------------------------------------------------
+
+In /etc/nova/nova.conf we have default: # max_limit = 1000 
+
+The recordset cleanup script depends on correct output from “openstack server list –all-projects" 
+
+Fix: Increased max_limit to 2000
+
+The recordset cleanup script will run “openstack server list –all-projects|wc –l" and compare the output to max_limit, and refuse to run if max_limit is too low. If this happens, increase max_limit so that it is greater than the number of VMs in the cluster. 
+
+As time permits we need to look into paging results: https://docs.openstack.org/api-guide/compute/paginated_collections.html
+
+
+Friday 12/13/2019: Arp table got full on pod2 controllers
+---------------------------------------------------------
+
+https://www.cyberciti.biz/faq/centos-redhat-debian-linux-neighbor-table-overflow/ 
+
+Fix: Increase sysctl values:
+
+.. code-block:: console
+
+   --- a/roles/openstack/controller/neutron/tasks/main.yml
+   +++ b/roles/openstack/controller/neutron/tasks/main.yml
+   @@ -243,6 +243,9 @@
+        with_items:
+          - { name: 'net.bridge.bridge-nf-call-iptables', value: '1' }
+          - { name: 'net.bridge.bridge-nf-call-ip6tables', value: '1' }
+   +      - { name: 'net.ipv4.neigh.default.gc_thresh3', value: '4096' }
+   +      - { name: 'net.ipv4.neigh.default.gc_thresh2', value: '2048' }
+   +      - { name: 'net.ipv4.neigh.default.gc_thresh1', value: '1024' }
+
+
+12/10/2019: RPC workers were overloaded
+---------------------------------------
+
+http://lists.openstack.org/pipermail/openstack-discuss/2019-December/011465.html 
+
+Fix: increase number of RPC workers. modify /etc/neutron/neutron.conf on controllers: 
+
+.. code-block:: console
+
+   148c148
+   < #rpc_workers = 1
+   ---
+   > rpc_workers = 8
+
+
+October 2019: Rootwrap
+----------------------
+
+Neutron was timing out because rootwrap was taking too long to spawn. 
+
+Fix: Run rootwrap daemon: 
+
+Add line to /etc/neutron/neutron.conf on the controllers: 
+
+root_helper_daemon = "sudo /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf" 
+ 
+Add line to /etc/sudoers.d/neutron_sudoers on the controllers: 
+
+neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf
--- a/doc/source/stories/index.rst
+++ b/doc/source/stories/index.rst
@ -0,0 +1,11 @@
+================
+ Scaling Stories
+================
+
+Contents:
+
+.. toctree::
+   :maxdepth: 2
+
+   stories
+   2020-01-29
--- a/doc/source/stories/stories.rst
+++ b/doc/source/stories/stories.rst
@ -0,0 +1,21 @@
+===========================
+Large Scale Scaling Stories
+===========================
+
+As part of its goal of further pushing back scaling limits within a given cluster, the Large Scale SIG collects scaling stories from OpenStack users.
+
+There is a size/load limit for single clusters past which things in OpenStack start to break, and we need to start using multiple clusters or cells to scale out. The SIG is interested in hearing:
+
+* what broke first for you, is it RabbitMQ or something else
+* what were the first symptoms
+* what size/load did it start to break
+* things you did to fix it
+
+This will be a great help to document expected limits, and identify where improvements should be focused.
+
+You can submit your story directly here, or on this `etherpad <https://etherpad.opendev.org/p/scaling-stories>`_.
+
+Stories
+-------
+
+* :doc:`2020-01-29-AlbertBraden <2020-01-29>`