Transfer large scale scaling stories to rst documentation

Signed-off-by: Ramona Rautenberg <rautenberg@osism.tech>
Change-Id: Ie5bb8c7e09ed49f633290cc789b874107676f9e4
This commit is contained in:
Ramona Rautenberg 2022-07-01 15:04:56 +02:00
parent 97066f8399
commit 6698cddaaf
7 changed files with 105 additions and 8 deletions

View File

@ -10,6 +10,7 @@ Contents:
readme
journey/index
contributor/index
stories/index
Indices and tables
==================

View File

@ -15,6 +15,5 @@ Contents:
scale_up
scale_out
upgrade_and_maintain
large_scale_scaling_stories
WIP: Transfer the content from https://wiki.openstack.org/wiki/Large_Scale_SIG

View File

@ -1,5 +0,0 @@
===========================
Large Scale Scaling Stories
===========================
# WIP

View File

@ -36,7 +36,7 @@ A: If you found out that your rabbitmq queue keep piling up for a certain servic
Resources
---------
* A curated collection of :doc:`scaling stories <large_scale_scaling_stories>`, as we collect them
* A curated collection of :doc:`scaling stories <../stories/stories>`, as we collect them
* Evaluation of internal messaging
@ -70,4 +70,4 @@ Other SIG work on that stage
* Submit scaling stories on https://etherpad.openstack.org/p/scaling-stories
* Curate them on :doc:`scaling stories <large_scale_scaling_stories>`
* Curate them on :doc:`scaling stories <../stories/stories>`

View File

@ -0,0 +1,70 @@
===================================================
Large Scale Scaling Stories/2020-01-29-AlbertBraden
===================================================
Here are the scaling issues I've encountered recently at Synopsys, in reverse chronological order:
Thursday 12/19/2019: openstack server list all-projects does not return all VMs.
---------------------------------------------------------------------------------
In /etc/nova/nova.conf we have default: # max_limit = 1000
The recordset cleanup script depends on correct output from “openstack server list all-projects"
Fix: Increased max_limit to 2000
The recordset cleanup script will run “openstack server list all-projects|wc l" and compare the output to max_limit, and refuse to run if max_limit is too low. If this happens, increase max_limit so that it is greater than the number of VMs in the cluster.
As time permits we need to look into paging results: https://docs.openstack.org/api-guide/compute/paginated_collections.html
Friday 12/13/2019: Arp table got full on pod2 controllers
---------------------------------------------------------
https://www.cyberciti.biz/faq/centos-redhat-debian-linux-neighbor-table-overflow/
Fix: Increase sysctl values:
.. code-block:: console
--- a/roles/openstack/controller/neutron/tasks/main.yml
+++ b/roles/openstack/controller/neutron/tasks/main.yml
@@ -243,6 +243,9 @@
with_items:
- { name: 'net.bridge.bridge-nf-call-iptables', value: '1' }
- { name: 'net.bridge.bridge-nf-call-ip6tables', value: '1' }
+ - { name: 'net.ipv4.neigh.default.gc_thresh3', value: '4096' }
+ - { name: 'net.ipv4.neigh.default.gc_thresh2', value: '2048' }
+ - { name: 'net.ipv4.neigh.default.gc_thresh1', value: '1024' }
12/10/2019: RPC workers were overloaded
---------------------------------------
http://lists.openstack.org/pipermail/openstack-discuss/2019-December/011465.html
Fix: increase number of RPC workers. modify /etc/neutron/neutron.conf on controllers:
.. code-block:: console
148c148
< #rpc_workers = 1
---
> rpc_workers = 8
October 2019: Rootwrap
----------------------
Neutron was timing out because rootwrap was taking too long to spawn.
Fix: Run rootwrap daemon:
Add line to /etc/neutron/neutron.conf on the controllers:
root_helper_daemon = "sudo /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf"
Add line to /etc/sudoers.d/neutron_sudoers on the controllers:
neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf

View File

@ -0,0 +1,11 @@
================
Scaling Stories
================
Contents:
.. toctree::
:maxdepth: 2
stories
2020-01-29

View File

@ -0,0 +1,21 @@
===========================
Large Scale Scaling Stories
===========================
As part of its goal of further pushing back scaling limits within a given cluster, the Large Scale SIG collects scaling stories from OpenStack users.
There is a size/load limit for single clusters past which things in OpenStack start to break, and we need to start using multiple clusters or cells to scale out. The SIG is interested in hearing:
* what broke first for you, is it RabbitMQ or something else
* what were the first symptoms
* what size/load did it start to break
* things you did to fix it
This will be a great help to document expected limits, and identify where improvements should be focused.
You can submit your story directly here, or on this `etherpad <https://etherpad.opendev.org/p/scaling-stories>`_.
Stories
-------
* :doc:`2020-01-29-AlbertBraden <2020-01-29>`