Merge "Transfer large scale scaling stories to rst documentation"
This commit is contained in:
commit
09408fdda2
|
@ -10,6 +10,7 @@ Contents:
|
|||
readme
|
||||
journey/index
|
||||
contributor/index
|
||||
stories/index
|
||||
|
||||
Indices and tables
|
||||
==================
|
||||
|
|
|
@ -15,6 +15,5 @@ Contents:
|
|||
scale_up
|
||||
scale_out
|
||||
upgrade_and_maintain
|
||||
large_scale_scaling_stories
|
||||
|
||||
WIP: Transfer the content from https://wiki.openstack.org/wiki/Large_Scale_SIG
|
||||
|
|
|
@ -1,5 +0,0 @@
|
|||
===========================
|
||||
Large Scale Scaling Stories
|
||||
===========================
|
||||
|
||||
# WIP
|
|
@ -36,7 +36,7 @@ A: If you found out that your rabbitmq queue keep piling up for a certain servic
|
|||
Resources
|
||||
---------
|
||||
|
||||
* A curated collection of :doc:`scaling stories <large_scale_scaling_stories>`, as we collect them
|
||||
* A curated collection of :doc:`scaling stories <../stories/stories>`, as we collect them
|
||||
|
||||
* Evaluation of internal messaging
|
||||
|
||||
|
@ -70,4 +70,4 @@ Other SIG work on that stage
|
|||
|
||||
* Submit scaling stories on https://etherpad.openstack.org/p/scaling-stories
|
||||
|
||||
* Curate them on :doc:`scaling stories <large_scale_scaling_stories>`
|
||||
* Curate them on :doc:`scaling stories <../stories/stories>`
|
||||
|
|
|
@ -0,0 +1,70 @@
|
|||
===================================================
|
||||
Large Scale Scaling Stories/2020-01-29-AlbertBraden
|
||||
===================================================
|
||||
|
||||
Here are the scaling issues I've encountered recently at Synopsys, in reverse chronological order:
|
||||
|
||||
|
||||
Thursday 12/19/2019: openstack server list –all-projects does not return all VMs.
|
||||
---------------------------------------------------------------------------------
|
||||
|
||||
In /etc/nova/nova.conf we have default: # max_limit = 1000
|
||||
|
||||
The recordset cleanup script depends on correct output from “openstack server list –all-projects"
|
||||
|
||||
Fix: Increased max_limit to 2000
|
||||
|
||||
The recordset cleanup script will run “openstack server list –all-projects|wc –l" and compare the output to max_limit, and refuse to run if max_limit is too low. If this happens, increase max_limit so that it is greater than the number of VMs in the cluster.
|
||||
|
||||
As time permits we need to look into paging results: https://docs.openstack.org/api-guide/compute/paginated_collections.html
|
||||
|
||||
|
||||
Friday 12/13/2019: Arp table got full on pod2 controllers
|
||||
---------------------------------------------------------
|
||||
|
||||
https://www.cyberciti.biz/faq/centos-redhat-debian-linux-neighbor-table-overflow/
|
||||
|
||||
Fix: Increase sysctl values:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
--- a/roles/openstack/controller/neutron/tasks/main.yml
|
||||
+++ b/roles/openstack/controller/neutron/tasks/main.yml
|
||||
@@ -243,6 +243,9 @@
|
||||
with_items:
|
||||
- { name: 'net.bridge.bridge-nf-call-iptables', value: '1' }
|
||||
- { name: 'net.bridge.bridge-nf-call-ip6tables', value: '1' }
|
||||
+ - { name: 'net.ipv4.neigh.default.gc_thresh3', value: '4096' }
|
||||
+ - { name: 'net.ipv4.neigh.default.gc_thresh2', value: '2048' }
|
||||
+ - { name: 'net.ipv4.neigh.default.gc_thresh1', value: '1024' }
|
||||
|
||||
|
||||
12/10/2019: RPC workers were overloaded
|
||||
---------------------------------------
|
||||
|
||||
http://lists.openstack.org/pipermail/openstack-discuss/2019-December/011465.html
|
||||
|
||||
Fix: increase number of RPC workers. modify /etc/neutron/neutron.conf on controllers:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
148c148
|
||||
< #rpc_workers = 1
|
||||
---
|
||||
> rpc_workers = 8
|
||||
|
||||
|
||||
October 2019: Rootwrap
|
||||
----------------------
|
||||
|
||||
Neutron was timing out because rootwrap was taking too long to spawn.
|
||||
|
||||
Fix: Run rootwrap daemon:
|
||||
|
||||
Add line to /etc/neutron/neutron.conf on the controllers:
|
||||
|
||||
root_helper_daemon = "sudo /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf"
|
||||
|
||||
Add line to /etc/sudoers.d/neutron_sudoers on the controllers:
|
||||
|
||||
neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf
|
|
@ -0,0 +1,11 @@
|
|||
================
|
||||
Scaling Stories
|
||||
================
|
||||
|
||||
Contents:
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
stories
|
||||
2020-01-29
|
|
@ -0,0 +1,21 @@
|
|||
===========================
|
||||
Large Scale Scaling Stories
|
||||
===========================
|
||||
|
||||
As part of its goal of further pushing back scaling limits within a given cluster, the Large Scale SIG collects scaling stories from OpenStack users.
|
||||
|
||||
There is a size/load limit for single clusters past which things in OpenStack start to break, and we need to start using multiple clusters or cells to scale out. The SIG is interested in hearing:
|
||||
|
||||
* what broke first for you, is it RabbitMQ or something else
|
||||
* what were the first symptoms
|
||||
* what size/load did it start to break
|
||||
* things you did to fix it
|
||||
|
||||
This will be a great help to document expected limits, and identify where improvements should be focused.
|
||||
|
||||
You can submit your story directly here, or on this `etherpad <https://etherpad.opendev.org/p/scaling-stories>`_.
|
||||
|
||||
Stories
|
||||
-------
|
||||
|
||||
* :doc:`2020-01-29-AlbertBraden <2020-01-29>`
|
Loading…
Reference in New Issue