538 Commits

Author SHA1 Message Date
Zuul
809fd6003b Merge "Fix minor error in extarq job" 2020-03-10 02:53:57 +00:00
chenke
4c4b732f4e Fix minor error in extarq job
The patch fix the var need_job init value to False,
In some case, The None will cause some errors that
are not easy to be detected.

Change-Id: Ic4eac2fbc274fc5dfe9b2f4b796888b96bd78d0c
Story: 2007352
Task: 38932
2020-03-05 14:22:08 +08:00
Zuul
6dab512af2 Merge "Revert "Solve py37 timeout"" 2020-03-05 06:07:37 +00:00
Zuul
2b5bee4afb Merge "add testcases for async job bind" 2020-03-05 06:07:36 +00:00
Zuul
ae7757c3bc Merge "change default SimpleQueue to _PySimpleQueue for queue" 2020-03-05 06:07:35 +00:00
Zuul
6890799fe4 Merge "add testcase for check_bindings_result failed" 2020-03-05 06:05:55 +00:00
Zuul
17119a07e3 Merge "UT for job manager" 2020-03-05 06:05:54 +00:00
Zuul
29e005678a Merge "move setting to devstack/settings" 2020-03-03 07:14:47 +00:00
Zuul
e9db6e8797 Merge "Remove useless interfaces in cond" 2020-03-03 02:07:19 +00:00
Zuul
7ecc9c08d9 Merge "Add obj_make_compatible()" 2020-03-02 16:35:36 +00:00
Sean Mooney
985b15287e move setting to devstack/settings
All user settable options should be stored the
devstack/setting file so they are defiend when it is
sourced early in devstack to allow values to be shared
between plugins if required.

This change moves the cyborg settings from
devstack/lib/cyborg to devstack/settings to conform to
the standard plugin interface.

The name of the folder where devstack clones the plugin
is specified in the first argument ot the enable_plugin
function invocation in the local.conf.
This change makes updates CYBORG_DIR to respect that
and adds TODOs for other issues that will be adressed in
follow up patches.

Change-Id: I5b6879e5ddb86659b8c7eb87b8d26cee33ed4754
2020-03-02 15:02:17 +00:00
Zuul
3290719ca7 Merge "add support for multi node deployments to fake driver" 2020-03-02 10:57:33 +00:00
chenke
83936803d3 Remove useless interfaces in cond
These APIs will not be used and will cause us
some misunderstandings.

Change-Id: I26a6debc76405ec42c3b23aae4752d450cec4eb0
Story: 38902
2020-03-02 11:10:08 +08:00
zhangbailin
9831730208 Add obj_make_compatible()
This adds a method to CyborgObject that allows it to convert itself
to and older version, within a compatibility window. So, if an object
had a revision that added or changed the formatting of an attribute,
the obj_make_compatible() method can fix up a primitive representation
before it is sent to a client expecting the older version.

Partial-Implements: blueprint add-description-field-to-device-profiles

Change-Id: I196629059bc32165f161fe9c071a339d63d71c10
2020-02-28 14:20:21 +00:00
zhangbailin
3c7e0868e6 Delete sandbox directory
This directory is not necessary exist in Cyborg project.

Change-Id: I845256168f60cd27522a7ae821bcaa50e1e218f5
2020-02-27 04:28:01 +00:00
Sean Mooney
eb12f68421 add support for multi node deployments to fake driver
This change alters the fake driver to include the hostname
in the deployable list so that each host in a multi node deployment
will have a unique placment RP name.

Change-Id: Ib0e202cac8af5ef7c5028c22dc0654911eb730f5
2020-02-24 23:16:13 +00:00
Shaohe Feng
d79ef4d616 Revert "Solve py37 timeout"
Still not find a good solution.
This reverts commit 08af6012710fc9bbbec0f7936abd701d4eccf7db.

The whole history of timeout can be 3 stage:
1. After we enable py37 there are some random timeout as follow:
https://review.opendev.org/#/c/679406/  On Sep 19 11:30 AM
https://review.opendev.org/#/c/688239/  On Oct 12 3:31 PM
https://review.opendev.org/#/c/688231/  On Oct 28 11:38 PM,
  Oct 29 9:11 AM, Oct 29 12:05 PM
https://review.opendev.org/#/c/685542/  On Nov 14 12:31 AM,
   Nov 14 7:42 AM
https://review.opendev.org/#/c/691872/  On Oct 29 11:55 PM
https://review.opendev.org/#/c/690509/  On Nov 15 4:33 PM

2. After this https://review.opendev.org/#/c/688593/ timeout Disappeared
for a while. This patch is merged on Nov 19 5:05 PM.
This patch set a wrong python path evn.

3. After another patch https://review.opendev.org/#/c/696397, timeout
comes out again.
The intention of this patch is to suppress confusing pep8 message, but
it also fix python path evn unintentionally.

The thread pool lib always loop to check if there are new jobs.
The action of testunit framework in py37 is different with py36. If
there are thread pool run, the testunit will never return.

you can a simple test as follow in this file:
def job1(t1=0):
    print(time.time())
    print("sleep %s second" % t1)
    time.sleep(t1)
    print(time.time())
    return "Hello, world"

class TestExtARQObject(base.DbTestCase):
    def test_foo(self):
        print("Test Foo")

    def test_bar(self):
        print("Test bar")

    def test_apply_patch_fpga_arq_monitor_job(self):
        works = utils.ThreadWorks()
        job = works.spawn(job1, 1)
        return job

Change-Id: I398db324563ecdb6e8fe0abb86fd02c1336b467f
2020-02-20 01:23:42 +00:00
Shaohe Feng
1345beb920 add testcases for async job bind
Change-Id: I9c7ecb1c1556751c2517a9f6e967885fd2088f87
2020-02-20 01:22:34 +00:00
Dan Smith
1e1b2693aa Fix exceptions defined with improper _msg_fmt
Many exceptions are defined in such a way that they will not render properly
when stringified. This is because instead of _msg_fmt, they used msg_fmt
or message in the class definition.

This fixes those and adds a test which I used to find all the offenders.

Closes task: #38817

Change-Id: I085ef5b0197b76b7b53639610f62b615fb538983
2020-02-19 11:54:57 -08:00
Dan Smith
d279c22d1e Avoid creating a root provider when parent is not found
Before this change, when agent called to conductor to report_data(),
if the parent provider was not found by hostname, we would log an error,
and then continue to create the "child" provider with no parent. We should
never do this if we are supposed to have a parent. Cleanup from this
situation is also messy.

This makes us raise PlacementResourceProviderNotFound() in that case,
which aborts the report and thus does not create the provider incorrectly.
It also makes the agent catch that exception and moves the log message
to the agent where the actual problem is (i.e. likely misconfiguration).

The exception used here is actually defined incorrectly, having a message
class variable instead of _msg_fmt, which caused it to not render properly.
This fixes that along the way and adds tests for the new conductor and
agent behaviors.

Closes task: 38813
Closes task: 38814

Change-Id: Ied8ee91592eb0b4675f9c155e30a6c3a7df9b597
2020-02-19 11:40:42 -08:00
Shaohe Feng
16ac71928d change default SimpleQueue to _PySimpleQueue for queue
py3.7 deadlock with monkey patch of stdlib thread modules + use of
ThreadPoolExecutor

3.7, ThreadPoolExecuter was changed to use queue.SimpleQueue; on 3.6 it
uses queue.Queue. No issue in 3.6

The python37 default SimpleQueue was implemented with C code, which will
cause hang the main thread.

Commit 2a71b8e introduce eventlet for unittest, but I did not find
why we need it.
Check other project they also use ThreadPoolExecutor, but not all
of them introduce eventlet.

Maybe GreenThreadPoolExecutor is another solution.

Ref:
ThreadPoolExecutor
https://docs.python.org/3.7/library/concurrent.futures.html#threadpoolexecutor

Same isses about ThreadPoolExecutor and eventlet
https://bugs.launchpad.net/designate/+bug/1782647
https://github.com/eventlet/eventlet/issues/508
https://github.com/gevent/gevent/pull/1253
https://github.com/ClericPy/torequests/issues/10
https://bugs.python.org/issue34173
https://github.com/gevent/gevent/issues/1248
https://github.com/gevent/gevent/issues/1251
http://lists.openstack.org/pipermail/openstack-dev/2018-July/132473.html
https://www.gitmemory.com/issue/eventlet/eventlet/508/511101004

Change-Id: I263f7222119d8d0c89cfc2a4758fe376ce1afd60
2020-02-19 16:04:55 +00:00
Shaohe Feng
b123b100c0 add testcase for check_bindings_result failed
Change-Id: Ib96686557c38e37c0bbeb50e141ff55284403786
2020-02-18 15:17:52 +00:00
Shaohe Feng
39b33b44ce UT for job manager
For we are consider to use Multiprocessing, so just submit partial of
UTs.

Other UTs will TBD.

Also fix some bugs when write UTs.

Also cover the follow Py3 compatibility:
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/mock/mock.py", line 1330, in patched
    return func(*args, **keywargs)
  File "/opt/stack/cyborg/cyborg/tests/unit/objects/test_ext_arq_job.py", line 293, in test_job_monitor
    objects.ext_arq.ExtARQ.job_monitor(self.context, works_generator, extarqs)
  File "/opt/stack/cyborg/cyborg/common/utils.py", line 432, in _impl
    LOG.error(msg, e.message)
AttributeError: 'AttributeError' object has no attribute 'message'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/mock/mock.py", line 1330, in patched
    return func(*args, **keywargs)
  File "/opt/stack/cyborg/cyborg/tests/unit/objects/test_ext_arq_job.py", line 293, in test_job_monitor
    objects.ext_arq.ExtARQ.job_monitor(self.context, works_generator, extarqs)
  File "/opt/stack/cyborg/cyborg/common/utils.py", line 430, in _impl
    output = method(self, *args, **kwargs)
  File "/opt/stack/cyborg/cyborg/objects/extarq/ext_arq_job.py", line 154, in job_monitor
    for _, (exc, tb), _, err in works_generator:
  File "/opt/stack/cyborg/cyborg/common/utils.py", line 333, in future_iterator
    f.exception_info(), f._result, f._state, e.message)
AttributeError: 'Future' object has no attribute 'exception_info'

Change-Id: Ibbcaac060b365cf3366e3b408a385d613b006dff
2020-02-18 15:16:06 +00:00
Zuul
e0ba01891f Merge "Update gpu driver" 2020-02-17 02:59:14 +00:00
Zuul
4d0a3d19f4 Merge "Solve py37 timeout" 2020-02-16 11:42:32 +00:00
chenke
08af601271 Solve py37 timeout
Py37 job always reports timeout error recently.
Please see [1] [2] [3].
At first it was suspected that the error was reported
because of the patch [4].
Therefore, Feng Shaohe's patch [5] revoked the merge,
and at this time, disappeared at py37 timeout.

But in fact, this problem is just hidden.
After removing this setting, the job of py37
is actually running on the environment of python 3.6
(community CI default version is 3.6), please see [6]
for detailed reasons.

Therefore, this patch exposes the hidden py37 timeout problem,
and at the same time, found method test_apply_patch_fpga_arq_monitor_job
, think it is the reason of the timeout. The reason I can find
this method is based on the the troubleshooting of tox -epy37 log.
After commenting out this method, I found that tox -epy37 can run
normally and there is no longer a timeout problem.

If you want to test, please ensure that you have a local
python3.7 environment, not 3.6, and execute rm .tox / -rf.
Then execute tox -epy37.

Therefore, the best way is to comment out this method and
restore py37 job at the same time.

If a friend discovers further reasons and solution, this method
can be restored, please refer to [7].

What went wrong in this method?
It is because in the deep call of this method, ThreadWork of
the thread pool will be used, which under Python3.7 will block
the execution of unit tests. For specific reasons, please see
[8] [9].

Reference:

[1]. https://review.opendev.org/#/c/702578/
[2]. https://review.opendev.org/#/c/703049/
[3]. https://review.opendev.org/#/c/703253/
[4]. https://review.opendev.org/#/c/696397/
[5]. https://review.opendev.org/#/c/706911/
[6]. http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2020-02-12.log.html#t2020-02-12T16:46:18
[7]. deed9c822e
[8]. https://review.opendev.org/#/c/707045/5//COMMIT_MSG
[9]. c61dd8c376/cyborg/objects/extarq/ext_arq_job.py (L41)

Change-Id: I09db889fe665c6246ec9503af92c909e7d0da24f
2020-02-14 19:49:34 +08:00
Zuul
c61dd8c376 Merge "Remove useless attributes list in Deployable" 2020-02-13 07:46:46 +00:00
Zuul
ffb848cee1 Merge "Improve UT for cyborg/db deployable" 2020-02-13 07:29:00 +00:00
Zuul
2ccc478986 Merge "Remove the invalid specs from doc/source" 2020-02-13 07:05:03 +00:00
Zuul
deed9c822e Merge "Fix warning in logs that '' is not a valid UUID." 2020-02-12 14:28:31 +00:00
Zuul
b8c74d68a8 Merge "Send a separate bind event to Nova for each ARQ in an instance." 2020-02-12 14:25:59 +00:00
Zuul
990e030983 Merge "Some bug fixes in async bind path." 2020-02-12 14:12:00 +00:00
Zuul
a6b5e8f0e3 Merge "bugs fix for compatibility issues between Py2 and Py3" 2020-02-12 14:12:00 +00:00
Zuul
1718526769 Merge "Improve UT for cyborg/db attach handle" 2020-02-12 13:15:30 +00:00
Zuul
41ef7a0dd8 Merge "Use ResourceNotFound replace ControlpathIDNotFound" 2020-02-12 12:55:55 +00:00
Zuul
b6c6c15cd7 Merge "Improve UT for cyborg/db ExtArq" 2020-02-12 12:55:51 +00:00
chenke
12c448cba3 Use ResourceNotFound replace ControlpathIDNotFound
This is a series of optimization for exception.

In fact, we only need to use the ResourceNotFound exception
to fit NotFound Exception.

More UT for control path such as:
get,list,create,delete will be added in the future.

Change-Id: I740eb28184b434583b58f10d2bf3e5e4621c43d4
Story: 2007045
Task: 38318
2020-02-12 09:38:43 +00:00
chenke
cb1b3ee651 Improve UT for cyborg/db ExtArq
This patch add some UT for ExtArq:
1. get
2. update
3. create
4. list
5. delete

Change-Id: I8f0d15d8c34f1eb77366d6021e465fcebd1be406
Story: 2007091
Task: 38133
2020-02-12 09:38:31 +00:00
chenke
e014259ac5 Remove useless attributes list in Deployable
The attribute and deployable tables have their separate
tables. We should remove the attributes from the
deployable object.

Change-Id: I1be185a6bce2ae90eca244b21b207a22e5a92044
Story: 2007182
Task: 38303
2020-02-12 09:38:17 +00:00
chenke
e7c6783858 Improve UT for cyborg/db deployable
This patch add some UT for deployable:
1. get
2. update
3. create
4. list
5. delete

Change-Id: I39b3f02e898b67e4d4eb686b5a6cf9065c6280de
Story: 2007091
Task: 38141
2020-02-12 09:38:06 +00:00
chenke
078014c053 Improve UT for cyborg/db device
This patch add some UT for device:
1. get
2. update
3. create
4. list
5. delete

Change-Id: Id66ba6f1442f87a0f8fb9644e45e147cc77a4f5e
Story: 2007091
Task: 38121
2020-02-12 09:37:53 +00:00
chenke
e2e1e3f156 Improve UT for cyborg/db attach handle
This patch add some UT for attach handle:
1. get
2. update
3. create
4. list
5. delete
6. allocate

Change-Id: I5e683c99d1e08ed6a166a110a87b665cdbc5bde3
Story: 2007091
Task: 38161
2020-02-12 09:37:40 +00:00
zhangbailin
aa2aa69e34 Remove the invalid specs from doc/source
The specs directory in Cyborg is not update, and we have the
Cyborg specifications in https://specs.openstack.org/openstack/cyborg-specs/,
so remove this directory in Cyborg, to reduce Cyborg maintenance costs.

Change-Id: Iebcbf2ebd6da3bc51e85c62f18c547909026c2f0
2020-02-12 15:31:04 +08:00
Sundar Nadathur
c87c232129 Fix warning in logs that '' is not a valid UUID.
Change-Id: I269554030908c1084f61a3d401524139a3735f28
2020-02-11 16:27:39 -08:00
Sundar Nadathur
d3648dccef Send a separate bind event to Nova for each ARQ in an instance.
This is based on discussion with Nova community. See:
https://review.opendev.org/#/c/692707/6/nova/objects/external_event.py@36

Each event has a unique tag, i.e. the ARQ UUID, and the
bind status for that ARQ.

Each ARQ has its own state. However, the bind status sent to Nova
should be 'completed' or 'failed'. The logic to do that conversion
should not be in nova_client.py, to keep it free of ARQ state details.
So it has been added in get_arq_bind_status() in ext_arq_job.py.

Change-Id: Iddbf9a77196fc42ac82ad1f6d88a4b0732852463
2020-02-11 16:22:34 -08:00
Sundar Nadathur
107cc7ea81 Some bug fixes in async bind path.
Change-Id: I5d575046e2be38f3bdd5d3f9c9495db121d8a05d
2020-02-10 23:36:11 -08:00
Shaohe Feng
e4dfc6f4bd bugs fix for compatibility issues between Py2 and Py3
Change-Id: I745eb4e28871fa0b554852831e5f4105ea677c27
2020-02-10 23:36:11 -08:00
Shaohe Feng
5b6f26abb8 Guess for the root cause of timeout
Change-Id: I877794c738f3c6ec09e9f83476b1f91096447afa
2020-02-10 23:36:10 -08:00
zhangbailin
acbc64f3be Enhance the db layer to verify filters
Now if we init filters=None, as call
dbapi.device_profile_list_by_filters(self.context, filters=None),
that will raise an NoneType error.

Mainly error info:
Traceback (most recent call last):
  File "/home/my_work/code/cyborg/cyborg/db/sqlalchemy/api.py", line
558, in device_profile_list_by_filters
    filters, exact_match_filter_names)
  File "/home/my_work/code/cyborg/cyborg/db/sqlalchemy/api.py", line
223, in _exact_filter
    if key not in filters:
TypeError: argument of type 'NoneType' is not iterable

This patch will add initial validation of the filters.

Change-Id: Icf711dc3621fb8d2e5b022ab1d1ce02b0885b055
2020-02-09 03:15:44 +00:00
Zuul
70bc4b89a4 Merge "Improve UT for cyborg/db device profile" 2020-02-08 17:41:41 +00:00