Acceleration Management
Go to file
Shaohe Feng d0578cff6e async job for bind
The currently design:
  Only support bind async.
  Only state in ['Initial','Unbound'], bind process can start.
  If a bind in process, it will be not stoped by deleting or unbind api.
  But the last state will be deleting or unbind.
  For multi jobs (such as user request 2 FPGAs), only all jobs are success,
  the ARQ state is "Bound". Any jobs failed, the ARQ state will be "BindFailed"
  Do not support rollback for update_provider
  Do not support multi thread in agent.
  If bind successfully, the info level log will show details as follow:
    1. All ARQs ['1d9f17a6-0b94-4838-ae55-cb49f5727a81'] async bind jobs
    has finished.
    2. Attach handle(d6442f0a-21f6-4161-bb74-45c66ba1f345) for
    ARQ(1d9f17a6-0b94-4838-ae55-cb49f5727a81) successfully.
    3. Update ARQ 1d9f17a6-0b94-4838-ae55-cb49f5727a81 state to "Bound"
    successfully.
  If bind jobs timeout, it will output log as follow:
    ERROR cyborg.accelerator.common.handler [-] Traceback (most recent call last):
      File "./cyborg/common/utils.py", line 283, in future_iterator
        yield (f.result(end_time - time.time()),
      File "/usr/local/lib/python2.7/dist-packages/concurrent/futures/_base.py", line 464, in result
        raise TimeoutError()
    TimeoutError

Still too many opens for async job.

Test this patch:
1. Get auth token
AUTH="X-Auth-Token: $(openstack token issue -c id -f value)"

2. Get resource provider info
RP_UUID=`openstack resource provider list -f value | \
    grep intel-fpga-dev |head -n 1 |awk '{print $1}'`
ARRAY=`OS_PLACEMENT_API_VERSION=1.18 openstack resource provider trait \
    list $RP_UUID -c name -f value`
DEVICE_TRAIT=`head -n 1 <<< $ARRAY`
FN_TRAIT=`grep FUNCTION_ID <<< $ARRAY`
FUN_ID=${FN_TRAIT##*_}
REGION_TRAIT=`grep FPGA_REGION <<< $ARRAY`
REGION_TYPE_ID=${REGION_TRAIT##*_}

3. upload a image
BIT_NAME=fpga_bts_1
touch samples.tar.gz
BIT_FILE=samples.tar.gz
openstack image create --tag FPGA \
    --property REGION_TYPE_ID="accel:region_type_id=${REGION_TYPE_ID}" \
    --property accel:function_id=$FUN_ID --file $BIT_FILE $BIT_NAME
BITSTREAM_ID=`openstack image list -c ID -c Name -f value | \
    grep $BIT_NAME |awk '{print $1}'`
openstack image list -c ID -f value --property accel:function_id=$FUN_ID

4. create device profile
CYURL=`openstack endpoint list --service cyborg \
    --interface "public" -c "URL" -f value`
CTYPE="Content-Type: application/json"
DEVPROF_NAME=afaas_example_1
$RESOURCES=FPGA
BODY="[{ \"name\": \"$DEVPROF_NAME\",
  \"groups\": [
      {\"resources:$RESOURCES\": \"1\",
       \"trait:$DEVICE_TRAIT\": \"required\",
       \"trait:$FN_TRAIT\": \"required\",
       \"accel:bitstream_id\": \"$BITSTREAM_ID\"
      }
  ]
}]"
curl -s -H "$CTYPE" -H "$AUTH" POST -d "$BODY" $CYURL/device_profiles

5. create an ARQ
BODY="{\"device_profile_name\": \"$DEVPROF_NAME\"}"
ARQ_UUID=`curl -s -H "$CTYPE" -H "$AUTH" $CYURL/accelerator_requests | \
    grep '"uuid": ".*?"' -oP |cut -d '"' -f 4 | tail -n 1`
ARQ_UUID=`curl -s -H "$CTYPE" -H "$AUTH" POST -d "$BODY" \
    $CYURL/accelerator_requests | grep '"uuid": ".*?"' -oP |cut -d '"' -f 4`

6. bind ARQ
NODENAME=`openstack hypervisor list -c "Hypervisor Hostname" -f value`
PR_UUID=`openstack resource provider list -f value | \
    grep intel-fpga-dev |head -n 1 |awk '{print $1}'`
INS_UUID=`uuidgen`
BODY="{\"$ARQ_UUID\": [
     {\"path\": \"/hostname\", \"op\": \"add\", \"value\": \"$NODENAME\"},
     {\"path\": \"/device_rp_uuid\", \"op\": \"add\",
            \"value\": \"$PR_UUID\"},
     {\"path\": \"/instance_uuid\", \"op\": \"add\", \"value\": \"$INS_UUID\"}
  ]
}"
curl -s -H "$CTYPE" -H "$AUTH" -X PATCH -d "$BODY" \
    -w "%{http_code}\n" $CYURL/accelerator_requests

7. Monitor notify
ubuntu:
  $ apt-get install httpry
centos:
  $ yum install httpry
$ NOVAURL=`openstack endpoint list --service nova \
    --interface "public" -c "URL" -f value`
$ HOST=${NOVAURL##*//}
$ HOST=${HOST%%/*}
All in One evn:
  $ sudo httpry -i lo -m POST  "tcp dst port 80 and src host $HOST"
Multi host evn:
  $ sudo httpry -m POST  "tcp dst port 80 and src host $HOST"

you can also check the log.
if anything wrong, it will output log as follow:
2019-09-14 14:21:12.353 41349 ERROR cyborg.common.utils [-] Traceback (most recent call last):
  File "./cyborg/common/utils.py", line 387, in _impl
    output = method(self, *args, **kwargs)
  File "./cyborg/accelerator/common/handler.py", line 234, in all_arq_bind_status_sync
  File "./cyborg/accelerator/common/handler.py", line 202, in bind_notify
  File "./cyborg/common/nova_client.py", line 56, in notify_binding
    result = self._send_events(events)
  File "./cyborg/common/nova_client.py", line 50, in _send_events
    (events, response.status_code, response.text))
Exception: Failed to send events [{'status': 'completed', 'tag': u'afaas_example_1', 'name': 'accelerator-requests-bound', 'server_uuid': u'0be7e70f-710d-4ecf-b480-911c53b565cf'}]: HTTP 404: {"itemNotFound": {"message": "No instances found for any event", "code": 404}}

Change-Id: I4e7d14e271aa26f19da3605b20cb4fd72eef4312
2019-09-27 15:06:47 +00:00
api-ref/source Add deployables to api-ref 2019-04-02 01:45:04 -07:00
cyborg async job for bind 2019-09-27 15:06:47 +00:00
devstack Merge "Fix v1 API." 2019-09-26 13:51:32 +00:00
doc Fix docs gate issue 2019-08-22 19:38:10 +09:00
etc/cyborg remove rootwrap in cyborg 2019-09-24 00:30:25 -07:00
releasenotes Implement privsep boilerplate in cyborg. 2019-09-25 19:22:12 -07:00
sandbox Setup sandbox and specs folder 2017-03-14 01:16:33 +08:00
setup Cyborg deployment script 2017-08-30 09:27:56 -04:00
tools/config Fix tox -egenconfig 2018-06-10 16:37:40 +08:00
.gitignore Notify Nova when all ARQs are resolved for an instance. 2019-09-11 07:51:31 -07:00
.gitreview OpenDev Migration Patch 2019-04-19 19:39:22 +00:00
.stestr.conf Switch to stestr 2018-07-24 15:10:52 +07:00
.zuul.yaml [train][goal] Run 'cyborg-tempest-ipv6-only' job in gate 2019-09-23 11:34:38 -07:00
babel.cfg initial setup "correct tox.ini testr and test-requirement.txt requirement 2016-01-18 14:29:39 +08:00
bindep.txt Add bindep support 2019-08-05 14:31:35 +08:00
CONTRIBUTING.rst Add gpu driver 2019-03-21 18:01:29 +08:00
HACKING.rst Add doc8 to pep8 check for cyborg project 2018-05-10 09:32:01 +07:00
LICENSE initial setup "correct tox.ini testr and test-requirement.txt requirement 2016-01-18 14:29:39 +08:00
README.rst Update README.rst for cyborg 2019-09-03 14:07:28 +08:00
requirements.txt Implement privsep boilerplate in cyborg. 2019-09-25 19:22:12 -07:00
setup.cfg remove rootwrap in cyborg 2019-09-24 00:30:25 -07:00
setup.py uncap eventlet 2018-05-07 17:42:19 +03:00
test-requirements.txt Placement report 2019-09-04 08:43:28 +00:00
tox.ini P8: Fix pep8 error in cyborg/tests and add post_mortem_debug.py 2019-09-18 16:24:05 +08:00

Cyborg

OpenStack Acceleration as a Service

Cyborg provides a general management framework for accelerators such as FPGA, GPU, SoCs, NVMe SSDs, CCIX caches, DPDK/SPDK, pmem and so forth.

Features

  • REST API for basic accelerator life cycle management
  • Generic driver for common accelerator support