Acceleration Management
d0578cff6e
The currently design: Only support bind async. Only state in ['Initial','Unbound'], bind process can start. If a bind in process, it will be not stoped by deleting or unbind api. But the last state will be deleting or unbind. For multi jobs (such as user request 2 FPGAs), only all jobs are success, the ARQ state is "Bound". Any jobs failed, the ARQ state will be "BindFailed" Do not support rollback for update_provider Do not support multi thread in agent. If bind successfully, the info level log will show details as follow: 1. All ARQs ['1d9f17a6-0b94-4838-ae55-cb49f5727a81'] async bind jobs has finished. 2. Attach handle(d6442f0a-21f6-4161-bb74-45c66ba1f345) for ARQ(1d9f17a6-0b94-4838-ae55-cb49f5727a81) successfully. 3. Update ARQ 1d9f17a6-0b94-4838-ae55-cb49f5727a81 state to "Bound" successfully. If bind jobs timeout, it will output log as follow: ERROR cyborg.accelerator.common.handler [-] Traceback (most recent call last): File "./cyborg/common/utils.py", line 283, in future_iterator yield (f.result(end_time - time.time()), File "/usr/local/lib/python2.7/dist-packages/concurrent/futures/_base.py", line 464, in result raise TimeoutError() TimeoutError Still too many opens for async job. Test this patch: 1. Get auth token AUTH="X-Auth-Token: $(openstack token issue -c id -f value)" 2. Get resource provider info RP_UUID=`openstack resource provider list -f value | \ grep intel-fpga-dev |head -n 1 |awk '{print $1}'` ARRAY=`OS_PLACEMENT_API_VERSION=1.18 openstack resource provider trait \ list $RP_UUID -c name -f value` DEVICE_TRAIT=`head -n 1 <<< $ARRAY` FN_TRAIT=`grep FUNCTION_ID <<< $ARRAY` FUN_ID=${FN_TRAIT##*_} REGION_TRAIT=`grep FPGA_REGION <<< $ARRAY` REGION_TYPE_ID=${REGION_TRAIT##*_} 3. upload a image BIT_NAME=fpga_bts_1 touch samples.tar.gz BIT_FILE=samples.tar.gz openstack image create --tag FPGA \ --property REGION_TYPE_ID="accel:region_type_id=${REGION_TYPE_ID}" \ --property accel:function_id=$FUN_ID --file $BIT_FILE $BIT_NAME BITSTREAM_ID=`openstack image list -c ID -c Name -f value | \ grep $BIT_NAME |awk '{print $1}'` openstack image list -c ID -f value --property accel:function_id=$FUN_ID 4. create device profile CYURL=`openstack endpoint list --service cyborg \ --interface "public" -c "URL" -f value` CTYPE="Content-Type: application/json" DEVPROF_NAME=afaas_example_1 $RESOURCES=FPGA BODY="[{ \"name\": \"$DEVPROF_NAME\", \"groups\": [ {\"resources:$RESOURCES\": \"1\", \"trait:$DEVICE_TRAIT\": \"required\", \"trait:$FN_TRAIT\": \"required\", \"accel:bitstream_id\": \"$BITSTREAM_ID\" } ] }]" curl -s -H "$CTYPE" -H "$AUTH" POST -d "$BODY" $CYURL/device_profiles 5. create an ARQ BODY="{\"device_profile_name\": \"$DEVPROF_NAME\"}" ARQ_UUID=`curl -s -H "$CTYPE" -H "$AUTH" $CYURL/accelerator_requests | \ grep '"uuid": ".*?"' -oP |cut -d '"' -f 4 | tail -n 1` ARQ_UUID=`curl -s -H "$CTYPE" -H "$AUTH" POST -d "$BODY" \ $CYURL/accelerator_requests | grep '"uuid": ".*?"' -oP |cut -d '"' -f 4` 6. bind ARQ NODENAME=`openstack hypervisor list -c "Hypervisor Hostname" -f value` PR_UUID=`openstack resource provider list -f value | \ grep intel-fpga-dev |head -n 1 |awk '{print $1}'` INS_UUID=`uuidgen` BODY="{\"$ARQ_UUID\": [ {\"path\": \"/hostname\", \"op\": \"add\", \"value\": \"$NODENAME\"}, {\"path\": \"/device_rp_uuid\", \"op\": \"add\", \"value\": \"$PR_UUID\"}, {\"path\": \"/instance_uuid\", \"op\": \"add\", \"value\": \"$INS_UUID\"} ] }" curl -s -H "$CTYPE" -H "$AUTH" -X PATCH -d "$BODY" \ -w "%{http_code}\n" $CYURL/accelerator_requests 7. Monitor notify ubuntu: $ apt-get install httpry centos: $ yum install httpry $ NOVAURL=`openstack endpoint list --service nova \ --interface "public" -c "URL" -f value` $ HOST=${NOVAURL##*//} $ HOST=${HOST%%/*} All in One evn: $ sudo httpry -i lo -m POST "tcp dst port 80 and src host $HOST" Multi host evn: $ sudo httpry -m POST "tcp dst port 80 and src host $HOST" you can also check the log. if anything wrong, it will output log as follow: 2019-09-14 14:21:12.353 41349 ERROR cyborg.common.utils [-] Traceback (most recent call last): File "./cyborg/common/utils.py", line 387, in _impl output = method(self, *args, **kwargs) File "./cyborg/accelerator/common/handler.py", line 234, in all_arq_bind_status_sync File "./cyborg/accelerator/common/handler.py", line 202, in bind_notify File "./cyborg/common/nova_client.py", line 56, in notify_binding result = self._send_events(events) File "./cyborg/common/nova_client.py", line 50, in _send_events (events, response.status_code, response.text)) Exception: Failed to send events [{'status': 'completed', 'tag': u'afaas_example_1', 'name': 'accelerator-requests-bound', 'server_uuid': u'0be7e70f-710d-4ecf-b480-911c53b565cf'}]: HTTP 404: {"itemNotFound": {"message": "No instances found for any event", "code": 404}} Change-Id: I4e7d14e271aa26f19da3605b20cb4fd72eef4312 |
||
---|---|---|
api-ref/source | ||
cyborg | ||
devstack | ||
doc | ||
etc/cyborg | ||
releasenotes | ||
sandbox | ||
setup | ||
tools/config | ||
.gitignore | ||
.gitreview | ||
.stestr.conf | ||
.zuul.yaml | ||
babel.cfg | ||
bindep.txt | ||
CONTRIBUTING.rst | ||
HACKING.rst | ||
LICENSE | ||
README.rst | ||
requirements.txt | ||
setup.cfg | ||
setup.py | ||
test-requirements.txt | ||
tox.ini |
Cyborg
OpenStack Acceleration as a Service
Cyborg provides a general management framework for accelerators such as FPGA, GPU, SoCs, NVMe SSDs, CCIX caches, DPDK/SPDK, pmem and so forth.
- Free software: Apache license
- Wiki: https://wiki.openstack.org/wiki/Cyborg
- Source: https://opendev.org/openstack/cyborg
- Blueprints and Bugs: https://storyboard.openstack.org/#!/project/openstack/cyborg
- Documentation: https://docs.openstack.org/cyborg/latest/
- Release notes: https://docs.openstack.org/releasenotes/cyborg/
- Design specifications: https://specs.openstack.org/openstack/cyborg-specs/
Features
- REST API for basic accelerator life cycle management
- Generic driver for common accelerator support