diff --git a/specs/mitaka/approved/virt-device-role-tagging.rst b/specs/mitaka/approved/virt-device-role-tagging.rst new file mode 100644 index 000000000..b54c39079 --- /dev/null +++ b/specs/mitaka/approved/virt-device-role-tagging.rst @@ -0,0 +1,684 @@ +.. + This work is licensed under a Creative Commons Attribution 3.0 Unported + License. + + http://creativecommons.org/licenses/by/3.0/legalcode + +================================= +Virtual guest device role tagging +================================= + +https://blueprints.launchpad.net/nova/+spec/virt-device-role-tagging + +This will provide a mechanism for the user to tag a device they +have assigned to their guest with a specific role. The tag will +be matched to the hardware address of the device and this mapping +exposed to the guest OS via metadata service/cloud-init. + +Problem description +=================== + +It is common to create virtual instances which have multiple +network devices or disk drives. The tenant user creating the +instance will often have a specific role in mind for each of +the devices. For example, a particular disk may be intended +for use as Oracle database storage, or as a Squid webcache +storage, etc. Similarly there may be specific network interfaces +intended for use by a network service application running in +the guest. + +The tenant user who is creating the instance does not have an +explicit way to communicate the intended usage of each device +to the application running inside the guest OS. + +It may appear possible to identify a device via some aspect +that the tenant user knows, and then use the cloud-init / +metadata service to provide a mapping to the guest. For +example, a MAC address could potentially be used to identify +NICs, or a disk device name string could be used to identify +disks. The user would then set a metadata tag eg + +:: + + # nova boot \ + --image mywebappimage \ + --flavor m1.large \ + --meta oracledata=vda \ + --meta apachefrontend=02:10:22:32:33:22 \0 + mywebapp + +The problem is that because Nova tries to hide as much detail +of the guest hardware setup as possible, it is not easy for +the tenant user to know what the unique identifiers for each +device are. For example, while with emulated NICs, it is +possible to know the MAC address before booting the instance, +when using PCI assigned devices, this is not available. + +Another approach might appear to be to identify devices based +on the order in which they appear to guests. eg the application +in the guest could be set to use the 3rd PCI NIC, or the 2nd +disk on the SCSI bus. The problem with this is that neither +Nova or the underlying hypervisor is able to provide a strong +guarantee around the device ordering in the guest. By good +fortune, the order in which disks are listed on the nova boot +command line, often matches the order in which device letters +are assigned by Linux, but nothing guarantees this to be the +case long term. + +Use Cases +---------- + +The tenant user needs to provide information to the guest +instance to identify which device to use for a desired +guest application role. + +For example, the tenant user wishes to instruct the Oracle +database to use a particular SCSI disk for its data storage, +because they have configured that disk to use a particular +cinder volume that is built for high throughput. Or they +may wish to instruct an NFV application that it should +process data from a particular network interface, because +that interface is connected to an interface in a second +guest which is sending the required network traffic. + +The tenant needs to be able to provide this identification +information to the guest OS, without knowing about how the +particular hypervisor will configure the virtual hardware. + + +Proposed change +=============== + +The proposal is to extend the REST API so that when adding +disks or network interfaces to a guest instance, it is +possible to pass an opaque string "tag". + +When booting a guest, Nova will determine what PCI, USB, +SCSI address corresponds to the device the user asked +for, and create a metadata file that maps the user +provided tag to the hypervisor assigned device address. + +This metadata file will be provided via either cloud-init +or the metadata service. + +When the guest OS image boots up, it will read this metadata +file to determine which devices need to be used for particular +application services running in the instance. How the guest +OS does this is outside the scope of this spec. Nova is merely +defining a file format and a set of information it will contain, +which the guest OS and/or applications can consume in a manner +which they prefer. There are no current standards in this area, +so it is a greenfield design for the file format. + +For example, consider that the user ran + +:: + + nova boot \ + --image mywebappimage \ + --flavor m1.large \ + --nic net-id=12345,tag=nfvfunc1 \ + --nic net-id=56789,tag=nfvfunc2 \ + --block-device volume_id=12345,bus=scsi,tag=oracledb \ + --block-device volume_id=56789,bus=virtio,tag=squidcache \ + mynfvapp + +The Nova could then auto-generate a metadata file that contained +the following, based on information reported by the Nova libvirt +driver for the guest instance: + +:: + + { + "devices": [ + { + "type": "nic", + "bus": "pci", + "address": "0000:00:02.0", + "mac": "01:22:22:42:22:21", + "tags": ["nfvfunc1"] + }, + { + "type": "nic", + "bus": "pci", + "address": "0000:00:03.0", + "mac": "01:22:22:42:22:21", + "tags": ["nfvfunc2"] + }, + { + "type": "disk", + "bus": "scsi", + "address": "1:0:2:0", + "serial": "disk-vol-2352423", + "tags": ["oracledb"] + }, + { + "type": "disk", + "bus": "pci", + "address": "0000:00:07.0", + "serial": "disk-vol-24235252", + "tags": ["squidcache"] + } + ] + } + +In this example, we have provide a few bits of information about +the devices + +* The type of device info is provided for. Currently + this is 'nic' or 'disk'. Other types will be provided + in the future. +* The bus the device is attached to. This can be + "pci", "scsi", "usb", "ide" and similar things. This is basically + saying how to interpret the device address. The bus may + be "none" in the case of containers, or where the device + is integrated into the platform board. +* The device address. The format of the address varies + based on the bus, but would be the PCI address, or SCSI + address, of USB port, or IDE channel, etc. +* The network device MAC address, if type==nic. +* The disk drive serial string (if set & type==disk). +* The network device name, if type==nic and the hypervisor + supports explicit device names (ie containers) +* The disk device name, if type==disk and the hypervisor + supports explicit device names (ie containers) +* It is possible for the same tag to appear multiple + times against different device types +* If the hypervisor provides two devices which mapo + to the same backend, it is possible for the same + tag to appear in both. This is the case with Xen + HVM guests where a single block device is exposed + via both Xen paravirt disk and IDE emulated disk. + The guest chooses which to use. +* Although the syntax supports setting of multiple + tags per device, initially the impl will only + allow a single tag. The syntax just allows for + future extension should there be a need. + +Note that not all architectures support PCI buses, for +example armv7 and s390 don't, so if a guest OS wishes to be +portable it must not assume it will get devices of a particular +type. As such for device addressing, only the "bus" attribute +would be considered mandatory, the "address" attribute may be +omitted if that data is not available. Network devices would +always have a "mac" attribute present. Disk devices would +have a "serial" attribute present if the disk had an associated +unique serial set. The virt drivers in Nova would endeavour to +make available as much information as possible. + +The data reported to the guest OS will be considered a stable +API that must be maintained across future Nova releases in a +backwards compatible manner. As such, the data will be made to +conform to a formal JSON schema, which will be append-only +to ensure future compatibility. + +:: + + { + "$schema": "http://json-schema.org/schema#", + "id": "http://openstack.org/schemas/nova/metadata/device-role-tagging/1.0", + "definitions": { + "nonedevicebus": { + "type": "object", + "properties": { + "bus": { + "type": "string", + "pattern": "none" + } + }, + "required": [ "bus" ] + }, + "pcidevicebus": { + "type": "object", + "properties": { + "bus": { + "type": "string", + "pattern": "pci" + }, + "address": { + "type": "string", + "pattern": "[a-f0-9]{4}:[a-f0-9]{2}:[a-f0-9]{2}.[a-f0-9]" + } + }, + "required": [ "bus" ] + }, + "usbdevicebus": { + "type": "object", + "properties": { + "bus": { + "type": "string", + "pattern": "usb" + }, + "address": { + "type": "string", + "pattern": "[a-f0-9]+:[a-f0-9]+" + } + }, + "required": [ "bus" ] + }, + "scsidevicebus": { + "type": "object", + "properties": { + "bus": { + "type": "string", + "pattern": "scsi" + }, + "address": { + "type": "string", + "pattern": "[a-f0-9]+:[a-f0-9]+:[a-f0-9]+:[a-f0-9]+" + } + }, + "required": [ "bus" ] + }, + "idedevicebus": { + "type": "object", + "properties": { + "bus": { + "type": "string", + "pattern": "ide" + }, + "address": { + "type": "string", + "pattern": "[0-1]:[0-1]" + } + }, + "required": [ "bus" ] + }, + "anydevicebus": { + "type": "object", + "oneOf": [ + { "$ref": "#/definitions/pcidevicebus" }, + { "$ref": "#/definitions/usbdevicebus" }, + { "$ref": "#/definitions/idedevicebus" }, + { "$ref": "#/definitions/scsidevicebus" }, + { "$ref": "#/definitions/nonedevicebus" } + ] + }, + "nicdevice": { + "type": "object", + "properties": { + "mac": { + "type": "string" + } + "devname": { + "type": "string" + } + }, + "required": ["mac"], + "additionalProperties": { + "allOf": [ + { "$ref": "#/definitions/anydevicebus" } + ] + } + }, + "diskdevice": { + "type": "object", + "properties": { + "serial": { + "type": "string" + }, + "path": { + "type": "string" + } + }, + "additionalProperties": { + "allOf": [ + { "$ref": "#/definitions/anydevicebus" } + ] + } + } + }, + + "type": "object", + + "properties": { + "devices": { + "type": "array", + "items": { + "type": [ + { "$ref": "#/definitions/nicdevice" }, + { "$ref": "#/definitions/diskdevice" } + ] + } + } + } + } + +The implementation will consist of several parts. There will +be a set of python classes defined in nova/virt/metadata.py +that are capable of representing the data described by the +JSON schema above, and generating a compliant JSON document. + +The virt drivers will be extended to populate instances of +these classes with the data associated with each instance. +The initial implementation will be done for the Libvirt +driver, however, other virt driver maintainers are +encouraged to provide the same functionality. + +The metadata API will be extended to be capable of reporting +this data associated with a guest instance. This has a chicken +and egg scenario for network configuration. Guests relying on +the metadata service will need to do a minimal network configuration +to reach the metadata service and obtain the info from Nova. +They can then re-configure networking based on the device +tag information. + +The config driver generator will be extended to be capable +of including this JSON data associated with a guest instance. +This is the preferred method where guests need to rely on +tags to confgure networking, as it has no chicken & egg +scenario. + +In the future QEMU will be able export metadata directly via +the firmware so it will be available directly from the very +earliest stages of boot. It is expected this will be used as +an additional optional transport in the future. + +Outside the scope of the Nova work, a simple tool will be +created that can parse this metadata file and set tags against +devices in the udev database. It is anticipated that cloud-init +would trigger this tool. Thus (Linux) applications / OS images +would not need to directly understand this Nova JSON format. +Instead they could just query udev to ask for details of the +device with a particular tag. This avoids the applications +needing to deal with the countless different device bus types +or addressing formats. + +Example for Xen HVM with dual-disk devices + +:: + + { + "devices": [ + { + "type": "nic", + "bus": "xen", + "address": "0", + "mac": "01:22:22:42:22:21", + "tags": ["nfvfunc1"] + }, + { + "type": "nic", + "bus": "xen", + "address": "1", + "mac": "01:22:22:42:22:21", + "tags": ["nfvfunc2"] + }, + { + "type": "disk", + "bus": "ide", + "address": "0:0", + "serial": "disk-vol-123456", + "tags": ["oracledb"] + }, + { + "type": "disk", + "bus": "xen", + "address": "0", + "path": "/dev/xvda", + "serial": "disk-vol-123456", + "tags": ["oracledb"] + } + { + "type": "disk", + "bus": "ide", + "address": "0:1", + "serial": "disk-vol-789321", + "tags": ["squidcache"] + }, + { + "type": "disk", + "bus": "xen", + "address": "1", + "path": "/dev/xvdb", + "serial": "disk-vol-789321", + "tags": ["squidcache"] + } + ] + } + +Some things to note about this Xen example. + +* There are two logical disks here, which Xen has exposed + as *both* IDE and Xen paravirt. +* For the Xen paravirt disks, Xen can also provide a fixed + guest path. +* The address for devices on Xen bus is just an integer + which maps into the XenBus namespace. + + +Example for LXC container + +:: + + { + "devices": [ + { + "type": "nic", + "bus": "none", + "mac": "01:22:22:42:22:21", + "devname": "eth0", + "tags": ["nfvfunc1"] + }, + { + "type": "nic", + "bus": "none", + "mac": "01:22:22:42:22:21", + "devname": "eth1", + "tags": ["nfvfunc2"] + }, + { + "type": "disk", + "bus": "none", + "serial": "disk-vol-2352423", + "path": "/dev/sda", + "tags": ["oracledb"] + }, + { + "type": "disk", + "bus": "none", + "serial": "disk-vol-24235252", + "path": "/dev/sdb", + "tags": ["squidcache"] + } + ] + } + +Some things to note about this LXC example: + +* Containers do not export device buses to guests, as + they don't emulate hardware. Thus the 'bus' is 'none' + and there is no corresponding 'address' +* Containers are able to provide fixed disk paths and + NIC device names + +Alternatives +------------ + +Many users facing this problem have requested that Nova allow +them to specify a fixed PCI address when creating disks and/or +network interfaces. In a traditional data center virtualization +world this would be an acceptable request, but a goal of the +cloud is to isolate tenant users from the specifics of guest +hardware configuration. Such configuration requires intimate +knowledge of the underlying hypervisor which is simply not +available to tenant users, nor should they be expected to +learn that. In view of this, it is considered inappropriate +to allow tenant users to control the guest device addressing +via the REST API. + +As noted in the problem description another approach is for +the tenant user to manually set tags via the existing mechanism +for providing user metadata to guests. This however relies on +the user knowing some unique identifying attribute for the +device upfront. In some cases this is possible, but there are +a number of cases where no such information is available. + +Data model impact +----------------- + +The BlockDeviceMapping object (and associated table) will +gain a freeform string attribute, named "tag". + +The NetworkRequest object (and associated table) will +gain a freeform string attribute, named "tag". + +In future other device types, such as PCI devices or serial +ports, may also gain similar "tag" attributes. For the initial +implementation only the disk and network objects are to be +dealt with. + +REST API impact +--------------- + +The block device mapping data format will gain a new +freeform string parameter, named "tag", which can be +set against each disk device. This would affect the +APIs for booting instances and hot-adding disks. In +terms of the Nova client this would be visible as a +new supported key against the --block-device flag. e.g. + + nova boot --block-device id=UUID,source=image,tag=database + +The volume attach API will similarly gain a new freeform +string parameter in the "volumeAttachment" data dict, +named "tag". In terms of the Nova client this would be +visible as a new flag. e.g. + + nova volume-attach --tag=database INSTANCE-ID VOLUME-ID + +The server create API gain a new freeform string parameter +in the "network" data dict, named "tag", for each virtual +interface. In terms of the Nova client this would be visible +as a new supported key against the --nic flag. e.g. + + nova boot --nic net-id=UUID,port-id=UUID,tag=database + +The interface attach API will similarly gain a new freeform +string parameter in the "interfaceAttachment" data dict, +named "tag". In terms of the Nova client this would be visible +as a new flag. e.g. + + nova interface-attach UUID --net-id UUID --port-id UUID --tag database + +In all cases there will need to be validation performed to +ensure that the supplied "tag" string is unique within the +scope of (instance, device-type). ie you cannot have two +NICs on the same instance with the same "tag", but you can +have a disk and a NIC with the same "tag". + +If no tag is defined against a device, the corresponding +device entry in the metadata file will not have any tags +listed. Since this is intended as an end user feature, +it is not considered appropriate for Nova to auto-generate +tags itself. + +This will require a new API microversion + +Security impact +--------------- + +None, this is merely providing some user metadata to the +guest OS. + +Notifications impact +-------------------- + +None + +Other end user impact +--------------------- + +There will be new fields available when specifying disks +or network interfaces for virtual instances. The metadata +service and cloud-init will have a new data file made +available containing the user tags & address information. + +Performance Impact +------------------ + +None + +Other deployer impact +--------------------- + +None + +Developer impact +---------------- + +None + +Implementation +============== + +Assignee(s) +----------- + +Primary assignee: + Artom Lifshitz + +Other contributors: + Daniel Berrange + +Work Items +---------- + +* Define new attribute for BlockDeviceMapping object +* Define new attribute for NetworkRequest object +* Define new parameters for block device in REST API(s) +* Define new parameters for network requests in REST API(s) +* Define a set of classes to represent the device metadata +* Modify the metadata API to be able to serve the new data + document +* Modify the config drive generator to be able to include + the new data document +* Modify the libvirt driver to populate the metadata about + devices that have tags present. +* Modify the Nova client to allow the extra tag parameter + to be provided. + +Dependencies +============ + +An external GIT repository will be created that provides a +tool that is capable of parsing the Nova tag metadata and +setting udev tags. This is not strictly a dependency, but +a highly desirable feature to facilite the use of this +tag information from Linux guests. + +Cloud-init will be enhanced to invoke this tool when it +finds the JSON tag metadata is available from Nova. + +Testing +======= + +Tempest tests will create a guest with various NICs and disks, +assign tags to them, and then check the guest facing metadata file +is present and contains sensible data. NB, the actual data it +contains will vary according to the hypervisor running the tests, +so care will need to be taken to ensure any test is portable. + +Documentation Impact +==================== + +The API documentation will need to be updated to list the new tag +parameter that is allowed against disk and network devices + +The user documentation for cloud-init will need to describe the +newly available metadata file and its semantics. + +References +========== + +History +======= + + +.. list-table:: Revisions + :header-rows: 1 + + * - Release Name + - Description + * - Liberty + - Introduced