This document had a bunch of great content, but some of it has been addressed and other initiatives have changed the content or approach. This commit attempts to refresh this documentation so that developers can continue to use it to improve policy enforcement. Change-Id: Iac7a2157d625524932b94a5564723b440efd7344
8.7 KiB
REST API Policy Enforcement
The following describes some of the shortcomings in how policy is used and enforced in nova, along with some benefits of fixing those issues. Each issue has a section dedicated to describing the underlying cause and historical context in greater detail.
Problems with current system
The following is a list of issues with the existing policy enforcement system:
- Testing default policies
- Mismatched authorization
- Inconsistent naming
- Incorporating default roles
- Compartmentalized policy enforcement
- Refactoring hard-coded permission checks
- Granular policy checks
Addressing the list above helps operators by:
- Providing them with flexible and useful defaults
- Reducing the likelihood of writing and maintaining custom policies
- Improving interoperability between deployments
- Increasing RBAC confidence through first-class testing and verification
- Reducing complexity by using consistent policy naming conventions
- Exposing more functionality to end-users, safely, making the entire nova API more self-serviceable resulting in less operational overhead for operators to do things on behalf of users
Additionally, the following is a list of benefits to contributors:
- Reduce developer maintenance and cost by isolating policy enforcement into a single layer
- Reduce complexity by using consistent policy naming conventions
- Increased confidence in RBAC refactoring through exhaustive testing that prevents regressions before they merge
Testing default policies
Testing default policies is important in protecting against authoritative regression. Authoritative regression is when a change accidentally allows someone to do something or see something they shouldn't. It can also be when a change accidentally restricts a user from doing something they used to have the authorization to perform. This testing is especially useful prior to refactoring large parts of the policy system. For example, this level of testing would be invaluable prior to pulling policy enforcement logic from the database layer up to the API layer.
Testing documentation exists that describes the process for developing these types of tests.
Mismatched authorization
The compute API is rich in functionality and has grown to manage both
physical and virtual hardware. Some APIs were meant to assist operators
while others were specific to end users. Historically, nova used
project-scoped tokens to protect almost every API, regardless of the
intended user. Using project-scoped tokens to authorize requests for
system-level APIs makes for undesirable user-experience and is prone to
overloading roles. For example, to prevent every user from accessing
hardware level APIs that would otherwise violate tenancy requires
operators to create a system-admin
or
super-admin
role, then rewrite those system-level policies
to incorporate that role. This means users with that special role on a
project could access system-level resources that aren't even tracked
against projects (hypervisor information is an example of
system-specific information.)
As of the Queens release, keystone supports a scope type dedicated to easing this problem, called system scope. Consuming system scope across the compute API results in fewer overloaded roles, less specialized authorization logic in code, and simpler policies that expose more functionality to users without violating tenancy. Please refer to keystone's authorization scopes documentation to learn more about scopes and how to use them effectively.
Inconsistent naming
Inconsistent conventions for policy names are scattered across most OpenStack services, nova included. Recently, there was an effort that introduced a convention that factored in service names, resources, and use cases. This new convention is applicable to nova policy names. The convention is formally documented in oslo.policy and we can use policy deprecation tooling to gracefully rename policies.
Incorporating default roles
Up until the Rocky release, keystone only ensured a single role
called admin
was available to the deployment upon
installation. In Rocky, this support was expanded to include
member
and reader
roles as first-class
citizens during keystone's installation. This allows service developers
to rely on these roles and include them in their default policy
definitions. Standardizing on a set of role names for default policies
increases interoperability between deployments and decreases operator
overhead.
You can find more information on default roles in the keystone specification or developer documentation.
Compartmentalized policy enforcement
Policy logic and processing is inherently sensitive and often complicated. It is sensitive in that coding mistakes can lead to security vulnerabilities. It is complicated in the resources and APIs it needs to protect and the vast number of use cases it needs to support. These reasons make a case for isolating policy enforcement and processing into a compartmentalized space, as opposed to policy logic bleeding through to different layers of nova. Not having all policy logic in a single place makes evolving the policy enforcement system arduous and makes the policy system itself fragile.
Currently, the database and API components of nova contain policy logic. At some point, we should refactor these systems into a single component that is easier to maintain. Before we do this, we should consider approaches for bolstering testing coverage, which ensures we are aware of or prevent policy regressions. There are examples and documentation in API protection testing guides.
Refactoring hard-coded permission checks
The policy system in nova is designed to be configurable. Despite this design, there are some APIs that have hard-coded checks for specific roles. This makes configuration impossible, misleading, and frustrating for operators. Instead, we can remove hard-coded policies and ensure a configuration-driven approach, which reduces technical debt, increases consistency, and provides better user-experience for operators. Additionally, moving hard-coded checks into first-class policy rules let us use existing policy tooling to deprecate, document, and evolve policies.
Granular policy checks
Policies should be as granular as possible to ensure consistency and
reasonable defaults. Using a single policy to protect CRUD for an entire
API is restrictive because it prevents us from using default roles to
make delegation to that API flexible. For example, a policy for
compute:foobar
could be broken into
compute:foobar:create
, compute:foobar:update
,
compute:foobar:list
, compute:foobar:get
, and
compute:foobar:delete
. Breaking policies down this way
allows us to set read-only policies for readable operations or use
another default role for creation and management of foobar resources. The oslo.policy library has
examples
that show how to do this using deprecated policy rules.