Merge "Add the 1st version of Mistral coding guidelines into the docs"
This commit is contained in:
commit
fef538cce7
544
doc/source/contributor/coding_guidelines.rst
Normal file
544
doc/source/contributor/coding_guidelines.rst
Normal file
@ -0,0 +1,544 @@
|
||||
Mistral Coding Guidelines
|
||||
=========================
|
||||
|
||||
Why learn more coding guidelines?
|
||||
---------------------------------
|
||||
|
||||
This document contains the description of the guidelines used for writing
|
||||
code on the Mistral project. Some of the guidelines follow from the
|
||||
nature of Python programming language (dynamic types, etc), some are the
|
||||
result of the consensus achieved by the project contributors during many
|
||||
contribution cycles. All contributors need to follow these guidelines
|
||||
when contributing to Mistral. The purpose of having such well described
|
||||
practices is improving communication between team members, reducing a number
|
||||
of controversial situations related to how a certain code snippet should
|
||||
be written, and letting contributors focus on unique engineering tasks
|
||||
rather than low level decisions that any engineer makes many times a day,
|
||||
like choosing a good name for a class or variable, or how to organise a loop,
|
||||
whether they need to put a blank line before "if" or "return", in what cases
|
||||
and so on. The document, when accepted and followed by team members, aims
|
||||
to improve overall development speed and quality.
|
||||
|
||||
Note that the guidelines described below almost don't conflict with the
|
||||
official PEP8 style guide (https://www.python.org/dev/peps/pep-0008/)
|
||||
describing how any Python program should be formatted. Mistral guidelines
|
||||
add more high level semantics on top of it. PEP8 still should be considered
|
||||
a necessary base for those who write Python programs. Strictly speaking,
|
||||
this guide is not exactly about style, it's about writing maintainable code.
|
||||
|
||||
Some of the concepts being discussed below may seem a bit too philosophical
|
||||
but, in fact, they reflect our real experience of solving practical tasks.
|
||||
According to it, some decisions work well and some don't given the Python
|
||||
programming nature and the nature of the project.
|
||||
|
||||
The guidelines are based on the three main values:
|
||||
|
||||
- **Communication.** When writing code we always try to create it in a way
|
||||
that it's easy to read and understand. This is important because most of
|
||||
the time developers spend on reading existing code, not writing new.
|
||||
- **Simplicity.** It makes sense to write code that use minimal means that
|
||||
solves a task at hand.
|
||||
- **Flexibility.** In most cases it's better to keep options open because
|
||||
in programming there's no such thing as "done". Pretty much all code gets
|
||||
modified during lifecycle of a program working in production. This is a
|
||||
very important difference form designing real buildings where we can't
|
||||
rebuild a basement after a building is completed and inhabited by people.
|
||||
|
||||
In the document, there is a number of code snippets, some are considered
|
||||
incorrect and some are correct, with explanations why from the perspective
|
||||
of these key values.
|
||||
|
||||
Further sections are devoted to certain aspects of writing readable code.
|
||||
They will typically have a list of guidelines upfront so it's easy to find
|
||||
them in the text and then more details on them, if needed.
|
||||
|
||||
Naming
|
||||
------
|
||||
|
||||
Naming is always important in programing. Good naming should always serve
|
||||
for better communication, i.e. a name of a code entity (module, variable,
|
||||
class etc.) should convey a clear message to a code reader about the meaning
|
||||
of the entity. For dynamic languages naming is even more important than for
|
||||
languages with static types. Below it will be shown why.
|
||||
|
||||
Using Abbreviations
|
||||
^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Guidelines
|
||||
''''''''''
|
||||
|
||||
- *Use well-known abbreviations to name variables, method arguments, constants*
|
||||
|
||||
Well-known abbreviations (shortcuts) used for names of constants, local
|
||||
variables and method arguments help simplify code, since it becomes less
|
||||
verbose, and simultaneously improve communication because all team members
|
||||
understand them the same way. Below is a list of abbreviations used on the
|
||||
Mistral project. The list is not final and is supposed to grow over time.
|
||||
|
||||
- **app** - application
|
||||
- **arg** - argument
|
||||
- **cfg** - configuration
|
||||
- **cls** - class
|
||||
- **cmd** - command
|
||||
- **cnt** - count
|
||||
- **ctx** - context
|
||||
- **db** - database
|
||||
- **desc** - description
|
||||
- **dest** - destination
|
||||
- **def** - definition
|
||||
- **defs** - definitions
|
||||
- **env** - environment
|
||||
- **ex** - execution
|
||||
- **execs** - executions
|
||||
- **exc** - exception
|
||||
- **log** - logger
|
||||
- **ns** - namespace
|
||||
- **info** - information
|
||||
- **obj** - object
|
||||
- **prog** - program
|
||||
- **spec** - specification
|
||||
- **sync** - synchronous
|
||||
- **wf** - workflow
|
||||
- **wfs** - workflows
|
||||
|
||||
Note that when using a well-known abbreviation to name a constant we need to
|
||||
use capital letters only (just as required by PEP8).
|
||||
|
||||
Local Variables and Method Arguments
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Guidelines
|
||||
''''''''''
|
||||
|
||||
- *The name of a method argument should clearly communicate its semantics
|
||||
and purpose*
|
||||
- *The name of a method argument or a local variable should include a type
|
||||
name if it creates an ambiguity otherwise*
|
||||
- *The name of a local variable can be shortened from a full word (up to one
|
||||
letter) if the scope of the variable is very narrow (several lines) and if
|
||||
it doesn't create ambiguities*
|
||||
|
||||
Principles of choosing names in dynamic languages like Python are especially
|
||||
important because oftentimes we don't clearly see of what type a certain
|
||||
variable is. Whereas a type itself (no matter if it's primitive or represented
|
||||
by a class) carries very important information about all objects of this type.
|
||||
For example, if we see a variable in Java or C++ code it's not a problem to
|
||||
find out of what type this variable is, we can easily find where the variable
|
||||
is defined. Any modern IDE also allows to navigate to a type declaration (e.g.
|
||||
if it's a class) just by clicking on it and see all required info about it.
|
||||
In Python, it's often much more problematic to find a variable type. Injecting
|
||||
a type name, at least a shortcut, into a variable name often helps mitigate
|
||||
this fundamental issue and improve code readability, and hence communication.
|
||||
|
||||
Below are the code snippets that help illustrate this problem. Let's assume we
|
||||
have the following Python classes:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
class TaskExecution(Execution):
|
||||
# Carries information about a running task.
|
||||
...
|
||||
|
||||
class TaskSpec(BaseSpec):
|
||||
# Defines a work logic of a particular task.
|
||||
...
|
||||
|
||||
For what we want to illustrate, it's not important if they belong to the same
|
||||
module or different.
|
||||
|
||||
Problematic Code
|
||||
''''''''''''''''
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def calculate_next_tasks(self, task, context):
|
||||
result = set()
|
||||
|
||||
for clause in task.get_on_clauses():
|
||||
task_names = self.evaluate_on_clause(clause, context)
|
||||
|
||||
tasks = [self.get_task_by_name(t_n) for t_n in task_names]
|
||||
|
||||
result.update(tasks)
|
||||
|
||||
return result
|
||||
|
||||
Is this method easy to understand? Well, if this code is part of a small
|
||||
program (e.g. a 200-300 lines script) is small then it may be ok. But when
|
||||
it's a system with dozens of thousands lines of code then it has a number of
|
||||
issues.
|
||||
|
||||
The most important issue is that we don't know the type of the "task" method
|
||||
argument. In order to find it we'll have to see where this method is called
|
||||
and what is passed as an argument. Then, if an object is not created there
|
||||
we'll have to go further and find callers of that method too, and so on until
|
||||
we find where the object is actually instantiated. The longer the method call
|
||||
chains are, the worse. So, jus by looking at this code we can't determine
|
||||
whether the argument "task" is of type TaskSpec or TaskExecution. So for
|
||||
example, we can't even say for sure if ``task.get_on_clauses()`` is a correct
|
||||
instruction. If we have hundreds of places like this, it is very challenging
|
||||
to read code and it is very easy to make a mistake when modifying it.
|
||||
|
||||
The other obvious issues are also related to naming:
|
||||
|
||||
#. It's not clear objects of what type are returned by the method
|
||||
#. It's not clear objects of what type is returned by
|
||||
``self.get_task_by_name(t_n)``
|
||||
|
||||
Better Code
|
||||
'''''''''''
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def calculate_next_task_specs(self, task_spec, context):
|
||||
result = set()
|
||||
|
||||
for clause in task_spec.get_on_clauses():
|
||||
task_names = self.evaluate_on_clause(clause, context)
|
||||
|
||||
task_specs = [self.get_task_spec_by_name(t_n) for t_n in task_names]
|
||||
|
||||
result.update(task_specs)
|
||||
|
||||
return result
|
||||
|
||||
Now we won't be confused. Of course, we still have to remember we have those
|
||||
two classes but at least we have a clear pointer to a type.
|
||||
|
||||
Functions and Methods
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Guidelines
|
||||
''''''''''
|
||||
|
||||
- *The name of a method should clearly communicate its semantics and purpose*
|
||||
- *The name of a method should include a type name of a returned value when it
|
||||
creates an ambiguity otherwise*
|
||||
|
||||
|
||||
For example, there are two classes like these again:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
class TaskExecution(Execution):
|
||||
# Carries information about a running task.
|
||||
...
|
||||
|
||||
class TaskSpec(BaseSpec):
|
||||
# Defines a work logic of a particular task.
|
||||
...
|
||||
|
||||
And we need a method that implements some logic and returns an instance of
|
||||
**TaskSpec**. How would we choose a good name for the method?
|
||||
|
||||
Problematic Code
|
||||
''''''''''''''''
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def calculate_parent_task(self, task_spec):
|
||||
...
|
||||
|
||||
Looking at this method signature it's not clear what to expect as a returned
|
||||
value since in Python method declarations don't contain a returned type value.
|
||||
Although there's a temptation to use just "task" in the method name it leads
|
||||
to the naming ambiguity: a returned value may be either of **TaskExecution**
|
||||
or **TaskSpec** type. Strictly speaking, it may be any type since it's Python
|
||||
but we can help a reader of this code a little bit and leave a hint about
|
||||
what's going to be returned from the method when it's called.
|
||||
|
||||
At first glance, it may seem a bit weird why we pay attention to such things.
|
||||
One may say that it's already totally clear that need to use full "task_spec"
|
||||
in the method name instead of "task" just according to the name of the class.
|
||||
Why anybody may want to use "task"? So, the reality is that during informal
|
||||
communication within a team people tend to simplify/shorten words. That is,
|
||||
when several team members are working on something related to task
|
||||
specifications within a certain scope (module, class, etc.) they often move
|
||||
to using a simplified terminology and instead of saying "task specification"
|
||||
they start saying just "task". And this kind of habit often also sneaks into
|
||||
code. And eventually it breaks communication between code author and code
|
||||
reader.
|
||||
|
||||
Better Code
|
||||
'''''''''''
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def calculate_parent_task_spec(self, task_spec):
|
||||
...
|
||||
|
||||
Although it's more verbose, it allows to mitigate the naming ambiguity for
|
||||
a code reader.
|
||||
|
||||
Constants
|
||||
^^^^^^^^^
|
||||
|
||||
Guidelines
|
||||
''''''''''
|
||||
|
||||
- *All hardcoded values (strings, numbers, etc.) must be defined as
|
||||
constants, i.e. global module variables*
|
||||
- *Constants must be defined in the beginning of a module*
|
||||
|
||||
Problematic Code
|
||||
''''''''''''''''
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def print_task_report_line(self, line, level=0):
|
||||
self.app.stdout.write(
|
||||
"task: %s%s\n" % (' ' * (level * 4), line)
|
||||
)
|
||||
|
||||
...
|
||||
|
||||
def print_action_report_line(self, line, level=0):
|
||||
self.app.stdout.write(
|
||||
"action: %s%s\n" % (' ' * (level * 4), line)
|
||||
)
|
||||
|
||||
The problem of this code is that it uses hard-coded integer values at
|
||||
different places. Why is it a problem?
|
||||
|
||||
- It's hard to find hard-coded values while reading code
|
||||
- It's hard to understand whether same values mean semantically the same
|
||||
thing, or different
|
||||
- It's easy to make a mistake when changing a hard-coded value because we
|
||||
can change it at one place and miss other places
|
||||
|
||||
Better Code
|
||||
'''''''''''
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
REPORT_ENTRY_INDENT = 4
|
||||
DEFAULT_ENTRY_LEVEL = 0
|
||||
|
||||
def print_task_report_line(self, line, level=DEFAULT_ENTRY_LEVEL):
|
||||
self.app.stdout.write(
|
||||
"task: %s%s\n" % (' ' * (level * REPORT_ENTRY_INDENT), line)
|
||||
)
|
||||
|
||||
...
|
||||
|
||||
def print_action_report_line(self, line, level=DEFAULT_ENTRY_LEVEL):
|
||||
self.app.stdout.write(
|
||||
"action: %s%s\n" % (' ' * (level * REPORT_ENTRY_INDENT), line)
|
||||
)
|
||||
|
||||
Now the code clearly communicates to a reader that value 4 in these two
|
||||
methods means exactly the same entity: an indent of any report entry. The
|
||||
other constant similarly adds clarity about value 0. Previously, these two
|
||||
integers were nameless. It's now also easy to change values, we just need
|
||||
to set different values to the named constants.
|
||||
|
||||
Grouping and Blank Lines
|
||||
------------------------
|
||||
|
||||
Guidelines
|
||||
^^^^^^^^^^
|
||||
- *Use blank lines to split individual steps (units) of algorithms.*
|
||||
- *Always put blank lines before and after "if", "for", "try" and "with"
|
||||
blocks if they don't contain one another w/o anything else in between.*
|
||||
- *Always put a blank line before "return" unless it's the only instruction
|
||||
in en enclosing code block like a method or an "if" block.*
|
||||
- *Use blank lines to split logically not symmetric lines, e.g. less
|
||||
abstract and more abstract lines like variable assignment and a method
|
||||
call.*
|
||||
- *Put a blank line after any call to a superclass.*
|
||||
|
||||
Although for someone it may not seem important, blank lines consciously put
|
||||
in code can improve readability. The general recommendation is to use blank
|
||||
lines to separate different logical blocks. When writing code it is useful
|
||||
to ask ourselves the question "what are the main steps of the algorithm I'm
|
||||
implementing?". When answered, it gives understanding of how code can be
|
||||
decomposed into sections. And in order to reflect that they are individual
|
||||
steps of the algorithm, the corresponding code blocks can be split by blank
|
||||
lines. Let's consider examples.
|
||||
|
||||
Problematic Code
|
||||
^^^^^^^^^^^^^^^^
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def update_task_state(self, state, state_info=None):
|
||||
old_task_state = self.task_ex.state
|
||||
if states.is_completed(self.task_ex.state):
|
||||
self.notify(old_task_state, self.task_ex.state)
|
||||
return
|
||||
if not states.is_valid_transition(self.task_ex.state, state):
|
||||
return
|
||||
child_states = [a_ex.state for a_ex in self.task_ex.executions]
|
||||
if state == states.RUNNING and states.PAUSED in child_states:
|
||||
return
|
||||
self.set_state(state, state_info)
|
||||
if states.is_completed(self.task_ex.state):
|
||||
self.register_workflow_completion_check()
|
||||
self.notify(old_task_state, self.task_ex.state)
|
||||
|
||||
Is this method easy to read? The method does a lot of things: it invokes
|
||||
other methods, checks conditions, calculates values and sets them to variables.
|
||||
Even more importantly, all of that are different computational steps of the
|
||||
method. However, they appear in the code one by one without any gaps. So when
|
||||
a human eye is reading this code there isn't any element in the code that would
|
||||
tell us where the next step of the algorithm starts. And a blank line can
|
||||
naturally play a role of such element.
|
||||
|
||||
Better Code
|
||||
^^^^^^^^^^^
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def update_task_state(self, state, state_info=None):
|
||||
old_task_state = self.task_ex.state
|
||||
|
||||
if states.is_completed(self.task_ex.state):
|
||||
self.notify(old_task_state, self.task_ex.state)
|
||||
|
||||
return
|
||||
|
||||
if not states.is_valid_transition(self.task_ex.state, state):
|
||||
return
|
||||
|
||||
child_states = [a_ex.state for a_ex in self.task_ex.executions]
|
||||
|
||||
if state == states.RUNNING and states.PAUSED in child_states:
|
||||
return
|
||||
|
||||
self.set_state(state, state_info)
|
||||
|
||||
if states.is_completed(self.task_ex.state):
|
||||
self.register_workflow_completion_check()
|
||||
|
||||
self.notify(old_task_state, self.task_ex.state)
|
||||
|
||||
Now when we read the method, we clearly see the individual steps:
|
||||
|
||||
- Saving an old task state for further usage
|
||||
- Check if task is already completed and if it is, notify clients about it
|
||||
and return
|
||||
- Check if the state transition we're going to make is valid
|
||||
- Calculate state of the child entities
|
||||
- Do not proceed with updating the task state if there any running or paused
|
||||
child entities
|
||||
- Actually update the task state
|
||||
- If the task is now completed, schedule the corresponding workflow completion
|
||||
check
|
||||
- Notify clients about a task transition
|
||||
|
||||
Of course, when writing code like this it may be hard to format code this way
|
||||
in the first place. But once we already have some version of code, we should
|
||||
take care of people who will surely be reading it in future. After all, the
|
||||
author may be reading it after some time. Again: programs are read much more
|
||||
often than written. So we need to make sure our code tells a good story about
|
||||
what it serves to.
|
||||
|
||||
As far as putting a blank line before "if", "try", "for" and "with", the
|
||||
reasoning is pretty straightforward: all these code flow controls already
|
||||
reflect separate computational steps because they do something that's
|
||||
different from the previous command by nature. For example, "if" may route a
|
||||
program in a different direction at runtime. So all these blocks should be
|
||||
clearly visible to a reader. "return" is also an outstanding command since it
|
||||
stops the execution of the current method and gives control back to the caller
|
||||
of the method. So it also deserves to be well visible.
|
||||
|
||||
Using blank lines consciously can also make code more symmetric. That is,
|
||||
if we don't mix up significantly different commands.
|
||||
|
||||
Problematic Code
|
||||
^^^^^^^^^^^^^^^^
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
var1 = "some value"
|
||||
my_obj.method1()
|
||||
my_obj.method2()
|
||||
var2 = "another value"
|
||||
|
||||
... # The rest of the code snippet
|
||||
|
||||
What's wrong with this code? The thing is that we mixed up lines where we do
|
||||
absolutely different things. Two of them just do set string values to the two
|
||||
new variables whereas the other two send messages to a different object, i.e.
|
||||
give it command to do something. In other words, the two lines here are more
|
||||
abstract than the two others since they don't run any concrete calculation,
|
||||
it is hidden by method calls against other object. So this code is not
|
||||
symmetric, it doesn't group commands of similar nature together and it doesn't
|
||||
separate them from each other.
|
||||
|
||||
Better Code
|
||||
^^^^^^^^^^^
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
var1 = "some value"
|
||||
var2 = "another value"
|
||||
|
||||
my_obj.method1()
|
||||
my_obj.method2()
|
||||
|
||||
... # The rest of the code snippet
|
||||
|
||||
This code fixes the mentioned issues and note that, again, a blank line
|
||||
clearly communicates that a more abstract block starts and that this block
|
||||
can and should be maintained separately.
|
||||
|
||||
|
||||
Multi-line Method Calls
|
||||
-----------------------
|
||||
|
||||
Guidelines
|
||||
^^^^^^^^^^
|
||||
- *Long method calls that don't fit into 80 characters must be written down
|
||||
in a way that each argument is located on an individual line, as well as
|
||||
the closing round bracket.*
|
||||
|
||||
All code lines need to be not longer than 80 characters. Once in a while it's
|
||||
required to break lines when we deal with long instructions. For example, when
|
||||
we need to write a method call with lots of arguments or the names of the
|
||||
arguments are long enough so the entire code instruction doesn't fit into
|
||||
80 characters.
|
||||
|
||||
Problematic Code
|
||||
^^^^^^^^^^^^^^^^
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
executor.run_action(self.action_ex.id, self.action_def.action_class,
|
||||
self.action_def.attributes or {}, self.action_ex.input,
|
||||
self.action_ex.runtime_context.get('safe_rerun', False),
|
||||
execution_context, target=target, timeout=timeout)
|
||||
|
||||
On many Python projects this way or breaking lines for long method calls
|
||||
is considered right. However, when we need to read and understand this code
|
||||
quickly we may experience the following issues:
|
||||
|
||||
- Hard to see where one argument ends and where another one starts
|
||||
- Hard to check if the order of arguments is correct
|
||||
- If such method call declaration is followed by another code line (e.g.
|
||||
another method call) then it's hard to see where the method call
|
||||
declaration ends
|
||||
|
||||
Better Code
|
||||
^^^^^^^^^^^
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
executor.run_action(
|
||||
self.action_ex.id,
|
||||
self.action_def.action_class,
|
||||
self.action_def.attributes or {},
|
||||
self.action_ex.input,
|
||||
self.action_ex.runtime_context.get('safe_rerun', False),
|
||||
execution_context,
|
||||
target=target,
|
||||
timeout=timeout
|
||||
)
|
||||
|
||||
Although the second version of the method call sacrifices conciseness to
|
||||
some extent, it eliminates the issues mentioned above. Every method argument
|
||||
is easily visible, it's easy to check the number of the arguments and their
|
||||
order (e.g. to compare with the method signature) and it's easy to see where
|
||||
the entire command ends because the ending round bracket on a separate line
|
||||
communicates it clearly.
|
||||
|
@ -4,6 +4,7 @@ Developer's Reference
|
||||
.. toctree::
|
||||
:maxdepth: 3
|
||||
|
||||
coding_guidelines
|
||||
creating_custom_action
|
||||
extending_yaql
|
||||
asynchronous_actions
|
||||
|
Loading…
Reference in New Issue
Block a user