Add the 1st version of Mistral coding guidelines into the docs

Change-Id: I07f3f7e83f7521c320f6b21b806692289c8a01ab
2019-12-12 17:50:18 +07:00 · 2019-12-12 17:50:18 +07:00 · 63ab3d4062
parent 0e0ab09d54
commit 63ab3d4062
2 changed files with 545 additions and 0 deletions
--- a/doc/source/contributor/coding_guidelines.rst
+++ b/doc/source/contributor/coding_guidelines.rst
@ -0,0 +1,544 @@
+Mistral Coding Guidelines
+=========================
+
+Why learn more coding guidelines?
+---------------------------------
+
+This document contains the description of the guidelines used for writing
+code on the Mistral project. Some of the guidelines follow from the
+nature of Python programming language (dynamic types, etc), some are the
+result of the consensus achieved by the project contributors during many
+contribution cycles. All contributors need to follow these guidelines
+when contributing to Mistral. The purpose of having such well described
+practices is improving communication between team members, reducing a number
+of controversial situations related to how a certain code snippet should
+be written, and letting contributors focus on unique engineering tasks
+rather than low level decisions that any engineer makes many times a day,
+like choosing a good name for a class or variable, or how to organise a loop,
+whether they need to put a blank line before "if" or "return", in what cases
+and so on. The document, when accepted and followed by team members, aims
+to improve overall development speed and quality.
+
+Note that the guidelines described below almost don't conflict with the
+official PEP8 style guide (https://www.python.org/dev/peps/pep-0008/)
+describing how any Python program should be formatted. Mistral guidelines
+add more high level semantics on top of it. PEP8 still should be considered
+a necessary base for those who write Python programs. Strictly speaking,
+this guide is not exactly about style, it's about writing maintainable code.
+
+Some of the concepts being discussed below may seem a bit too philosophical
+but, in fact, they reflect our real experience of solving practical tasks.
+According to it, some decisions work well and some don't given the Python
+programming nature and the nature of the project.
+
+The guidelines are based on the three main values:
+
+- **Communication.** When writing code we always try to create it in a way
+  that it's easy to read and understand. This is important because most of
+  the time developers spend on reading existing code, not writing new.
+- **Simplicity.** It makes sense to write code that use minimal means that
+  solves a task at hand.
+- **Flexibility.** In most cases it's better to keep options open because
+  in programming there's no such thing as "done". Pretty much all code gets
+  modified during lifecycle of a program working in production. This is a
+  very important difference form designing real buildings where we can't
+  rebuild a basement after a building is completed and inhabited by people.
+
+In the document, there is a number of code snippets, some are considered
+incorrect and some are correct, with explanations why from the perspective
+of these key values.
+
+Further sections are devoted to certain aspects of writing readable code.
+They will typically have a list of guidelines upfront so it's easy to find
+them in the text and then more details on them, if needed.
+
+Naming
+------
+
+Naming is always important in programing. Good naming should always serve
+for better communication, i.e. a name of a code entity (module, variable,
+class etc.) should convey a clear message to a code reader about the meaning
+of the entity. For dynamic languages naming is even more important than for
+languages with static types. Below it will be shown why.
+
+Using Abbreviations
+^^^^^^^^^^^^^^^^^^^
+
+Guidelines
+''''''''''
+
+ - *Use well-known abbreviations to name variables, method arguments, constants*
+
+Well-known abbreviations (shortcuts) used for names of constants, local
+variables and method arguments help simplify code, since it becomes less
+verbose, and simultaneously improve communication because all team members
+understand them the same way. Below is a list of abbreviations used on the
+Mistral project. The list is not final and is supposed to grow over time.
+
+ - **app** - application
+ - **arg** - argument
+ - **cfg** - configuration
+ - **cls** - class
+ - **cmd** - command
+ - **cnt** - count
+ - **ctx** - context
+ - **db** - database
+ - **desc** - description
+ - **dest** - destination
+ - **def** - definition
+ - **defs** - definitions
+ - **env** - environment
+ - **ex** - execution
+ - **execs** - executions
+ - **exc** - exception
+ - **log** - logger
+ - **ns** - namespace
+ - **info** - information
+ - **obj** - object
+ - **prog** - program
+ - **spec** - specification
+ - **sync** - synchronous
+ - **wf** - workflow
+ - **wfs** - workflows
+
+Note that when using a well-known abbreviation to name a constant we need to
+use capital letters only (just as required by PEP8).
+
+Local Variables and Method Arguments
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Guidelines
+''''''''''
+
+ - *The name of a method argument should clearly communicate its semantics
+   and purpose*
+ - *The name of a method argument or a local variable should include a type
+   name if it creates an ambiguity otherwise*
+ - *The name of a local variable can be shortened from a full word (up to one
+   letter) if the scope of the variable is very narrow (several lines) and if
+   it doesn't create ambiguities*
+
+Principles of choosing names in dynamic languages like Python are especially
+important because oftentimes we don't clearly see of what type a certain
+variable is. Whereas a type itself (no matter if it's primitive or represented
+by a class) carries very important information about all objects of this type.
+For example, if we see a variable in Java or C++ code it's not a problem to
+find out of what type this variable is, we can easily find where the variable
+is defined. Any modern IDE also allows to navigate to a type declaration (e.g.
+if it's a class) just by clicking on it and see all required info about it.
+In Python, it's often much more problematic to find a variable type. Injecting
+a type name, at least a shortcut, into a variable name often helps mitigate
+this fundamental issue and improve code readability, and hence communication.
+
+Below are the code snippets that help illustrate this problem. Let's assume we
+have the following Python classes:
+
+ .. code-block:: python
+
+    class TaskExecution(Execution):
+        # Carries information about a running task.
+        ...
+
+    class TaskSpec(BaseSpec):
+        # Defines a work logic of a particular task.
+        ...
+
+For what we want to illustrate, it's not important if they belong to the same
+module or different.
+
+Problematic Code
+''''''''''''''''
+
+ .. code-block:: python
+
+    def calculate_next_tasks(self, task, context):
+        result = set()
+
+        for clause in task.get_on_clauses():
+            task_names = self.evaluate_on_clause(clause, context)
+
+            tasks = [self.get_task_by_name(t_n) for t_n in task_names]
+
+            result.update(tasks)
+
+        return result
+
+Is this method easy to understand? Well, if this code is part of a small
+program (e.g. a 200-300 lines script) is small then it may be ok. But when
+it's a system with dozens of thousands lines of code then it has a number of
+issues.
+
+The most important issue is that we don't know the type of the "task" method
+argument. In order to find it we'll have to see where this method is called
+and what is passed as an argument. Then, if an object is not created there
+we'll have to go further and find callers of that method too, and so on until
+we find where the object is actually instantiated. The longer the method call
+chains are, the worse. So, jus by looking at this code we can't determine
+whether the argument "task" is of type TaskSpec or TaskExecution. So for
+example, we can't even say for sure if ``task.get_on_clauses()`` is a correct
+instruction. If we have hundreds of places like this, it is very challenging
+to read code and it is very easy to make a mistake when modifying it.
+
+The other obvious issues are also related to naming:
+
+#. It's not clear objects of what type are returned by the method
+#. It's not clear objects of what type is returned by
+   ``self.get_task_by_name(t_n)``
+
+Better Code
+'''''''''''
+
+ .. code-block:: python
+
+    def calculate_next_task_specs(self, task_spec, context):
+        result = set()
+
+        for clause in task_spec.get_on_clauses():
+            task_names = self.evaluate_on_clause(clause, context)
+
+            task_specs = [self.get_task_spec_by_name(t_n) for t_n in task_names]
+
+            result.update(task_specs)
+
+        return result
+
+Now we won't be confused. Of course, we still have to remember we have those
+two classes but at least we have a clear pointer to a type.
+
+Functions and Methods
+^^^^^^^^^^^^^^^^^^^^^
+
+Guidelines
+''''''''''
+
+ - *The name of a method should clearly communicate its semantics and purpose*
+ - *The name of a method should include a type name of a returned value when it
+   creates an ambiguity otherwise*
+
+
+For example, there are two classes like these again:
+
+ .. code-block:: python
+
+    class TaskExecution(Execution):
+        # Carries information about a running task.
+        ...
+
+    class TaskSpec(BaseSpec):
+        # Defines a work logic of a particular task.
+        ...
+
+And we need a method that implements some logic and returns an instance of
+**TaskSpec**. How would we choose a good name for the method?
+
+Problematic Code
+''''''''''''''''
+
+.. code-block:: python
+
+    def calculate_parent_task(self, task_spec):
+        ...
+
+Looking at this method signature it's not clear what to expect as a returned
+value since in Python method declarations don't contain a returned type value.
+Although there's a temptation to use just "task" in the method name it leads
+to the naming ambiguity: a returned value may be either of **TaskExecution**
+or **TaskSpec** type. Strictly speaking, it may be any type since it's Python
+but we can help a reader of this code a little bit and leave a hint about
+what's going to be returned from the method when it's called.
+
+At first glance, it may seem a bit weird why we pay attention to such things.
+One may say that it's already totally clear that need to use full "task_spec"
+in the method name instead of "task" just according to the name of the class.
+Why anybody may want to use "task"? So, the reality is that during informal
+communication within a team people tend to simplify/shorten words. That is,
+when several team members are working on something related to task
+specifications within a certain scope (module, class, etc.) they often move
+to using a simplified terminology and instead of saying "task specification"
+they start saying just "task". And this kind of habit often also sneaks into
+code. And eventually it breaks communication between code author and code
+reader.
+
+Better Code
+'''''''''''
+
+.. code-block:: python
+
+    def calculate_parent_task_spec(self, task_spec):
+        ...
+
+Although it's more verbose, it allows to mitigate the naming ambiguity for
+a code reader.
+
+Constants
+^^^^^^^^^
+
+Guidelines
+''''''''''
+
+ - *All hardcoded values (strings, numbers, etc.) must be defined as
+   constants, i.e. global module variables*
+ - *Constants must be defined in the beginning of a module*
+
+Problematic Code
+''''''''''''''''
+
+.. code-block:: python
+
+    def print_task_report_line(self, line, level=0):
+        self.app.stdout.write(
+            "task: %s%s\n" % (' ' * (level * 4), line)
+        )
+
+    ...
+
+    def print_action_report_line(self, line, level=0):
+        self.app.stdout.write(
+            "action: %s%s\n" % (' ' * (level * 4), line)
+        )
+
+The problem of this code is that it uses hard-coded integer values at
+different places. Why is it a problem?
+
+ - It's hard to find hard-coded values while reading code
+ - It's hard to understand whether same values mean semantically the same
+   thing, or different
+ - It's easy to make a mistake when changing a hard-coded value because we
+   can change it at one place and miss other places
+
+Better Code
+'''''''''''
+
+.. code-block:: python
+
+    REPORT_ENTRY_INDENT = 4
+    DEFAULT_ENTRY_LEVEL = 0
+
+    def print_task_report_line(self, line, level=DEFAULT_ENTRY_LEVEL):
+        self.app.stdout.write(
+            "task: %s%s\n" % (' ' * (level * REPORT_ENTRY_INDENT), line)
+        )
+
+    ...
+
+    def print_action_report_line(self, line, level=DEFAULT_ENTRY_LEVEL):
+        self.app.stdout.write(
+            "action: %s%s\n" % (' ' * (level * REPORT_ENTRY_INDENT), line)
+        )
+
+Now the code clearly communicates to a reader that value 4 in these two
+methods means exactly the same entity: an indent of any report entry. The
+other constant similarly adds clarity about value 0. Previously, these two
+integers were nameless. It's now also easy to change values, we just need
+to set different values to the named constants.
+
+Grouping and Blank Lines
+------------------------
+
+Guidelines
+^^^^^^^^^^
+ - *Use blank lines to split individual steps (units) of algorithms.*
+ - *Always put blank lines before and after "if", "for", "try" and "with"
+   blocks if they don't contain one another w/o anything else in between.*
+ - *Always put a blank line before "return" unless it's the only instruction
+   in en enclosing code block like a method or an "if" block.*
+ - *Use blank lines to split logically not symmetric lines, e.g. less
+   abstract and more abstract lines like variable assignment and a method
+   call.*
+ - *Put a blank line after any call to a superclass.*
+
+Although for someone it may not seem important, blank lines consciously put
+in code can improve readability. The general recommendation is to use blank
+lines to separate different logical blocks. When writing code it is useful
+to ask ourselves the question "what are the main steps of the algorithm I'm
+implementing?". When answered, it gives understanding of how code can be
+decomposed into sections. And in order to reflect that they are individual
+steps of the algorithm, the corresponding code blocks can be split by blank
+lines. Let's consider examples.
+
+Problematic Code
+^^^^^^^^^^^^^^^^
+
+.. code-block:: python
+
+    def update_task_state(self, state, state_info=None):
+        old_task_state = self.task_ex.state
+        if states.is_completed(self.task_ex.state):
+            self.notify(old_task_state, self.task_ex.state)
+            return
+        if not states.is_valid_transition(self.task_ex.state, state):
+            return
+        child_states = [a_ex.state for a_ex in self.task_ex.executions]
+        if state == states.RUNNING and states.PAUSED in child_states:
+            return
+        self.set_state(state, state_info)
+        if states.is_completed(self.task_ex.state):
+            self.register_workflow_completion_check()
+        self.notify(old_task_state, self.task_ex.state)
+
+Is this method easy to read? The method does a lot of things: it invokes
+other methods, checks conditions, calculates values and sets them to variables.
+Even more importantly, all of that are different computational steps of the
+method. However, they appear in the code one by one without any gaps. So when
+a human eye is reading this code there isn't any element in the code that would
+tell us where the next step of the algorithm starts. And a blank line can
+naturally play a role of such element.
+
+Better Code
+^^^^^^^^^^^
+
+.. code-block:: python
+
+    def update_task_state(self, state, state_info=None):
+        old_task_state = self.task_ex.state
+
+        if states.is_completed(self.task_ex.state):
+            self.notify(old_task_state, self.task_ex.state)
+
+            return
+
+        if not states.is_valid_transition(self.task_ex.state, state):
+            return
+
+        child_states = [a_ex.state for a_ex in self.task_ex.executions]
+
+        if state == states.RUNNING and states.PAUSED in child_states:
+            return
+
+        self.set_state(state, state_info)
+
+        if states.is_completed(self.task_ex.state):
+            self.register_workflow_completion_check()
+
+        self.notify(old_task_state, self.task_ex.state)
+
+Now when we read the method, we clearly see the individual steps:
+
+ - Saving an old task state for further usage
+ - Check if task is already completed and if it is, notify clients about it
+   and return
+ - Check if the state transition we're going to make is valid
+ - Calculate state of the child entities
+ - Do not proceed with updating the task state if there any running or paused
+   child entities
+ - Actually update the task state
+ - If the task is now completed, schedule the corresponding workflow completion
+   check
+ - Notify clients about a task transition
+
+Of course, when writing code like this it may be hard to format code this way
+in the first place. But once we already have some version of code, we should
+take care of people who will surely be reading it in future. After all, the
+author may be reading it after some time. Again: programs are read much more
+often than written. So we need to make sure our code tells a good story about
+what it serves to.
+
+As far as putting a blank line before "if", "try", "for" and "with", the
+reasoning is pretty straightforward: all these code flow controls already
+reflect separate computational steps because they do something that's
+different from the previous command by nature. For example, "if" may route a
+program in a different direction at runtime. So all these blocks should be
+clearly visible to a reader. "return" is also an outstanding command since it
+stops the execution of the current method and gives control back to the caller
+of the method. So it also deserves to be well visible.
+
+Using blank lines consciously can also make code more symmetric. That is,
+if we don't mix up significantly different commands.
+
+Problematic Code
+^^^^^^^^^^^^^^^^
+
+.. code-block:: python
+
+    var1 = "some value"
+    my_obj.method1()
+    my_obj.method2()
+    var2 = "another value"
+
+    ... # The rest of the code snippet
+
+What's wrong with this code? The thing is that we mixed up lines where we do
+absolutely different things. Two of them just do set string values to the two
+new variables whereas the other two send messages to a different object, i.e.
+give it command to do something. In other words, the two lines here are more
+abstract than the two others since they don't run any concrete calculation,
+it is hidden by method calls against other object. So this code is not
+symmetric, it doesn't group commands of similar nature together and it doesn't
+separate them from each other.
+
+Better Code
+^^^^^^^^^^^
+
+.. code-block:: python
+
+    var1 = "some value"
+    var2 = "another value"
+
+    my_obj.method1()
+    my_obj.method2()
+
+    ... # The rest of the code snippet
+
+This code fixes the mentioned issues and note that, again, a blank line
+clearly communicates that a more abstract block starts and that this block
+can and should be maintained separately.
+
+
+Multi-line Method Calls
+-----------------------
+
+Guidelines
+^^^^^^^^^^
+ - *Long method calls that don't fit into 80 characters must be written down
+   in a way that each argument is located on an individual line, as well as
+   the closing round bracket.*
+
+All code lines need to be not longer than 80 characters. Once in a while it's
+required to break lines when we deal with long instructions. For example, when
+we need to write a method call with lots of arguments or the names of the
+arguments are long enough so the entire code instruction doesn't fit into
+80 characters.
+
+Problematic Code
+^^^^^^^^^^^^^^^^
+
+.. code-block:: python
+
+    executor.run_action(self.action_ex.id, self.action_def.action_class,
+        self.action_def.attributes or {}, self.action_ex.input,
+        self.action_ex.runtime_context.get('safe_rerun', False),
+        execution_context, target=target, timeout=timeout)
+
+On many Python projects this way or breaking lines for long method calls
+is considered right. However, when we need to read and understand this code
+quickly we may experience the following issues:
+
+ - Hard to see where one argument ends and where another one starts
+ - Hard to check if the order of arguments is correct
+ - If such method call declaration is followed by another code line (e.g.
+   another method call) then it's hard to see where the method call
+   declaration ends
+
+Better Code
+^^^^^^^^^^^
+
+.. code-block:: python
+
+    executor.run_action(
+        self.action_ex.id,
+        self.action_def.action_class,
+        self.action_def.attributes or {},
+        self.action_ex.input,
+        self.action_ex.runtime_context.get('safe_rerun', False),
+        execution_context,
+        target=target,
+        timeout=timeout
+    )
+
+Although the second version of the method call sacrifices conciseness to
+some extent, it eliminates the issues mentioned above. Every method argument
+is easily visible, it's easy to check the number of the arguments and their
+order (e.g. to compare with the method signature) and it's easy to see where
+the entire command ends because the ending round bracket on a separate line
+communicates it clearly.
+
--- a/doc/source/contributor/index.rst
+++ b/doc/source/contributor/index.rst
@ -4,6 +4,7 @@ Developer's Reference
 .. toctree::
   :maxdepth: 3

+   coding_guidelines
   creating_custom_action
   extending_yaql
   asynchronous_actions