WIP Add fard disks failure detection example

Change-Id: I06c128048456f8dee4245956a56b8216e7d2c56c
2019-11-02 15:25:06 +00:00 · 2019-11-02 15:25:06 +00:00 · 0bcc22a3b4
parent 5dbeff5746
commit 0bcc22a3b4
3 changed files with 164 additions and 0 deletions
--- a/doc/examples.md
+++ b/doc/examples.md
@ -244,6 +244,170 @@ the source input format. So you can typically use it anywhere in existing data
 flow. In this mode, only root cause alerts are preserved. A result example
 under this mode is shown below.

+## Disk Failure prediction using SMART data
+
+Hard disk drives (HDDs) are most fragile part of systems and consequences of disk failures can be difficult to recover, or even unrecoverable.
+To ensure the reliability and stability of systems, it is crucial to monitor the working conditions of HDDs in real time and detect soon-to-fail HDDs by sensors.
+
+This example shows how MoNanas ingests SMART (self-monitoring and repair technology) data, trains a classification algorithm(Random forest), and then uses the trained  algorithm to detect the likelihood failure of the HDDs.  
+
+To train the model, we use publically available [SMART dataset](https://www.backblaze.com/b2/hard-drive-test-data.html), which is collected in the Backblaze data center. Backblaze run tens of thousands hard drives from 2013 and representing the largest public SMART dataset.
+
+### Running the Example
+To test this example, perform the following steps.
+
+#### Configuration File
+Before running MoNanas, we need to create a configuration file describing how
+we want MoNanas to orchestrate the data execution (creating a pipeline). You can find the following configuration in `$MONANAS_HOME/config/smart.json`:
+
+
+
+
+```json
+{
+    "spark_config": {
+        "appName": "testApp",
+        "streaming": {
+            "batch_interval": 1
+        }
+    },
+    "server": {
+        "port": 3000,
+        "debug": false
+    },
+    "sources": {
+        "src1": {
+            "module": "SmartFile",
+            "params": {
+               "dir": "/var/tmp/source_data/"
+            }
+        }
+    },
+    "ingestors": {
+        "ing1": {
+            "module": "SmartIngestor"
+        }
+    },
+    "smls": {
+        "sml1": {
+            "module": "RandomForestClassifier",
+        }
+    },
+    "voters": {
+        "vot1": {
+            "module": "PickIndexVoter",
+            "index": 0
+        }
+    },
+    "sinks": {
+        "snk3": {
+            "module": "KafkaSink",
+            "host": "127.0.0.1",
+            "port": 9092,
+            "topic": "my_topic"
+        },
+        "snk2": {
+            "module": "StdoutSink"
+        }
+    },
+    "ldps": {
+        "ldp1": {
+            "module": "Smart"
+        }
+    },
+    "connections": {
+        "src1": ["ing1", "ldp1"],
+        "ing1": [],
+        "sml1": ["vot1"],
+        "vot1": ["ldp1"],
+        "ldp1": ["snk1", "snk2"],
+        "snk1": [],
+        "snk2": [],
+        "snk3": []
+    },
+    "feedback": {}
+}
+```
+
+
+The flow of data execution is defined in `connections`. In this case, data
+are ingested from `src1` where smart data with csv format smart-data are being generated
+using open dataxxxx. The data are then ingested by `ing1` where
+each entry is converted into a format suitable for machine learning algorithm.
+MoNanas uses `numpy.array` as a standard format. Typically, an aggregator is
+responsible for aggregating data from different ingestors but in this scenario,
+there is only one ingestor, hence the implicit aggregator (not defined in the
+configuration) simply forwards the data to `sml1`, which uses Random Forest algorithm
+ to find the disk failures then passes the result to `vot1`.
+
+The voter is configured to pick the output of the first SML function and
+forwards that to `ldp1`. Here, the live data processor transforms data streamed from `src1`
+using prediction result and pushes it to standard output as well as thespecified Kafka server.
+
+#### Run MoNanas
+Start MoNanas as follows:
+
+
+```bash
+python $MONANAS_HOME/run.py -p $SPARK_HOME -c $MONANAS_HOME/config/iptables_anomalies.json \
+  -l $MONANAS_HOME/config/logging.json
+```
+If you want to run your own configuration you will need to change the value of the -c parameter.
+
+#### Start Data Execution
+MoNanas exposes a REST API for controlling the data execution. Use any HTTP
+client to POST with the following request body to MoNanas server (running on your localhost, at port 3000 by default) to start data streaming.
+
+
+```json
+{
+  "action": "start_streaming"
+}
+```
+e.g. using curl from terminal to make a request assuming the server is running
+locally.
+
+```bash
+curl -H "Content-Type: application/json" -X POST \
+  -d '{"action": "start_streaming"}' \
+  http://localhost:3000/api/v1/actions
+```
+
+#### Stop Data Execution
+When you want to stop the example, you can send another HTTP POST to order MoNanas to stop streaming data. In this case, the request body should be:
+
+
+```json
+{
+  "action": "stop_streaming"
+}
+```
+e.g. using curl from terminal to make a request assuming the server is running
+locally.
+
+```bash
+curl -H "Content-Type: application/json" -X POST \
+  -d '{"action": "stop_streaming"}' \
+  http://localhost:3000/api/v1/actions
+```
+
+#### Results
+
+
+The sinks for the transformed (processed) alerts defined in the configuration are via standard output and Kafka server. 
+Therefore, if the hard drive failure is predicted, the output will be displayed in the console. Alternatively, users can subscribe to a queue with the topic "my_topic" using any Kafka client.
+
+A result example is shown below. The value of the `DiskFailure` is identifier specified in input, so you can change the value as you like,  as long as it's constant and unique across all data.
+
+```json
+{
+"DiskFailure": "ST3000DM001_disk001"
+}
+```
+
+
+
+
 ## Anomalies in Rule Firing Patterns

 Some attacks can be recognized by patterns of rules being triggered in an anomalous fashion. For example, a Ping flood, a denial of service attack that consist in sending multiple pings to a system, would trigger IPTable rules handling the Ping much more often than if the system is not being attacked.
--- a/doc/images/disk_failure_prediction.png
+++ b/doc/images/disk_failure_prediction.png
--- a/doc/images/disk_failure_prediction_with_data.png
+++ b/doc/images/disk_failure_prediction_with_data.png