Rules

Basic Functionality

How processors process log messages is defined via configurable rules. Each rule contains a filter that is used to select log messages. Other parameters within the rules define how certain log messages should be transformed. Those parameters depend on the processor for which they were created.

Rule Files

Rules are defined as YAML objects or JSON objects. Rules can be distributed over different files or multiple rules can reside within one file. Each file contains multiple YAML documents or a JSON array of JSON objects. The YAML format is preferred, since it is a superset of JSON and has better readability.

Depending on the filter, a rule can trigger for different types of messages.

Further details can be found in the section for processors.

Example structure of a YAML file with a rule for the labeler processor

filter: 'command: execute'  # A comment
labeler:
  label:
    action:
    - execute
description: '...'

Example structure of a YAML file containing multiple rules for the labeler processor

filter: 'command: "execute something"'
labeler:
  label:
    action:
    - execute
description: '...'
---
filter: 'command: "terminate something"'
labeler:
  label:
    action:
    - execute
description: '...'

Example structure of a JSON file with a rule for the labeler processor

{
  "filter": "command: execute",
  "labeler": {
    "label": {
      "action": ["execute"]
    }
  }
  "description": "..."
}

Example structure of a JSON file containing multiple rules for the labeler processor

[
  {
    "filter": "command: execute",
    "labeler": {
      "label": {
        "action": ["execute"]
      }
    }
    "description": "..."
  },
  {
    "filter": "command: execute",
    "labeler": {
      "label": {
        "action": ["execute"]
      }
    }
    "description": "..."
  }
]

Log message field value access

All rules reference fields or field values of log messages. This can be done via the dot notation. To reference a nested field inside the log event, just give the whole path from the event root to the desired field. To reference the field information in the following example you would use the following notation: more.nested.information. If you do want to access a specific item inside a list of the event you can extend the dotted notation with indices. Given the following example you can access the list element lists with the following notation: more.nested.sometimes.1. In case you want to have more than one element then you can slice the list with the pattern start:stop:step_size, e.g: more.nested.sometimes.0:2 which would return ["inside", "lists"]. This slicing is based on the native python list slicing.

Example Event

{
  "some": "data",
  "more": {
    "nested": {
      "information": "is here",
      "sometimes": ["inside", "lists", "of", "elements"]
    }
  }
}

Warning

The dotted field notation is available in all processors, the use of indices to access list elements is though not available in the Clusterer, Labeler and the Pseudonymizer.

Filter

The filters are based on the Lucene query language, but contain some additional enhancements. It is possible to filter for keys and values in log messages. Dot notation is used to access subfields in log messages. A filter for {'field': {'subfield': 'value'}} can be specified by field.subfield': 'value'.

If a key without a value is given it is filtered for the existence of the key. The existence of a specific field can therefore be checked by a key without a value. The filter filter: field.subfield would match for every value subfield in {'field': {'subfield': 'value'}}. The special key * can be used to always match on any input. Thus, the filter filter: * would match any input document.

The filter in the following example would match fields ip_address with the value 192.168.0.1. Meaning all following transformations done by this rule would be applied only on log messages that match this criterion. This example is not complete, since rules are specific to processors and require additional options.

Example

{ "filter": "ip_address: 192.168.0.1" }

It is possible to use filters with field names that contain white spaces or use special symbols of the Lucene syntax. However, this has to be escaped. The filter filter: 'field.a subfield(test): value' must be escaped as filter: 'field.a\subfield(test): value'. Other references to this field do not require such escaping. This is only necessary for the filter. It is necessary to escape twice if the file is in the JSON format - once for the filter itself and once for JSON.

Operators

A subset of Lucene query operators is supported:

NOT: Condition is not true.
AND: Connects two conditions. Both conditions must be true.
OR: Connects two conditions. At least one them must be true.

In the following example log messages are filtered for which event_id: 1 is true and ip_address: 192.168.0.1 is false. This example is not complete, since rules are specific to processors and require additional options.

Example

{ "filter": "event_id: 1 AND NOT ip_address: 192.168.0.1" }

RegEx-Filter

It is possible to use regex expressions to match values. To be recognized as a regular expression, the filter field has to start with /.

Example

filter: 'ip_address: /192\.168\.0\..*/'

[Deprecated, but still functional] The field with the regex pattern must be added to the optional field regex_fields in the rule definition.

In the following example the field ip_address is defined as regex field. It would be filtered for log messages in which the value ip_address starts with 192.168.0.. This example is not complete, since rules are specific to processors and require additional options.

Example

filter: 'ip_address: "192\.168\.0\..*"'
regex_fields:
- ip_address

RuleTree

For performance reasons on startup, all rules per processor are aggregated to a rule tree. Instead of evaluating all rules independently for each log message, the message is checked against the rule tree. Each node in the rule tree represents a condition that has to be met, while the leaves represent changes that the processor should apply. If no condition is met, the processor will just pass the log event to the next processor.

Rule Tree Configuration

To further improve the performance, it is possible to prioritize specific nodes of the rule tree, such that broader conditions are higher up in the tree. And specific conditions can be moved further down. The following json gives an example of such a rule tree configuration. This configuration will lead to the prioritization of category and message in the rule tree.

{
  "priority_dict": {
    "category": "01",
    "message": "02"
  },
  "tag_map": {
    "check_field_name": "check-tag"
  }
}

A path to a rule tree configuration can be set in any processor configuration under the key tree_config.