Pipeline Rules

All posts tagged Pipeline Rules

I encountered a problem where certain messages being sent to our Graylog instance had fields that were larger than ElasticSearch / Lucene limit of 32kb, thus failing to be indexed because of that one field. Kind of wished that Graylog had more intelligent way to handle these… There are bunch of people that encounter problems like these, just search for any part of this error and you’ll see many complaints.

{"type":"illegal_argument_exception","reason":"Document contains at least one immense term in field=\"Field_name\" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[...]...', original message: bytes can be at most 32766 in length; got 32773","caused_by":{"type":"max_bytes_length_exceeded_exception","reason":"max_bytes_length_exceeded_exception: bytes can be at most 32766 in length; got 32773"}}

After a while of searching for a solution, there was none. Not a ready to use solution at least. Graylog support suggested splitting the field on several forum posts without specific instructions. So, I spent some amount of time and figured one way to do it using substring function (described here).

The rule I’ve created will first check if the fields you want to deal with even exist, no need to process the rule and waste CPU cycles if the field is absent. Then the rule will generate additional fields for any data beyond 32KB and named them “_continued_X” up to 3 new fields each of 32KB. Totaling 4 fields in total for a maximum of 128KB. Any fields smaller than 32KB will also be processed, but the field name and content will effectively stay the same.

Before you begin, make sure the field you are trying to split is properly parsed by Graylog (i.e. you have appropriately configured input and/or extractors). Then create a new pipeline or add to existing.

Create a new pipeline rule based on the following code, link the rule to proper streams and stages.

rule "Split_a_field_larger_than_32kb"
let any_var_name = to_string($message.your_field_name);
set_field("your_field_name", substring(any_var_name, 0, 32766));
set_field("your_field_name_continued_1", substring(any_var_name, 32766, 65532));
set_field("your_field_name_continued_2", substring(any_var_name, 65532, 98298));
set_field("your_field_name_continued_3", substring(any_var_name, 98298, 131064));

Obviously, modify the rule to suite your environment, primarily the filed name. Variable name can be anything.
Amount of fields can also be reduced/increased the way you want.

I hope that this will save time for you.