Codecs
TL;DR: Codecs are used to describe how to decode data from the wire and encode it back to wire format.
Tremor connects to external systems using connectors. Connectors use codecs
to transform the data Tremor receives from connected system participants into a structured value that forms the payload of each and every Tremor event.
Codecs are the means of turning the (mostly binary) data from the wire (e.g. from a TCP connection) into structured values for Tremor events and back into binary wire format.
Each connector can be configured with a codec
.
Usage
If you expect JSON
data from a TCP connection, you need to configure the json
codec.
Example:
define connector tcp_example from tcp_server
with
codec = "json",
config = {
"url": "localhost:12345"
}
end;
This tcp_example
connector is configured to expect JSON
data from each accepted TCP connection. It expects 1 JSON document after the next without a single byte separating them.
Codecs and Preprocessors
If you expect line-delimited JSON
instead, with 1 document per line, you need to add a preprocessor that separates the wire data by newline and feeds each line to the codec.
Preprocessors perform various kinds of preprocessing on the wire data, e.g. splitting data by some separator or decompressing data, and multiple can be configured to operate in a chain. The result of this chain, one or multiple chunks of binary data, is passed on to the codec.
Example:
define connector line_delimited_json_via_tcp from tcp_server
with
preprocessors = [
{
"name": "separate",
"config": {
"separator": "\n"
}
}
],
codec = "json",
config = {
"url": "localhost:65535"
}
end;
This line_delimited_json_via_tcp
connector is now configured to expect 1 JSON
document per line from each accepted TCP connection. Just by adding the separate
Preprocessor.
Codecs and Postprocessors
If we want to send out line delimited JSON
where each JSON document is base64 encoded, we need to use a postprocessor. Postprocessors perform some action on the binary data a codec produces. They can e.g. Split or join the data, compress the data or prefix it with a length-prefix.
Example:
define connector my_tcp_client from tcp_client
with
codec = "json",
postprocessors = [
"base64",
"separate"
],
config = {
"url": "localhost:9200"
}
end;
This my_tcp_client
connector is configured to use 2 postprocessors in a chain. First each event is encoded using the json
codec, then the encoded binary data is base64-encoded by the base64
postprocessor and finally each resulting chunk of base64 data is split from the next by inserting a line delimiter
using the separate
postprocessor.
Codecs share similar concepts to extractors, but differ in their application. Codecs are applied to external data as they are ingested by or egressed from a running Tremor process. Extractors, on the other hand, are used in scripts to extract structured from e.g. strings that are already part of a Tremor event.
Data Format
Tremor's internal data representation is JSON-like. The supported value types are:
- String- UTF-8 encoded
- Numeric (float, integer)
- Boolean
- Null
- Array
- Record (string keys)
- Binary (raw bytes)
Supported Codecs
Codec Name | Description |
---|---|
binary | Raw network endian binary data |
binflux | An efficient binary representation of influx data |
csv | The CSV format as per RFC4180 - constrained to a single line |
dogstatsd | The DogStatsD protocol |
influx | The influx line protocol |
json | The JSON format |
json-sorted | The JSON format |
msgpack | The Msgpack binary format |
null | An drop only codec |
statsd | The statds format |
string | UTF-8 String format |
syslog | The syslog format - IETF and BSD styles |
yaml | The YAML format |