Delta Tables Schema - Text Lines
See Delte Egress Sink page to learn more about sending data to Delta Tables in general.
When using this schema, single table is created right at the Directory Path
specified in the Egress Route configuration.
The incoming message's payload is expected to be in a textual row-based format, such as ndjson, jsonlines, or CSV.
Delta egress is agnostic to the specific format of each row/line but treats each row (delimited by the LF
/ /n
character) individually.
Each row is mapped to a single Delta Table row. Therefore, a single message is mapped to multiple Delta Table rows if it contains multiple lines.
Schema
Output tables always contain the following columns:
Name | Type | Example | Description |
---|---|---|---|
payload_line | string | {"property1": 42} | Line from message payload. Leading and trailing whitespace is trimmed. |
payload_line_number | long | 42 | Number of the line in the message that the payload was parsed from. |
kind | string | Message | Identifier that allows to distinguish amongst different kinds of events. Currently, this is always set to Message because no other event kinds are supported. |
stream_group_name | string | group-a | Name of the stream group the message was sent into. |
stream_name | string | telemetry | Name of the stream the message was sent into. |
device_id | string | robot-125 | Id of the device that sent the message. |
batch_id | string | 2023-12-19 | Identifier of batch. It is provided by device or auto-filled by the platform (if configured). |
batch_slice_id | string | logs | Identifier of batch slice (if it was provided by the device). |
message_id | string | m00767 | Identifier of the message. It is provided by device or auto-filled by the platform (if configured). |
workspace_id | string | 69f09b3f-ec0d-4b9e-a5ec-87150b935296 | Identifier of the Workspace that originating Device and Stream belong into. Formatted as GUID/UUID with 32 hexadecimal digits (lowercase) separated by hyphens. |
ingress_enqueued_date_time | timestamp | 2023-12-19T11:25:56.1408925+01:00 | Time when the Message was ingested by the platform. ISO 8601 format. |
ingress_enqueued_date | date | 2023-12-05 | The UTC date generated from ingress_enqueued_date_time. |
Spark SQL - interpretable schema
payload_line STRING,
payload_line_number LONG,
kind STRING,
stream_group_name STRING,
stream_name STRING,
site_id STRING,
device_id STRING,
batch_id STRING,
batch_slice_id STRING,
message_id STRING,
workspace_id STRING,
ingress_enqueued_date_time TIMESTAMP,
ingress_enqueued_date DATE
Partition key columns
Tables are partitioned by ingress_enqueued_date
column.