Consume Data in Your Systems
Data coming in from devices into the platform are available in two forms:
Stream Storage - built-in storage where data from all streams are automatically stored. Data of all shapes and sizes can fit here and are available anytime for further processing. No configuration is needed.
Egress Sinks - each stream can also be routed to one or more Egress Sinks, representing various external locations such as Azure Service Bus, AWS S3, or an OpenTelemetry backend. Each egress sink can be configured based on specific user needs.
See Send Data From Devices for more information on sending data.
When to use Stream Storage?
Stream Storage is a simple storage with only a few basic features. However, it is very reliable, scales seamlessly into petabytes of data, and has a bandwidth of several hundreds of Gbps. It is agnostic to the incoming data format, which is stored as binary objects under a predefined location derived from the stream group name, stream name, device ID, and other metadata.
Stream storage can not only store raw messages but it can also batch them into larger objects which might greatly improve the performance of the downstream processing.
Whether batched or not, messages in the stream storage from a single device and stream are always deduplicated based on the batch ID and message ID.
Many use-cases can be covered by using just the stream storage. Especially the following use-cases are a good fit:
- Batch processing (e.g. calculating daily statistics, machine learning).
- Data analytics.
- Backend for a data-intensive apps.
- Hosting of large files.
- Long-term data backup.
When to use Egress Sinks?
Unlike stream storage, egress sinks do not represent one specific storage or technology but are abstractions over many external storages, message brokers, APIs, and more. Currently, supported egress sink kinds are:
- Azure Event Hub
- Azure Service Bus
- SQL Database
- Open Telemetry Endpoint
- Databricks Delta Table
- Includes first-class support for Open Telemetry data.
- Amazon S3
- Grafana
- The platform also provides hosted Grafana to visualise your time-series data.
- It is a special Egress Sink that is available for every Workspace out-of-the-blox.
In constrast with stream storage, the egress sinks:
- They do not provide any long-term data retention directly within the platform. The platform just routes the data to the desired location as soon as possible.
- They are more flexible. Each stream can be routed to one or more egress sinks, and the routing can be customized based on the options provided by the specific egress sink kind.
- By default, the data is not deduplicated. However, some egress sinks can be configured to deduplicate the data (e.g. using primary keys in SQL Database).
Following use-cases are a good fit for egress sinks:
- Stream processing (e.g. realtime anomaly detection).
- Exporting data coming from devices into systems outside of the platform.
- Realtime data visualization.
- Event-based apps.