Metrics

Comprehensive description of Spotflow metrics including collection model, configuration, and dashboard usage.

Spotflow provides an end-to-end metrics solution designed specifically for embedded devices.

This page walks you through key aspects of Spotflow metrics, from enabling collection on devices to understanding how metrics are aggregated and visualized in dashboards.

Getting Started with Device Integration

Guide: Gathering Metrics with Zephyr or Nordic nRF Connect SDK

Spotflow offers native metrics integration for devices running Zephyr and Nordic nRF Connect SDK through a lightweight software module. The module can collect system metrics automatically and send them to Spotflow without additional instrumentation.

Transport Protocol

Spotflow metrics use an optimized transport protocol based on TCP, MQTT and TLS and two serialization formats (CBOR and JSON).

MQTT over TLS

All log transmission uses MQTT over TLS (MQTTS) for secure, reliable communication:

TLS Version: 1.2 or higher
Certificate: Let's Encrypt ISRG Root X1 (pre-installed on many systems or included in Spotflow SDKs)
Authentication: Devices authenticate using ingest keys as MQTT passwords and their unique device IDs as MQTT usernames.

Serialization format

There are two protocol flavors for metric ingestion:

CBOR-based: recommended for memory and bandwidth efficiency. Used by Spotflow device modules.
JSON-based: recommended for simplicity and interoperability, especially for custom integrations over MQTT.

When building a custom MQTT integration, JSON metric payloads can be used even if your device-side SDK uses CBOR by default.

Publish metric messages to the ingest-json topic using the following JSON schema:

{
    "messageType": "METRIC",
    "metricName": "cpu_utilization_percent",

    // Optional for PT0S/PT1M/PT1H/P1D metrics, present for aggregated metrics
    "aggregationInterval": "PT1M",

    // Optional labels for dimensional metrics
    "labels": {
        "interface": "wlan0"
    },

    // Device uptime when metric sample/window was produced
    "deviceUptimeMs": 123456,

    // Sequence number within a metric stream
    "sequenceNumber": 42,

    // For aggregated metrics: sum over the window
    // For PT0S/no aggregation: raw sample value
    "sum": 318.7,

    // Optional overflow marker for integer sums
    "sumTruncated": false,

    // Present for aggregated metrics
    "count": 30,
    "min": 5.2,
    "max": 18.1
}

Publish metric messages to the ingest-cbor topic using the following CDDL schema:

metric-message = {
    0 => 5,                      ; messageType: metric
    21 => tstr,                  ; metricName
    ? 22 => agg-interval,        ; aggregationInterval (0/1/3/4)
    ? 5 => labels,               ; labels map
    ? 6 => int,                  ; deviceUptimeMs
    ? 13 => uint,                ; sequenceNumber
    24 => metric-value,          ; sum (or raw value for no aggregation)
    ? 25 => bool,                ; sumTruncated (optional)
    ? 26 => uint,                ; count (for aggregated metrics)
    ? 27 => metric-value,        ; min (for aggregated metrics)
    ? 28 => metric-value         ; max (for aggregated metrics)
}

labels = {* (tstr => tstr)}
metric-value = int / float

; Aggregation interval enum values used by device module
agg-none = 0                    ; PT0S
agg-1min = 1                    ; PT1M
agg-1hour = 3                   ; PT1H
agg-1day = 4                    ; P1D
agg-interval = (agg-none / agg-1min / agg-1hour / agg-1day)

Metrics can be aggregated before upload depending on the chosen aggregation interval. This reduces message frequency while preserving useful observability signals over time windows.

Custom metrics

Custom application metrics support is coming soon.

Aggregation Model

Metrics can be aggregated before transmission to limit bandwidth usage.

In practice, aggregation is useful when devices report frequently but you want stable operational trends instead of every raw sample. For example, one-hour aggregation is often enough for fleet health monitoring, while longer windows can be useful for low-bandwidth deployments. No aggregation is typically best for event-like metrics or short-lived diagnostics where every sample matters.

Supported aggregation windows are:
- PT0S: no aggregation
- PT1M: 1 minute
- PT1H: 1 hour
- P1D: 1 day

A practical pattern is to choose aggregation based on the tradeoff between metric granularity and uplink volume.

System Metrics

System metrics are primarily useful for:

Operational health monitoring: track CPU, heap, and stack pressure before failures happen.
Connectivity diagnostics: detect unstable network behavior from TX/RX trends and connection state transitions.
Stability and reliability analysis: correlate resets with firmware versions, deployment waves, or environmental conditions.
Capacity planning: understand how close devices run to resource limits over time.

Together, these metrics provide a baseline observability layer even when application-specific metrics are not yet instrumented.

Spotflow supports the following system metrics:

Heap Free Bytes (heap_free_bytes)
Heap Allocated Bytes (heap_allocated_bytes)
CPU Utilization Percent (cpu_utilization_percent)
Thread Stack Free Bytes (thread_stack_free_bytes)
Thread Stack Used Percent (thread_stack_used_percent)
Network TX Bytes (network_tx_bytes)
Network RX Bytes (network_rx_bytes)
MQTT Connection State (connection_mqtt_connected)
Boot Reset Cause (boot_reset)

Connection state and reset cause are event-based metrics and are reported on events (state change / boot), not by periodic sampling windows.

Uptime Metric (Heartbeat)

In fleet views, uptime trends can help distinguish isolated device instability from broader rollout or infrastructure issues.

Uptime is reported as a dedicated heartbeat metric:

Uptime Milliseconds (uptime_ms)

This metric is used in the Device Uptime section of Device Vitals.

Heartbeat is useful as a lightweight liveness signal:

It confirms that the device is still active and reporting.
It helps detect silent outages where no logs or other telemetry are produced.
It provides context for resets by showing uptime progression between reboot events.

Dashboards

Spotflow dashboards are the primary place to analyze metrics trends across fleet and devices:

Dashboards

The Connectivity & Traffic section of the Device Dashboard.

Device and fleet charts are computed from ingested metrics, including automatically collected system metrics from the device module.

Metrics

Getting Started with Device Integration

Transport Protocol

MQTT over TLS

Serialization format

Aggregation Model

System Metrics

Uptime Metric (Heartbeat)

Dashboards

Reference Repository Materials

Learn more

Guide: Metrics with Zephyr

Guide: Metrics with MQTT

Fundamentals: Dashboards

On this page