Crash reports with Zephyr

Send crash reports and collect core dumps from Zephyr devices in the field and analyze them with Spotflow.

This guide explains how to send crash reports and core dumps from devices running Zephyr RTOS using the Spotflow device module and analyze crashes and core dumps in the Spotflow web application.

Install Spotflow Device Module

As the very first step, check one of the following basic integration guides:

Enable Core Dump collection via Kconfig

Enable Spotflow device module to collect core dumps and configure Zephyr's core dump subsystem to include all available crash information or only selected parts:

prj.conf
# Enable Spotflow coredump collection

CONFIG_SPOTFLOW=y
CONFIG_SPOTFLOW_COREDUMPS=y

# Configure what should be included in the coredump

CONFIG_DEBUG_COREDUMP_THREADS_METADATA=y
CONFIG_DEBUG_COREDUMP_MEMORY_DUMP_THREADS=y

See details about the Spotflow-specific options later in this doc. See Zephyr docs for details regarding options such as DEBUG_COREDUMP_MEMORY_DUMP_* or DEBUG_COREDUMP_THREAD*.

Define core dump flash partition

When the crash occurs, the core dump file needs to be stored somewhere before it can be uploaded to Spotflow after reboot. We make use of Zephyr core dump subsystem's ability to store the file into a flash partition.

boards/nrf7002dk_nrf5340_cpuapp_ns.overlay
&flash0 {
  /* Although partitions are defined using Partition Manager, Zephyr core dumps require to
  * have the core dump partition defined here as well.
  */
  partitions {
    /* Reserve last 64 KiB of flash for core dumps */
    coredump_partition: partition@f0000 {
      label = "coredump-partition";
      reg = <0x000f0000 DT_SIZE_K(64)>;
    };
  };
};
This step might vary across target boards. See our fully working samples on GitHub.

To unlock advanced crash analysis features, you can upload the ELF file containing debug symbols to Spotflow.

The core dump will be automatically linked based on build id embedded into both ELF file and core dump file automatically by Spotflow.

The symbols will be used to decode stack traces and variable names in the Spotflow web application.

Log detail.
You can upload symbol files to firmwares to enable automatic analysis.

See dedicated Firmware Management page for information about managing firmwares and symbol files and obtaining additional information about your fleet.

Wait for (or simulate) a crash

Core dump file is automatically created by Zephyr when a fatal error occurs and sent to cloud for analysis by Spotflow device module, immediately after the device reboots. By default, Spotflow device module automatically reboots the device after a fatal error.

To simulate the crash, you can use for example Zephyr's k_oops function which will terminate current thread fatally:

static void simulate_crash()
{
  LOG_INF("Simulating crash. Going to oops.");
  k_oops();
}

Analyze Crash Reports in the Web Application

Once your device is integrated, you can analyze crash reports and core dumps in the web application.

Each sent core dump automatically generates a crash report.

The main entry point is the Events page, which gives you a comprehensive view of all events collected from your devices. There, you can filter crash reports by their content, device ID and other metadata. Diagnose the crash by analyzing:

  • Stack traces of one or more threads.
  • Register values and local variables for individual stack frames. If a register value is available, it can be also casted to several types.
  • State of global variables at the time of the crash.
Log detail.
You can analyze crash reports in the web application.

How the Device Module Works

The Spotflow device module integrates with Zephyr's native core dump subsystem to provide a familiar development experience:

  • When Spotflow device module is configured for collecting core dumps, storing of the core dump files on flash partition is automatically enabled via DEBUG_COREDUMP_BACKEND_FLASH_PARTITION Zephyr's Kconfig option.
  • The core dump file is automatically created on the flash partition when a fatal error occurs.
  • After each reboot, Spotflow device module checks the presence of a core dump file on the flash partition. If found, it uploads it to Spotflow.
  • After successful upload, the core dump file is erased from the flash partition.
  • Only Zephyr-native APIs are used.
Core dump file is automatically erased from the flash partition after upload.

Automatic restart

By default, Zephyr RTOS halts the system unconditionally after a fatal error (e.g. by entering an infinite loop, depending on the architecture).

This behavior is not suitable for production use and can be overridden by providing custom implementation of fatal error policy handler (the k_sys_fatal_error_handler function).

Spotflow device module, by default, overrides the Zephyr RTOS's standard k_sys_fatal_error_handler implementation in order to automatically reboots the device after a fatal error, allowing the core dump file to be uploaded.

Users can customize this behavior by first disabling the Spotflow's handler via Kconfig option SPOTFLOW_USE_DEFAULT_REBOOT_HANDLER while providing their own implementation of the k_sys_fatal_error_handler function at the same time.

With Spotflow device module, the device will automatically reboot after a fatal error.

ELF files, debugging symbols and build ID

To provide the best possible crash analysis experience, debugging symbols associated with the code running on devices are needed. Spotflow allows users to upload and manage Zephyr ELF files containing the debugging symbols (see Firmware Management for details).

To link ELF file containing debugging symbols with an uploaded core dump, the build ID is used. The build ID is a GNU Build ID-style identifier that uniquely identifies an ELF file by its relevant parts (ELF sections that influence the runtime behavior such as code or global data, excluding sections like debugging symbols, hashed by SHA-1).

The build ID is embedded into the Zephyr ELF file automatically by the Spotflow device module during build. More specifically, it is stored in the ELF file as a Zephyr custom Binary Descriptor with ID 0x5f0 and length of 20 bytes. For Zephyr targets that do not support Binary Descriptors, the build ID is stored as a bindesc_entry_spotflow_build_id symbol in the ELF file.

In case that build ID is not available, the ELF files can still be linked manually to specific core dumps.

Core dumps vs. logging

Spotflow core dumps and logging are complementary features, seamlessly working side by side. After reboot, the core dump file upload has a priority over the logs.

Kconfig options

prj.conf
CONFIG_SPOTFLOW_COREDUMPS=n

Set SPOTFLOW_COREDUMPS option to y to enable Spotflow device module to collect core dumps. This also enables automatic restarts (via SPOTFLOW_USE_DEFAULT_REBOOT_HANDLER option.)

prj.conf
CONFIG_SPOTFLOW_COREDUMPS_CHUNK_SIZE=1024

Use SPOTFLOW_COREDUMPS_CHUNK_SIZE option to set the size of the chunk of core dump file (in bytes) to be uploaded to Spotflow in one go. Larger chunks are more efficient but require more stable connection for the upload to succeed.

prj.conf
CONFIG_SPOTFLOW_USE_DEFAULT_REBOOT_HANDLER=y

The default behavior of automatic restarts can be disabled by setting SPOTFLOW_USE_DEFAULT_REBOOT_HANDLER option to n or additionally customized by providing a custom implementation of the k_sys_fatal_error_handler function.

prj.conf
CONFIG_SPOTFLOW_COREDUMPS_PROCESSING_LOG_LEVEL=3

Use the SPOTFLOW_COREDUMPS_PROCESSING_LOG_LEVEL option to set the log level of the Spotflow processing backend code for core dumps.

prj.conf
CONFIG_SPOTFLOW_COREDUMPS_BACKEND_QUEUE_SIZE=16

The SPOTFLOW_COREDUMPS_BACKEND_QUEUE_SIZE defines size of the queue used by Spotflow Coredump backend to store coredump chunks before sending them to the cloud.

See KConfig for core dumps for details about options above.

See Advanced configuration options for Zephyr for additional options not directly related to core dumps.

Learn more

How is this guide?