Dremio Jekyll

Cumulocity IoT DataHub Explained - Dremio

Today, digital transformation is a keystone element for any enterprise to succeed. While the term might be broad, it still encompasses the mandate to rethink old processes and to become more agile in the enterprises’ ability to proactively address the quality of their service and satisfaction of their customers.

Enter IoT, it covers a wide variety of communications between devices, systems and humans and Its fundamental principle is to allow enterprises to make swift, data-driven decisions. In this manner, enterprises collect data from multiple IoT devices, gather data from them and leverage the potential of data lakes to store all this data into the same format with the ultimate goal of analyzing it and gain insights from it.

The process sounds simple, however the challenge around setting up a fast and reliable data pipeline to allow enterprises to obtain value from their IoT data is much much more overwhelming. First, enterprises need Data Engineers to manipulate unstructured data in different formats, then use data from other systems to enrich it, and finally use Data Scientists to analyze and gain insights from it. This is the challenge that Software AG’s Cumulocity IoT DataHub solves.

What is Cumulocity IoT DataHub?

The Cumulocity IoT DataHub platform allows you to manage and monitor a variety of IoT devices. Using Dremio’s Data Lake Engine, Cumulocity IoT DataHub stores the data emitted from these devices in a highly efficient format suitable for analytical queries.

Cumulocity IoT DataHub was created with the purpose to allow users to run ad-hoc queries as well as more sophisticated and complex analytical queries against IoT data stored in the data lake. In addition to this kind of querying, you can connect the BI and Data Science tools that you are familiar with to the Cumulocity IoT DataHub through JDBC, ODBC, and REST API.

image alt text


Cumulocity IoT DataHub enriches the value of your IoT data, and enhances the performance of your IoT data pipeline, and allows you to have a birds eye view of your IoT infrastructure through several key attributes:

Full Control of Your Assets

With Cumulocity IoT DataHub you can have a live view of the status of your devices, it also allows you to have full control and manage your devices remotely by selecting any available data points and using this data to trigger events and sending commands from the user interface back to the device. An example of the data points that can be collected are: location, acceleration, temperature, pressure, etc.

Data Visualization and Exploration

Using the Cumulocity IoT DataHub console you can create your own dashboards to visualize any of the available data points from your devices, and monitor their operations while they are happening. Additionally you can preview any data point as well as define thresholds that would automatically trigger other events if their limits are reached.

Multi-Protocol Support

Cumulocity IoT DataHub supports different industry protocols that allows to configures different devices accordingly. These protocols are:

  • Modbus
  • CAN Bus
  • CANOpen
  • Profibus

Each one of these protocols expose different data points which allows you to easily configure your Cumulocity IoT DataHub dashboards.

Dremio and The Cumulocity DataHub IoT Pipeline

Cumulocity IoT DataHub, allows you to use scalable and inexpensive storage by providing an easy-to-use data pipeline that extracts data from Cumulocity’s Operational Store to a data lake for long-term archival and efficient analytical querying.

image alt text

Using Dremio as its central component, Cumulocity IoT DataHub offers an SQL-based Query Interface for querying the data lake and enabling customers to connect to any applications that support ODBC, JDBC, or REST protocols.

Another key component of Cumulocity IoT DataHub engine is the ETL pipeline that:

  • Periodically extracts data from Cumulocity’s Operational Store.
  • Transforms the data into a relational format.
  • Persists the data as Apache Parquet files in the data lake.

When queries are submitted by users through any BI or data science tool, the query is executed against the data in the data lake and not the operational store, thus enhancing query performance and providing sub-second query response.

The Offloading Pipeline

In order to turn data into a flat condensed format which can be leveraged for efficient SQL querying, Cumulocity IoT DataHub moves data from its Operational Store to a data lake, referring to this process as “Offloading”.

Offloading allows Cumulocity IoT DataHub users to build a low-cost and long-term archive of device data as well as decoupling of analytical workloads from operational workloads. In the UI users can select where to offload the data and how often they would want to run the Cumulocity IoT DataHub jobs.

image alt text

When an offloading job is triggered, several actions take place: 1) document-based entities of Cumulocity’s Operational Store are transformed into a relational format. 2) The flattened data is stored in Parquet files in the data lake.

Once the offloading is complete, data consumers can focus their efforts on analyzing and gaining insights from their data at interactive speed using the BI tools and data science tools that they are already familiar with.

Cumulocity IoT DataHub Use Cases

These are some of the scenarios on which Cumulocity IoT DataHub plays a key role:

Crash detection: gathering acceleration and movement data points from any device (i.e vehicles, motorcycles, bicycles, etc.) you can use Cumulocity IoT DataHub to set alerts that would get triggered at the moment that the device reaches certain limit on the data point configuration. Additionally, that data can be used to trigger another immediate event such as an automatic notification to the driver’s emergency contact.

Asset tracking: Through the use of its geofencing feature, Cumulocity IoT DataHub allows users to set virtual geographic perimeters to keep track of track of where their assets are located. Users can simply an area on a map in the Cumulocity IoT DataHub UI and configure what events will be triggered once the device leaves or enter the area.

Operations monitoring: Thanks to its multi-protocol support, Cumulocity IoT DataHub allows users to capture data from any device including heavy machinery, manufacturing machines, wind turbines, etc., allowing users to measure a wide variety of variables such as uptime, downtime, pressure levels, energy consumption, etc. The use of the data captured, does not only allow users to reduce the reaction time to address any incident but also provides great value when defining prediction models to enhance preventive maintenance and safety measures.

Cumulocity IoT DataHub allows you to connect devices and consume their live data and use that data to trigger operations, while allowing you to analyze and visualize your data to gain valuable insights and turn them into action or further integrate with existing enterprise applications to enhance the quality of your operations.

Click here to learn more about Cumulocity IoT DataHub