Relational Databases & High Speed Data

Relational databases are ideal for summary data, such as…

  • Daily Totals,
  • Shift Summaries,
  • KPIs, and
  • Events

But when being used for high-speed sensor data, they very quickly start to suffer from degraded performance. They become quite slow both to read and write to, and this is for several reasons.

Low Ingestion Rate

While fine for corporate data, there's a lot of behind-the-scenes upkeep when writing to relational databases, thanks to their complex and variable structure. This means that they have a relatively low number of writes-per-second they can achieve.

A large site with tens of thousands of data points can quickly overwhelm a relational database. Data is lost when the server simply can't keep up with the demand.

Dedicated time-series databases are designed specifically to have high ingestion rates, which they can do thanks to their simple and consistent structure and dedicated caching mechanisms

Continuous Indexes

The indexes of a relational database perform best when items are added randomly. Unfortunately, timestamps are not random - they're sequential. This means that over time your index become extremely bloated and inefficient.

Dedicated time-series databases are optimised to have time-based keys, and split your data into 'chunks' to prevent your indexes from degrading, making writing and querying significantly faster.

Inter-sample Sampling

Relational databases don't work with time in a way that is ideally suited to sensor data.

In many real-world data settings, you may have thousands of data points, but many will be at different sampling rates, or be recorded at slightly different times. They're not synchronised, so there isn't a single moment with a complete set of data.

Some time-series databases are smart enough to provide query functions that allow you to view the data between your samples.

For example, you might have a sample taken at 10:00, and another at 11:00. With most relational (and some time-series databases), asking for the time between 10:20 and 10:40 will return nothing.

The best time-series databases are smart enough to either provide the last known value, or interpolate the two samples.

ARDIs relational database queries are often quite slow, as it needs to perform a range of queries to replicate this behaviour (and each database engine needs to be separately tuned - the solution that works best on Microsoft SQL Server is quite different from the one in MySQL/MariaDB).

NOTE: Not all time-series databases have this capability. It's not present in Influx or Timescale. It is however available in Prometheus and most industrial historians.

Alternatives

For moderate amounts of lower-speed data (1 sample per second or slower), Prometheus is an extremely effective open-source time-series database that integrates extremely well with ARDI.

For faster data on a budget, you might want to explore Timescale or InfluxDB - however, these don't offer inter-sample queries. This can make them slow to query when you need large numbers of points.

For general industrial use, Aveva PI is a good - if expensive - option.

For high-speed (ie. sub-second) data, iba can handle phenomenal data rates.