Creating Time-Series AI Training Data

Our Goal

In Creating a Classifier AI, we built up some training data that looked for the instantaneous status of a machine.

It looks for a specific combination of sensor values that indicated that a machine was in a particular state.

Not everything is so neat. In some systems, you need to look at events as they happen over time instead of instantly. This is common when you're trying to raise alerts before incidents happen, or when you're trying to identify something with a small amount of data.

In this case, we can use the samples layer to capture the values of one or more sensor/table columns over time.

In this example, our staff member has access to a number of air-powered tools. The compressor has a flow-meter that tells us how much air it's using. We want to be able to view the air usage and figure out what the operator was actually doing with their air tools (ie. what tools, and possibly what job) during the day.

Plan

To build the training data, we need to go through a few steps.


Step 1: Build a Spreadsheet

Watch the operator during the day and record what they were using and what job they were performing every time they use an air tool.

Record the times and activities in an Excel spreadsheet / CSV file with a date/time and a description of each activity.

Time, Activity
2025-02-01 10:00:00, Cleaning
2025-02-03 02:34:33, Removing Bolts
2025-02-05 09:00:00, Fastening Bolts
2025-02-05 09:56:00, Hammering

Step 2: Add an Upload Layer

To accept this CSV file for use in Capture, we add a get_upload layer to our Capture configuration.

In this case, the file only has a single date rather than a distinct start and stop. The offset parameter adds a fixed number of seconds to the time, and the length parameter extends the end time out from the start.

By using an offset of -15 seconds and a length of 30, we create a 30 second frame centered around the time you've provided in the CSV file.

{
	"type": "get_upload",
        "offset": -15,
        "length": 30
}

Step 2: Capture the Machine State

Next, we get the air usage information from our sensors.

{
	"type": "get_query",
	"query": "'Air Flow Meter.Pressure' SELECTOR",
        "samples": 30,
	"comment": "Get machine status details"
}

Step 3: Capture Samples

Next, we'll split this into distinct, evenly-timed samples.

{
	"type": "samples",
	"max": 10,
        "seconds": 1
}

This creates one capture for each row in the CSV file that contains 10 samples of 1-second ait-flow data.

This is perfect for building a model that can classify what tasks your operator is performing.

Altering Timing

It's quite common to realise half-way through a project with AI training that you've picked the wrong timing. Either you need finer resolution, or to work with a longer time-frame.

One of the benefits of Capture is that you can update your logic and immediately re-build your data, going back in time to re-acquire information at your new, updated time-scale.

Full Example

{
    "steps": [
        {
            "type": "get_upload",
            "comment": "Get records",
            "format": "csv",
            "offset": -15,
            "length": 30
        },
        {
            "type": "get_query",
            "comment": "Get machine data",
            "query": "'Air Flow Meter.Flow Rate' SELECTOR",
            "samples": "30"
        },
        {
            "type": "sample",
            "comment": "Split into even times",
            "max": 10,
            "seconds": 1
        }
    ],
    "uniqueid": "{LocalStartTime}",
    "keys": [
        "StartTime"
    ],
    "name": "Compressed Air Events",
    "strings": [
        "Activity"
    ]
}

Mixing Time Series and Summary Data

There may be times that you want to mix some time-series data with summary data.

In the example above, you may have a case where the air usage looks very different depending on your pressure. But since pressure doesn't change much and the sensor updates slowly, there's no point capturing 10 distinct pressure values - you only need one.

The easiest way to do this is by limiting the columns that you use in the samples step, and following it up with a 'flatten'. For example…

{
	"type": "get_query",
	"query": "('Air Flow Meter','Air Pressure Sensor') ASSET ('Pressure','Flow Rate') PROPERTY VALUES",
        "samples": 30,
	"comment": "Get machine status details"
},
{
	"type": "samples",
	"max": 10,
        "seconds": 1,
        "keep": true,
        "columns": ["Air Flow Meter.Flow Rate"]
},
{
	"type": "flatten",
	"method": "average"
}

The final results in this example will have 10 different values for flow rate, but only a single value for air pressure.