Creating Classifier AI Training Data

This is an old revision of the document!

Creating Classifier AI Training Data

Our Goal

We'd like to create a machine learning model that can tell us when a particular type of asset is in or approaching some specific states that we'd like to avoid.

To do this, we'll need training data, and we can use Capture to make collecting this data much simpler.

In this case we have several similar machines that we will use as sources of our training data. These are called 'Machine #1', 'Machine #2' and so on. There are 50 in all.

NOTE: This example just shows using Capture to create the training data - which is one of the more time-consuming parts of the process.

Plan

To build the training data, we need to go through a few steps.

1) Find examples of when machines have been in the state(s) we're looking for, and record the machine number, date-and-time, and the name of the state the machine was in (see example in 1a),

2) For each of those times, we then want to capture as much information as possible about sensors and condition of the machine

Step 1a: Build a Spreadsheet

Collect the minimum data you need - in this case, time, machine number and issue type and record it in a spreadsheet.

When you've collected a number of samples in various conditions, you can then export that information as a CSV file.

Ensure that your CSV file has a header row and one column is called either 'Time' or 'Date'.

Time, Machine, Condition
2025-02-01 10:00:00, 25, Stopped
2025-02-03 02:34:33, 7, Misaligned
2025-02-05 09:00:00, 43, Normal

Step 1b: Add an Upload Layer

To accept this CSV file for use in Capture, we add a get_upload layer to our Capture configuration.

{
	"type": "get_upload"
}

Step 2a: Capture the Machine State

Next, we get all of the context information we can about our machine.

{
	"type": "get_query",
	"query": "'Machine #{Machine}' ASSET AIPOINTS",
        "samples": 30,
	"comment": "Get machine status details"
}

Step 2b: Capture the Average Value

Next, we'll get the average value out of the samples we picked up.

{
	"type": "flatten",
	"method": "avg",
        "comment": "Capture averages"
}

Resolve Names

One issue we have is that our attribute names (the names of the values in our Capture) contain the name of the machine.

For example, for an issue on Machine #22, we have the properties Machine #22 Temperature, Machine #22 Pressure etc.

If we want to make it easy to compare our captures against each-other, we should make our attribute names consistent across our captures.

We can do that with the rename layer, which lets us rename our attributes using regular expressions.

{
	"type": "rename",
	"from": "Machine #\d*.(.*)",
        "to": "$1"
        "comment": "Make names consistent"
}

Results

This allows you to easily create an extensive database that captures insights around machine status when assets were in particular conditions - making it very easy to build tools that can learn to not only detect but also to predict when assets are starting to enter these states (ie. when they are going to fail, be unbalanced or produce low quality product).

Full Example

{
    "steps": [
        {
            "type": "get_upload",
            "comment": "Get records",
            "format": "csv"
        },
        {
            "type": "get_query",
            "comment": "Get machine data",
            "query": "'Machine #{Machine}' ASSET ALLPOINTS BOUND",
            "samples": "25"
        },
        {
            "type": "flatten",
            "comment": "Capture average values",
            "method": "avg"
        },
        {
            "type": "rename",
            "comment": "Remove machine name from attributes",
            "from": "Machine #\\d*.(.*)",
            "to": "$1"
        }
    ],
    "uniqueid": "{LocalStartTime}",
    "keys": [
        "Condition",
        "Machine"
    ],
    "name": "Machine Issues",
    "strings": [
        "Condition"
    ]
}

Creating Classifier AI Training Data