====Creating Classifier AI Training Data==== ===Our Goal=== We'd like to create a **machine learning model** that can tell us when a particular type of asset is in or approaching some specific states that we'd like to avoid. To do this, we'll need //training data//, and we can use **Capture** to make collecting this data much simpler. In this case we have **several similar machines** that we will use as sources of our training data. These are called 'Machine #1', 'Machine #2' and so on. There are 50 in all. //NOTE: This example just shows using Capture to create the **training data** - which is one of the more time-consuming parts of the process.// ===Plan=== To build the training data, we need to go through a few steps. ---- 1) Find examples of when machines have been in the state(s) we're looking for, and record the machine number, date-and-time, and the name of the state the machine was in (see example in 1a), \\ \\ 2) For each of those times, we then want to capture as much information as possible about sensors and condition of the machine \\ ===Step 1a: Build a Spreadsheet=== Collect the minimum data you need - in this case, **time**, **machine number** and **issue type** and record it in a spreadsheet. When you've collected a number of samples in various conditions, you can then export that information as a CSV file. Ensure that your CSV file has a header row and one column is called either 'Time' or 'Date'. Time, Machine, Condition 2025-02-01 10:00:00, 25, Stopped 2025-02-03 02:34:33, 7, Misaligned 2025-02-05 09:00:00, 43, Normal ===Step 1b: Add an Upload Layer=== To accept this CSV file for use in Capture, we add a [[get_upload|get_upload]] layer to our Capture configuration. { "type": "get_upload" } ===Step 2a: Capture the Machine State=== Next, we get all of the context information we can about our machine. { "type": "get_query", "query": "'Machine #{Machine}' ASSET AIPOINTS", "samples": 30, "comment": "Get machine status details" } ===Step 2b: Capture the Average Value=== Next, we'll get the average value out of the samples we picked up. { "type": "flatten", "method": "avg", "comment": "Capture averages" } ===Resolve Names=== One issue we have is that our attribute names (the names of the values in our Capture) contain the name of the machine. For example, for an issue on Machine #22, we have the properties **Machine #22 Temperature**, **Machine #22 Pressure** etc. If we want to make it easy to compare our captures against each-other, we should make our attribute names consistent across our captures. We can do that with the [[rename|rename]] layer, which lets us rename our attributes using //regular expressions//. { "type": "rename", "from": "Machine #\d*.(.*)", "to": "$1" "comment": "Make names consistent" } ===Results=== This allows you to easily create an extensive database that captures insights around machine status when assets were in particular conditions - making it very easy to build tools that can learn to not only //detect// but also to //predict// when assets are starting to enter these states (ie. when they are going to fail, be unbalanced or produce low quality product). ===Full Example=== { "steps": [ { "type": "get_upload", "comment": "Get records", "format": "csv" }, { "type": "get_query", "comment": "Get machine data", "query": "'Machine #{Machine}' ASSET ALLPOINTS BOUND", "samples": "25" }, { "type": "flatten", "comment": "Capture average values", "method": "avg" }, { "type": "rename", "comment": "Remove machine name from attributes", "from": "Machine #\\d*.(.*)", "to": "$1" } ], "uniqueid": "{LocalStartTime}", "keys": [ "Condition", "Machine" ], "name": "Machine Issues", "strings": [ "Condition" ] }