Fuzzy Matching

Fuzzy Matching is used when you want to search your captures for the closest match across several different properties.

Because perfect matches are unlikely in real-world scenarios, the fuzzy matching system instead produces a score for each possible match and selects the capture with the lowest overall score.

See AI vs Fuzzy Matching for a discussion on some key benefits and down-sides of fuzzy matching vs using AI.

To do this, you'll often need to adjust the weighting factors that control matching, because in most cases there will be vital priorities to which of the various things you're searching for are the most important.

Example

For example, let's look at a system that is producing cupcakes.

We're capturing how our system is set up (including line speed, oven temperature, colour tolerance etc.) for every different batch of cupcakes we produce, so we can automatically set up our system correctly for new product runs in the future.

First, we identify which attributes we are going to know (or be able to easily get) before our production run begins.

These include…

  • Product Weight,
  • Product Type (ie. Blueberry, Chocolate, Bananna).
  • Product Style (ie, Plain, Deluxe, Fudgy, Gluten Free),
  • Ambient Temperature
  • Ambient Humidity

Setting Priorities

If we want to search for the closest match (rather than using AI), we need to set some priorities in the configuration file.

We might decide that size is the most critical factor, followed by type and style. Temperature and humidity are useful, but not as important as any of the others.

For each attribute we're going to match, we can set up a multiplier and a type or closeness threshold.

Multiplier

The multiplier is applied to the difference between the attributes when comparing the value you searched for against

For example, we want a 10g difference in weight to be penalised much more than a 10% difference in humidity.

So we can give weight differences a large multiplier, or humidity differences a small multiplier.

    "matching": {
		"Weight": {
			"mult": 10,
			"close": 0.002
		},
		"Type": {
			"mult": 1,
			"type": "equalonly"
		},
		"Style": {
			"mult": 1,
			"type": "equalonly"
		},
		"Temperature": {			
			"mult": 0.5,
                        "close": 2
		},
		"Humidity": {			
			"mult": 0.1,
                        "close": 10
		}
	},
Closeness

The 'close' value defines how much difference is considered 'good enough' to be a match. If the difference between the search value and the captured value is less than this amount, it will be considered an exact match.

Match Types

The 'type' value defines how the comparison should be performed.

equalonly

When matching Equal Only, the condition becomes a simple yes/no rather than a multiplier - the score will be 0 for an exact match, or the 'mult' value if it's anything else.

This is the ideal method when comparing strings or discrete values.

preference

This indicates that you'd prefer values close to a particular target. Think of it as being able to specify a default search value for when users forget to do so.

For example, you might want to always give priority to the fastest result - if the search finds several very similar matches, we want to use the one that was quickest.

The following rule will do that…

   "Paint Line.Speed - Actual": {
      "type": "preference",
      "mult": 0.2,
      "target": 300 
   }	

The target attribute specifies what value you're hoping to achieve. In this case, a small amount will be added to the score whenever the speed is not 300.

It's often a good idea to give these very small 'mult' values - otherwise an unexpected or noisy signal might cause issues.

range

Range comparisons are used when you want to target a value within a given range (for example, your data includes values for minimum and maximum rather than a specific target.

Choose the name of the minimum and maximum ranges, and a value for perc between 0 and 1 to determine where along that range you'd like to target.

For example, with the following configuration…

   "Paint Line.Speed - Actual": {
      "type": "range",
      "min": "Paint Line.Speed - Minimum",
      "max": "Paint Line.Speed - Maximum",
      "perc": 0.5 
   }	

…if Paint Line.Speed - Minimum was 25 and Paint Line.Speed - Maximum was 100, the system would try to find captures where the speed was closest to 60 (which is 50% of the way between 25 and 100).

Missing/Error

If one of the captures is missing a value or a comparison otherwise can't be performed, you can set a default penalty.

    "matching": {
		"Weight": {
			"error": 200
		}
	},

This sets a default penalty (200) that is used when the capture doesn't have a value for Weight or the weight values can't be compared (ie. one of them has bad/invalid data).

Exponential Penalties

Setting exp to any value will make penalty for values being different exponential rather than linear.

This is primarily used when you have multiple similar conditions, and you'd like to tune your matching so that many small differences is less significant than a single large difference.

A '1' is added to the value (to prevent very small amounts from becoming smaller) and is subtracted after the difference has been squared. This operation is performed before multiplication.

For example,

You are tracking the colour of the cupcakes your machine is producing. There are two sensors, one on each side of the cupcake.

You want to make sure that both sides being slightly off the target is treated as much more acceptable than one side being completely off target. By making both scores exponential, you make sure that a big difference is penalised more than several small differences.

    "matching": {
		"Colour Left": {
                        "type": "range",
                        "min": "Min Darkness",
                        "max": "Max Darkness",
                        "perc": 0.5,
			"exp": true },
		"Colour Right": {
                        "type": "range",
                        "min": "Min Darkness",
                        "max": "Max Darkness",
                        "perc": 0.5,
			"exp": true }
		}

In the example above, we are looking for a value exactly half way (0.5) between the 'Min Darkness' and 'Max Darkness' values we read from our production system.

Using linear penalties, let's look at the following scenarios….

Left SideRight SideLinear PenaltyExpontential Penalty
0.10.10.20.42
0.200.20.44
0.3-0.10.40.9

Remembering that lower scores are better, this shows that you get results that prefer multiple small differences over single large differences.