Fuzzy Matching

Fuzzy Matching is used when you want to search your captures for the closest match across several different properties.

Because perfect matches are unlikely in real-world scenarios, the fuzzy matching system instead produces a score for each possible match and selects the capture with the lowest overall score.

See AI vs Fuzzy Matching for a discussion on some key benefits and down-sides of fuzzy matching vs using AI.

To do this, you'll often need to adjust the weighting factors that control matching, because in most cases there will be vital priorities to which of the various things you're searching for are the most important.

Example

For example, let's look at a system that is producing cupcakes.

We're capturing how our system is set up (including line speed, oven temperature, colour tolerance etc.) for every different batch of cupcakes we produce, so we can automatically set up our system correctly for new product runs in the future.

First, we identify which attributes we are going to know (or be able to easily get) before our production run begins.

These include…

  • Product Weight,
  • Product Type (ie. Blueberry, Chocolate, Bananna).
  • Product Style (ie, Plain, Deluxe, Fudgy, Gluten Free),
  • Ambient Temperature
  • Ambient Humidity

Setting Priorities

If we want to search for the closest match (rather than using AI), we need to set some priorities in the configuration file.

We might decide that size is the most critical factor, followed by type and style. Temperature and humidity are useful, but not as important as any of the others.

For each attribute we're going to match, we can set up a multiplier and a type or closeness threshold.

Multiplier

The multiplier is applied to the difference between the attributes when comparing the value you searched for against

For example, we want a 10g difference in weight to be penalised much more than a 10% difference in humidity.

So we can give weight differences a large multiplier, or humidity differences a small multiplier.

    "matching": {
		"Weight": {
			"mult": 10,
			"close": 0.002
		},
		"Type": {
			"mult": 1,
			"type": "equalonly"
		},
		"Style": {
			"mult": 1,
			"type": "equalonly"
		},
		"Temperature": {			
			"mult": 0.5,
                        "close": 2
		},
		"Humidity": {			
			"mult": 0.1,
                        "close": 10
		}
	},
Closeness

The 'close' value defines how much difference is considered 'good enough' to be a match. If the difference between the search value and the captured value is less than this amount, it will be considered an exact match.

Match Types

The 'type' value defines how the comparison should be performed.

equalonly

When matching Equal Only, the condition becomes a simple yes/no rather than a multiplier - the score will be 0 for an exact match, or the 'mult' value if it's anything else.

This is the ideal method when comparing strings or discrete values.

preference

This indicates that you'd prefer values close to a particular target. Think of it as being able to specify a default search value for when users forget to do so.

For example, you might want to always give priority to the fastest result - if the search finds several very similar matches, we want to use the one that was quickest.

The following rule will do that…

   "Paint Line.Speed - Actual": {
      "type": "preference",
      "mult": 0.2,
      "target": 300 
   }	

The target attribute specifies what value you're hoping to achieve. In this case, a small amount will be added to the score whenever the speed is not 300.

It's often a good idea to give these very small 'mult' values - otherwise an unexpected or noisy signal might cause issues.