====Fuzzy Matching====

Fuzzy Matching is used when you want to search your captures for the //closest match// across several different properties.

Because perfect matches are unlikely in real-world scenarios, the fuzzy matching system instead produces a //score// for each possible match and selects the capture with the lowest overall score.

See [[AI vs Fuzzy Matching|AI vs Fuzzy Matching]] for a discussion on some key benefits and down-sides of fuzzy matching vs using AI.

To do this, you'll often need to adjust the **weighting factors** that control matching, because in most cases there will be vital priorities to which of the various things you're searching for are the most important.

===Example===

For example, let's look at a system that is producing cupcakes. 

We're capturing how our system is set up (including line speed, oven temperature, colour tolerance etc.) for every different batch of cupcakes we produce, so we can automatically set up our system correctly for new product runs in the future.

First, we identify **which attributes** we are going to know (or be able to easily get) before our production run begins.

These include...

  * Product Weight,
  * Product Type (ie. Blueberry, Chocolate, Bananna).
  * Product Style (ie, Plain, Deluxe, Fudgy, Gluten Free),
  * Ambient Temperature
  * Ambient Humidity

===Setting Priorities===

If we want to search for the closest match (rather than using [[aimodels|AI]]), we need to set some //priorities// in the [[structure|configuration file]].

We might decide that **size** is the most critical factor, followed by **type** and **style**. Temperature and humidity are useful, but not as important as any of the others.

For each attribute we're going to match, we can set up a //multiplier// and a //type// or //closeness threshold//.

==Multiplier==

The multiplier is applied to the **difference** between the attributes when comparing the value you searched for against 

For example, we want a 10g difference in weight to be penalised //much// more than a 10% difference in humidity. 

So we can give weight differences a //large// multiplier, or humidity differences a //small// multiplier.

<code javascript>
    "matching": {
		"Weight": {
			"mult": 10,
			"close": 0.002
		},
		"Type": {
			"mult": 1,
			"type": "equalonly"
		},
		"Style": {
			"mult": 1,
			"type": "equalonly"
		},
		"Temperature": {			
			"mult": 0.5,
                        "close": 2
		},
		"Humidity": {			
			"mult": 0.1,
                        "close": 10
		}
	},
</code>

==Closeness==

The 'close' value defines how much difference is considered 'good enough' to be a match. If the difference between the search value and the captured value is less than this amount, it will be considered an exact match.

===Match Types===

The 'type' value defines how the comparison should be performed. 

==equalonly==

When matching //Equal Only//, the condition becomes a simple //yes/no// rather than a multiplier - the score will be 0 for an exact match, or the 'mult' value if it's anything else.

This is the ideal method when comparing strings or discrete values.

==preference==

This indicates that you'd **prefer** values close to a particular target. Think of it as being able to specify a default search value for when users forget to do so.

For example, you might want to always give priority to the //fastest// result - if the search finds several very similar matches, we want to use the one that was quickest.

The following rule will do that...

<code javascript>
   "Paint Line.Speed - Actual": {
      "type": "preference",
      "mult": 0.2,
      "target": 300 
   }	
</code>

The **target** attribute specifies what value you're hoping to achieve. In this case, a small amount will be added to the score whenever the speed is not 300.

//It's often a good idea to give these very small 'mult' values - otherwise an unexpected or noisy signal might cause issues.//

==range==

//Range// comparisons are used when you want to target a value within a given range (for example, your data includes values for **minimum** and **maximum** rather than a specific target.

Choose the name of the minimum and maximum ranges, and a value for **perc** between 0 and 1 to determine where along that range you'd like to target.

For example, with the following configuration...

<code javascript>
   "Paint Line.Speed - Actual": {
      "type": "range",
      "min": "Paint Line.Speed - Minimum",
      "max": "Paint Line.Speed - Maximum",
      "perc": 0.5 
   }	
</code>

...if **Paint Line.Speed - Minimum** was 25 and **Paint Line.Speed - Maximum** was 100, the system would try to find captures where the speed was closest to **60** (which is 50% of the way between 25 and 100).

===Missing/Error===

If one of the captures is //missing// a value or a comparison otherwise //can't be performed//, you can set a default penalty.

<code javascript>
    "matching": {
		"Weight": {
			"error": 200
		}
	},
</code>

This sets a default penalty (200) that is used when the capture doesn't have a value for **Weight** or the weight values can't be compared (ie. one of them has bad/invalid data).

===Exponential Penalties===

Setting **exp** to any value will make penalty for values being different **exponential** rather than linear.

This is primarily used when you have multiple similar conditions, and you'd like to tune your matching so that many **small** differences is less significant than a single **large** difference.

A '1' is added to the value (to prevent very small amounts from becoming //smaller//) and is subtracted after the difference has been squared. This operation is performed //before multiplication//.

For example, 

You are tracking the colour of the cupcakes your machine is producing. There are two sensors, one on each side of the cupcake.

You want to make sure that both sides being //slightly// off the target is treated as much more acceptable than one side being //completely// off target. By making both scores **exponential**, you make sure that a big difference is penalised more than several small differences.

<code javascript>
    "matching": {
		"Colour Left": {
                        "type": "range",
                        "min": "Min Darkness",
                        "max": "Max Darkness",
                        "perc": 0.5,
			"exp": true },
		"Colour Right": {
                        "type": "range",
                        "min": "Min Darkness",
                        "max": "Max Darkness",
                        "perc": 0.5,
			"exp": true }
		}
</code>

In the example above, we are looking for a value exactly half way (0.5) between the 'Min Darkness' and 'Max Darkness' values we read from our production system. 

Using linear penalties, let's look at the following scenarios....

^Left Side^Right Side^Linear Penalty^Expontential Penalty^
|0.1|0.1|0.2|0.42|
|0.2|0|0.2|0.44|
|0.3|-0.1|0.4|0.9|

Remembering that //lower scores are better//, this shows that you get results that prefer **multiple small differences** over **single large differences**.

===Testing / Debugging===

Sometimes you'll find that your algorithm needs to be //tuned// - you have matches that don't seem quite right.

You can discover //why// matches are being scored the way they are using the **Explain** button. See [[debugging scores|validating match scores]] for more information.