Numeric Measures for Association Rules
In today's post, we dive into understanding Association Rules for Market Basket Analysis and discuss three numeric measures that should be considered before deciding to act on / make a business decision based on associations that have been observed in the data: (1) Support (2) Confidence and (3) Lift.
Association rules are typically written in the format:
Left hand side Implies Right hand side
The left hand side is referred to as the Antecedent and the right hand side is the Consequent. The Antecedent means a thing that logically precedes another while a Consequent means a thing that follows as a result. For example, in the association rule:
{Butter, Eggs} Implies {Bread}
Butter and eggs are the Antecedent while Bread is the Consequent. What this rule means that if you were to pick a shopping cart at random and find butter and eggs in there, there is a chance that you are also likely to find bread.
The numeric measures mentioned above (Support, Confidence and Lift) are used to measure the chance that this rule holds true. So let's get straight into understanding these rules in detail. In order to do this, let's take the following sample of market baskets:
Let us now investigate the following rules:
A implies B
A implies C
C implies D
B & C imply E
Support
Support refers to the percentage of baskets where the rule was true, i.e. where both the left side and the right side products were present.
Let us review our market baskets and look for support for the rules that we want to investigate:
A implies B: Support = 1 / 5 = 0.2
A implies C: Support = 2 / 5 = 0.4
C implies D: Support = 2 / 5 = 0.4
B & C imply E: Support = 1 / 5 = 0.2
Since we have a total five market baskets, the denominator always equals 5.
In the first rule, A implies B, we have only one basket where A and B are both present. Therefore, the support for this rule is 1 / 5.
Similarly for A implies C, since there are 2 baskets that contain A and C, the support for this rule is 2 / 5.
For C implies D, there are 2 baskets where C and D are both present so support for this rule is also 2 / 5.
And finally, there is only 1 basket that contains B, C and E so support for the rule B & C imply E is 1 / 5.
Confidence
Confidence measures what percentage of baskets that contain the product on the left hand side also contain the product on the right hand side.
Let us review our market baskets and look for confidence in the rules that we want to investigate:
A implies B: Confidence = 1 / 3 = 0.33
A implies C: Confidence = 2 / 3 = 0.67
C implies D: Confidence = 2 / 4 = 0.5
B & C imply E: Confidence = 1 / 3 = 0.33
In our first rule, we have 3 baskets that contain A. Of these 3 baskets, only 1 basket also contains B. Therefore, the confidence in this rule is 1 / 3.
In our second rule, we have 3 baskets that contain A. Of these 3 baskets, 2 baskets also contain C. Therefore, the confidence in this rule is 2 / 3.
In our third rule, we have 4 baskets that contain C. Of these 4 baskets, 2 baskets also contain D. Therefore, the confidence in this rule is 2 / 4.
In our fourth rule, we have 3 baskets that contain B & C. Of these 3 baskets, only 1 basket also contains E. Therefore, the confidence in this rule is 1 / 3.
Lift
Lift measures how much more frequently the product on the left hand side is found with the product on the right hand side than without the product on the right hand side.
Let us review our market baskets and look for lift in the rules that we want to investigate:
A implies B: Lift = 1 / 2 = 0.5
A implies C: Lift = 2 / 1 = 2
C implies D: Lift = 2 / 2 = 1
B & C imply E: Lift = 1 / 2 = 0.5
In our first rule, we have 1 basket that contains A and B. We have 2 baskets that contain A but do not contain B. Therefore, the lift from this rule is 1 / 2.
In our second rule, we have 2 baskets that contain A and C. We also have 1 basket contains A but does not contain C. Therefore, the lift from this rule is 2 / 1.
In our third rule, we have 2 baskets that contain C and D. We also have 2 baskets that contain C but not D. Therefore, the lift from this rule is 2 / 2.
In our fourth rule, we have 1 basket that contains B, C and E. We also have 2 baskets that contain B & C but not E. Therefore, the lift from this rule is 1 / 2.
Other numeric measures
Other numeric measures that are used to measure the strength of association rules include:
* All confidence
* Collective strength
* Conviction
* Leverage
A detailed discussion of these and other measures can be found here.
User defined significance levels
Association rules in order to be used need to satisfy user defined significance levels. There are no standard thresholds that need to be met; all thresholds are user defined. Rules are usually formed when:
1) User defined significance level for support is met; and
2) User defined significance level for confidence is met.
The Apriori algorithm is particularly useful in identifying these measures; an example is provided below:
The web graph node in SPSS is very useful in getting a visual representation of the relationships; an example is provided below:
Association rules are typically written in the format:
Left hand side Implies Right hand side
The left hand side is referred to as the Antecedent and the right hand side is the Consequent. The Antecedent means a thing that logically precedes another while a Consequent means a thing that follows as a result. For example, in the association rule:
{Butter, Eggs} Implies {Bread}
Butter and eggs are the Antecedent while Bread is the Consequent. What this rule means that if you were to pick a shopping cart at random and find butter and eggs in there, there is a chance that you are also likely to find bread.
The numeric measures mentioned above (Support, Confidence and Lift) are used to measure the chance that this rule holds true. So let's get straight into understanding these rules in detail. In order to do this, let's take the following sample of market baskets:
Let us now investigate the following rules:
A implies B
A implies C
C implies D
B & C imply E
Support
Support refers to the percentage of baskets where the rule was true, i.e. where both the left side and the right side products were present.
Let us review our market baskets and look for support for the rules that we want to investigate:
A implies B: Support = 1 / 5 = 0.2
A implies C: Support = 2 / 5 = 0.4
C implies D: Support = 2 / 5 = 0.4
B & C imply E: Support = 1 / 5 = 0.2
Since we have a total five market baskets, the denominator always equals 5.
In the first rule, A implies B, we have only one basket where A and B are both present. Therefore, the support for this rule is 1 / 5.
Similarly for A implies C, since there are 2 baskets that contain A and C, the support for this rule is 2 / 5.
For C implies D, there are 2 baskets where C and D are both present so support for this rule is also 2 / 5.
And finally, there is only 1 basket that contains B, C and E so support for the rule B & C imply E is 1 / 5.
Confidence
Confidence measures what percentage of baskets that contain the product on the left hand side also contain the product on the right hand side.
Let us review our market baskets and look for confidence in the rules that we want to investigate:
A implies B: Confidence = 1 / 3 = 0.33
A implies C: Confidence = 2 / 3 = 0.67
C implies D: Confidence = 2 / 4 = 0.5
B & C imply E: Confidence = 1 / 3 = 0.33
In our first rule, we have 3 baskets that contain A. Of these 3 baskets, only 1 basket also contains B. Therefore, the confidence in this rule is 1 / 3.
In our second rule, we have 3 baskets that contain A. Of these 3 baskets, 2 baskets also contain C. Therefore, the confidence in this rule is 2 / 3.
In our third rule, we have 4 baskets that contain C. Of these 4 baskets, 2 baskets also contain D. Therefore, the confidence in this rule is 2 / 4.
In our fourth rule, we have 3 baskets that contain B & C. Of these 3 baskets, only 1 basket also contains E. Therefore, the confidence in this rule is 1 / 3.
Lift
Lift measures how much more frequently the product on the left hand side is found with the product on the right hand side than without the product on the right hand side.
Let us review our market baskets and look for lift in the rules that we want to investigate:
A implies B: Lift = 1 / 2 = 0.5
A implies C: Lift = 2 / 1 = 2
C implies D: Lift = 2 / 2 = 1
B & C imply E: Lift = 1 / 2 = 0.5
In our first rule, we have 1 basket that contains A and B. We have 2 baskets that contain A but do not contain B. Therefore, the lift from this rule is 1 / 2.
In our second rule, we have 2 baskets that contain A and C. We also have 1 basket contains A but does not contain C. Therefore, the lift from this rule is 2 / 1.
In our third rule, we have 2 baskets that contain C and D. We also have 2 baskets that contain C but not D. Therefore, the lift from this rule is 2 / 2.
In our fourth rule, we have 1 basket that contains B, C and E. We also have 2 baskets that contain B & C but not E. Therefore, the lift from this rule is 1 / 2.
Other numeric measures
Other numeric measures that are used to measure the strength of association rules include:
* All confidence
* Collective strength
* Conviction
* Leverage
A detailed discussion of these and other measures can be found here.
User defined significance levels
Association rules in order to be used need to satisfy user defined significance levels. There are no standard thresholds that need to be met; all thresholds are user defined. Rules are usually formed when:
1) User defined significance level for support is met; and
2) User defined significance level for confidence is met.
The Apriori algorithm is particularly useful in identifying these measures; an example is provided below:
The web graph node in SPSS is very useful in getting a visual representation of the relationships; an example is provided below:
Comments
Post a Comment