Association analysis – in the context of consumer product affinity, is also called Market Basket Analysis. It is an unsupervised algorithm – easier to comprehend if you understand the supervised model. Consider a case where the temperature and humidity in a particular location are given, you are trying to predict the quantity of rain on that day. The inputs are standardized and you observe / record the inputs i.e. temperature and humidity and of course, the rainfall. Given this data, called training data by the way, you try to get a formula like (.007*temp + .03* Humidity) = rain in mm. The converse of this example where you just have a mass of data and you are trying to figure out the structure is called unsupervised learning.
Association is one such algorithm which identifies patterns or data items that occur frequently together. The famous story around beer → diaper correlation i.e. “Friday afternoons, young American males who buy diapers (nappies) also have a predisposition to buy beer” is a classic example or output of an Association Analysis. Well, of all the algorithms we put up so far, this is conceptually the easiest to understand. Consider a set of consumer purchase transactions. For simplicity, let’s assume that a customer never purchases more than 3 items in one shot.
A single glance tells you that out of 5 transactions 3 have BEEF & CHEESE occurring together. Well, you don’t really need a complex mining algorithm to do this – a simple cross tab / Cartesian query will give you results like the table below, where the cell data indicates no. of times these two occur together / total number of transactions.
But Association analysis does much more. It digs deep across all this data and gives you the following metrics for all combinations of products. Let’s take the example of beef and cheese occurring together.
Confidence can be used for placement strategies if high enough since it indicates that people buy both together rather than just Beef. Use this intelligence to show these products together or if you are a brick and mortar establishment, physically collocate them so that they are in the eye range of the customer in the aisle. Lift indicates strength of the rule and greater the value, better the strength of the rule. Well, enough math for the time being I guess.
We will take off from the previous story of a focused, narrow data driven consumer persona. Highly Engaged, Valuable Middle Female Customer from Tennessee is what we got to from our previous adventures. Well, now we have Jane who fits these exact criteria but really has just signed up and Elena who has been around for some time. We do not have enough information about Jane’s behavioral patterns but we do know something about Elena. There are several use cases that can be of interest to the marketer if – (a big if mind you,) you know what or how she could behave at a given point in time. For instance, some nice questions that we can ask are:
- What is the product that Jane could most probably buy? What is the first campaign I can send to her that will be relevant to her?
- Elena has been around for sometime and been unresponsive to campaigns. Is there a risk that she will move away?
- Do consumers like Elena & Jane show patterns of attrition for e.g. typically after 3 months post hitting a net value of say $10,000?
Well, we didn’t go through the whole bit for nothing. Let’s put the algorithm to work and see what it spits out. The size of the circles indicates confidence and the color intensity indicates higher lift.
If you analyze the data (read it as the LHS (vertical) à RHS (Horizontal)) - the first entry would be the propensity of buying BAKED BREAD & CHEESE together. If you look at the big intense circles, where really is your sweet spot, BEEF & CHEESE Wins! So you got your first campaign out for Jane (for BEEF) and a cross sell option post that as well - CHEESE.
Cheers and happy Mining!!!