Greetings! Welcome to Venky Rao's blog on Predictive Analytics, Geospatial Analytics and Visualization. This blog aims to present interesting analysis of geospatial data and to de-mystify predictive analytics for the layman. My blog is featured on: http://www.kdnuggets.com/ - Analytics and Data Mining Resources AND http://www.datasciencecentral.com/ - The Online Resource for Big Data Practitioners.
Customer Profiling (guest author: Paul Cook)
Profiling is a data mining technique used
to find patterns and trends in customer data. In today's post we will explore
the application of this technique.
As we will demonstrate in this article,
profiling is both simple and powerful. Its great strength is its simplicity. It
is ideal for communicating large amounts of information in a user-friendly way.
Profiles are clear, comprehensive, and easy to read, which makes them ideal for
communicating with a business or non-technical audience.
Profiling describes a group of people by
summarizing information about them. Profiles are typically used to answer
What do my customers look like?
Which prospects are most likely to buy?
What drives customer churn?
Often, a profile is all you need to answer
these questions. Other times, you may choose to use a profile for exploratory
data analysis before multivariate modelling.
Profiling is often used to find hot
prospects for marketing campaigns. By comparing past purchasers and
non-purchasers, you can see what's different about the purchasers. These
differences can be combined into a statistical
model, and the effectiveness of the model measured using gain
and lift charts. When building a model, a profile is used to show the
effect of the variables included in the analysis.
Today's tutorial was created using a profiling add-on for IBM
SPSS Statistics, though you can also build profiles in Excel if you prefer.
In today's tutorial, we will start with the most basic type of profile, and
build up to the more sophisticated applications.
The simplest form of profile describes one
group of customers, like the example below.
This is, of course, a greatly simplified
example. Real profiles may contain dozens or even hundreds of variables, and thousands
to millions of customers. Nonetheless, from this example, it is immediately
apparent that this business's base is skewed towards older, affluent, male
The above example contains just counts of
customers. Often, you would want to examine other statistics too, like counts,
averages, maximums, standard deviations, and so on. The next example presents
the same customers, but this time showing both sample sizes (N) and average
profit per customer (MEAN).
We saw before that this customer group
tends to be older, affluent, and male. But when we look at the profitability
figures, see how the most profitable customers tend to be less affluent and
female. This raises questions for the business's marketing strategy. Could the
business increase profitability by targeting women and the less affluent market
A common use of profiling is to compare and
contrast one customer group against another—often comparing a customer segment
to the entire customer base, to see what's different and special about the
In the profile above, the customer base has
been split into three equal-sized groups of 10,000 each, and a customer segment
is compared using an index, which is also graphed as a bar chart. This shows
that the segment being investigated is skewed towards new customers who have
purchased recently and made multiple purchases.
Comparisons like these are perfect for
answering ad hoc questions like "what's different about high-value
customers?" or "how do recent recruits differ from existing
customers?" They are also perfect when using clustering to create new
Marketers are always searching for market
segments that are highly responsive to their offers and promotions. To find
these responsive segments, profiles of response rates are used, as shown below.
In this example, z-scores have been
calculated to highlight particularly high or low response rates. The profile
above shows that the number of cars a person owns is strongly associated with
response: the more cars a person owns, the more likely they are to respond.
Using exactly the same technique, you can
profile response rates, cross-sell rates, attrition rates, and so on.
As we have seen, profiling is both
beautifully simple and extremely powerful. And best of all, it can now be done
in seconds using automated profiling tools.
Lift and Gain Charts are a useful way of visualizing how good a predictive model is. In SPSS, a typical gain chart appears as follows:
In today's post, we will attempt to understand the logic behind generating a gain chart and then discuss how gain and lift charts are interpreted.
To do this, we will use the example of a direct mailing company. Let us assume that based on experience, the company knows that the average response rate on its direct mail campaigns is 10%. Let us further make the following assumptions:
* Cost per ad mailed = $1 * Return per response = $50
Additionally, let us assume that the company mails out ads in lots of 10,000. Based on these assumptions, if the company mails out 100,000 ads, a table summarizing the results it would obtain from this campaign is provided below:
Now let us assume that the company uses SPSS Modeler to develop a predictive model using data from previous campaigns. "Response / No Response" is identified as the "target" fie…
In today's post, we discuss how to create a time series forecast using IBM SPSS Modeler. For the purposes of our exercise, we will use historical sales data at a SKU (stock keeping unit) level. This data is provided in a MicroSoft Excel .xlsx file and must be in the following format:
In the image above, AAAAA through EEEEE are SKU numbers with the relevant monthly sales data provided in the respective columns. There is also a column that indicates the grand total of all SKUs sold in a month (AAAAA +...+ EEEEE + other SKUs not shown in the image above). The last column in the image above reflects the months for which historical sales data are provided.
As with any modeling exercise, we first insert a source node into the modeling canvas. Since our data is in the MicroSoft Excel .xlsx file format, we insert an Excel source node as follows:
On exporting to a Table node, we see the output display as follows:
We then add a Filter node to select the five SKUs that we will be creati…
Decision Trees are commonly used in data mining with the objective of creating a model that predicts the value of a target (or dependent variable) based on the values of several input (or independent variables). In today's post, we discuss the CART decision tree methodology. The CART or Classification & Regression Trees methodology was introduced in 1984 by Leo Breiman, Jerome Friedman, Richard Olshen and Charles Stone as an umbrella term to refer to the following types of decision trees:
Classification Trees: where the target variable is categorical and the tree is used to identify the "class" within which a target variable would likely fall into. Regression Trees: where the target variable is continuous and tree is used to predict it's value.
The CART algorithm is structured as a sequence of questions, the answers to which determine what the next question, if any should be. The result of these questions is a tree like structure where the ends are terminal node…