Greetings! Welcome to Venky Rao's blog on Predictive Analytics, Geospatial Analytics and Visualization. This blog aims to present interesting analysis of geospatial data and to de-mystify predictive analytics for the layman. My blog is featured on: http://www.kdnuggets.com/ - Analytics and Data Mining Resources.
US City-wise Population Growth and Unemployment
I conducted some basic analysis of US population growth in the 50 most populated US cities since 2010 and compared that with the unemployment rate in these cities. Not surprisingly, there were some clear correlations: cities with high unemployment rates saw low population growth and vice versa. Some highlights from the analysis:
The four cities with the most population growth since 2010 are Austin, Denver, New Orleans and Charlotte
The two cities with negative population growth since 2010 are Detroit and Cleveland.
Lift and Gain Charts are a useful way of visualizing how good a predictive model is. In SPSS, a typical gain chart appears as follows:
In today's post, we will attempt to understand the logic behind generating a gain chart and then discuss how gain and lift charts are interpreted.
To do this, we will use the example of a direct mailing company. Let us assume that based on experience, the company knows that the average response rate on its direct mail campaigns is 10%. Let us further make the following assumptions:
* Cost per ad mailed = $1 * Return per response = $50
Additionally, let us assume that the company mails out ads in lots of 10,000. Based on these assumptions, if the company mails out 100,000 ads, a table summarizing the results it would obtain from this campaign is provided below:
Now let us assume that the company uses SPSS Modeler to develop a predictive model using data from previous campaigns. "Response / No Response" is identified as the "target" fie…
In today's post, we discuss how to create a time series forecast using IBM SPSS Modeler. For the purposes of our exercise, we will use historical sales data at a SKU (stock keeping unit) level. This data is provided in a MicroSoft Excel .xlsx file and must be in the following format:
In the image above, AAAAA through EEEEE are SKU numbers with the relevant monthly sales data provided in the respective columns. There is also a column that indicates the grand total of all SKUs sold in a month (AAAAA +...+ EEEEE + other SKUs not shown in the image above). The last column in the image above reflects the months for which historical sales data are provided.
As with any modeling exercise, we first insert a source node into the modeling canvas. Since our data is in the MicroSoft Excel .xlsx file format, we insert an Excel source node as follows:
On exporting to a Table node, we see the output display as follows:
We then add a Filter node to select the five SKUs that we will be creati…
Decision Trees are commonly used in data mining with the objective of creating a model that predicts the value of a target (or dependent variable) based on the values of several input (or independent variables). In today's post, we discuss the CART decision tree methodology. The CART or Classification & Regression Trees methodology was introduced in 1984 by Leo Breiman, Jerome Friedman, Richard Olshen and Charles Stone as an umbrella term to refer to the following types of decision trees:
Classification Trees: where the target variable is categorical and the tree is used to identify the "class" within which a target variable would likely fall into. Regression Trees: where the target variable is continuous and tree is used to predict it's value.
The CART algorithm is structured as a sequence of questions, the answers to which determine what the next question, if any should be. The result of these questions is a tree like structure where the ends are terminal node…