Greetings! Welcome to Venky Rao's blog on Predictive Analytics, Geospatial Analytics and Visualization. This blog aims to present interesting analysis of geospatial data and to de-mystify predictive analytics for the layman. My blog is featured on: http://www.kdnuggets.com/ - Analytics and Data Mining Resources.
I created a web app the mapped recent earthquakes in the world based on data from the US Geological Survey (http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_week.csv). To understand the potential impact of these earthquakes, I added a layer that showed the population of the world's major cities (Esri's World's Cities layer). Some countries with large populations (China, India, Indonesia) lie in areas that are prone to earthquakes. Some less populated countries (eg New Zealand) are also in earthquake prone areas. The other observation is that there is significant and regular seismic activity in the Indian and Pacific Oceans especially when compared to the Atlantic Ocean. Here is the web map that I created:
I also created an interactive web app based on this data. To view the web app in all its glory, go here: http://arcg.is/2llj9Qd
Lift and Gain Charts are a useful way of visualizing how good a predictive model is. In SPSS, a typical gain chart appears as follows:
In today's post, we will attempt to understand the logic behind generating a gain chart and then discuss how gain and lift charts are interpreted.
To do this, we will use the example of a direct mailing company. Let us assume that based on experience, the company knows that the average response rate on its direct mail campaigns is 10%. Let us further make the following assumptions:
* Cost per ad mailed = $1 * Return per response = $50
Additionally, let us assume that the company mails out ads in lots of 10,000. Based on these assumptions, if the company mails out 100,000 ads, a table summarizing the results it would obtain from this campaign is provided below:
Now let us assume that the company uses SPSS Modeler to develop a predictive model using data from previous campaigns. "Response / No Response" is identified as the "target" fie…
In today's post, we discuss how to create a time series forecast using IBM SPSS Modeler. For the purposes of our exercise, we will use historical sales data at a SKU (stock keeping unit) level. This data is provided in a MicroSoft Excel .xlsx file and must be in the following format:
In the image above, AAAAA through EEEEE are SKU numbers with the relevant monthly sales data provided in the respective columns. There is also a column that indicates the grand total of all SKUs sold in a month (AAAAA +...+ EEEEE + other SKUs not shown in the image above). The last column in the image above reflects the months for which historical sales data are provided.
As with any modeling exercise, we first insert a source node into the modeling canvas. Since our data is in the MicroSoft Excel .xlsx file format, we insert an Excel source node as follows:
On exporting to a Table node, we see the output display as follows:
We then add a Filter node to select the five SKUs that we will be creati…
Decision Trees are commonly used in data mining with the objective of creating a model that predicts the value of a target (or dependent variable) based on the values of several input (or independent variables). In today's post, we discuss the CART decision tree methodology. The CART or Classification & Regression Trees methodology was introduced in 1984 by Leo Breiman, Jerome Friedman, Richard Olshen and Charles Stone as an umbrella term to refer to the following types of decision trees:
Classification Trees: where the target variable is categorical and the tree is used to identify the "class" within which a target variable would likely fall into. Regression Trees: where the target variable is continuous and tree is used to predict it's value.
The CART algorithm is structured as a sequence of questions, the answers to which determine what the next question, if any should be. The result of these questions is a tree like structure where the ends are terminal node…