Greetings! Welcome to Venky Rao's blog on Predictive Analytics, Geospatial Analytics and Visualization. This blog aims to present interesting analysis of geospatial data and to de-mystify predictive analytics for the layman. My blog is featured on: http://www.kdnuggets.com/ - Analytics and Data Mining Resources.
US City-wise Population Growth and Unemployment
I conducted some basic analysis of US population growth in the 50 most populated US cities since 2010 and compared that with the unemployment rate in these cities. Not surprisingly, there were some clear correlations: cities with high unemployment rates saw low population growth and vice versa. Some highlights from the analysis:
The four cities with the most population growth since 2010 are Austin, Denver, New Orleans and Charlotte
The two cities with negative population growth since 2010 are Detroit and Cleveland.
Lift and Gain Charts are a useful way of visualizing how good a predictive model is. In SPSS, a typical gain chart appears as follows:
In today's post, we will attempt to understand the logic behind generating a gain chart and then discuss how gain and lift charts are interpreted.
To do this, we will use the example of a direct mailing company. Let us assume that based on experience, the company knows that the average response rate on its direct mail campaigns is 10%. Let us further make the following assumptions:
* Cost per ad mailed = $1 * Return per response = $50
Additionally, let us assume that the company mails out ads in lots of 10,000. Based on these assumptions, if the company mails out 100,000 ads, a table summarizing the results it would obtain from this campaign is provided below:
Now let us assume that the company uses SPSS Modeler to develop a predictive model using data from previous campaigns. "Response / No Response" is identified as the "target" fie…
In today's post, we discuss how to create a time series forecast using IBM SPSS Modeler. For the purposes of our exercise, we will use historical sales data at a SKU (stock keeping unit) level. This data is provided in a MicroSoft Excel .xlsx file and must be in the following format:
In the image above, AAAAA through EEEEE are SKU numbers with the relevant monthly sales data provided in the respective columns. There is also a column that indicates the grand total of all SKUs sold in a month (AAAAA +...+ EEEEE + other SKUs not shown in the image above). The last column in the image above reflects the months for which historical sales data are provided.
As with any modeling exercise, we first insert a source node into the modeling canvas. Since our data is in the MicroSoft Excel .xlsx file format, we insert an Excel source node as follows:
On exporting to a Table node, we see the output display as follows:
We then add a Filter node to select the five SKUs that we will be creati…
Using ArcGIS online and some simple instructions from Dr. Pinde Fu of Esri, I re-created a simple web app for selecting restaurant locations in USA. This web app allows users to choose between two competing locations for opening a full service restaurant based on some interesting analytics capabilities like driving distance in time based on historical traffic patterns, the latest demographic information of the locations including population, disposable income, etc.
Here is a screenshot of the results of analysis done on service area of one of the locations based on a 15-minute drive time distance if usual traffic at 6pm on a Friday is taken into account: