Scoring a model in SPSS

In the previous post "predicting payment days for accounts receivables at hospitals", we created several model nuggets that could predict the payment days based on inputs such as age, income, employment, marital status, etc and on the observed values of the actual number of days that we taken for payment.  In today's post, we will use one of those model nuggets to to predict the number of days for a new set of data.  This process, called "scoring the model" in SPSS is really the main objective of the entire modeling exercise.

In order to do this, we first need a new set of data that contain the same fields as the data that was used in developing the model.  An easy way to test this is by using the same file that was used to train the model but removing the observed data from the file.  From the image below, you will see that the field "payment days" is missing:


We then run this data through the Auto Data Prep node.  This is important because we used the Auto Data Prep node while creating the model nugget and the model nugget will look for all the same fields while attempting to score the model.  We also run the data through the Anomaly node for the same reason.

We then attach a filter node to the anomaly node and remove the following input fields since they are not needed for the prediction:


We then attach a select node to remove all the records that have anomalies in them.  Having removed the anomalies, we attach a type node where all the fields are treated as inputs - none of the fields are designated as the target.  We are now ready to run this data through the model nugget that we had created using the observed values of payment days.

In order to do that, we copy the model nugget from the original stream and paste it on our current modeling canvas.  We then attach a table node to the model nugget and run the table node.  The results are as follows:


As can be seen from the image above, the right-most column of the table contains a field called $R-payment days.  This field is generated by the model nugget and contains the models predictions for the number of payments days.

Popular posts from this blog

Understanding And Interpreting Gain And Lift Charts

Creating a time series forecast using IBM SPSS Modeler

Web app for selecting restaurant locations