Predicting payment days for accounts receivables at hospitals - part II
In part I of this post, we were able to improve the accuracy of the predictive model by running the data through the auto data prep node. As a result, we were able to obtain a linear correlation of 0.872 between "payment days" and "$XR-payment days". In this post, we will try to further improve the accuracy of the model.
We first attach an Anomaly Detection node to the type node. Anomaly detection models are used to identify outliers, or unusual cases, in the data. Unlike other modeling methods that store rules about unusual cases, anomaly detection models store information on what normal behavior looks like. This makes it possible to identify outliers even if they do not conform to any known pattern, and it can be particularly useful in applications, such as fraud detection, where new patterns may constantly be emerging. Anomaly detection is an unsupervised method, which means that it does not require a training dataset containing known cases of fraud to use as a starting point (Source: IBM SPSS Help).
After running the anomaly detection node and identifying the fields the contain the anomalies, we then attach a Select node to discard the fields that contain the anomalies as follows:
I then attach an Auto Numeric node and re-run the model. The results are as follows:
The linear correlation between "payment days" and "$XR-payment days" is now 0.896 which is better than 0.872 observed previously.
What else can we do? One thing that comes to mind is replace the Auto Numeric node with specific modeling nodes to see if a more accurate model can be created. In order to do this, we attach the following modeling nodes: Neural Net, Linear Model, C&RT, CHAID and Generalized Linear. We obtain the following results:
1) Neural Net
2) Linear Model
3) C&RT
4) CHAID
5) Generalized Linear
From the above results, it is clear the model with the highest accuracy is the Neural Net model where the linear correlation between "payment days" and "$XR-payment days" is now 0.909 which is better than 0.896 observed previously.
We first attach an Anomaly Detection node to the type node. Anomaly detection models are used to identify outliers, or unusual cases, in the data. Unlike other modeling methods that store rules about unusual cases, anomaly detection models store information on what normal behavior looks like. This makes it possible to identify outliers even if they do not conform to any known pattern, and it can be particularly useful in applications, such as fraud detection, where new patterns may constantly be emerging. Anomaly detection is an unsupervised method, which means that it does not require a training dataset containing known cases of fraud to use as a starting point (Source: IBM SPSS Help).
After running the anomaly detection node and identifying the fields the contain the anomalies, we then attach a Select node to discard the fields that contain the anomalies as follows:
I then attach an Auto Numeric node and re-run the model. The results are as follows:
The linear correlation between "payment days" and "$XR-payment days" is now 0.896 which is better than 0.872 observed previously.
What else can we do? One thing that comes to mind is replace the Auto Numeric node with specific modeling nodes to see if a more accurate model can be created. In order to do this, we attach the following modeling nodes: Neural Net, Linear Model, C&RT, CHAID and Generalized Linear. We obtain the following results:
1) Neural Net
2) Linear Model
3) C&RT
4) CHAID
5) Generalized Linear
From the above results, it is clear the model with the highest accuracy is the Neural Net model where the linear correlation between "payment days" and "$XR-payment days" is now 0.909 which is better than 0.896 observed previously.
Hi ,
ReplyDeleteNice Post . Mean while do you have any idea on reading web feeds via SPSS .
Hi S@j, thanks for your comment. What is the context of your question around web feeds?
ReplyDelete