Posts

Showing posts from January, 2013

Introduction to Classification & Regression Trees (CART)

Image
Decision Trees are commonly used in data mining with the objective of creating a model that predicts the value of a target (or dependent variable) based on the values of several input (or independent variables).  In today's post, we discuss the CART decision tree methodology.  The CART or Classification & Regression Trees methodology was introduced in 1984 by Leo Breiman , Jerome Friedman , Richard Olshen and Charles Stone as an umbrella term to refer to the following types of decision trees: Classification Trees : where the target variable is categorical and the tree is used to identify the "class" within which a target variable would likely fall into. Regression Trees : where the target variable is continuous and tree is used to predict it's value. The CART algorithm is structured as a sequence of questions, the answers to which determine what the next question, if any should be.  The result of these questions is a tree like structure where the...

Data Mining and Airline Safety

Image
In today's post, we examine the use of data mining to improve airline safety.  Over the past several decades, air travel has become, statistically, one of the safest modes of transportation.  In the following chart, you will observe that there has been a substantial decline in the fatal accident rate from 1950 through about 1980, even though the actual number of departures has increased significantly: [Source: Handbook of Statistical Analysis and Data Mining; Nisbet, Elder, Miner, pp 378] Since 1980 however, the decline in fatalities has somewhat stabilized which probably indicates that new thinking and new safety approaches are needed to further push down the rate of fatalities.  One such approach could be the use of data mining in determining the causes of fatalities so that preventative action may be taken.  In this post, we will use publicly available data on airline safety to identify main causes of accidents and thereafter identify which the main predicto...