Data understanding part 1
In today's post we will focus on "data understanding", which is a crucial aspect of all data mining projects. Data understanding comes immediately after business understanding in the CRISP-DM methodology: Per IBM SPSS Modeler Help, the data understanding phase of CRISP-DM involves taking a closer look at the data available for mining. It involves accessing the data and exploring it using tables and graphics. This enables you to determine the quality of the data and describe the results of these steps in the project documentation. To get started, I used a csv file that was sent to me recently. I dragged a Var. File node onto the modeling canvas, attached the csv file to that node and then output the results into a table node. On reviewing the results, it was clear to me that csv format was not working as desired - the data was not coming through in the correct columns as in the source file. I then saved the source file as an Excel Workbook (2007, 2010) and repeat