Comparison of Data Mining and Statistical Techniques for Classification Model

Author: Lahiri, Rochana

Advisor:

Educational level: Master of Science (M.S.)

Discipline: Information Systems & Decision Sciences (Business Administration)

University: Louisiana State University

Abstract: The purpose of this study is to observe the performance of three statistical and data mining classification models viz., logistic regression, decision tree and neural network models for different sample sizes and sampling methods on three sets of data. It is a 3 by 2 by 3 by 8 study where each statistical or data mining method has been employed to build a model for each of 8 different sample sizes and two different sampling methods. The effect of sample size on the overall performance of each model against two sets of test data are observed and compared.It is seen that for a given dataset, none of the three methods is found to outperform any other and their performances are comparable. This is in contrast to many of the existing studies as cited in the literature review chapter of this thesis. But the absolute value of prediction accuracy varied between the three datasets indicating that the data distribution and data characteristics play a role in the actual prediction accuracy, especially the ratio of the binary values of the dependent variable in the training dataset and the population. The models built with each of the sample size and sampling method for each method were run on two sets of test data to test whether the prediction accuracy was being replicated. It was found that for each of the cases the prediction accuracy was replicated across the test datasets.

Visit Source URL

Download full report

Keywords: sample size, classification, data mining



Share/Bookmark




More from Louisiana State University

'Baleful Weeds and Precious-Juiced Flowers': Romeo and Juliet and Renaissance Medical Discourse (view more...) Add to list

Abstract: This thesis claims that Shakespeare exaggerated the characterization of two figures in Romeo and Juliet, Friar Laurence and the apothecary, to make a statement about the conditions of medical treatment in sixteenth century London. These two figures represent two very different approaches to healing, one that is informed with...

"A Kind Providence" and "The Right to Self Preservation": How Andrew Jackson, Emersonian Whiggery, and Frontier Calvinism Shaped the Course of American Political Culture (view more...) Add to list

Abstract: Andrew Jackson has inspired numerous biographies and works of historical scholarship, but his religious views have attracted very little attention. Jackson may have been a giant on the political landscape, but he was also a human being, an ordinary American who experienced the same difficulties and challenges as other...

"Above the Noise and the Glory:" Tiers of Propaganda in Great War Literature (view more...) Add to list

Abstract: "Above the Noise and the Glory:" Tiers of Propaganda in Great War Literature illuminates the literary responses of Rupert Brooke, Mary Borden, Alice-Dunbar Nelson and Willa Cather to the manner in which the threat to one's cultural community, as well as personal and physical landscape, transforms a nation's, and...

More related to Information Systems & Decision Sciences (Business Administration)

An Integrative Model of Clients' Decision to Adopt an Application Service Provider (view more...) Add to list

Abstract: Application Services Providers (ASPs) exploit the economics of delivering commercial off-the-shelf software over the Internet to many dispersed users, but the decision-making process to adopt the ASP business model can be complex requiring a comprehensive consideration of various factors. As a new form of outsourcing, the ASP business model differs...

An Investigation of the Factors That Influence Electronic Information Sharing between State and Local Agencies (view more...) Add to list

Abstract: This study investigates the factors that influence local government participation in electronic information sharing with state agencies. Although electronic information sharing has the potential to help government agencies to increase productivity and performance, improve policy-making and provide better public services to the citizens, there is still little information available...

Comparison of Data Mining and Statistical Techniques for Classification Model (view more...) Add to list

Abstract: The purpose of this study is to observe the performance of three statistical and data mining classification models viz., logistic regression, decision tree and neural network models for different sample sizes and sampling methods on three sets of data. It is a 3 by 2 by 3 by 8 study...