Showing posts with label C50. Show all posts
Showing posts with label C50. Show all posts

Sunday, September 29, 2013

Breaking down the AP Calc dataset by Demographic Data

I wanted to look at what demographic data classify a county as having AP Calculus for all of the years 2008-2013.  I decided to use the C50 package in R to analyze the data since I was having little luck with logistic regression.  Logistic regression is hard to interpret but decision trees are very easy to interpret.  It slices and dices the data and comes up with simple rules that characterize your data into groups.  Part of the output from this algorithm follows:

pop > 44542: TRUE (20)
pop <= 44542:
:...pop <= 16132: FALSE (36/3)
    pop > 16132:
    :...pHouse > 2.576285: FALSE (5)
        pHouse <= 2.576285:
        :...pctFreeRed <= 28.7193: TRUE (8)
            pctFreeRed > 28.7193:
            :...PCIncome > 30964: TRUE (3)
                PCIncome <= 30964:
                :...pctNoFather <= 17.9: FALSE (10)
                    pctNoFather > 17.9:
                    :...pSQM <= 18.2: FALSE (2)
                        pSQM > 18.2: TRUE (3)

The TRUEs mean that rule classifies the county as having AP Calculus and the number(s) that follow mean how many were classified and if any were misclassified.  Maps and commentary are after the jump.