Sunday, September 29, 2013

Breaking down the AP Calc dataset by Demographic Data

I wanted to look at what demographic data classify a county as having AP Calculus for all of the years 2008-2013.  I decided to use the C50 package in R to analyze the data since I was having little luck with logistic regression.  Logistic regression is hard to interpret but decision trees are very easy to interpret.  It slices and dices the data and comes up with simple rules that characterize your data into groups.  Part of the output from this algorithm follows:

pop > 44542: TRUE (20)
pop <= 44542:
:...pop <= 16132: FALSE (36/3)
    pop > 16132:
    :...pHouse > 2.576285: FALSE (5)
        pHouse <= 2.576285:
        :...pctFreeRed <= 28.7193: TRUE (8)
            pctFreeRed > 28.7193:
            :...PCIncome > 30964: TRUE (3)
                PCIncome <= 30964:
                :...pctNoFather <= 17.9: FALSE (10)
                    pctNoFather > 17.9:
                    :...pSQM <= 18.2: FALSE (2)
                        pSQM > 18.2: TRUE (3)

The TRUEs mean that rule classifies the county as having AP Calculus and the number(s) that follow mean how many were classified and if any were misclassified.  Maps and commentary are after the jump.

Saturday, September 14, 2013

Visualizing AP Calculus by county in Minnesota

When I was first thinking about this data, I thought it was going to be a case of each year at least one county would add AP Calculus, but no counties would lose AP Calculus.  That turns out not to be the case.  I'm guessing that state budget problems and district level property tax levies play a bigger role than anything else.

2009 Add Carlton, Redwood, Traverse  Drop Isanti
2010 Add Itasca, Nicollet, Nobles, Swift  Drop Sibley, Mower, Watonwan, Waseca, Rock, Lincoln, Mille Lacs
2011 Add Sibley, Mower, Waseca, Mille Lacs  Drop Meeker, Swift, Beltrami, Cass, Polk
2012 Add Rock, Swift, Kanabec, Beltrami, Polk  Drop Itasca, Freeborn, Mille Lacs, Koochiching
2013 Add Cook, Itasca, Mille Lacs, Roseau  Drop Sibley, Rock, Pipestone, Chippewa, Renville, Traverse


Tuesday, September 10, 2013

Polynomial curve fitting

I am working through Pattern Recognition and Machine Learning by Christopher Bishop and made these graphs of the least squared error solution to various polynomial fitting problems.




Classification of whether a county has AP Calculus (Part 1)

I used python's Numpy to get some information about how employment, creative jobs, arts jobs, and the ratio of creative jobs and arts jobs to employment influence whether a county has AP Calculus offered within its borders.

The data was called Creative Class County Codes at Data.gov.  I used the same zip codes from the earlier project and converted them to counties by downloading a spreadsheet from UnitedStatesZipCodes.org.  The code follows: