I wanted to look at what demographic data classify a county as having AP Calculus for all of the years 2008-2013. I decided to use the C50 package in R to analyze the data since I was having little luck with logistic regression. Logistic regression is hard to interpret but decision trees are very easy to interpret. It slices and dices the data and comes up with simple rules that characterize your data into groups. Part of the output from this algorithm follows:
pop > 44542: TRUE (20)
pop <= 44542:
:...pop <= 16132: FALSE (36/3)
pop > 16132:
:...pHouse > 2.576285: FALSE (5)
pHouse <= 2.576285:
:...pctFreeRed <= 28.7193: TRUE (8)
pctFreeRed > 28.7193:
:...PCIncome > 30964: TRUE (3)
PCIncome <= 30964:
:...pctNoFather <= 17.9: FALSE (10)
pctNoFather > 17.9:
:...pSQM <= 18.2: FALSE (2)
pSQM > 18.2: TRUE (3)
The TRUEs mean that rule classifies the county as having AP Calculus and the number(s) that follow mean how many were classified and if any were misclassified. Maps and commentary are after the jump.
Sunday, September 29, 2013
Saturday, September 14, 2013
Visualizing AP Calculus by county in Minnesota
When I was first thinking about this data, I thought it was going to be a case of each year at least one county would add AP Calculus, but no counties would lose AP Calculus. That turns out not to be the case. I'm guessing that state budget problems and district level property tax levies play a bigger role than anything else.
2009 Add Carlton, Redwood, Traverse Drop Isanti
2010 Add Itasca, Nicollet, Nobles, Swift Drop Sibley, Mower, Watonwan, Waseca, Rock, Lincoln, Mille Lacs
2011 Add Sibley, Mower, Waseca, Mille Lacs Drop Meeker, Swift, Beltrami, Cass, Polk
2012 Add Rock, Swift, Kanabec, Beltrami, Polk Drop Itasca, Freeborn, Mille Lacs, Koochiching
2013 Add Cook, Itasca, Mille Lacs, Roseau Drop Sibley, Rock, Pipestone, Chippewa, Renville, Traverse
Tuesday, September 10, 2013
Polynomial curve fitting
I am working through Pattern Recognition and Machine Learning by Christopher Bishop and made these graphs of the least squared error solution to various polynomial fitting problems.
Classification of whether a county has AP Calculus (Part 1)
I used python's Numpy to get some information about how employment, creative jobs, arts jobs, and the ratio of creative jobs and arts jobs to employment influence whether a county has AP Calculus offered within its borders.
The data was called Creative Class County Codes at Data.gov. I used the same zip codes from the earlier project and converted them to counties by downloading a spreadsheet from UnitedStatesZipCodes.org. The code follows:
The data was called Creative Class County Codes at Data.gov. I used the same zip codes from the earlier project and converted them to counties by downloading a spreadsheet from UnitedStatesZipCodes.org. The code follows:
Labels:
AP Calculus,
logistic cost function,
Python,
Richard Florida
Subscribe to:
Posts (Atom)