Friday, November 1, 2013

Socioeconomic model of whether Minnesota counties had AP Calculus 2008-2013

This is a first attempt at creating a model using socioeconomic variables and I used those from the decision tree: Population, People per Household, pct Free and Reduced Lunch, Per Capita Income, pct No Father on Birth Certificate, and Population per Square Mile.
Coefficients:
                    Estimate Std. Error z value Pr(>|z|)
(Intercept)                1.837e+01  1.176e+01   1.561   0.1185
data1$pop                6.863e-05  3.416e-05   2.009   0.0445 *
data1$pHouse        -7.694e+00  3.795e+00  -2.027   0.0426 *
data1$pctFreeRed  -8.018e-02  5.991e-02  -1.338   0.1808
data1$PCIncome    -4.612e-05  1.430e-04  -0.323   0.7470
data1$pctNoFather  7.184e-02  6.560e-02   1.095   0.2735
data1$pSQM           3.925e-02  2.581e-02   1.521   0.1282
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

As a way to improve on my model, I've made two graphs to show error in the model.



Sunday, September 29, 2013

Breaking down the AP Calc dataset by Demographic Data

I wanted to look at what demographic data classify a county as having AP Calculus for all of the years 2008-2013.  I decided to use the C50 package in R to analyze the data since I was having little luck with logistic regression.  Logistic regression is hard to interpret but decision trees are very easy to interpret.  It slices and dices the data and comes up with simple rules that characterize your data into groups.  Part of the output from this algorithm follows:

pop > 44542: TRUE (20)
pop <= 44542:
:...pop <= 16132: FALSE (36/3)
    pop > 16132:
    :...pHouse > 2.576285: FALSE (5)
        pHouse <= 2.576285:
        :...pctFreeRed <= 28.7193: TRUE (8)
            pctFreeRed > 28.7193:
            :...PCIncome > 30964: TRUE (3)
                PCIncome <= 30964:
                :...pctNoFather <= 17.9: FALSE (10)
                    pctNoFather > 17.9:
                    :...pSQM <= 18.2: FALSE (2)
                        pSQM > 18.2: TRUE (3)

The TRUEs mean that rule classifies the county as having AP Calculus and the number(s) that follow mean how many were classified and if any were misclassified.  Maps and commentary are after the jump.

Saturday, September 14, 2013

Visualizing AP Calculus by county in Minnesota

When I was first thinking about this data, I thought it was going to be a case of each year at least one county would add AP Calculus, but no counties would lose AP Calculus.  That turns out not to be the case.  I'm guessing that state budget problems and district level property tax levies play a bigger role than anything else.

2009 Add Carlton, Redwood, Traverse  Drop Isanti
2010 Add Itasca, Nicollet, Nobles, Swift  Drop Sibley, Mower, Watonwan, Waseca, Rock, Lincoln, Mille Lacs
2011 Add Sibley, Mower, Waseca, Mille Lacs  Drop Meeker, Swift, Beltrami, Cass, Polk
2012 Add Rock, Swift, Kanabec, Beltrami, Polk  Drop Itasca, Freeborn, Mille Lacs, Koochiching
2013 Add Cook, Itasca, Mille Lacs, Roseau  Drop Sibley, Rock, Pipestone, Chippewa, Renville, Traverse


Tuesday, September 10, 2013

Polynomial curve fitting

I am working through Pattern Recognition and Machine Learning by Christopher Bishop and made these graphs of the least squared error solution to various polynomial fitting problems.




Classification of whether a county has AP Calculus (Part 1)

I used python's Numpy to get some information about how employment, creative jobs, arts jobs, and the ratio of creative jobs and arts jobs to employment influence whether a county has AP Calculus offered within its borders.

The data was called Creative Class County Codes at Data.gov.  I used the same zip codes from the earlier project and converted them to counties by downloading a spreadsheet from UnitedStatesZipCodes.org.  The code follows:

Friday, July 19, 2013

How Kafkaesque was the Supreme Court during their 2012 term?

Update (7/20/2013): I added tables showing the relative frequency of words that are common between Metamorphosis and the 2012 Supreme Court Opinions.

Thursday, July 18, 2013

Supreme Court 2012 Term Opinion Word Frequency Analysis

The first thing that stuck out was the 'u' which might be more common in a future century when the Supreme Court decisions are texted out.  Here it is because I split the text into words on anything that wasn't a letter(lower case or upper case) or an apostrophe. So the 'u', 's', and 'v' are 'U. S. v.' text strings that have been split up on the white space and periods.