Showing posts with label Choropleth. Show all posts
Showing posts with label Choropleth. Show all posts

Friday, November 1, 2013

Socioeconomic model of whether Minnesota counties had AP Calculus 2008-2013

This is a first attempt at creating a model using socioeconomic variables and I used those from the decision tree: Population, People per Household, pct Free and Reduced Lunch, Per Capita Income, pct No Father on Birth Certificate, and Population per Square Mile.
Coefficients:
                    Estimate Std. Error z value Pr(>|z|)
(Intercept)                1.837e+01  1.176e+01   1.561   0.1185
data1$pop                6.863e-05  3.416e-05   2.009   0.0445 *
data1$pHouse        -7.694e+00  3.795e+00  -2.027   0.0426 *
data1$pctFreeRed  -8.018e-02  5.991e-02  -1.338   0.1808
data1$PCIncome    -4.612e-05  1.430e-04  -0.323   0.7470
data1$pctNoFather  7.184e-02  6.560e-02   1.095   0.2735
data1$pSQM           3.925e-02  2.581e-02   1.521   0.1282
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

As a way to improve on my model, I've made two graphs to show error in the model.



Sunday, September 29, 2013

Breaking down the AP Calc dataset by Demographic Data

I wanted to look at what demographic data classify a county as having AP Calculus for all of the years 2008-2013.  I decided to use the C50 package in R to analyze the data since I was having little luck with logistic regression.  Logistic regression is hard to interpret but decision trees are very easy to interpret.  It slices and dices the data and comes up with simple rules that characterize your data into groups.  Part of the output from this algorithm follows:

pop > 44542: TRUE (20)
pop <= 44542:
:...pop <= 16132: FALSE (36/3)
    pop > 16132:
    :...pHouse > 2.576285: FALSE (5)
        pHouse <= 2.576285:
        :...pctFreeRed <= 28.7193: TRUE (8)
            pctFreeRed > 28.7193:
            :...PCIncome > 30964: TRUE (3)
                PCIncome <= 30964:
                :...pctNoFather <= 17.9: FALSE (10)
                    pctNoFather > 17.9:
                    :...pSQM <= 18.2: FALSE (2)
                        pSQM > 18.2: TRUE (3)

The TRUEs mean that rule classifies the county as having AP Calculus and the number(s) that follow mean how many were classified and if any were misclassified.  Maps and commentary are after the jump.

Saturday, September 14, 2013

Visualizing AP Calculus by county in Minnesota

When I was first thinking about this data, I thought it was going to be a case of each year at least one county would add AP Calculus, but no counties would lose AP Calculus.  That turns out not to be the case.  I'm guessing that state budget problems and district level property tax levies play a bigger role than anything else.

2009 Add Carlton, Redwood, Traverse  Drop Isanti
2010 Add Itasca, Nicollet, Nobles, Swift  Drop Sibley, Mower, Watonwan, Waseca, Rock, Lincoln, Mille Lacs
2011 Add Sibley, Mower, Waseca, Mille Lacs  Drop Meeker, Swift, Beltrami, Cass, Polk
2012 Add Rock, Swift, Kanabec, Beltrami, Polk  Drop Itasca, Freeborn, Mille Lacs, Koochiching
2013 Add Cook, Itasca, Mille Lacs, Roseau  Drop Sibley, Rock, Pipestone, Chippewa, Renville, Traverse


Tuesday, June 11, 2013

Monday, June 10, 2013

First time using ggplot2 in R

I used the packages XML, rgdal, maptools, and ggplot2. From XML I used readHTMLTable and got a bunch of tables from the Star Tribune 100 website. I was able to get the cities from the tables and then I used the census 2010 cities shapefile and subsetted severely by whether the city was in Minnesota and had the same name as a city from the Star Tribune 100.

I now have the points of the cities and the outline of the state of Minnesota.  Next came the hard part. I don't know a ton about Geographical Information Systems or much about Geography so I couldn't tell what coordinate system my two layers were in.  After much stumbling around I realized that the cities were in longitude and latitude and that the outline of the state of Minnesota was in something called UTM. So I used spTransform from rgdal to change coordinate systems from UTM zone 15 to longitude/latitude. After that I plotted using ggplot2 and it seems to work out.

The three dots that seem like they could be artifacts or something are probably Montevideo, Fergus Falls, and Thief River Falls.

Saturday, May 25, 2013

Web Scraping for Education Data

I spent some time today and yesterday doing some data wrangling. I wondered where AP Calculus AB is offered throughout the state of Minnesota. I first went to the College Board site and found this database. There were 9 pages of results that I downloaded and then used readLines and regex to extract the zip codes for each school. There were 164 schools and only 142 unique zip codes. I used the maptools package to create the map and colored it based on the data.