I used the packages XML, rgdal, maptools, and ggplot2. From XML I used readHTMLTable and got a bunch of tables from the Star Tribune 100 website. I was able to get the cities from the tables and then I used the census 2010 cities shapefile and subsetted severely by whether the city was in Minnesota and had the same name as a city from the Star Tribune 100.
I now have the points of the cities and the outline of the state of Minnesota. Next came the hard part. I don't know a ton about Geographical Information Systems or much about Geography so I couldn't tell what coordinate system my two layers were in. After much stumbling around I realized that the cities were in longitude and latitude and that the outline of the state of Minnesota was in something called UTM. So I used spTransform from rgdal to change coordinate systems from UTM zone 15 to longitude/latitude. After that I plotted using ggplot2 and it seems to work out.
The three dots that seem like they could be artifacts or something are probably Montevideo, Fergus Falls, and Thief River Falls.
Showing posts with label Web Scraping. Show all posts
Showing posts with label Web Scraping. Show all posts
Monday, June 10, 2013
First time using ggplot2 in R
Labels:
Choropleth,
R,
Star Tribune,
Star Tribune 100,
Web Scraping
Saturday, May 25, 2013
Web Scraping for Education Data
I spent some time today and yesterday doing some data wrangling. I wondered where AP Calculus AB is offered throughout the state of Minnesota. I first went to the College Board site and found this database. There were 9 pages of results that I downloaded and then used readLines and regex to extract the zip codes for each school. There were 164 schools and only 142 unique zip codes. I used the maptools package to create the map and colored it based on the data.
Subscribe to:
Posts (Atom)