Greg Klein's Blog

Wikipedia, Python, and Maps

leave a comment »

Wikipedia is very cool. As of this post (March 19th), the English language wikipedia has about 4.2 million articles written by volunteers. Of these, a little under 1.5 million have geographical information associated with them in the form of geographical coordinates. For example, on the article about UCSC if you look in the upper right hand corner, you’ll see a small bit of text displaying the geographical coordinates. There’s a whole group of people on wikipedia dedicated to adding this information to articles. And they’ve done a great job.

I happened to stumble upon a very useful data dump created by someone with the account name “dispenser” on the wikimedia toolserver. It is a .sql file of all of the geotagged articles in the english language wikipedia (find it here).

So I dumped this file into a mysql database on my computer, and sure enough, I’m able to make queries and get geographical information out!


mysql> select gc_lat, gc_lon, gc_name from coord_enwiki where gc_name = "University of California, Santa Cruz";
+-------------+---------------+--------------------------------------+
| gc_lat      | gc_lon        | gc_name                              |
+-------------+---------------+--------------------------------------+
| 37.00000000 | -122.06000000 | University of California, Santa Cruz |
+-------------+---------------+--------------------------------------+
1 row in set (0.43 sec)

So the next step I thought would be neat was to display all of the geotagged wikipedia articles near where I live (Santa Cruz, California). I chose to use Python because it’s pretty well suited for simple hacks like this, and I can always use the practice. Interfacing Python with MySQL is easy enough using the MySQLdb package, but we have another problem: how to create maps easily in Python. After a very short amount of time searching, I found the answer: basemap.

Basemap makes drawing all sorts of maps in all sorts of projections super simple. So with these tools in hand, I was able to quickly create the following map:

All geotagged articles near Santa Cruz, CA.

All geotagged articles near Santa Cruz, CA.

Neat! So the next step for me was to try to produce a global map of wikipedia articles. After some experimentation, I found that using a heatmap would work much better than placing points, as the denser parts of the map just end up black, which isn’t very useful.

So after a bit more work I’m left with this:

Global view of English Wikipedia articles.

Global view of English Wikipedia articles.

I’ve posted all of the sourcecode to make the heatmap above, you can find it here.

Advertisements

Written by gregklein

March 19, 2013 at 4:06 pm

Posted in python

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: