Monday, September 21, 2015

Mapping all the Books

Earliest this month the GDELT Project made available data from 3.5 million digitized books on Google BigQuery. The data is available in two separate BigQuery datasets:

Internet Archive Book Collection in Google BigQuery (includes fulltext for 1800-1922 books)
HathiTrust Book Collection in Google BigQuery

There is obviously a lot of mapping potential in all those 3.5 million books. For example you could map the number of books published by location by year. This GDELT Project map uses CartoDB's Torque library to do just that.

The map shows all the books from the HathiTrust collection. The HathiTrust collection contains millions of titles digitized from libraries around the world. This map shows the locations of all the locations mentioned in the collection from 1800-2011. One obvious pattern is the growth of North American locations mentioned in the books as the years pass.

If you want to start using the two BigQuery tables yourself then this GDELT Project introduction Google BigQuery + 3.5M Books: Sample Queries should prove useful. The article includes a number of sample queries which you can run on either of the two BigQuery datasets. It also includes a couple of maps made from data obtained from sample queries.

One of the example maps shows locations made in Civil War related books (screenshot above). The other example maps all books published 1900 to 1920 with the subject tag 'World War'.

No comments: