Friday, October 12, 2018

A Map of all the Books


The HathiTrust Digital Map is an interactive map which allows you to browse and explore the 14 million volumes in the HaithTrust's repository of digitized texts. The map not only provides a visual interface with which you can navigate the books in the HaithiTrust digital library it also includes a fascinating discussion about how the texts are organized on the map. A discussion which explores how organizing digital texts may require a whole new system of library classification.

The Library of Congress Classification system categorizes books into different broad subjects and then by sub-classes within each of these subjects. The HathiTrust Digital Map uses an entirely different method of classification. On this interactive maps texts are organized by the similarity in the vocabulary of individual texts.

The interactive map has two distinct modes: 'Read' and 'Interact'. If you select 'Interact' you can zoom in and pan around the map. If you then select an individual dot on the map you can actually open the selected text on the HathiTrust Digital Library website. However if you select 'Read' you can learn more about the vocabulary similarity classification system used by the digital map.

This 'Read' section takes you on a story map tour of some of the interesting patterns that emerge when you organize the HathiTrust Digital Library by vocabulary similarity. The story map shows you how this classification system diverges or resembles subject based classification systems, such as the Library of Congress Classification system. It also explores some of the new 'clusters' of books that emerge when you classify by vocabulary similarity. New clusters of texts which have some syntactical similarity but which under a subject based classification system would be classified far apart.

This story map tour also provides a great illustration of how a digital map of a library can actually use a number of different library classification systems at the same time. On the HathiTrust Digital Map the texts are organized spatially by their similarity in vocabulary. However as  you progress through the story map the texts are also organized by language and then by subject matter by applying different colors to the markers of books in different categories. In this way the map is able to pick out interesting clusters of texts which have similar vocabularies within subject classes, texts which have widely different vocabularies but are still in the same subject class or texts which have similar vocabularies but are in different subject classes.

No comments: