Wednesday, August 21, 2024

One Million Screenshots. One Map!

screenshot of a map of website screenshots
Over the years a number of people have used the popular Leaflet.js mapping library to map image datasets. For example Nathan Rooy's Visual Book Recommender uses Leaflet to map the images of 51,847 book covers. The Pudding has also mapped images of 5,000 book covers on its 11 Years of Top-Selling Book Covers, Arranged by Visual Similarity

Mapping libraries have also been used in the past to visually map the internet. For example the Internet Map (which appears to be now dead) used the Google Maps API to visualize the 350,000 largest websites in the world. On this map different sized circles were used to represent individual websites on the Internet. The sizes of the circles were determined by the amount of traffic to each website - the larger the amount of traffic, the bigger the circle. The location of websites on the map was determined by the active hyperlinks between the sites.

Now One Million Websites has created a map which visualizes screenshots of the top one million websites in the world. One Million Screenshots is an interactive Leaflet map which allows you to pan and zoom around the screenshots of the top 1 million websites. To make the map Urlbox took screenshots of the top 1,048,576 ranked websites by Common Crawl Web Graph

Usually on maps of large image datasets some attempt is made to map the images so that related images appear near to each other. Whatever category One Million Screenshots has used to map their images doesn't appear to have worked very well. For example the screenshot of the New York Times on the map is flanked on one side by the website of the Municipality of Pictou County and on the other by Websunday (a Japanese manga magazine). I struggle to understand how these three website screenshots come to be mapped so closely together.

It isn't even as if Urlbox hasn't spent time trying to categorize similar sites. If you click on a screenshot on the map and go to a website's dedicated page you can view a number of different categories of 'similar sites'. If you select the 'similar description' option here you can find a lot of websites which do appear to be closely related. The 'similar description' metric looks to me to be the one that Urlbox should have used to determine the position of each website screenshot on the One Million Screenshots map.

Via: Data Vis Dispatch

No comments: