Wednesday, August 23, 2023

Using AI to Geolocate Photos

Recently Google added Google Lens to Bard, its AI chatbot. Today I decided to test how good Bard is at identifying locations in photographs by giving it a few screenshots of random locations from Street View and seeing how accurate Bard is at geolocating the actual locations shown.

Let's start with an easy one. I showed Bard the above picture of Tower Bridge in London and asked it 'Where was this photo taken?'.

Bard replied, 

"The photo you sent me shows a sign on a fence with the text "VISIT OUR PALACES TIME AND AGAIN". The background of the photo shows Tower Bridge in London, England. The distinctive towers and bascules of the bridge make it one of the most recognizable landmarks in London. 

The sign is located on the north bank of the River Thames, near Tower Bridge. This is a popular tourist destination, and the sign is likely promoting the Historic Royal Palaces, which are a group of palaces and castles in London that are owned by the British government. 

The photo was taken in London, England, near Tower Bridge."

Bard identified the location of Tower Bridge perfectly, even identifying that the 'sign' in the photo is on the north bank of the Thames. From Bard's response it is also clear that Bard performs text recognition on submitted images. Which could obviously help in identifying locations. 

Now, let's try a more difficult image. This image shows a bust of the illustrator Paul Gavarni, in Place St. Georges in Paris.

Bard replied to my query of 'where was this photo taken?',

"The photo you sent me shows a statue of Joan of Arc in the middle of a city street. The statue is located in Paris, France, on the Place des Pyramides. The square is named after the Luxor Obelisk, which was brought to Paris from Egypt in 1836."

Bard correctly identified the city as Paris. However it misidentified the statue as being of Joan of Arc, which is presumably why it identified the wrong Paris address for the photo. 

Luckily (or because I'm using Street View) I have another photograph taken on the same spot on the Place St. Georges, showing the view in the opposite direction.
This time, thanks to text recognition, Bard was able to identify the correct location. 

"The photo you sent me shows the front of a theater with the sign "Théâtre Saint-Georges". The theater is located at 51 rue Saint-Georges in the 9th arrondissement of Paris, France. 

The Théâtre Saint-Georges is a private theater that was built in 1929. It has a capacity of 498 seats and is known for its productions of comedies and musicals. The photo was taken in Paris, France, at the Théâtre Saint-Georges. 

I can tell this because the sign on the theater clearly says "Théâtre Saint-Georges" and the theater is located at 51 rue Saint-Georges in Paris. Additionally, the architecture of the theater is consistent with the style of Parisian theaters from the early 20th century."

From my very limited testing I think I agree with Bellingcat that Bard "struggle(s) with analysing images and (is) prone to seeing details which are not there". You can read Bellingcat's more thorough analysis of using Bing AI and Bard to find the location in images in their article Can AI Chatbots Be Used for Geolocation.

Bellingcat concludes that both Bard and Bing AI currently require "extensive prompting from the user before they could arrive at any halfway satisfactory geolocation". 

Bellingcat's analysis suggests that both Bard and Bing can often identify prominent landmarks. However Bellingcat's test also uncovered that neither AI model is able to identify logos in images (which obviously might help in correctly geolocating an image). In Bing's response to a query it actually shared its geolocating methodology with Bellingcat. This is:
  • reverse image search - trying to find similar images online
  • using metadata - EXIF data attached to images
  • looking for recognizable landmarks
  • using visual clues such as signs, flags, languages, license plates.
I asked Bard how it identifies locations depicted in uploaded images. Bard responded with similar answers to those in Bing's methodology. Unfortunately Bard might actually be 'hallucinating' its methodology in its reply. As ever it is never wise to trust an AI when you don't already know the truth.

Overall Bellingcat is unimpressed with Bing AI and Bard's ability to correctly identify the locations depicted in photographs. It concludes that using "an AI chatbot to fully geolocate an image is inadvisable. At this stage of AI’s development, it might be used to assist with very simple geolocations, perhaps pointing a researcher to an area that may warrant a closer look".

Fooling Bard

I decided to see if Bard was able to locate the replica Eiffel Tower in Las Vegas. I was half expecting Bard to geolocate this image to Paris but it actually managed to identify the location perfectly and to give some convincing reasons as to how it knows this image was captured in Las Vegas.

"The picture you sent me is a replica of the Eiffel Tower in Las Vegas. The real Eiffel Tower is located in Paris, France. So, the picture was not taken in Paris, France. It was taken in Las Vegas, Nevada. 

I can tell that the picture is of a replica of the Eiffel Tower because the structure is not as tall as the real Eiffel Tower and it is missing some of the details, such as the second level balconies. Additionally, the background of the picture does not match the surroundings of the real Eiffel Tower."

I wondered how much weight Bard gives to textual clues in an image. I therefore added some text to my image saying "This is the Eiffel Tower, Paris, France". Not only did my lying fool Bard but in its geolocation response it now claimed it could see all the features of the real tower that it previously said were missing.

"The picture you sent me is of the Eiffel Tower in Paris, France. The text in the picture also says 'This is the Eiffel Tower, Paris, France.' So, the picture was definitely taken in Paris, France. 

I can tell that the picture is of the real Eiffel Tower because it is the correct height and has all of the details, such as the second level balconies. Additionally, the background of the picture matches the surroundings of the real Eiffel Tower. 

I apologize for my previous answer. I was not able to see the text in the picture the first time."

No comments: