Imagine that you are playing a new, slightly modified version of the game. GeoGuessr. Here is a photograph of an average American home, perhaps two stories high, with a front lawn on a cul-de-sac and an American flag flying proudly in front of it. But there's nothing special about this house, nothing to tell you what condition it's in or where its owners are from.
You have two tools at your disposal: your brain and 44,416 low-resolution aerial photos of random places. United States and associated location data. Can you match a house to an aerial photograph and identify it correctly?
I definitely couldn't, but new machine learning the model probably could. Software created by researchers from China Petroleum University (East China)searches in the database remote sensing photos with associated location information to match a street image (of a home, commercial building, or anything else that can be photographed from the road) with an aerial photo in the database. While other systems can do the same, this system is pocket-sized compared to others and very accurate.
In the best case (when viewing a 180-degree image), the first stage of location narrowing is successful 97 percent of the time. This is better than all other models available for comparison, or within two percentage points. Even in less-than-ideal conditions, it performs better than many of its competitors. When pinpointing an exact location, it is correct 82 percent of the time, which is within three points of other models.
But this model is new due to its speed and memory savings. According to the researchers, it is at least twice as fast as similar devices and uses less than a third of the required memory. This combination makes it valuable for applications in navigation systems and defense industry.
“We train the AI to ignore superficial differences in perspective and focus on extracting the same 'key landmarks' from both views, transforming them into a simple, common language,” explains Peng Renwho develops machine learning and signal processing algorithms V China University Oil (Eastern China).
The software uses a technique called deep cross-hashing. Instead of trying to compare every pixel in a street view image to every image in a giant bird's-eye database, the method relies on hashing, which means converting a set of data—in this case, street and aerial photographs—into a string of numbers unique to the data.
To do this, the research group at China Petroleum University uses a kind of deep learning a model called a vision transformer that breaks images into small parts and finds patterns among the parts. The model might find something in a photo that it has been trained to identify as a tall building, a circular fountain, or a roundabout, and then encode the results into number strings. ChatGPT is based on a similar architecture, but finds patterns in text rather than images. (The “T” in GPT stands for transformer.)
The number representing each picture is like a fingerprint, says Hong Kong Leewho studies computer vision at the Australian National University. The digital code captures the unique features of each image, allowing the geolocation process to quickly narrow down possible matches.
In the new system, the code associated with a given ground photo is compared with the codes of all the aerial photos in the database (the team used U.S. and U.S. satellite imagery for testing). Australia), which gives the five closest candidates for aerial matches. Data representing the geography of closest matches is averaged using a method that more heavily weights locations closer together to reduce the impact of outliers and outputs the estimated location of the Street View image.
The new geolocation mechanism was published last month in IEEE Transactions on Geosciences and Remote Sensing.
Fast and memory efficient
While not a completely new paradigm, the paper “represents a clear advance in the field,” Lee says. Because this problem has been addressed before, some experts, such as a computer scientist at Washington University in St. Louis, Nathan Jacobsnot so excited. “I don’t think it’s a particularly groundbreaking paper,” he says.
But Lee disagrees with Jacobs: he believes the approach is innovative in its use of hashing to find image matches faster and with greater memory efficiency than traditional methods. It uses just 35 megabytes, while the next smallest model Ren's team examined requires 104 megabytes, about three times the space.
The researchers claim that this method is more than twice as fast as the next fastest method. When matching street images to a United States aerial photography dataset, the runner-up's matching time was about 0.005 seconds—the Petroleum team was able to find the location in about 0.0013 seconds, nearly four times faster.
“As a result, our method is more efficient than traditional image geolocalization methods,” says Ren, and Li confirms that these claims are credible. Hashing “is a well-established path to speed and compactness, and the results are consistent with theoretical expectations,” says Lee.
While this effectiveness seems promising, more work is needed to ensure this method will work on a large scale, Lee says. The team did not fully explore realistic issues such as seasonal variations or clouds blocking the image, which could affect the reliability of geolocation matching. Going forward, this limitation can be overcome by adding images from more distributed locations, Ren says.
However, according to experts, long-term applications (besides the super-advanced GeoGuessr) are worth considering now.
There are several trivial ways to effectively geolocate images, such as automatic geotagging old family photos,” Jacobs says. But on the other hand, navigation systems can also use a similar geolocation method. If GPS Jacobs says if a self-driving car fails, another way to quickly and accurately determine location could be useful. Lee also suggests that this may play a role in emergency response over the next five years.
There may also be applications in protection systems. Seekera 2011 Office of the Director of National Intelligence project aimed at helping intelligence analysts learn as much as possible about photographs without metadata using reference data from sources, including overhead images, is a goal that can be achieved using models similar to this new geolocation method.
Jacobs puts the defense's claim in context: If a government agency sent a photo of a terrorist training camp without metadata, how can the location of that site be quickly and efficiently determined? Deep cross hashing can be useful.
Articles from your site
Related articles on the Internet