Abby Stylianou created an app that asks users to upload photos of hotel rooms they stay in while traveling. It may seem like a simple act, but the resulting database of hotel room images helps Stylian and her colleagues help victims of human trafficking.
Traffickers often post photographs of their victims in hotel rooms as online advertisements, evidence that can be used to find victims and bring the perpetrators of these crimes to justice. But to use this evidence, analysts must be able to determine where the photographs were taken. That's where TrafficKam comes. The app uses submitted images for training image search system currently used by US-based National Center for Mission and Exploited Children (NCMEC) helping him geolocate published images—a deceptively difficult task.
Stylianou, a professor at Saint Louis University, currently works with Nathan Jacobs' a team from Washington University in St. Louis to take the model even further by developing multimodal search capabilities that enable video and text queries.
Stylianou about:
What came first: your interest in computers or your desire to help bring justice to victims of violence, and how did they come together?
Abby Stylianou: This is a crazy story.
I'll be going back to get my bachelor's degree. I didn't really know what I wanted to do, but I decided remote sensing a class in my second semester of graduate school that I just really enjoyed. When I graduated, [George Washington University professor (then at Washington University in St. Louis)] Robert Pless hired me to work on a program called Seeker.
The purpose of Finder was to say: if you have a photo and nothing else, how can you figure out where that photo was taken? My family knew about the work I did and [in 2013] my uncle shared an article with me in St. Louis Post-Dispatch about a young murder victim from the 1980s whose case went cold. [The St. Louis Police Department] I still didn’t understand who she was.
They had photographs of the 1983 burial. They wanted to exhume her remains so they could conduct modern forensic testing to find out what part of the country she came from. But they exhumed the remains under her gravestone in the cemetery, and it wasn't her.
And they [dug up the wrong remains] two more times, after which the St. Louis medical examiner said, “You can't continue digging until you have proof of where the remains actually are.” My uncle sends me this and says, “Hey, could you find out where this photo was taken?”
We ended up asking the St. Louis Police Department to adopt this tool that we were creating for. geolocalization to see if we can find the location of this lost grave. We provided the St. Louis medical examiner with a report that said, “In our opinion, this is where the remains are located.”
And we were right. We were able exhume her remains. They were able to conduct advanced forensic testing and find out that she was from the southeast. We still haven't figured out her identity, but we have much better genetic information at this point.
For me, that moment was, “This is what I want to do with my life. computer vision do something good.” This was a turning point for me.
How does your algorithm work? Can you tell me how a user uploaded photo becomes useful data for law enforcement agencies?
Stylianou: Today, when we think about artificial intelligence systems, there are two really key points. One is the data and the other is the model you use to work with. For us, both are equally important.
First, there's the data. We are very lucky that there are tons of hotel images on the site. Internetand therefore we may collect publicly available data in large quantities. We have millions of these images available online. However, the problem with many of these images is that they look like advertising images. These are perfect images of the nicest hotel room – they are really clean, and the images of the victims look anything but.
The victim's image is often a selfie that the victim took herself. They are in a dirty room. The lighting is imperfect. This is a problem for machine learning algorithms. We call this domain rupture. When there is a gap between the data you trained your model on and the data you work with during inference, your model will not perform very well.
The idea behind the TraffickCam mobile app was largely to augment internet data with data that actually looked more like images of the victim. We created this app so that people traveling can send photos of their hotel rooms specifically for this purpose. We use these images, combined with images available on the Internet, to train our model.
What then?
Stylianou: Once we have a large pile of data, we train neural networks learn how to insert it. If you take an image and run it through neural networkwhat comes out at the other end is not a clear prediction of which hotel the image came from. Rather, it is a numerical representation [of image features].
We have a neural network that takes in images and outputs vectors—small numeric representations of those images—where images taken from the same location hopefully have similar representations. This is what we then use in this investigative platform that we have deployed in [NCMEC].
We have a search interface that uses this deep learning a model that an analyst can insert their image into, run through it, and get a set of results of other images that are visually similar, and you can use that to then infer a location.
Identification of hotel rooms using computer vision
Many of your articles mention that matching images of hotel rooms can actually be more difficult than matching photographs of other types of locations. Why is this and how do you deal with these problems?
Stylianou: There are several features of hotels that are truly unique compared to other industries. Two different hotels can actually look very similar: Every Motel 6 in the country has been renovated and looks virtually identical. This is a real challenge for models who try to come up with different views for different hotels.
On the other hand, two rooms in the same hotel may look completely different. You have a penthouse suite and an entry level room. Or the renovation happened on one floor and not on another. This is a really difficult task when two images need to have the same representation.
Other parts of our requests are unique because we usually need to remove a very, very large portion of the image first. We're talking about a baby pornography images. It must be removed before it is sent to our system.
We trained the first version by inserting people-shaped blobs to try to get the network to ignore the erased part. But [Temple University professor and close collaborator Richard Souvenir’s team] showed that if you actually use artificial intelligence when you draw – you actually fill that object with some kind of natural-looking texture – you actually have much more success in finding it than if you left the erased object there.
So when our analysts conduct a search, the first thing they do is erase the image. The next thing we do is use the AI model to fill it back in.
Some of your work is related object recognition rather than image recognition. Why?
Stylianou: [NCMEC] Analysts using our tool have shared with us that they often only see one item in the background in a query, and they want to run a search on that one. But when the models we train typically run at full image scale, that's a problem.
And the hotel has some unique things and some that don’t. For example, a white hotel bed does not discriminate. Most hotels have a white bed. But a truly unique piece of art on the wall, even if it's small, can go a long way toward identifying a location.
[NCMEC analysts] may sometimes see only one object or know that one object is important. Simply upscaling the models we already use doesn't work. How could we support this better? We do things like train object-oriented models. You can have a sofa model, a lamp model and a rug model.
How do you measure the success of an algorithm?
Stylianou: I have two versions of this answer. First, there is no real data set that we can use to measure this, so we create proxy data sets. We have data that we have collected through the TraffickCam app. We take subsets of them and put large blobs into them, which we erase and measure the fraction of time that we correctly predict which hotel they are from.
So these images look as similar to the victims as we can make them look. However, they still don't necessarily look exactly the same as the images of the victims, right? This is the best kind of quantitative indicator that we can come up with.
And then we work a lot with [NCMEC] to understand how the system works for them. We learn about instances where they can successfully and unsuccessfully use our tool. Honestly, some of the most helpful feedback we get from them is when they tell us, “I tried searching, but it didn’t work.”
Was the coincidence of the hotel's positive image actually used to help victims of human trafficking?
Stylianou: It's always difficult for me to talk about these things, partly because I have small children. It's frustrating and I don't want to take the worst thing that's ever happened to anyone and tell it as our positive story.
However, there are cases that we know about. I recently heard something from NCMEC analysts that really encouraged me about why I do what I do.
There was a case where there was a live broadcast. And it was a small child who was attacked in a hotel. NCMEC was alerted to this. Analysts trained to use TraffickCam took a screenshot, connected it to our system, got the result of what hotel it was, sent law enforcement and were able to rescue the child.
I'm very, very lucky that I'm working on something that makes a real difference and that we can make a difference.
Articles from your site
Related articles on the Internet






