By Jackie Swift
When you think about computers, you probably don’t consider the phenomenon of human perception. After all, what do our sensory abilities and brain processes have to do with the logic of a computer? Quite a lot, says Kavita Bala, professor of computer science and dean of the Cornell University Ann S. Bowers College of Computing and Information Science. For her, the mysteries of human perception have inspired more than a decade of groundbreaking work in computer graphics and computer vision.
Bala began her research by focusing on physics-based modeling and rendering. Her work helped to pioneer a new approach to computer graphics that takes its cue from the way humans visually perceive the world. “In the real world we have photons bouncing everywhere, but we don’t perceive every single photon in its full glory,” she says. “We get a sort of gestalt of the scene. What we perceive is quite a reduced form of the actual complexity.”
Structure: The Key to Accurate Rendering
Bala worked with collaborators to research and create models that accurately render fabric in computer-generated imagery. When the researchers began their work, rendering algorithms were able to produce only approximations of what a material might look like in a particular scene — for instance, the appearance of a silk dress or a velvet shirt, Bala explains. “They never quite looked the way they were supposed to look,” she says.
Bala and her colleagues asked how we know silk from velvet. “Silk is shiny, and velvet is fuzzy with a characteristic way that it reflects light,” Bala says. “That’s how we tell them apart. And the reason they look that way is because of the structure of the materials.”
Armed with that idea, the researchers took micro-computed tomography (micro-CT) scans of the materials to get micron-resolution detail of their structure. “Structure is geometric information, not optical information,” Bala says. “It’s not only about reflection; it’s about light interacting with the shape of the material. That was our key contribution, the understanding that if you capture the structure well you can create algorithms that produce realistic material appearance automatically. Over a series of years, we came up with better and better algorithms to render the materials until finally we have gorgeous models of these materials looking like they do in the real world.”
GrokStyle: Getting the Picture
Based on her interest in perception, Bala also began to explore computer vision — the ability of computer algorithms to know what they’re looking at. Once again she turned to human perception as the basis for her research, asking how we recognize what is in an image, as well as how we use that recognition to understand the world.
Working with Sean Bell, PhD ’16 Computer Science, now at Meta (formerly Facebook Inc.), Bala looked at the way furniture was presented online through photographs posted to sites like Flickr and online design sites. The researchers soon identified an unfulfilled need on these sites: Users would ask what kinds of furniture were shown in the photos. They wanted to know where they could buy the pieces themselves, but that information was not readily available.
“Someone needed to go beyond saying ‘That’s a chair,’” Bala says. “The real expertise is to say, ‘That’s an Eames chair. That’s an IKEA chair.’ And that’s exactly where we felt artificial intelligence could play a positive role.”
Bala and Bell developed neural networks, algorithms that are inspired by networks of neurons in the brain. They took tens of thousands of online images — everything from product catalog photos to images shared on public social networking sites — and showed them to the neural networks they were developing.
“If you have enough images of a thing, you can recognize what it is,” Bala says. “We trained these networks to do fine-grained recognition, where they could accurately identify the type and brand of furniture in an image.”
Eventually Bala and Bell broadened their network’s expertise into fashion. The ultimate result was an artificial intelligence (AI) made up of a set of algorithms that performed better than the state-of-the-art AIs of the time. “We stayed at least twice as accurate as the next best,” Bala says.
Bala took a leave of absence from Cornell in 2016 and cofounded a company with Bell to market an AI-recognition product for furniture and fashion called GrokStyle (from “grok,” a term coined by Robert Heinlein in his 1961 novel Stranger in a Strange Land). Their success led to GrokStyle’s acquisition by Meta. Today, a new-generation AI based on GrokStyle — called GrokNet — runs visual recognition for Facebook’s e-commerce feature, Facebook Marketplace.
Pinpointing Patterns in Fashion
Although AI visual recognition still has a ways to go to become accurate under all conditions, Bala and her group are currently working on another project that assumes it is foolproof. They are exploring the ramifications of the premise that a good recognition algorithm, fed all the images of the world, will be able to pinpoint patterns in the data. As a first test, they created an algorithm they called StreetStyle that could comb through images and identify unique aspects of fashion specific to certain places or times of year.
“We were essentially mining all the world’s images to understand cultural phenomenon, and it was incredibly exciting.”
“Even if you’ve never visited a certain part of the world, by analyzing photos from there, you can get a sense of how people dress in that place,” Bala explains. “We ran our recognition algorithm looking for signature clothing from different parts of the world, and we found all kinds of matches. In Cairo the hijab popped out. In Lagos, it was the gele, a traditional headgear the women wear that’s very distinctive.”
The researchers went on to publish a paper on a more advanced form of the algorithm called GeoStyle. In it, they also looked for spikes, sudden appearances of particular articles or colors of clothing. “Out popped March 17,” Bala says. “Many people in the United States wear green on that day because it’s St. Patrick’s Day. We also saw cultural or sporting events: the Stanley Cup, the World Cup. These were times when fans were either posting pictures of their sports heroes or dressing like them. We were essentially mining all the world’s images to understand cultural phenomenon, and it was incredibly exciting.”
Detecting Global Trends in Climate and Agriculture
Working with researchers at the Cornell Institute of Digital Agriculture, Bala is now applying the latest iteration of the algorithm to satellite images, in an attempt to detect things like global climate changes and trends in crop health.
“Prediction is one of our goals,” she says. “If we see trends that are time-shifted in one place and we start seeing the beginning of that same trend in another location, we may be able to predict where the phenomenon is going to go. There’s an extraordinary amount of visual data about our planet and our lives posted online. We can learn so much about who we are and the state of our planet by analyzing it.”