This Machine Can Learn About the World Just by Watching It
It’s a possible step on the road to a total surveillance AI like the one in Person of Interest. Researchers from the University of Washington have developed a computer program that teaches itself everything there is to know about any visual concept — without any human supervision.
Called LEVAN (Learning EVerything About ANything), it’s best described as a hopped-up search engine, but one imbued with powerful algorithms and artificial intelligence. The system, which was developed by computer scientists from the University of Washington and the Allen Institute for Artificial Intelligence in Seattle, searches millions of books and images on the Web to learn all possible variations of visual concept. It then displays the results to a user as a comprehensive, browsable list of images, which allows them to explore and understand topics quickly and in great detail.
The developers want to make LEVAN an open-source program. It will be available both as an educational tool and an information bank for researchers in the computer vision community. Eventually, the team hopes to offer a smartphone app that can parse out and automatically categorize photos.
But let’s not kid ourselves — a program that can supposedly “teach itself everything about anything” hold serious implications for AI. LEVAN brings to mind the Pentagon’s effort to develop computers that can teach themselves. Revealingly, the research for LEVAN was funded by the U.S. Office of Naval Research, as well as the National Science Foundation.
But how is it possible for a machine to learn every visual aspect about any concept. The researchers proposed two main approaches, or axes. The “everything” axis corresponds to all possible appearance variations of a concept, while the “anything” axis corresponds to the span of different concepts for which visual models are to be learned.
The program discovers associations between textual and visual data, learning to match rich sets of phrases with pixels in an image. This way, it can recognize instances of specific concepts when it sees them.
LEVAN learns which terms are relevant by analyzing the content of the images found on the Web and identifying characteristic patterns across them using recognition algorithms. Currently, users can browse a library of about 175 concepts, such as “airline” “window,” “beautiful,” “breakfast,” “shiny,” “cancer,” “innovation,” “skateboarding,” “robot,” and “horse.”
If the concept is not in the library, users can submit a query here. But be prepared to wait; currently, LEVAN is limited in how fast it can learn a concept on account of the tremendous computational power required to crunch each query, which can take as much as 12 hours. The researchers are currently looking at ways to increase LEVAN’s processing speed and capabilities.
After submitting the new query, the program automatically generates an exhaustive list of subcategory images that relate to that concept. For example, a search for “dog” brings up the obvious collection of subcategories: photos of “Chihuahua dog,” “black dog,” “swimming dog,” “scruffy dog,” “greyhound dog” — but also “dog nose,” “dog bowl,” “sad dog,” “ugliest dog,” “hot dog” and even “down dog,” as in the yoga pose.
The system works by searching the text from literally millions of books written in English and available on Google Books. It relentlessly scours every occurrence of the concept in the entire digital library. An algorithm then filters out words that aren’t visual (e.g. “jumping horse” would be included, but “my horse” would not).
After acquiring the relevant phrases, LEVAN performs an image search on the Web, hunting down uniformity in appearance among the photos gathered. As it’s trained to find relevant images, it then recognizes all images associated with the given phrase.
“Major information resources such as dictionaries and encyclopedias are moving toward the direction of showing users visual information because it is easier to comprehend and much faster to browse through concepts,” noted lead researcher Santosh Divvala in a release. “However, they have limited coverage as they are often manually curated. The new program needs no human supervision, and thus can automatically learn the visual knowledge for any concept.”
To date, LEVAN tagged more than 13 million images with 65,000 different phrases.
Given the pace of these developments, it’s reasonable to assume that future systems will not only be capable of learning and teaching themselves, but also be programmed to act on that acquired information or new skills. This could include the design of new products or medicines — or even the refinement of its own programming, leading to the dreaded concept known as recursively improving artificial intelligence.
The researchers will present the project later this month at the Computer Vision and Pattern Recognition annual conference in Columbus, Ohio. Here’s a link to the paper: “Learning Everything About Anything: Webly-Supervised Visual Concept Learning.” (pdf) Supplemental information via University of Washington.
Comments are closed.