La lecture en ligne est gratuite
Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

Partagez cette publication

IMAGE INTELLIGENCE: MAKING VISUAL CONTENT PREDICTIVE Including 30 use cases for image intelligence in the enterprise
By Susan Etlinger, Analyst Altimeter, a Prophet Company July 18, 2016
EXECUTIVE SUMMARY
People no longer communicate online simply via written content, such as posts and comments; they upload and share billions of photos every day. This can be both exciting and terrifying from a brand perspective, because approximately 80% of images that include one or more logos do not directly refer to the brand with associated text. As a result, organizations are missing the content and meaning of images and are unable to act on the opportunities or risks they present.
Companies ranging from technology startups to industry Goliaths, such as Facebook and GoogLe, are deveLopIng technoLogIes that use artIicIaL InteLLIgence to anaLyze the content of images. Increasingly, they’re applying analytics to images to better understand their impact on the business. But the opportunity for organizations to make sense of images isn’t just about recognition and analysis; it’s about image intelligence: the ability to detect and analyze images, incorporate them with other data sources, and develop predictive models to forecast and act on emerging trends.
This report lays out the current market opportunities, challenges and use cases for image intelligence and offers recommendations for organizations that wish to unlock the predictive potential of visual content.
TABLE OF CONTENTS
Executive Summary The Rise of Visual Media How Do Computers See? From Computer Vision to Image Intelligence The Business Value of Image Intelligence Privacy, Trust and Customer Experience Challenges of Image Intelligence A Look at the Future Recommendations Endnotes Methodology Brands, Researchers, Agencies and Industry Experts (10)  Technology Vendors (17)  Social & Digital Media Technology Platforms (3) Acknowledgments About Us
www.altimetergroup.com|@setlinger|susan@altimetergroup.com
2 3 7 12 14 24 26 29 30 33 34 34 34 34 35 36
2
THE RISE OF VISUAL MEDIA
“I see more and more people sharing images and getting away from text; look at the explosion of memes and emoji. It’s becoming a more and more complex environment, how people are communicating over social media.”
— Glen Szczypka, Deputy Director, Health Media Collaboratory, National Opinion Research Center at the University of Chicago
www.altimetergroup.com|@setlinger|susan@altimetergroup.com
3
The ubiquity of smartphone cameras, combined with increasing use of social networks, has led to an explosion in picture taking and photo sharing. According to Mary Meeker’s 2016 Internet Trends report, people share and upload over 3 billion images every day on Facebook properties (Facebook, Messenger, WhatsApp and Instagram) and Snapchat alone (see Figure 1).
FIGURE 1IMAGE GROWTH REMAINS STRONG, SAYS MARY MEEKER’S INTERNET TRENDS REPORT
Source: Snapchat, Company disclosed information, KPCB estimates Note: Snapchat data includes images and video. Snapchat stories are a compilation of images and video. WhatsApp data estimated based on average of photos shared disclosed in Q1:15 and Q1:16. Instagram data per Instagram press release. Messenger data per Facebook (~9.5B photos per months). Facebokk shares ~2B photos per day across Facebook, Instagram, Messenger, and WhatsApp (2015)
In addition to sparking trends and conversations, photo sharing is driving technology innovation. Markets and Markets, a research îrm, expects the image-recognition market to reach nearly $30 1 billion by 2020, driven in large part by sharing via social media.
Image recognition— what Gartner deînes as “technologies [that] strive to identify objects, people, buildings, places, logos, and anything else that has value to consumers and enterprises” — is just the îrst step in deriving insight from and acting on images, however. The next step is to analyze them to better understand their context and impact.
The photo on the following page provides a good example (see Figure 2).
www.altimetergroup.com|@setlinger|susan@altimetergroup.com
4
A human can easily interpret this photo as a woman playing tennis at the U.S. Open. If she is a tennis fan, she may even recognize Ana Ivanovic. But a computer simply “sees” a collection of pixels that it must then classify into objects (a woman, a tennis racket, some logos, and so on). It then must interpret those objects to infer meaning: a woman playing in an event in the US Open Series, sponsored by Sony Ericsson and Olympus.
FIGURE 2MEASURING THE VALUE OF IMAGES
The value of this photo to a brand such as Sony Ericsson or Olympus is its effectiveness at reaching as broad an audience as possible. When this photo is shared in social or digital channels, however, it is unlikely to include any explicit brand mention such as a hashtag or caption. But for brands that sponsor sporting events, the ability of computers to detect these types of brand mentions can be extremely valuable tools for measuring calculate sharing behavior, reach and, ultimately, sponsorship ROI.
www.altimetergroup.com|@setlinger|susan@altimetergroup.com
5
Computer vision enables images to become a source of actionable insight for brands.
Images have the power to capture emotion and call people to action in a way that words often cannot. They can ignite product sales, as when a young mother named Candace Payne recorded 2 a video of herself laughing hysterically in a Chewbacca mask in her car in a Kohl’s parking lot. The video immediately went viral and became, with over 137 million views, the most-watched Facebook Live video ever, causing the mask to sell out across multiple channels. In addition, although photos or videos may carry diverse cultural connotations, they are also far more universal than language — a signiîcant asset for global brands.
“The growth of media inside the Twitter stream has been enormous,” says Chris Moody, Vice President Data Strategy at Twitter. “Just as people originally said they wanted to listen to social content, they are increasingly asking to see images — pictures of their products and pictures of their logos. This is repeating with images what we saw with text analytics: PR and crisis management, photos of products and competitors’ products. They want to understand what those images are.”
The opportunity for organizations to make sense of images isn’t just about recognition and analysis, however; it’s about image intelligence — the ability to detect and analyze images, develop predictive models based upon them, and use these models in context with other data sources to forecast and act on emerging trends, develop business cases, detect and mitigate crises, and a host of other uses.
Before we discuss the use cases for image intelligence, it’s important to understand a bit about how it works.
www.altimetergroup.com|@setlinger|susan@altimetergroup.com
6
HOW DO COMPUTERS SEE?
The îrst thing to understand about image recognition technology is that it is kind of a 3 paradox. For many people, there is nothing more natural than to look at an object and perceive that it is a daisy, or a group of people at the NBA înals, or a woman riding a bicycle. From an engineering perspective, however, the process of perception is highly complex.
DEEP LEARNING TEACHES COMPUTERS ABOUT IMAGES
To enable a computer to recognize and classify an image (e.g., to “see”), requires a technique known asdeep learning. Deep learning consists of running data through an “artiîcial neural network”; basically, a software program that roughly simulates the behavior of neurons in the brain. Deep learning translates things people can easily perceive into something computers can recognize and interpret.
The goal of deep learning is to train the software to classify future data; for example, to 4 distinguish a cat from a dog, a beagle from an Irish Setter, and so on. A recent article inMIT Technology Reviewputs it this way: “Deep learning is why you can search images stored in Google Photos using keywords and why Facebook recognizes your friends in photos before 5 you’ve tagged them.”
www.altimetergroup.com|@setlinger|susan@altimetergroup.com
7
Deep learning and neural networks aren’t new; variations of this technique have been used 6 since the 1950s. But the rise in photo sharing and the desire to identify, interpret and act on visual content have provided a new application for neural networks: understanding not only the objects contained in an image, but their context and meaning. Figure 3 shows two sets of images: backpacks and beagles. Below each image is a number that denotes the estimated accuracy (the “conîdence level”) of the image. The higher the number, the higher the probability that the classiîcation is accurate. You can begin to understand some of the complexity involved in deep learning. We know, without having to be told, that color is not a meaningful attribute for a backpack; it can be red or blue or plaid or camouage, and it’s still a backpack.
FIGURE 3HOW NEURAL NETWORKS CLASSIFY OBJECTS
Source: Ditto
www.altimetergroup.com|@setlinger|susan@altimetergroup.com
8
But color is an important attribute for classifying beagles. Yet even the human eye isn’t infallible; a meme that circulated on Reddit in early 2016 — “puppy or bagel?” — showed just how easy it is 7 to fool the eye (and potentially a computer) under the right circumstances (see Figure 4).
FIGURE 4PUPPY OR BAGEL? FOOLING COMPUTER AND HUMAN VISION
Source: Imgur
While this is a humorous example, incorrect classiîcation — what data scientists call “false positives” and “false negatives” — can have damaging consequences. Last year, Flickr’s auto-tagging feature mistakenly identiîed a photograph of an iron gate in front of a îeld as a “jungle 8 gym.” In fact, it was a concentration camp. So, even with deep learning, some images are harder to interpret than others.
www.altimetergroup.com|@setlinger|susan@altimetergroup.com
9
On a daily basis, most of the photos we see don’t simply include a single object in a photograph: That would be too easy. We usually see objects grouped together in scenes: the photo of Ana Ivanovic at the U.S. Open, a group of people playing beach volleyball, and so on. Figure 5 provides a framework for understanding the kinds of things computers can identify using neural networks: objects, scenes, attributes and even emotion. This is just a sampling for illustrative purposes; there are many more examples, some of which are easier to recognize than others.
FIGURE 5HOW NEURAL NETWORKS CLASSIFY IMAGES
Objects = binoculars
OBJECTS · Logos · Faces and other parts  of the body · Trees, Lakes, Oceans · People · Transportation · Animals and Breeds
· Bridges,Monuments
· Sports fields and facilities
· Hotels, hospitals,  office buildings
Scene = outdoors
SCENE · Sporting Events  · Basketball Game  · Football Game  · Golf Tournament  · Tennis Match  · Marathon · Stores · Parks · Concerts · Weddings · Parties · Protests · Beach · Mall
www.altimetergroup.com|@setlinger|susan@altimetergroup.com
Emotion = happy
Attributes = green, yellow, navy
ATTRIBUTES · Quantity, Color, Size,  Shape, Gender · Image Prominence · Weather  · Sunny  · Rainy  · Snowing · Season or Time of Day  · Afternoon  · Sunset  · Winter  · Summer · Style  · Luxury  · Cozy  · Fashionable  · Bridal · Sexual, Violent
EMOTION · Happiness · Sadness · Excitement · Anger
10
THE IMPORTANCE OF TRAINING DATA
At its most fundamental, a set of images is a collection of data. To interpret the data, you need to run it through a neural network. But, like people, neural networks can’t learn in a vacuum; they have to be trained with examples — the more, the better — to help them distinguish things from each other. This is calledmachine learning, and it is how, in Figure 3, the Ditto network was able to identify a set of 10 beagles with at least 98% probability. But consider this: If the network had never “seen” other dog breeds, how would it know that not every dog is a beagle? There is an old saying: “If all you have is a hammer, everything is a nail.” With deep learning, if all you have is a hammer,everythingis a hammer.
To teach a neural network the difference between hammers and other types of tools, you need to show it lots of data, known astraining data. This means showing it at least hundreds and ideally thousands of different images of tools and objects other than hammers: wrenches, screwdrivers and so on. We know this intuitively when we see small children start to use language. At îrst, every animal is a “doggie”; over time, children learn to distinguish different types of animals from each other.
Human beings do this unconsciously, but with computers, we need to use different training data for different purposes. Training data with thousands of pictures of dogs won’t help a network learn to recognize shoes, much less distinguish a sneaker from a stiletto. As a result, there is no dataset that is optimized for every possible use. Michael Jones, Director of Product Management and Data Science at Salesforce, says, “There is no generic data set for training data. Don’t assume it’s omnipotent.”
Why is this so important?
Because, when evaluating image-recognition technology, how well the system learns from data — not what it “knows” at the outset — is where the real value lies. In humans, we think of this as an aspect of intelligence; in computers, we call itartiIcial intelligence(AI). AI-based systems require an agile and collaborative mindset and the willingness to evaluate the technology to see how well it suits a particular use case before making a decision.
www.altimetergroup.com|@setlinger|susan@altimetergroup.com
11
Un pour Un
Permettre à tous d'accéder à la lecture
Pour chaque accès à la bibliothèque, YouScribe donne un accès à une personne dans le besoin