?????????TTu utorialtorial given aat t WWW-2008, April 21, 2008 in BeijingOpinion MMiningining & Summarization- Sentiment AnalysisBing LiuDepartment of Computer ScienceUniversity of Illinois at Chicagoliub@cs.uic.eduhttp://www.cscs.uicuic.edu/~liubIntroduction – facts and opinionsTwo main types of textual information. FtFacts and OpinionsMost current information processing technique (e.g., ssearchearch engines) wworkork with factsfacts (assume they are true)Facts cancan be expressed wwithith topictopic keywords.keywords.E.g., search engines do not search for opinionsOpppinions are hard to express with a few keywordsHow do people think of Motorola Cell phones?Current search ranking strategy is not appropriate for opinion retrieval/search.Bing Liu, UIC WWW-2008 Tutorial 2??????????????????Introduction – user generated contentWord-of-mouth on the WebOne can express personal experiences and opinions on almost anything, at review sites, forums, discussion groups, blogs ... (called tthehe user generated content.)They contain valuable informationWeb/global scale: No longer – one’s circle of friendsOur interest: to mine opinions expressed in the user-generated contentAn intellectually very challenging problem.Practically very useful. Bing Liu, UIC WWW-2008 Tutorial 3Introduction – ApplicationsBusinesses and organizations ...
Bing Liu Department of Computer Science University of Illinois at Chicago liub cs.uic.edu . . .
Introduction facts and opinions
Two main types of textual information. p n onsac s an Most current information processing technique . ., they are true) E.g., search engines do not search for opinions are hard to ex ress with a few ke wordsO inions How do people think of Motorola Cell phones? search ranking strategy is not appropriate forCurrent .
Bing Liu, UIC WWW-2008 Tutorial
2
Introduction user generated content
Word-of-mouth on the Web One can express personal experiences and opinions on almost anything, at review sites, forums, discussion groups, ... . They contain valuable information Web/ lobal scale: circle of friends er one’sNo lon Our interest:to mine opinions expressed in the user-generated content An intellectually very challenging problem. Practically very useful.
Bing Liu, UIC WWW-2008 Tutorial
Introduction Applications
3
Businesses and organizations: product and service benchmarking. . Business spends a huge amount of money to find consumer sentiments and opinions. , , lsiduandivI: interested in other’s opinions when Purchasing a product or using a service, Finding opinions on political topics, Ads placements:Placing ads in the user-generated content . Place an ad from a competitor if one criticizes a product. Opinion retrieval/search: providing general search for opinions.
Bing Liu, UIC WWW-2008 Tutorial
4
Two types of evaluation
Direct Opinions: sentiment expressions on some objects, e.g., products, events, topics, persons. . ., Subjective similarities or differences of more than one ob ect. Usuall ex ressin an orderin . E.g., “car x is cheaper than car y. Objective or subjective.
Bing Liu, UIC WWW-2008 Tutorial
O inion search(Liu, Web Data Mining book, 2007)
5
Can ou search for o inions as convenientl as general Web search? need to make a decisionWhenever ou ou may want some opinions from others, Wouldn’t it be nice?ou can find them on a search system instantly, by issuing queries such as Opinions: “Motorola cell phones ompar sons: ao oro a vs. o Cannot be done yet! Very hard!
Bing Liu, UIC WWW-2008 Tutorial
6
T ical o inion search ueries
inion o anization or or erson of a inionFind the o holder) on a particular object or a feature of the object. is Bill Clinton’s opinion on abortion?E.g., what Find positive and/or negative opinions on a particular object (or some features of the object),e.g., . public opinions on a political topic. e on an ob ect chan inionsFind how o over time. How object A compares with Object B? Gmail vs. Hotmail
Bing Liu, UIC WWW-2008 Tutorial
Find the opinion of a person on X
, can handle it, i.e., using suitable keywords. ’ Reason: opinion on a particular topic. The o inion is likel in a sin contained le document. Thus, a good keyword query may be sufficient.
Bing Liu, UIC WWW-2008 Tutorial
7
8
Find opinions on an object
We use product reviews as an example: from general Web search. inions for oE. . search on “Motorola RAZR V3 General Web search (for a fact): rank pages according to some authority and relevance scores. per ec s . page rs searc e ee user v ews One fact = Multiple facts r hini n w v r nk i ir l h: r reading only the review ranked at the top is not appropriate because it is only the opinion of one person.
Bing Liu, UIC WWW-2008 Tutorial
Search opinions (contd)
9
Ranking: two ran ngspro uce Positive opinions and negative opinions Some kind of summar # of each ., both, e. of Or, one rankingbut The top (say 30) reviews should reflect the natural distribution , . ., right balance of positive and negative reviews. Questions: Should the user reads all the top reviews? OR Should the system prepare a summary of the reviews?
Bing Liu, UIC WWW-2008 Tutorial
10
Reviews are similar to surveys
Reviews can be regarded as traditional surveys. In traditional survey, returned survey forms are treate as raw ata. Analysis is performed to summarize the survey . E.g., % against or for a particular issue, etc. , Can a summary be produced?
Bing Liu, UIC WWW-2008 Tutorial
Roadmap
Opinion mining the abstraction Document level sentiment classification Feature-based opinion mining and
Comparative sentence and relation
Opinion spam
Bing Liu, UIC WWW-2008 Tutorial
11
12
Opinion mining theabstraction (Hu and Liu, KDD-04; Liu, Web Data Mining book 2007)
Opinion holder: The person or organization that holds a specific opinion on a particular object. ec : c anon w s op n on expresse Opinion: a view, attitude, or appraisal on an object from an opinion holder. Objectives of opinion mining: many ... Let us abstract the problem put existing research into a common framework We useconsumer reviews of productsto develop the . .
Bing Liu, UIC WWW-2008 Tutorial
Object/entity
13
Definition(object): AnobjectOis an entity which topic.Ois represented as a hierarchy ofcomponents,sub-components, and so on. set ofattributesof the component. Ois the root node (which also has a set of attributes) of the node. simplify our discussion, we use “To features to . The term “feature should be understood in abroad sense, feature, topic or sub-topic, event or sub-event, etcProduct o e: ea ure s a so a ec see o.
Bing Liu, UIC WWW-2008 Tutorial
14
Model of a review
An objectOrepresented with a finite set of features,is , , ,n. Each featurefiinFcan be expressed with a finite set of words or phrasesWi, which aresynonyms. {W1,W2, ,Wn} for the features. subset of thefeaturesSj⊆Fof objectO. For each featurefk∈Sjthatjcomments on, he/she chooses a word or phrase fromWkto describe the feature, and ex resses a ositive ne ative or neutralo iniononf.
Bing Liu, UIC WWW-2008 Tutorial
Opinion mining tasks
At the document (or review) level: Classes: positive, negative, and neutral Assumption: each document (or review) focuses on a single opinion from a single opinion holder. At the sentence level: as:en y ng su ec ve sen op n ona e ences Classes: objective and subjective (opinionated) Task 2:sentiment classification of sentences Classes:positive, negative and neutral. Assumption: a sentence contains only one opinion not true in many cases. Then we can also consider clauses or phrases. Bing Liu, UIC WWW-2008 Tutorial
15
16
Opinion mining tasks (contd)
At the feature level: Task1:Identify and extract object features that have been commented on by an opinion holder (e.g., a reviewer). positive, negative or neutral. Task3:Group feature synonyms. Produce a feature-based opinion summary of multiple reviews(more on this later). p n on ers o: a so use u , e.g., s ers oen y in news articles, etc, but they are usually known in th r n r t nt nt i. . th r f th t .
Bing Liu, UIC WWW-2008 Tutorial
More at the feature level
17
Problem 1:BothFandWare unknown. We need to perform all three tasks: Problem 2:Fis known butWis unknown. All three tasks are still needed. Task 3 is easier. It becomes the problem of matching the discovered . Problem 3:Wis known (Fis known too).
F:the set of features
Bing Liu, UIC WWW-2008 Tutorial
18
Roadmap
Opinion mining the abstraction Document level sentiment classification Feature-based opinion mining and
Comparative sentence and relation
Opinion spam
Bing Liu, UIC WWW-2008 Tutorial
Sentiment classification
19
Classify documents (e.g., reviews) based on the (authors), Positive ne ative and ossibl neutral Since in our modelan objectOitself is also a feature, then sentiment classificationessentially determines the opinion . ., . Similar but different from topic-based text classification. In topic-based text classification, topic words are important. In sentiment classification, sentiment words are more , . ., , , , , , .
. automobiles, banks, movies, and travel destinations. The approach: Three steps Part-of-speech tagging phrases) from reviews if their tags conform to some given patterns, e.g., (1) JJ, (2) NN.
Bing Liu, UIC WWW-2008 Tutorial
(SO) of the extracted phrases Use Pointwise mutual information PMI(word1,word2)=log2⎛⎜P(word1∧word2)⎟⎞