English español

Repositorio de la Universidad de Oviedo > Trabajos académicos > Trabajos Fin de Máster >

Please use this identifier to cite or link to this item: http://hdl.handle.net/10651/19364

Title: Opinion Mining in Web 2.0
Author(s): Pérez Gallego, Pablo José
Advisor: Díez Peláez, Jorge
Luaces Rodríguez, Óscar
Keywords: Sentiment analysis
Opinion mining
Machine Learning
Feature Selection
Issue date: Jul-2012
Series/Report no.: Máster Universitario en Soft Computing y Análisis Inteligente de Datos
Abstract: During the last years we are assisting to an intense Web transformation process. It is no longer a mere static information repository but a dynamic system in which users have become the main content contributors. They actively participate in sharing their opinions, thoughts and views about products, events and almost anything in social networks, forums, blogs, etc. With the latest advances in mobile technologies, users can actually interact anytime from anywhere; real time information has become a reality. All these mixture of social networks, discussion groups, forums and blogs are collectively called the user-generated content. It has many practical applications and has a potential major value from both the user and business points of view. On one hand, knowing other user opinions is useful when having to take a decision in our daily life. On the other hand, it is an invaluable information source about user preferences and tastes. Due to the large and diverse number of opinion sources, it appears the necessity of systems able to automatically discover, analyze and summarize the expressed sentiment in the so- called user-generated content. Sentiment analysis grows out of this need. It focuses on the computational study of people's opinions, appraisals, and emotions toward entities, events and their properties. In the first three chapters of this document we introduce the problem of sentiment analysis, describing its main characteristics and di culties, we brie y present the main theoretical background of the realized work, and we provide the reader with an exhaustive literature review, analyzing the previous related works in the literature. Afterwards, we face a sentiment classification problem consisting in learning to classify a series of movie reviews, as positive or negative, in function of the sentiment expressed by the author. In chapter 4 we present the dataset and its main properties, together with all the preprocess steps we have applied to the text movie reviews in order to obtain valuable representations. We also present the methodology we used to execute the experiments and to estimate the performance of the proposed approaches. In chapter 5 we describe our solutions to the problem, we present the details of all the performed experiments and evaluate and discuss the obtained results. As baseline we have reproduced an extensive part of the experiments presented in [Pang et al., 2002]. As follows we propose a series of feature reduction approaches, with the objective of selecting a reduced and representative vocabulary of the movie review domain. Finally, we propose a novel method based on measuring word cooccurrence information in order to obtain a "meaning" representation of the text documents.
URI: http://hdl.handle.net/10651/19364
Appears in Collections:Trabajos Fin de Máster

Files in This Item:

File Description SizeFormat
TFM_Pablo_Perez_TFM.pdf2,18 MBAdobe PDFView/Open

Exportar a Mendeley

This item is licensed under a Creative Commons License
Creative Commons

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.


Base de Datos de Autoridades Biblioteca Universitaria Consultas / Sugerencias