Show simple item record

One Size Fits All? A Simple Technique to Perform Several NLP Tasks

dc.contributor.authorGayo Avello, Daniel 
dc.contributor.authorÁlvarez Gutiérrez, Dario 
dc.contributor.authorGayo Avello, José
dc.identifier.citationLecture Notes in Computer Science, 3230, p. 267-279 (2004); doi:10.1007/978-3-540-30228-5_24spa
dc.descriptionEsTAL - España for NATURAL LANGUAGE PROCESSING, 2004spa
dc.description.abstractWord fragments or n-grams have been widely used to perform different Natural Language Processing tasks such as information retrieval [1] [2], document categorization [3], automatic summarization [4] or, even, genetic classification of languages [5]. All these techniques share some common aspects such as: (1) documents are mapped to a vector space where n-grams are used as coordinates and their relative frequencies as vector weights, (2) many of them compute a context which plays a role similar to stop-word lists, and (3) cosine distance is commonly used for document-to-document and query-to-document comparisons. blindLight is a new approach related to these classical n-gram techniques although it introduces two major differences: (1) Relative frequencies are no more used as vector weights but replaced by n-gram significances, and (2) cosine distance is abandoned in favor of a new metric inspired by sequence alignment techniques although not so computationally expensive. This new approach can be simultaneously used to perform document categorization and clustering, information retrieval, and text summarization. In this paper we will describe the foundations of such a technique and its application to both a particular categorization problem (i.e., language identification) and information retrieval
dc.format.extentp. 267-279spa
dc.relation.ispartofLecture Notes in Computer Sciencespa
dc.rights(c) Springer
dc.titleOne Size Fits All? A Simple Technique to Perform Several NLP Tasksspa

Files in this item


There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record