RUO Principal

Repositorio Institucional de la Universidad de Oviedo

Ver ítem 
  •   RUO Principal
  • Producción Bibliográfica de UniOvi: RECOPILA
  • Artículos
  • Ver ítem
  •   RUO Principal
  • Producción Bibliográfica de UniOvi: RECOPILA
  • Artículos
  • Ver ítem
    • español
    • English
JavaScript is disabled for your browser. Some features of this site may not work without it.

Listar

Todo RUOComunidades y ColeccionesPor fecha de publicaciónAutoresTítulosMateriasxmlui.ArtifactBrowser.Navigation.browse_issnPerfil de autorEsta colecciónPor fecha de publicaciónAutoresTítulosMateriasxmlui.ArtifactBrowser.Navigation.browse_issn

Mi cuenta

AccederRegistro

Estadísticas

Ver Estadísticas de uso

AÑADIDO RECIENTEMENTE

Novedades
Repositorio
Cómo publicar
Recursos
FAQs

One Size Fits All? A Simple Technique to Perform Several NLP Tasks

Autor(es) y otros:
Gayo Avello, DanielAutoridad Uniovi; Álvarez Gutiérrez, DarioAutoridad Uniovi; Gayo Avello, José
Fecha de publicación:
2004
Editorial:

Springer

Versión del editor:
http://dx.doi.org/10.1007/978-3-540-30228-5_24
Citación:
Lecture Notes in Computer Science, 3230, p. 267-279 (2004); doi:10.1007/978-3-540-30228-5_24
Descripción física:
p. 267-279
Resumen:

Word fragments or n-grams have been widely used to perform different Natural Language Processing tasks such as information retrieval [1] [2], document categorization [3], automatic summarization [4] or, even, genetic classification of languages [5]. All these techniques share some common aspects such as: (1) documents are mapped to a vector space where n-grams are used as coordinates and their relative frequencies as vector weights, (2) many of them compute a context which plays a role similar to stop-word lists, and (3) cosine distance is commonly used for document-to-document and query-to-document comparisons. blindLight is a new approach related to these classical n-gram techniques although it introduces two major differences: (1) Relative frequencies are no more used as vector weights but replaced by n-gram significances, and (2) cosine distance is abandoned in favor of a new metric inspired by sequence alignment techniques although not so computationally expensive. This new approach can be simultaneously used to perform document categorization and clustering, information retrieval, and text summarization. In this paper we will describe the foundations of such a technique and its application to both a particular categorization problem (i.e., language identification) and information retrieval tasks.

Word fragments or n-grams have been widely used to perform different Natural Language Processing tasks such as information retrieval [1] [2], document categorization [3], automatic summarization [4] or, even, genetic classification of languages [5]. All these techniques share some common aspects such as: (1) documents are mapped to a vector space where n-grams are used as coordinates and their relative frequencies as vector weights, (2) many of them compute a context which plays a role similar to stop-word lists, and (3) cosine distance is commonly used for document-to-document and query-to-document comparisons. blindLight is a new approach related to these classical n-gram techniques although it introduces two major differences: (1) Relative frequencies are no more used as vector weights but replaced by n-gram significances, and (2) cosine distance is abandoned in favor of a new metric inspired by sequence alignment techniques although not so computationally expensive. This new approach can be simultaneously used to perform document categorization and clustering, information retrieval, and text summarization. In this paper we will describe the foundations of such a technique and its application to both a particular categorization problem (i.e., language identification) and information retrieval tasks.

Descripción:

EsTAL - España for NATURAL LANGUAGE PROCESSING, 2004

URI:
http://hdl.handle.net/10651/10565
ISSN:
0302-9743
DOI:
10.1007/978-3-540-30228-5_24
Colecciones
  • Artículos [37549]
Ficheros en el ítem
Métricas
Compartir
Exportar a Mendeley
Estadísticas de uso
Estadísticas de uso
Metadatos
Mostrar el registro completo del ítem
Página principal Uniovi

Biblioteca

Contacto

Facebook Universidad de OviedoTwitter Universidad de Oviedo
El contenido del Repositorio, a menos que se indique lo contrario, está protegido con una licencia Creative Commons: Attribution-NonCommercial-NoDerivatives 4.0 Internacional
Creative Commons Image