English español

Repositorio de la Universidad de Oviedo. > Producción Bibliográfica de UniOvi: RECOPILA > Artículos >

Please use this identifier to cite or link to this item: http://hdl.handle.net/10651/8140

Title: Application of Variable Length N-gram Vectors to Monolingual and Bilingual Information Retrieval
Author(s): Gayo Avello, Daniel
Álvarez Gutiérrez, Dario
Gayo Avello, José
Issue date: 2005
Publisher: Springer
Publisher version: http://dx.doi.org/10.1007/11519645_7
Citation: Lecture Notes in Computer Science, 3491, p. 73-82(2005); doi:10.1007/11519645_7
Format extent: p. 73-82
Abstract: Our group in the Department of Informatics at the University of Oviedo has participated, for the first time, in two tasks from CLEF: monolingual (Russian) and bilingual (Spanish-to-English) information retrieval. Our main goal was to test the application to IR of a modified version of n-gram vector space model (codenamed blindLight). This new approach has been successfully applied to other NLP tasks such as language identification or text summarization and the results achieved at CLEF'04, although not exceptional, are encouraging. Major differences between the blindLight approach and classical techniques are two: (1) relative frequencies are no more used as vector weights but replaced by n-gram significances, and (2) cosine distance is abandoned in favor of a new metric inspired by sequence alignment techniques although not so computationally expensive. In order to perform cross-language IR we have developed a naive n-gram pseudo-translator similar to those described by McNamee and Mayfield or Pirkola et al.
URI: http://www.di.uniovi.es/~dani/downloads/dgayo-clef04.pdf
ISSN: 0302-9743
Appears in Collections:Artículos

Files in This Item:

There are no files associated with this item.

Exportar a Mendeley

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.


Base de Datos de Autoridades Biblioteca Universitaria Consultas / Sugerencias