Data Documentation for "Discovering related scientific literature beyond semantic similarity: a new co-citation approach" General Information: This data contains the source code and dependencies of the proposed system, the new corpus "corpus_ACL_rel100" created, all the results obtained after applying the proposed system to the ACL corpus, the ACL corpus, the two samples extracted from the DBLP corpus from the research article "O. Rodriguez-Prieto, L. Araujo, J. Martinez-Romo. Discovering related scientific literature beyond semantic similarity: a new co-citation approach. Scientometrics(120) issue 1, pp. 105-127, 2019. https://doi.org/10.1007/s11192-019-03125-9" Name of dataset: Data from the article "O. Rodriguez-Prieto, L. Araujo, J. Martinez-Romo. Discovering related scientific literature beyond semantic similarity: a new co-citation approach. Scientometrics(120) issue 1, pp. 105-127, 2019. https://doi.org/10.1007/s11192-019-03125-9" Name of data files in the data set: source.zip corpus_ACL_rel100.zip all_pvalues_acl.zip ACL.zip DBLP-sample1.zip DBLP-sample2.zip Dataset language: English Date the data set was last modified: 18 December 2018 Funder: The work of Oscar Rodriguez-Prieto was supported by the Spanish Ministry of Education, Culture and Sport under an FPU grant (FPU15/05261). How to cite data: Data from the article "O. Rodriguez-Prieto, L. Araujo, J. Martinez-Romo. Discovering related scientific literature beyond semantic similarity: a new co-citation approach. Scientometrics(120) issue 1, pp. 105-127, 2019. https://doi.org/10.1007/s11192-019-03125-9" Methodology for data collection: Detailed in "O. Rodriguez-Prieto, L. Araujo, J. Martinez-Romo. Discovering related scientific literature beyond semantic similarity: a new co-citation approach. Scientometrics(120) issue 1, pp. 105-127, 2019. https://doi.org/10.1007/s11192-019-03125-9" Data collector(s): Óscar Rodríguez Prieto, rodriguezoscar@uniovi.es; Juan Martínez Romo, juaner@lsi.uned.es; M. Lourdes Araujo Serna, lurdes@lsi.uned.es Date of data collection: 22 September 2017 Person to contact with questions: Óscar Rodríguez Prieto, rodriguezoscar@uniovi.es Data entry: 27 December 2024 Software (including version #) used to prepare data set: Detailed in "O. Rodriguez-Prieto, L. Araujo, J. Martinez-Romo. Discovering related scientific literature beyond semantic similarity: a new co-citation approach. Scientometrics(120) issue 1, pp. 105-127, 2019. https://doi.org/10.1007/s11192-019-03125-9" Data processing that was performed: Detailed in "O. Rodriguez-Prieto, L. Araujo, J. Martinez-Romo. Discovering related scientific literature beyond semantic similarity: a new co-citation approach. Scientometrics(120) issue 1, pp. 105-127, 2019. https://doi.org/10.1007/s11192-019-03125-9" Variables: Detailed in "O. Rodriguez-Prieto, L. Araujo, J. Martinez-Romo. Discovering related scientific literature beyond semantic similarity: a new co-citation approach. Scientometrics(120) issue 1, pp. 105-127, 2019. https://doi.org/10.1007/s11192-019-03125-9" File Overview: source.zip: Source code and dependencies of the system proposed in the article. corpus_ACL_rel100.zip: The new corpus "corpus_ACL_rel100" created after a manual evaluation of 100 pairs of articles from the ACL corpus. all_pvalues_acl.zip: All the results obtained by applying the proposed system to the ACL corpus. ACL.zip: The ACL corpus used in the article, with the citations and metadata of all the articles, and the abstracts extracted. DBLP-sample1.zip: The first sample of abstracts extracted from the DBLP corpus. DBLP-sample2.zip: The second sample of abstracts extracted from the DBLP corpus.