Mostrar el registro sencillo del ítem
Data from "Heterogeneous tree structure classification to label Java programmers according to their expertise level"
dc.contributor.author | Ortín Soler, Francisco | |
dc.contributor.author | Rodríguez Prieto, Óscar | |
dc.contributor.author | Pascual, Nicolás | |
dc.contributor.author | García Rodríguez, Miguel | |
dc.date.accessioned | 2024-01-16T07:46:28Z | |
dc.date.available | 2024-01-16T07:46:28Z | |
dc.date.issued | 2019-06-13 | |
dc.identifier.uri | https://hdl.handle.net/10651/70832 | |
dc.description | Data from the article "F. Ortin, O. Rodriguez-Prieto, N. Pascual, M. Garcia. Heterogeneous tree structure classification to label Java programmers according to their expertise level. Future Generation Computer Systems (105), pp. 380-394, 2020. https://doi.org/10.1016/j.future.2019.12.016" | spa |
dc.description.abstract | Open-source code repositories are a valuable asset to creating different kinds of tools and services, utilizing machine learning and probabilistic reasoning. Syntactic models process Abstract Syntax Trees (AST) of source code to build systems capable of predicting different software properties. The main difficulty of building such models comes from the heterogeneous and compound structures of ASTs, and that traditional machine learning algorithms require instances to be represented as n-dimensional vectors rather than trees. In this article, we propose a new approach to classify ASTs using traditional supervised-learning algorithms, where a feature learning process selects the most representative syntax patterns for the child subtrees of different syntax constructs. Those syntax patterns are used to enrich the context information of each AST, allowing the classification of compound heterogeneous tree structures. The proposed approach is applied to the problem of labeling the expertise level of Java programmers. The system is able to label expert and novice programs with an average accuracy of 99.6%. Moreover, other code fragments such as types, fields, methods, statements and expressions could also be classified, with average accuracies of 99.5%, 91.4%, 95.2%, 88.3% and 78.1%, respectively. | spa |
dc.description.sponsorship | This work has been partially funded by the Spanish Department of Science, Innovation and Universities: project RTI2018-099235-B-I00. The authors have also received funds from the University of Oviedo through its support of official research groups (GR-2011-0040). | spa |
dc.language.iso | eng | spa |
dc.relation.isreferencedby | F. Ortin, O. Rodriguez-Prieto, N. Pascual, M. Garcia. Heterogeneous tree structure classification to label Java programmers according to their expertise level. Future Generation Computer Systems (105), pp. 380-394, 2020. https://doi.org/10.1016/j.future.2019.12.016 | spa |
dc.rights | Open Data Commons Attribution License (ODC-By) | spa |
dc.rights.uri | https://opendatacommons.org/licenses/by/ | |
dc.subject | Big code | spa |
dc.subject | Machine learning | spa |
dc.subject | Syntax patterns | spa |
dc.subject | Abstract syntax trees | spa |
dc.subject | Programmer expertise | spa |
dc.subject | Decision trees | spa |
dc.subject | Big data | spa |
dc.title | Data from "Heterogeneous tree structure classification to label Java programmers according to their expertise level" | spa |
dc.type | dataset | spa |
dc.identifier.doi | 10.17811/ruo_datasets.70832 | |
dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/RTI2018-099235-B-I00/ES/MODELADO DE USUARIO PARA PERSONALIZACION DE INTERFAZ GUIADO POR ANALISIS AUTOMATICO DE PATRONES DE COMPORTAMIENTO/ | spa |
dc.relation.projectID | info:eu-repo/grantAgreement/University of Oviedo/Plan Propio 2019 - Grants for the maintenance of research activities/GR-2011-0040/ES/Computational Reflection Research Group/ | spa |
dc.rights.accessRights | open access | spa |
dc.relation.ispartofURI | http://hdl.handle.net/10651/54618 | |
dc.publication.year | 2019 |
Ficheros en el ítem


Este ítem aparece en la(s) siguiente(s) colección(ones)
-
Datos de investigación [79]
Esta colección contiene los datos primarios recopilados o generados en el transcurso de un proyecto de investigación. -
Informática [873]
-
Investigaciones y Documentos OpenAIRE [8377]
Publicaciones resultado de proyectos financiados con fondos públicos