Data from

Open-source code repositories are a valuable asset to creating different kinds of tools and services, utilizing machine learning and probabilistic reasoning. Syntactic models process Abstract Syntax Trees (AST) of source code to build systems capable of predicting different software properties. The main difficulty of building such models comes from the heterogeneous and compound structures of ASTs, and that traditional machine learning algorithms require instances to be represented as n-dimensional vectors rather than trees. In this article, we propose a new approach to classify ASTs using traditional supervised-learning algorithms, where a feature learning process selects the most representative syntax patterns for the child subtrees of different syntax constructs. Those syntax patterns are used to enrich the context information of each AST, allowing the classification of compound heterogeneous tree structures. The proposed approach is applied to the problem of labeling the expertise level of Java programmers. The system is able to label expert and novice programs with an average accuracy of 99.6%. Moreover, other code fragments such as types, fields, methods, statements and expressions could also be classified, with average accuracies of 99.5%, 91.4%, 95.2%, 88.3% and 78.1%, respectively.

Descripción:

Data from the article "F. Ortin, O. Rodriguez-Prieto, N. Pascual, M. Garcia. Heterogeneous tree structure classification to label Java programmers according to their expertise level. Future Generation Computer Systems (105), pp. 380-394, 2020. https://doi.org/10.1016/j.future.2019.12.016"

Patrocinado por:

This work has been partially funded by the Spanish Department of Science, Innovation and Universities: project RTI2018-099235-B-I00. The authors have also received funds from the University of Oviedo through its support of official research groups (GR-2011-0040).

Colecciones

Datos de investigación [84]
Informática [875]
Investigaciones y Documentos OpenAIRE [8420]

Ficheros en el ítem

Dataset (187.3Mb)

Repositorio Institucional de la Universidad de Oviedo

Data from "Heterogeneous tree structure classification to label Java programmers according to their expertise level"

Autor(es) y otros:

Palabra(s) clave:

Fecha de publicación:

Resumen:

Descripción:

URI:

DOI:

Enlace a recurso relacionado:

Patrocinado por:

Colecciones

Ficheros en el ítem

Métricas

Compartir

Estadísticas de uso

Metadatos