RUO Home

Repositorio Institucional de la Universidad de Oviedo

View Item 
  •   RUO Home
  • Investigación
  • Datos de investigación
  • View Item
  •   RUO Home
  • Investigación
  • Datos de investigación
  • View Item
    • español
    • English
JavaScript is disabled for your browser. Some features of this site may not work without it.

Browse

All of RUOCommunities and CollectionsBy Issue DateAuthorsTitlesSubjectsxmlui.ArtifactBrowser.Navigation.browse_issnAuthor profilesThis CollectionBy Issue DateAuthorsTitlesSubjectsxmlui.ArtifactBrowser.Navigation.browse_issn

My Account

LoginRegister

Statistics

View Usage Statistics

RECENTLY ADDED

Last submissions
Repository
How to publish
Resources
FAQs

Data from "Heterogeneous tree structure classification to label Java programmers according to their expertise level"

Author:
Ortín Soler, FranciscoUniovi authority; Rodríguez Prieto, ÓscarUniovi authority; Pascual, Nicolás; García Rodríguez, MiguelUniovi authority
Subject:

Big code

Machine learning

Syntax patterns

Abstract syntax trees

Programmer expertise

Decision trees

Big data

Publication date:
2019-06-13
Abstract:

Open-source code repositories are a valuable asset to creating different kinds of tools and services, utilizing machine learning and probabilistic reasoning. Syntactic models process Abstract Syntax Trees (AST) of source code to build systems capable of predicting different software properties. The main difficulty of building such models comes from the heterogeneous and compound structures of ASTs, and that traditional machine learning algorithms require instances to be represented as n-dimensional vectors rather than trees. In this article, we propose a new approach to classify ASTs using traditional supervised-learning algorithms, where a feature learning process selects the most representative syntax patterns for the child subtrees of different syntax constructs. Those syntax patterns are used to enrich the context information of each AST, allowing the classification of compound heterogeneous tree structures. The proposed approach is applied to the problem of labeling the expertise level of Java programmers. The system is able to label expert and novice programs with an average accuracy of 99.6%. Moreover, other code fragments such as types, fields, methods, statements and expressions could also be classified, with average accuracies of 99.5%, 91.4%, 95.2%, 88.3% and 78.1%, respectively.

Open-source code repositories are a valuable asset to creating different kinds of tools and services, utilizing machine learning and probabilistic reasoning. Syntactic models process Abstract Syntax Trees (AST) of source code to build systems capable of predicting different software properties. The main difficulty of building such models comes from the heterogeneous and compound structures of ASTs, and that traditional machine learning algorithms require instances to be represented as n-dimensional vectors rather than trees. In this article, we propose a new approach to classify ASTs using traditional supervised-learning algorithms, where a feature learning process selects the most representative syntax patterns for the child subtrees of different syntax constructs. Those syntax patterns are used to enrich the context information of each AST, allowing the classification of compound heterogeneous tree structures. The proposed approach is applied to the problem of labeling the expertise level of Java programmers. The system is able to label expert and novice programs with an average accuracy of 99.6%. Moreover, other code fragments such as types, fields, methods, statements and expressions could also be classified, with average accuracies of 99.5%, 91.4%, 95.2%, 88.3% and 78.1%, respectively.

Description:

Data from the article "F. Ortin, O. Rodriguez-Prieto, N. Pascual, M. Garcia. Heterogeneous tree structure classification to label Java programmers according to their expertise level. Future Generation Computer Systems (105), pp. 380-394, 2020. https://doi.org/10.1016/j.future.2019.12.016"

URI:
https://hdl.handle.net/10651/70832
DOI:
10.17811/ruo_datasets.70832
Enlace a recurso relacionado:
http://hdl.handle.net/10651/54618
Patrocinado por:

This work has been partially funded by the Spanish Department of Science, Innovation and Universities: project RTI2018-099235-B-I00. The authors have also received funds from the University of Oviedo through its support of official research groups (GR-2011-0040).

Collections
  • Datos de investigación [79]
  • Informática [873]
  • Investigaciones y Documentos OpenAIRE [8377]
Files in this item
untranslated
Dataset (187.3Mb)
untranslated
Readme (3.473Kb)
Métricas
Compartir
Exportar a Mendeley
Estadísticas de uso
Estadísticas de uso
Metadata
Show full item record
Página principal Uniovi

Biblioteca

Contacto

Facebook Universidad de OviedoTwitter Universidad de Oviedo
The content of the Repository, unless otherwise specified, is protected with a Creative Commons license: Attribution-Non Commercial-No Derivatives 4.0 Internacional
Creative Commons Image