Data Documentation for "Heterogeneous tree structure classification to label Java programmers according to their expertise level" General Information: This data contains all the datasets, source code, selected features, model hyperparameters, the syntax patterns found, and the evaluation data from the research article "F. Ortin, O. Rodriguez-Prieto, N. Pascual, M. Garcia. Heterogeneous tree structure classification to label Java programmers according to their expertise level. Future Generation Computer Systems (105), pp. 380-394, 2020. https://doi.org/10.1016/j.future.2019.12.016" Name of dataset: Data from the article "F. Ortin, O. Rodriguez-Prieto, N. Pascual, M. Garcia. Heterogeneous tree structure classification to label Java programmers according to their expertise level. Future Generation Computer Systems (105), pp. 380-394, 2020. https://doi.org/10.1016/j.future.2019.12.016" Name of data files in the data set: patterns.zip datasets.bak features.zip hyper-params.zip output.zip source.zip Dataset language: English Date the data set was last modified: 13 June 2019 Funder: This work has been partially funded by the Spanish Department of Science, Innovation and Universities: project RTI2018-099235-B-I00. The authors have also received funds from the University of Oviedo through its support of official research groups (GR-2011-0040). How to cite data: Data from the article "F. Ortin, O. Rodriguez-Prieto, N. Pascual, M. Garcia. Heterogeneous tree structure classification to label Java programmers according to their expertise level. Future Generation Computer Systems (105), pp. 380-394, 2020. https://doi.org/10.1016/j.future.2019.12.016" Methodology for data collection: Detailed in "F. Ortin, O. Rodriguez-Prieto, N. Pascual, M. Garcia. Heterogeneous tree structure classification to label Java programmers according to their expertise level. Future Generation Computer Systems (105), pp. 380-394, 2020. https://doi.org/10.1016/j.future.2019.12.016" Data collector(s): Francisco Ortin Soler, ortin@uniovi.es Date of data collection: 15 April 2019 Person to contact with questions: Francisco Ortin Soler, ortin@uniovi.es, https://reflection.uniovi.es Data entry: 15 January 2024 Software (including version #) used to prepare data set: Detailed in "F. Ortin, O. Rodriguez-Prieto, N. Pascual, M. Garcia. Heterogeneous tree structure classification to label Java programmers according to their expertise level. Future Generation Computer Systems (105), pp. 380-394, 2020. https://doi.org/10.1016/j.future.2019.12.016" Data processing that was performed: Detailed in "F. Ortin, O. Rodriguez-Prieto, N. Pascual, M. Garcia. Heterogeneous tree structure classification to label Java programmers according to their expertise level. Future Generation Computer Systems (105), pp. 380-394, 2020. https://doi.org/10.1016/j.future.2019.12.016" Variables: Detailed in "F. Ortin, O. Rodriguez-Prieto, N. Pascual, M. Garcia. Heterogeneous tree structure classification to label Java programmers according to their expertise level. Future Generation Computer Systems (105), pp. 380-394, 2020. https://doi.org/10.1016/j.future.2019.12.016" File Overview: patterns.zip: Syntax patterns found. datasets.bak: PostgreSQL datasets used to create the models. features.zip: Features selected for all the models. hyper-params.zip: Hyper-parameters selected to create the different models. output.zip: Output data generated running the experiments. source.zip: Source code of all the experiments and processes described in the article.