Coz Velasco, Juan José del Luaces Rodríguez, Óscar
Fecha de publicación :
Multilabel classi cation is a task commonly required in many elds nowadays. It is an extension of conventional classi cation in which each instance may be associated with more than one label. Some examples of applications where multilabel classi cation is employed are media contents, functional genomics and directed marketing. There are di erent kinds of methods for multilabel classi cation tasks. Some of them, transform the problem while others extend speci c learning algorithms in order to handle multilabel data. A particular subset of the former are decomposition methods which split multilabel classi cation tasks into simplier ones. This project is focused on these methods. Speci cally, a study of Binary Relevance (BR) method in relation with other decomposition methods is done. BR is a very simple and common approach that learns a binary classi er for each one of the labels of the original problem. It presents some advantages, like its linear complexity with the number of labels, but it has the disadvantage that it does not consider dependence among labels. Nevertheless, as it is shown in this work, the performance of this algorithm is not as bad as it could be thought when comparing it with others methods. Its performance is closely related to the evaluation metric and to the target loss function optimized by the base learner used. Additionaly, also an study of some others decompostion methods (CC, DBR, NS and STA) was done in order to determine if it is better to use actual labels or predictions in the training phase and if better performance is obtained employing only previous ones in a chain structure or all of them. The conclusion is that, in general, it is better to use actual labels but it depends again on the evaluation metric applied.