More about HKUST
Multilabel Classification with Label Structures
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "Multilabel Classification with Label Structures" By Miss Wei BI Abstract Many real-world applications involve multilabel classification, in which multiple labels can be associated with each sample. In many multilabel applications, structures exist among labels. A popular structure on labels is the label hierarchy, which can be achieved with the help of domain experts, or be automatically created from the data using procedures such as hierarchical clustering or Bayesian network structure learning. This label hierarchy may then be arranged as a tree, as in text categorization, or more generally, in a directed acyclic graph (DAG), as in the Gene Ontology orts typically ignore such label structure or can only exploit the dependencies in a label tree. Instead of a label hierarchy, some implicit structures may exist between labels. For instance, some labels have strong correlations between each other. Examples can be found in text categorization that an article on "sports" may also be labeled "entertainment"; and in image classification that an image annotated with "jungle" may also be tagged with "bushes". Besides the presence of label correlations, we may not have access to all the true labels of each training sample in such applications,. For example, many image annotation tasks use crowdsourcing platforms to collect labels. For each image, the workers may only provide a small, incomplete set of answers to the queried labels. Existing algorithms are often incapable of handling both label correlations and missing labels. In this thesis, we introduce various methods that exploit the label structure for multilabel classification. We first explore the use of a label hierarchy. Specifically, we proposed three works erent aspects of the problem. In the first work, we propose novel multilabel algorithms for the mandatory leaf node prediction problem, in which the prediction paths of a given test example are required to end at leaf nodes of the label hierarchy. This problem setting is particularly useful when the leaf nodes have much stronger semantic meaning than the internal nodes. In the second work, we discuss proper loss functions for multilabel problem when label hierarchies exist, and derive their corresponding Bayes-optimal classifiers. Thirdly, we present a probabilistic framework by incorporating hierarchical label constraints via posterior regularization such that the hierarchical constraints hold in expectation for the output labels during training. For the second kind of label structure, we consider that certain correlations exist between labels. We propose a probabilistic model that can simultaneously capture label correlations and handle missing labels. Date: Friday, 26 June 2015 Time: 10:00am - 12:00noon Venue: Room 2132C Lift 10 Chairman: Prof. Jianzhen Yu (CHEM) Committee Members: Prof. James Kwok (Supervisor) Prof. Brian Mak Prof. Qiang Yang Prof. Shaojie Shen (ECE) Prof. Dacheng Tao (Univ. of Tech., Australia)