Learning to Classify Fine--Grained Grained Categories with Privileged Visual-Semantic Visual Misalignment
Abstract: Image categorisation is an active yet challenging research topic in computer vision, which is to classify the images according to their semantic content. Recently, fine-grained grained object categorisation has attracted wide attention and remains difficult due to feature inconsisten inconsistency cy caused by smaller inter-class inter and larger intra-class class variation as well as large varying poses. Most of the existing frameworks focused on exploiting a more discriminative imagery representation or developing a more robust classification framework to mit mitigate igate the suffering. The concern has recently been paid to discovering the dependency across finefine grained class labels based on Convolutional Neural Networks. Encouraged by the success of semantic label embedding to discover the fine fine-grained grained class labels’ correlation, this paper exploits the misalignment between visual feature space and semantic label embedding space and incorporates it as a privileged information into a cost-sensitive sensitive learning framework. Owing to capturing both the variation of imagery feature ture representation and also the label correlation in the semantic label embedding space, such a visual visual-semantic semantic misalignment can be employed to reflect the importance of instances, which is more informative that conventional cost-sensitivities. sensitivities. Experiment results demonstrate the effectiveness of the proposed framework on public fine fine-grained grained benchmarks with achieving superior performance to state state-of-the-arts.