Hybrid cnn and dictionary based models for scene recognition and domain adaptation

Page 1

Hybrid CNN and Dictionary Dictionary-Based Based Models for Scene Recognition and Domain Adaptation

Abstract: Convolutional neural network (CNN) has achieved the state-of-the-art state performance in many different visual tasks. Learned from a large-scale large training data set, CNN features are much more discriminative and accurate than the handcrafted features. Moreover, CNN features aare re also transferable among different domains. On the other hand, traditional dictionary dictionary-based based features (such as BoW and spatial pyramid matching) contain much more local discriminative and structural information, which is implicitly embedded in the images. To further improve the performance, in this paper, we propose to combine CNN with dictionary-based based models for scene recognition and visual domain adaptation (DA). Specifically, based on the well well-tuned tuned CNN models (e.g., AlexNet and VGG Net), two dictionary-based based representations are further constructed, namely, mid-level mid local representation (MLR) and convolutional Fisher vector (CFV) representation. In MLR, an efficient two--stage stage clustering method, i.e., weighted spatial and feature space spectral clusterin clusteringg on the parts of a single image followed by clustering all representative parts of all images, is used to generate a classclass mixture or a class-specific specific part dictionary. After that, the part dictionary is used to operate with the multiscale image inputs for generating mid-level level representation. In CFV, a multiscale and scale scale-proportional proportional Gaussian mixture model training strategy is utilized to generate Fisher vectors based on the last convolutional layer


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.