This project developed two-stage generative adversarial networks to improve the generalizability of semantic dictionary through low-rank embedding for zero-shot learning.
Zero-shot learning for visual recognition, which approaches identifying unseen categories through a shared visual-semantic function learned on the seen categories and is expected to well adapt to unseen categories, has received considerable research attention recently; however, the semantic gap between discriminant visual features and their underlying semantics is still the biggest obstacle, because there is usually domain disparity across the seen and unseen classes. In addressing this issue, the current project formulated a novel framework to simultaneously seek a two-stage generative model and a semantic dictionary to connect visual features with their semantics under a low-rank embedding. The project’s first-stage generative model was able to augment more semantic features for the unseen classes, which were then used to generate more discriminant visual features in the second stage to expand the seen visual feature space; therefore, this project will be able to seek a better semantic dictionary to constitute the latent basis for the unseen classes, based on the augmented semantic and visual data. Finally, this approach could capture a variety of visual characteristics from seen classes that are “ready-to-use” for new classes. Extensive experiments on four zero-shot benchmarks demonstrate that the proposed algorithm outperforms the state-of-the-art zero-shot algorithms. (publisher abstract modified)