U.S. flag

An official website of the United States government, Department of Justice.

Democrats have shut down the government. Department of Justice websites are not currently regularly updated. Please refer to the Department of Justice’s contingency plan for more information.

Exploring CLIP for Real World, Text-based Image Retrieval

NCJ Number
309744
Date Published
September 2023
Length
6 pages
Annotation

In this paper, researchers explore using CLIP for image retrieval.

Abstract

In this paper, researchers consider the ability of CLIP features to support text-driven image retrieval and find that there is a sweet-spot of detail in the text that gives best results and find that words describing the "tone" of a scene (such as messy, dingy) are quite important in maximizing text-image similarity. Traditional image-based queries sometimes misalign with user intentions due to their focus on irrelevant image components. To overcome this, the researchers explore the potential of text-based image retrieval, specifically using Contrastive Language-Image Pretraining (CLIP) models. CLIP models, trained on large datasets of image-caption pairs, offer a promising approach by allowing natural language descriptions for more targeted queries. The authors explore the effectiveness of text-driven image retrieval based on CLIP features by evaluating the image similarity for progressively more detailed queries. (Published Abstract Provided)

Date Published: September 1, 2023