New AI Framework Cuts 'Hallucinations' in Image Search by 7.37%

New AI Framework Cuts 'Hallucinations' in Image Search by 7.37%

An open book with a red letter 'D' on it against a black background, showing visible text and intricate details of the letter.

New AI Framework Cuts 'Hallucinations' in Image Search by 7.37%

Researchers have developed a new framework to address a persistent issue in image search systems. The problem, known as 'hallucinations' in Diffusion-Interactive Text-to-Image Retrieval (DAI-TIR), often leads to inaccurate image search results. Their solution, called Diffusion-aware Multi-view Contrastive Learning (DMCL), aims to filter out misleading cues and improve performance across multiple benchmarks.

The team began by identifying how hallucinated cues degrade DAI-TIR performance. These false signals mislead image search systems, reducing accuracy in multi-round searches. To address this, they introduced DMCL, a method that optimizes both query intent and target image representations.

DMCL works by aligning textual and diffusion-based query views while suppressing spurious signals. It uses two key training objectives: Multi-View Query, Target Alignment and Text, Diffusion Consistency. These objectives help refine the system's ability to match queries with the correct images.

Testing showed significant improvements. DMCL achieved accuracy gains of up to 7.37% over existing methods across five benchmarks. The framework also employs attention visualisation and geometric embedding-space analyses to validate its filtering behaviour and strengthen cross-modal alignment.

The researchers also released a large-scale DAI-TIR training dataset to support further work in this field. While the current approach uses a simple additive fusion scheme for query integration, they plan to explore more advanced fusion techniques in future studies.

The DMCL framework directly targets hallucinations in DAI-TIR systems, offering measurable gains in image search accuracy. By optimizing query intent and image representations, it provides a clearer path for future improvements in interactive text-to-image search technology. The release of a dedicated training dataset further enables ongoing research in this area.

Neueste Nachrichten