Unsupervised Content Mining in CBIR: Harnessing Latent Diffusion for Complex Text-Based Query Interpretation
Abstract
:1. Introduction
- Problem formulation: We proposed the problem of CBIR, with complex text-based queries as input to the retrieval system.
- Complex queries: Illustrated some of the complex queries and their multi-dimensional nature that can be leveraged to design a retrieval system.
- Diffusion models for query interpretation: Explained the diffusion models, focusing on the latent diffusion models and their effectiveness in interpreting complex text queries.
- Retrieval model: Proposed the custom triplet network model used to retrieve the relevant images from the database.
- Experimental results: Presented the experimental results, illustrating the effectiveness of the model, and also discussed how we can further enhance the proposed methodology for future works.
2. Related Work
2.1. Content-Based Image Retrieval
2.2. Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR)
2.3. Diffusion Models
3. Methodology
3.1. Problem Formulation
- Let denote the set of all complex queries, where each represents a textual description encompassing abstract concepts, detailed descriptions, emotions, and contextual information.
- Let represent the set of database images. Here denotes jth image. This image database is a repository of images from which relevant images are retrieved.
- The objective is to design a function , where is the set of images retrieved for a given query . This function is referred to as the retrieval function and the purpose is to learn and map each of the query to a subset of images in D that are relevant as per the query intent.
3.2. Complex Queries
- A high-resolution photograph of a Siberian Husky in a snowy landscape showcasing its thick fur and striking blue eyes.
- An artistic rendering of a Corgi dressed as a medieval knight in a whimsical, storybook illustration style.
- A detailed, close-up portrait of a German Shepherd with a focused expression in a police K-9 vest, set against an urban backdrop.
- A vintage black and white photograph of a group of Beagles participating in a traditional fox hunt captures the movement and excitement.
- A digital painting of a fantasy scene featuring a mythical dog breed with wings and glowing eyes, set in an enchanted forest at twilight.
- A hyperrealistic oil painting of a Labrador Retriever lying on a sunny beach, showing fine details of its wet fur and sand.
- An abstract, cubist interpretation of a Poodle, focusing on geometric shapes and bold colors, reminiscent of Picasso’s style.
- A watercolor scene of a Dachshund in a cozy, home setting, curled up by a fireplace, with soft lighting and warm tones.
- A dynamic, action shot of a Border Collie herding sheep, capturing the motion and energy in a rural, pastoral setting.
3.3. Diffusion Models
3.4. Latent Diffusion Models
3.4.1. Autoencoding Phase
3.4.2. Diffusion Process in Latent Space
3.4.3. Generating New Samples
3.5. Transforming Complex Queries into Images Using LDMs
- Autoencoder(VAE): The VAE architecture [47] consists of two main components: an encoder and a decoder. The encoder compresses the image into a condensed latent space representation, which is then fed into the U-Net model. On the other hand, the decoder works to expand this latent representation back into its original image form. In the training phase of latent diffusion, the encoder extracts latent representations (or latents) from images to initiate the forward diffusion process, progressively adding noise with each step. Conversely, during inference, the cleaned latents produced by the reverse diffusion are reconstructed into images by the VAE decoder. It is important to note that the VAE decoder is the only component required during the inference process.
- U-Net: The U-Net architecture [48] is structured into two sections: an encoder and a decoder, both of which utilize ResNet blocks. The encoder’s function is to downscale an image into a reduced resolution format, while the decoder reverses this process, aiming to restore the image to its original, higher resolution form, ideally with reduced noise. Specifically, the output of the U-Net is designed to forecast the noise residual, facilitating the generation of a refined, denoised image representation. Shortcut connections are integrated to ensure critical information is not lost during the downsampling process, linking the downsampling ResNets in the encoder directly to the upsampling ResNets in the decoder. Moreover, the latent diffusion version of the U-Net incorporates the ability to tailor its output based on text embeddings, achieved through the incorporation of cross-attention layers. These layers are strategically placed within both the encoder and decoder segments, typically amidst the ResNet blocks.
- Text-encoder: The text encoder is responsible for converting the input prompt into an embedding space that can be comprehended by the U-Net. Typically, a basic transformer-based encoder is used to convert a series of input tokens into a series of latent text embeddings. A pre-trained CLIP model [49], known as CLIPTextModel, is used as a text encoder.
3.6. Domain-Gap Problem
3.7. Image Retrieval Using Triplet Networks
- Anchor: An anchor is a reference sample that will be used to compare and reference other data points in the database.
- Positive: A positive is a data sample that is similar to the anchor.
- Negative: A negative is a data point or image dissimilar to the anchor.
3.7.1. Triplet Loss Function
- denotes the loss associated with the triplet.
- denotes the Euclidean distance between the anchor (a) and the positive (p) in the feature space.
- is the distance of the negative (n) from the anchor (a).
- margin delineates the minimum separation between the positive and negative pairs, adding to the robustness of the embedding. This is a hyperparameter.
3.7.2. Training of Triplet Network
- Begin with the random selection of two distinct classes from the total available classes. Assign one as the positive class and the other as the negative class.
- From the collection of images, randomly select two of them from the positive class. These are designated as the anchor and positive instances for the triplet.
- Subsequently, choose a single image from the negative class at random. This serves as the negative instance, completing the triplet for the training dataset.
- Optimizer: Adam optimizer
- Learning rate:
- Epsilon:
- Number of epochs: 30
- Steps in epoch: 1000
- Validation steps: 200
- Batch size: 64
- Regularization: Drop out
- Dropout rate:
3.7.3. Prediction and Image Retrieval
3.7.4. Steps in Image Retrieval
- Pass the complex text query to the diffusion model to create an equivalent image representation.
- Resize the image to match the dimensions of the database images.
- Pass the database images through the trained triplet network model to create the corresponding feature representations.
- Pass the query image representation through the triplet network model to create its feature representation.
- Evaluate the cosine similarity between the query feature representation and those of the database.
- Rank the images based on the similarity to publish the retrieved top relevant images.
3.8. Network Architecture of Triplet Network
4. Experimental Results
4.1. Datasets
4.2. Evaluation Metric
4.3. Performance Comparison
4.4. Ablation Studies
4.4.1. Retrieval Method Selection
4.4.2. Loss Function Selection
Limitations and Areas for Improvement
- The current implementation utilizes a pre-trained model to generate images from complex text queries without the need for retraining or adapting it to the database’s domain. Although this method is feasible, its effectiveness may be enhanced because we sometimes have knowledge about the database’s domain. Hence, a possible future course of action to enhance the established system would be to optimize it for the particular database it is utilized with.
- Another important constraint of our method pertains to ethical problems linked to the utilization of Generative AI models. Although these algorithms can generate highly realistic images based on text descriptions, they can unintentionally continue or magnify biases in the data used for training. This can result in the creation of stereotyped or culturally inappropriate images, distorting the intended message and perpetuating damaging stereotypes. Moreover, the capacity of these models to produce authentic images gives rise to apprehensions over the possibility of their misuse, such as the creation of misleading or deceptive visuals.
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Müller, H.; Müller, W.; Squire, D.M.; Marchand-Maillet, S.; Pun, T. Performance Evaluation in Content-Based Image Retrieval: Overview and Proposals. Pattern Recognit. Lett. 2001, 22, 593–601. [Google Scholar] [CrossRef]
- Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 10684–10695. [Google Scholar]
- Fjeld, J.; Frank, M.R. Art and the Science of Generative AI. Science 2023, 380, 1110–1111. [Google Scholar]
- Hoffer, E.; Ailon, N. Deep Metric Learning Using Triplet Network. In Proceedings of the Third International Workshop on Similarity-Based Pattern Recognition, SIMBAD 2015, Copenhagen, Denmark, 12–14 October 2015; Springer International Publishing: Cham, Switzerland, 2015; Volume 3, pp. 84–92. [Google Scholar]
- Hu, R.; Barnard, M.; Collomosse, J. Gradient Field Descriptor for Sketch Based Retrieval and Localization. In Proceedings of the 2010 IEEE International Conference on Image Processing, Hong Kong, China, 26–29 September 2010; pp. 1025–1028. [Google Scholar]
- Cao, Y.; Wang, C.; Zhang, L.; Zhang, L. Edgel Index for Large-Scale Sketch-Based Image Search. In Proceedings of the CVPR 2011, Colorado Springs, CO, USA, 20–25 June 2011; pp. 761–768. [Google Scholar] [CrossRef]
- Kobayashi, K.; Gu, L.; Hataya, R.; Mizuno, T.; Miyake, M.; Watanabe, H.; Takahashi, M.; Takamizawa, Y.; Yoshida, Y.; Nakamura, S.; et al. Sketch-Based Semantic Retrieval of Medical Images. Med. Image Anal. 2024, 92, 103060. [Google Scholar] [CrossRef] [PubMed]
- Jain, A.K.; Klare, B.; Park, U. Face Matching and Retrieval in Forensics Applications. IEEE Multimed. 2012, 19, 20. [Google Scholar] [CrossRef]
- Bagwari, A.; Sinha, A.; Singh, N.K.; Garg, N.; Kanti, J. CBIR-DSS: Business Decision Oriented Content-Based Recommendation Model for E-commerce. Information 2022, 13, 479. [Google Scholar] [CrossRef]
- Lim, J.-H.; Kim, S. A Study on Markerless AR-Based Infant Education System Using CBIR. In Proceedings of the International Conference on Security-Enriched Urban Computing and Smart Grid, Daejeon, Republic of Korea, 15–17 September 2010; Springer: Berlin/Heidelberg, Germany, 2010; pp. 52–58. [Google Scholar]
- Zou, Y.L.; Li, C.; Boukhers, Z.; Shirahama, K.; Jiang, T.; Grzegorzek, M. Environmental microbiological content-based image retrieval system using internal structure histogram. In Proceedings of the 9th International Conference on Computer Recognition Systems CORES 2015, Wroclaw, Poland, 25–27 May 2015; Springer International Publishing: Cham, Switzerland, 2016; pp. 543–552. [Google Scholar]
- Muneesawang, P.; Guan, L. Automatic Machine Interactions for Content-Based Image Retrieval Using a Self-Organizing Tree Map Architecture. IEEE Trans. Neural Netw. 2002, 13, 821–834. [Google Scholar] [CrossRef] [PubMed]
- Deselaers, T.; Keysers, D.; Ney, H. Features for Image Retrieval: An Experimental Comparison. Inf. Retr. 2008, 11, 77–107. [Google Scholar] [CrossRef]
- Kunal; Singh, B.; Kaur, E.K.; Choudhary, C. A Machine Learning Model for Content-Based Image Retrieval. In Proceedings of the 2023 2nd International Conference for Innovation in Technology (INOCON), Bangalore, India, 3–5 March 2023; pp. 1–6. [Google Scholar] [CrossRef]
- Dubey, S.R.; Singh, S.K.; Singh, R.K. Rotation and Illumination Invariant Interleaved Intensity Order-Based Local Descriptor. IEEE Trans. Image Process. 2014, 23, 5323–5333. [Google Scholar] [CrossRef] [PubMed]
- Madhavi, D.; Mohammed, K.M.C.; Jyothi, N.; Patnaik, M.R. A Hybrid Content-Based Image Retrieval System Using Log-Gabor Filter Banks. Int. J. Electr. Comput. Eng. (IJECE) 2019, 9, 237–244. [Google Scholar] [CrossRef]
- Madhavi, D.; Patnaik, M.R. Genetic Algorithm-Based Optimized Gabor Filters for Content-Based Image Retrieval. In Intelligent Communication, Control and Devices: Proceedings of ICICCD 2017; Springer: Singapore, 2018; pp. 157–164. [Google Scholar]
- Madhavi, D.; Patnaik, M.R. Image Retrieval Based on Tuned Color Gabor Filter Using Genetic Algorithm. Int. J. Appl. Eng. Res. 2017, 12, 5031–5039. [Google Scholar]
- Yuan, Z.; Zhang, W.; Fu, K.; Li, X.; Deng, C.; Wang, H.; Sun, X. Exploring a fine-grained multiscale method for cross-modal remote sensing image retrieval. arXiv 2022, arXiv:2204.09868. [Google Scholar] [CrossRef]
- Yuan, Z.; Zhang, W.; Tian, C.; Mao, Y.; Zhou, R.; Wang, H.; Fu, K.; Sun, X. MCRN: A Multi-source Cross-modal Retrieval Network for remote sensing. Int. J. Appl. Earth Obs. Geoinf. 2022, 115, 103071. [Google Scholar] [CrossRef]
- Abdel-Nabi, H.; Al-Naymat, G.; Awajan, A. Content-Based Image Retrieval Approach Using Deep Learning. In Proceedings of the 2019 2nd International Conference on New Trends in Computing Sciences (ICTCS), Amman, Jordan, 9–11 October 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–8. [Google Scholar]
- Camlica, Z.; Tizhoosh, H.R.; Khalvati, F. Autoencoding the Retrieval Relevance of Medical Images. In Proceedings of the 2015 International Conference on Image Processing Theory, Tools and Applications (IPTA), Orleans, France, 10–13 November 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 550–555. [Google Scholar]
- Shakarami, A.; Tarrah, H. An Efficient Image Descriptor for Image Classification and CBIR. Optik 2020, 214, 164833. [Google Scholar] [CrossRef] [PubMed]
- Kumar, G.V.R.M.; Madhavi, D. Stacked Siamese Neural Network (SSiNN) on Neural Codes for Content-based Image Retrieval. IEEE Access 2023, 11, 77452–77463. [Google Scholar] [CrossRef]
- Yuan, X.; Liu, Q.; Long, J.; Hu, L.; Wang, Y. Deep Image Similarity Measurement Based on the Improved Triplet Network with Spatial Pyramid Pooling. Information 2019, 10, 129. [Google Scholar] [CrossRef]
- Cai, Y.; Li, Y.; Qiu, C.; Ma, J.; Gao, X. Medical Image Retrieval Based on Convolutional Neural Network and Supervised Hashing. IEEE Access 2019, 7, 51877–51885. [Google Scholar] [CrossRef]
- Öztürk, Ş. Stacked Auto-Encoder Based Tagging with Deep Features for Content-Based Medical Image Retrieval. Expert Syst. Appl. 2020, 161, 113693. [Google Scholar] [CrossRef]
- Gupta, S.; Chaudhuri, U.; Banerjee, B.; Kumar, S. Zero-Shot Sketch Based Image Retrieval Using Graph Transformer. In Proceedings of the 26th International Conference on Pattern Recognition (ICPR), Montreal, QC, Canada, 21–25 August 2022; pp. 1685–1691. [Google Scholar]
- Ren, H.; Zheng, Z.; Wu, Y.; Lu, H.; Yang, Y.; Shan, Y.; Yeung, S.-K. ACNet: Approaching-and-Centralizing Network for Zero-Shot Sketch-Based Image Retrieval. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 5022–5035. [Google Scholar] [CrossRef]
- Gopu, V.R.; Muni Kumar, M.; Dunna, M. Zero-Shot Sketch-Based Image Retrieval Using StyleGen and Stacked Siamese Neural Networks. J. Imaging 2024, 10, 79. [Google Scholar] [CrossRef] [PubMed]
- Sohl-Dickstein, J.; Weiss, E.; Maheswaranathan, N.; Ganguli, S. Deep Unsupervised Learning Using Nonequilibrium Thermodynamics. In Proceedings of the International Conference on Machine Learning, PMLR, Lille, France, 6–11 July 2015; pp. 2256–2265. [Google Scholar]
- Ho, J.; Jain, A.; Abbeel, P. Denoising Diffusion Probabilistic Models. In Proceedings of the NeurIPS, Virtual, 6–12 December 2020; Volume 33, pp. 6840–6851. [Google Scholar]
- Song, Y.; Ermon, S. Generative Modeling by Estimating Gradients of the Data Distribution. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar]
- Dhariwal, P.; Nichol, A. Diffusion Models Beat GANs on Image Synthesis. Adv. Neural Inf. Process. Syst. 2021, 34, 8780–8794. [Google Scholar]
- Nichol, A.Q.; Dhariwal, P. Improved Denoising Diffusion Probabilistic Models. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; pp. 8162–8171. [Google Scholar]
- Rombach, R.; Blattmann, A.; Ommer, B. Text-Guided Synthesis of Artistic Images with Retrieval-Augmented Diffusion Models. arXiv 2022, arXiv:2207.13038. [Google Scholar]
- Batzolis, G.; Stanczuk, J.; Schönlieb, C.-B.; Etmann, C. Conditional Image Generation with Score-Based Diffusion Models. arXiv 2021, arXiv:2111.13606. [Google Scholar]
- Daniels, M.; Maunu, T.; Hand, P. Score-Based Generative Neural Networks for Large-Scale Optimal Transport. Adv. Neural Inf. Process. Syst. 2021, 34, 12955–12965. [Google Scholar]
- Chung, H.; Sim, B.; Ye, J.C. Come-Closer-Diffuse-Faster: Accelerating Conditional Diffusion Models for Inverse Problems Through Stochastic Contraction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 12413–12422. [Google Scholar]
- Kawar, B.; Elad, M.; Ermon, S.; Song, J. Denoising Diffusion Restoration Models. Adv. Neural Inf. Process. Syst. 2022, 35, 23593–23606. [Google Scholar]
- Esser, P.; Rombach, R.; Blattmann, A.; Ommer, B. ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis. Adv. Neural Inf. Process. Syst. 2021, 34, 3518–3532. [Google Scholar]
- Meng, C.; He, Y.; Song, Y.; Song, J.; Wu, J.; Zhu, J.-Y.; Ermon, S. SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations. arXiv 2021, arXiv:2108.01073. [Google Scholar]
- Dankar, F.K.; Ibrahim, M. Fake it till you make it: Guidelines for effective synthetic data generation. Appl. Sci. 2021, 11, 2158. [Google Scholar] [CrossRef]
- Yuan, Z.; Hao, C.; Zhou, R.; Chen, J.; Yu, M.; Zhang, W.; Wang, H.; Sun, X. Efficient and controllable remote sensing fake sample generation based on diffusion model. IEEE Trans. Geosci. Remote Sens. 2023, 61. [Google Scholar] [CrossRef]
- Andriyanov, N.A.; Vasiliev, K.K.; Dementiev, V.E.; Belyanchikov, A.V. Restoration of Spatially Inhomogeneous Images Based on a Doubly Stochastic Model. Optoelectron. Instrum. Data Process. 2022, 58, 465–471. [Google Scholar] [CrossRef]
- Krasheninnikov, V.; Malenova, O.; Subbotin, A. The Identification of Doubly Stochastic Circular Image Model. Procedia Comput. Sci. 2020, 176, 1839–1847. [Google Scholar] [CrossRef]
- Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015, Proceedings of the 18th International Conference, Munich, Germany, 5–9 October 2015, Proceedings, Part III; Springer International Publishing: Cham, Switzerland, 2015; Volume 18, pp. 234–241. [Google Scholar]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning Transferable Visual Models from Natural Language Supervision. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; pp. 8748–8763. [Google Scholar]
- Schuhmann, C.; Beaumont, R.; Vencu, R.; Gordon, C.; Wightman, R.; Cherti, M.; Coombes, T.; Katta, A.; Mullis, C.; Wortsman, M.; et al. Laion-5B: An Open Large-Scale Dataset for Training Next Generation Image-Text Models. Adv. Neural Inf. Process. Syst. 2022, 35, 25278–25294. [Google Scholar]
- Jing, F.; Li, M.; Zhang, H.-J.; Zhang, B. A Unified Framework for Image Retrieval Using Keyword and Visual Features. IEEE Trans. Image Process. 2005, 14, 979–989. [Google Scholar] [CrossRef]
- Dash, A.; Gamboa, J.C.B.; Ahmed, S.; Liwicki, M.; Afzal, M.Z. TAC-GAN: Text Conditioned Auxiliary Classifier Generative Adversarial Network. arXiv 2017, arXiv:1703.06412. [Google Scholar]
- Kumar, P.M.A.; Rao, T.S.M.; Raj, L.A.; Pugazhendi, E. An Efficient Text-Based Image Retrieval Using Natural Language Processing (NLP) Techniques. In Intelligent System Design: Proceedings of Intelligent System Design: INDIA 2019; Springer: Singapore, 2021; pp. 505–519. [Google Scholar]
Layer | Configuration | Output Shape | Parameters |
---|---|---|---|
Convolution 2D | Filters: 32 Kernel = padding = same | 896 | |
Activation | ReLU | 0 | |
Convolution 2D | Filters: 32 Kernel = | 9248 | |
Activation | ReLU | 0 | |
Max Pooling 2D | pool_size: | 0 | |
Dropout | 0.25 | 0 | |
Convolution 2D | Filters: 64 Kernel = padding = same | 18,496 | |
Activation | ReLU | 0 | |
Convolution 2D | Filters: 64 Kernel = padding = same | 36,928 | |
Activation | ReLU | 0 | |
Max Pooling 2D | pool_size: | 0 | |
Dropout | 0.25 | 0 | |
Flatten | 2304 | 0 | |
Dense | 512 | 1,180,160 | |
Activation | ReLU | 512 | 0 |
Dense | 32 | 16,416 |
Layer | Output Shape | Parameters |
---|---|---|
Input Layer 1 | 0 | |
Input Layer 2 | 0 | |
Input Layer 3 | 0 | |
Sequential | 32 | 1,262,144 |
Vectors (Concatenate) | 96 | 0 |
Class | Precision | Recall | F1-Score | Support |
---|---|---|---|---|
Aeroplane | 0.56 | 0.58 | 0.57 | 1000 |
Automobile | 0.59 | 0.59 | 0.59 | 1000 |
Bird | 0.49 | 0.47 | 0.48 | 1000 |
Cat | 0.52 | 0.51 | 0.51 | 1000 |
Deer | 0.38 | 0.37 | 0.37 | 1000 |
dog | 0.46 | 0.47 | 0.46 | 1000 |
Frog | 0.49 | 0.48 | 0.48 | 1000 |
Horse | 0.6 | 0.58 | 0.59 | 1000 |
Ship | 0.59 | 0.66 | 0.62 | 1000 |
Truck | 0.57 | 0.61 | 0.59 | 1000 |
Approach | mAP@25 | Retrieval Time Complexity |
---|---|---|
Keyword-based retrieval | 5.7 | |
TAC-GAN | 19.6 | |
NLP model approach | 29.3 | |
Proposed methodology | 45.2 |
Approach | mAP@25 |
---|---|
Autoencoder | 22.4 |
Triplet Network | 45.2 |
Approach | mAP@25 |
---|---|
Contrastive Loss | 39.5 |
Triplet Loss | 45.2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gopu, V.R.M.K.; Dunna, M. Unsupervised Content Mining in CBIR: Harnessing Latent Diffusion for Complex Text-Based Query Interpretation. J. Imaging 2024, 10, 139. https://doi.org/10.3390/jimaging10060139
Gopu VRMK, Dunna M. Unsupervised Content Mining in CBIR: Harnessing Latent Diffusion for Complex Text-Based Query Interpretation. Journal of Imaging. 2024; 10(6):139. https://doi.org/10.3390/jimaging10060139
Chicago/Turabian StyleGopu, Venkata Rama Muni Kumar, and Madhavi Dunna. 2024. "Unsupervised Content Mining in CBIR: Harnessing Latent Diffusion for Complex Text-Based Query Interpretation" Journal of Imaging 10, no. 6: 139. https://doi.org/10.3390/jimaging10060139
APA StyleGopu, V. R. M. K., & Dunna, M. (2024). Unsupervised Content Mining in CBIR: Harnessing Latent Diffusion for Complex Text-Based Query Interpretation. Journal of Imaging, 10(6), 139. https://doi.org/10.3390/jimaging10060139