Reference-Free Texture Image Retrieval Based on User-Adaptive Psychophysical Perception Modeling

Xu, Shaojun; Chen, Yulong; Zhang, Yichi; Zheng, Yao

doi:10.3390/electronics15030710

Open AccessArticle

Reference-Free Texture Image Retrieval Based on User-Adaptive Psychophysical Perception Modeling

College of Information Science and Technology, Shihezi University, Eighth Division of Xinjiang Production and Construction Corps, Shihezi 832061, China

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(3), 710; https://doi.org/10.3390/electronics15030710

Submission received: 7 January 2026 / Revised: 28 January 2026 / Accepted: 2 February 2026 / Published: 6 February 2026

(This article belongs to the Section Computer Science & Engineering)

Download

Browse Figures

Versions Notes

Abstract

Texture image retrieval based on subjective visual descriptions remains a significant challenge due to the “semantic gap”, where conventional Content-Based Image Retrieval (CBIR) methods rely on low-level features or reference images that often diverge from human perception. To bridge this gap, this paper proposes a reference-free, perception-driven retrieval framework that enables users to query textures directly via abstract perceptual attributes. First, we constructed a human-centric perceptual feature space through controlled psychophysical experiments, quantifying 12 explicit texture attributes (e.g., granularity, directionality) using a 9-point Likert scale. Second, addressing the variability in visual sensitivity across user demographics, we developed a user-adaptive mechanism incorporating dual perceptual libraries tailored for art-major and non-art-major groups. Retrieval is formulated as a perception-aligned similarity optimization problem within this normalized space. Experimental evaluations on the Describable Textures Dataset (DTD) demonstrate that our method achieves superior perceptual consistency compared to both handcrafted descriptors (GLCM, LBP, HOG) and deep learning baselines (VGG16, ResNet50). Notably, the framework attained high PAP@3 performance across both user groups, validating its effectiveness in decoding fuzzy human intent without the need for query images. This work provides a robust solution for semantic-based texture retrieval in human–computer interaction scenarios.

Keywords:

texture image retrieval; subjective perception matching; reference-free retrieval; psychophysical experiments; perceptual feature space; human-computer interaction

1. Introduction

With the rapid development of digital media and image acquisition technologies, content-based image retrieval (CBIR) has become an essential tool in various application domains [1,2,3]. A typical CBIR system consists of two core modules: feature extraction and similarity measurement [4]. The feature extraction module converts image content into discriminative feature vectors, while the similarity measurement module computes feature distances between a query image and images in a database to retrieve the most relevant results. However, features extracted by conventional CBIR methods are mainly low-level visual descriptors, and similarity measurements based on such features often deviate significantly from human subjective perception [5]. In texture-related scenarios, humans tend to describe and distinguish textures using perceptual attributes such as roughness, directionality, and contrast. Classic perceptual studies have shown that texture perception can be characterized along a set of perceptually meaningful dimensions [6]. Textures that are perceptually similar usually share consistent high-level perceptual characteristics, even when low-level image statistics differ. Therefore, modeling texture similarity directly from perceptual features derived from human visual experience is regarded as a promising direction for bridging the gap between computational retrieval results and subjective perception.

From the perspective of system design, a texture retrieval framework generally involves two key components: texture feature extraction and retrieval method formulation. Feature extraction aims to encode texture appearance into quantifiable representations that serve as the basis for retrieval accuracy, while retrieval methods determine how efficiently and effectively target textures are matched within a database. Early texture retrieval approaches relied primarily on traditional image analysis techniques and handcrafted feature descriptors [7]. With the advancement of deep learning, neural network–based methods have been increasingly applied to texture retrieval, enabling the extraction of high-order texture representations through deep architectures and improving retrieval robustness and accuracy [8]. Deep convolutional architectures have shown strong representation capacity across a range of image modeling tasks, which further motivates deep feature learning for texture-related analysis [9]. Nevertheless, due to the intrinsic limitations of conventional image-driven features in representing natural texture perception, researchers have gradually explored texture retrieval frameworks inspired by human visual perception [10]. By constructing perceptual texture spaces that better conform to human similarity judgments [11], these approaches aim to enhance perceptual consistency in retrieval results. For instance, psychophysics-based studies have analyzed perceptual similarity among natural textures and constructed texture similarity matrices to support perceptually aligned retrieval [12]. Despite these efforts, most existing texture retrieval frameworks remain fundamentally image-driven and assume the availability of reference images, which restricts their applicability in perception-driven scenarios where retrieval intent is expressed only through subjective descriptions.

From the viewpoint of feature representation, texture features can be broadly categorized into handcrafted features and deep learning–based features. Handcrafted features are designed according to prior knowledge of visual mechanisms and encode low-level texture characteristics into explicit descriptors. Such features are commonly divided into statistical and structural categories [13]. Statistical features, such as gray-level co-occurrence matrices (GLCM), model texture by capturing gray-level relationships and have been widely applied in medical image analysis, although they are sensitive to scale and orientation variations and may incur high computational costs in high-dimensional settings [14]. Local binary patterns (LBP) describe local intensity differences and are effective for capturing fine texture details but are sensitive to noise and lack global structural information [15]. Other handcrafted descriptors, including Histogram of Oriented Gradients (HOG) [16] and Gabor filters [17], characterize texture through gradient orientation distributions and frequency-domain analysis, respectively. More recently, advanced local pattern descriptors combined with dimensionality reduction techniques have been proposed for texture-based image retrieval. For example, a PCA-based advanced local octa-directional pattern descriptor was introduced to encode local intensity variations along multiple directions while reducing feature dimensionality [18], achieving competitive retrieval performance on large-scale image datasets. While handcrafted features offer clear physical interpretability and low computational complexity, they essentially capture pixel-level statistics and structures and cannot directly correspond to high-level perceptual attributes such as roughness or granularity. Beyond conventional pixel-level descriptors, alternative structural modeling approaches have also been explored for texture analysis. Graph-based representations, such as natural and horizontal visibility graphs, have been employed to capture texture structures using graph-theoretic features and conventional machine learning classifiers, demonstrating that non-CNN structural modeling can achieve competitive texture classification performance [19].

With the rapid development of deep learning, deep features have become the mainstream choice for complex texture analysis due to their hierarchical and data-driven representation learning capability. Recent studies have further explored deep learning–based image retrieval frameworks by integrating multiple feature modalities. For instance, a retrieval model for lunar complex craters combined deep representations extracted by transformer-based architectures with handcrafted texture and shape features, and optimized similarity learning through metric learning strategies, achieving improved retrieval accuracy under complex visual conditions [20]. Early convolutional neural network (CNN)–based approaches extracted texture features from intermediate or fully connected layers of pre-trained models such as VGG-16 and ResNet, enabling abstraction from low-level edges to high-level semantic patterns [21]. Specialized architectures, such as ArtFusionNet, further enhanced texture representation by incorporating dilated convolutions and multi-scale feature fusion, demonstrating strong performance in artistic texture recognition tasks [22]. More recently, Transformer-based models, including Vision Transformer (ViT) and Swin Transformer, have introduced self-attention mechanisms to overcome the local receptive field limitations of CNNs, effectively capturing global dependencies in non-periodic textures and improving discrimination in complex scenes [23]. Despite their strong performance, deep features generally suffer from limited interpretability, as their learned representations lack explicit correspondence with fine-grained human perceptual attributes [24]. Moreover, despite their effectiveness, these image-driven methods generally assume the availability of a reference image or embedding as the query, which limits their applicability in reference-free retrieval scenarios based on subjective perceptual descriptions.

In terms of retrieval strategy design, existing image retrieval methods can be broadly divided into distance-based and classification-based paradigms. Distance-based retrieval methods formulate retrieval as a similarity measurement problem between feature vectors and rank results according to predefined distance metrics. Commonly used measures include Euclidean distance [25], which is intuitive and suitable for low- to medium-dimensional continuous features; Manhattan distance [26], which offers lower computational complexity for high-dimensional data; and cosine similarity [27], which evaluates directional consistency and is effective for sparse representations. However, the efficiency of distance-based retrieval is highly dependent on feature dimensionality and distance computation cost. Classification-based retrieval methods, in contrast, map images into predefined categories using trained classifiers and perform retrieval within localized category subsets. Traditional classifiers such as support vector machines (SVM) [28] and random forests [29] rely on handcrafted features and incur relatively low training costs, but their performance is constrained by feature representation capability. Deep learning–based classification approaches significantly improve classification accuracy by leveraging large-scale annotated datasets and can reduce retrieval search space in large databases [30]. Nevertheless, classification-based paradigms depend heavily on predefined category systems and annotated data, making them unsuitable for scenarios in which retrieval intent is expressed through fuzzy or subjective perceptual descriptions.

Therefore, when users are unable to provide reference images and retrieval intent is primarily driven by subjective perceptual attributes, conventional distance-based and classification-based retrieval paradigms exhibit inherent limitations. This observation motivates the development of reference-free texture retrieval frameworks that operate directly within a perceptual feature space derived from human visual perception, enabling retrieval results that are more consistent with human subjective judgments.

The main contributions of this work are summarized as follows:

A human-centered perceptual feature space for texture representation is constructed through systematic psychophysical experiments, where texture attributes are explicitly defined and quantified based on human subjective evaluations, enabling perceptually meaningful texture modeling.
A perception-aligned, reference-free texture image retrieval framework is proposed, which formulates texture retrieval directly in the constructed perceptual space and allows users to retrieve textures using subjective perceptual descriptions rather than reference images.
A user-adaptive retrieval mechanism is introduced by constructing dual perceptual feature libraries for different user groups, accounting for perceptual sensitivity differences between art-major and non-art-major observers and improving the robustness and interpretability of retrieval results.

2. Materials and Methods

This section describes the engineering pipeline of the proposed perception-aligned texture retrieval framework, whose objective is to construct a computable perceptual representation for reference-free retrieval based on subjective descriptions. Controlled psychophysical experiments are employed as an engineering tool to collect and aggregate human ratings of predefined perceptual attributes, forming a low-dimensional perceptual feature space for texture representation, rather than as an independent study of visual perception mechanisms. In this work, the term “user-adaptive” refers to the selection of a corresponding perceptual feature library according to the user group, rather than an online learning or iterative optimization process. All perceptual feature construction and model training are performed offline, and retrieval is conducted on fixed representations without parameter updating. The overall framework of the proposed perception-aligned texture retrieval system is illustrated in Figure 1. The framework consists of two main stages: perceptual feature construction and perception-aligned retrieval. Texture images are first evaluated through psychophysical experiments to obtain aggregated perceptual feature vectors, and separate perceptual feature libraries are constructed for art-major and non-art-major observer groups. Given a user-defined perceptual description as the query, retrieval is performed directly in the perceptual feature space by measuring perceptual distances, and the Top-K most perceptually similar textures are returned.

2.1. Dataset and Texture Selection

The experiments in this study are conducted using texture images selected from the Describable Textures Dataset (DTD), which contains a diverse collection of real-world texture images covering a wide range of material categories and visual appearances. A total of 300 texture images were selected from the DTD for perceptual evaluation, covering multiple material categories. The selection aimed to ensure perceptual diversity across attributes such as granularity, regularity, contrast, and directionality, rather than category balance. To ensure perceptual diversity and experimental reliability, texture samples are selected to represent variations in structural regularity, granularity, contrast, and directionality. All selected texture images are resized to a unified resolution and converted to grayscale when necessary to reduce potential color-related interference in perceptual evaluation. This design choice was made to isolate texture-related perceptual attributes and avoid confounding effects introduced by color variation, rather than to model color perception. Representative examples of the preprocessed grayscale texture images used for perceptual evaluation are shown in Figure 2.

2.2. Perceptual Evaluation Protocol

To construct perceptual feature representations for engineering-oriented texture retrieval, a controlled perceptual evaluation protocol was designed to collect structured subjective ratings under standardized conditions. The protocol emphasizes measurement consistency and reproducibility, ensuring that perceptual ratings can be reliably aggregated and utilized for subsequent computational modeling, rather than aiming to investigate psychological mechanisms or cognitive processes of human vision. Specifically, the experimental design focuses on minimizing external visual interference and inter-subject ambiguity through unified display settings, predefined perceptual attributes, and consistent rating criteria. By treating perceptual evaluation as a data acquisition procedure for feature construction, the protocol provides a stable perceptual basis for engineering-oriented texture retrieval. Figure 3 illustrates the overall procedure of the psychophysical experiment designed for perceptual attribute evaluation. Texture images are presented to observers under controlled laboratory conditions, where participants from art-major and non-art-major groups perform attribute-wise evaluation using a predefined set of perceptual attributes. The collected subjective ratings are aggregated across observers to form perceptual feature representations, which serve as the basis for constructing the perceptual feature library.

Texture samples used in the experiment are selected to cover a wide range of visual appearances, including variations in granularity, regularity, contrast, and directional patterns. Prior to the formal evaluation, all texture images are preprocessed to ensure consistency in resolution and display conditions. Images are presented on a calibrated display under controlled ambient lighting to minimize external visual interference.

2.2.1. Participants

A total of 62 participants were recruited for the perceptual evaluation. All participants were university students aged between 18 and 26 years, including 29 males and 33 females, forming a controlled observer group under standardized laboratory conditions. Their majors encompassed fields such as Computer Science, Law, and Arts. Participants are recruited and divided into two groups according to their educational background: an art-major group and a non-art-major group. This grouping strategy is motivated by previous findings that visual training and artistic experience may influence sensitivity to texture-related perceptual attributes. All participants re-port normal or corrected-to-normal vision and have no known visual or neurological disorders.

Before the experiment, participants receive standardized instructions explaining the evaluation task, perceptual attributes, and rating criteria. A brief familiarization session is conducted to ensure that participants understand the semantic meaning of each perceptual attribute and the usage of the rating scale, thereby reducing inter-subject ambiguity.

2.2.2. Texture Attributes and Rating Scale

Based on a review of relevant literature and expert consultation, a set of perceptual attributes is defined to characterize texture appearance using commonly adopted descriptors in texture perception studies. These attributes are selected to reflect both local and global perceptual properties of textures, including but not limited to contrast, repetitiveness, granularity, and feature density. Each attribute is treated as an independent perceptual dimension to facilitate explicit perceptual representation.

To facilitate a consistent understanding of the perceptual attributes and their semantic ranges, representative visual examples are provided for each attribute. Figure 4 illustrates the perceptual definitions of four representative attributes by showing texture samples with gradually increasing perceptual strength from low to high.

Together with the numerical scale definitions summarized in Table 1, these visual examples were presented to participants during the instruction phase to reduce semantic ambiguity and improve inter-observer consistency in perceptual evaluation. By combining textual definitions with representative visual references, participants were provided with a concrete and unified understanding of each perceptual attribute prior to the formal evaluation task.

Texture perception is quantified using a 9-point Likert scale, where higher values indicate a stronger presence of the corresponding perceptual attribute. The use of a discrete ordinal scale allows participants to express fine-grained perceptual differences while maintaining evaluation consistency across samples and reducing subjective rating instability. In addition, the bounded numerical range facilitates reliable aggregation of ratings across observers and supports subsequent normalization and distance-based similarity computation in the perceptual feature space. The complete list of perceptual attributes and the perceptual meanings of the scale endpoints are summarized in Table 1, which provides a structured, standardized, and reproducible definition of the 12 perceptual dimensions employed in the experiment.

To ensure a shared understanding of the perceptual attributes and rating criteria across observers with different perceptual backgrounds, detailed attribute definitions and representative visual examples were provided during the instruction phase prior to the formal experiment. This preparation step is intended not only to reduce semantic ambiguity, but also to establish consistent internal reference standards for perceptual judgment, thereby improving inter-observer consistency in perceptual evaluation.

2.2.3. Experimental Procedure

The psychophysical experiments were conducted following a standardized and controlled protocol to ensure consistency and reliability of perceptual evaluations across observers. All experimental sessions were carried out in the same university computer laboratory to avoid potential variations in perceptual judgment caused by environmental changes. At the beginning of each session, the experimenter operated the instructor’s master control computer and used the Red Spider Multimedia Network Classroom Software (developed by Guangzhou Chuangxun Software Co., Ltd., China; official website: http://www.3000soft.net/index.php, accessed on 3 February 2026) to simultaneously distribute texture images to all participant workstations. All computers in the laboratory were equipped with identical hardware configurations, including processor models and display specifications, ensuring device-level uniformity across observers. Each texture image was assigned a unique sample identifier and displayed at a fixed resolution on all student computers. Together with the fixed seating arrangement and uniform ambient lighting conditions in the laboratory, this setup ensured that all observers viewed texture samples from consistent perspectives, under identical illumination directions, and with the same display quality, thereby minimizing potential interference from device-related or environmental factors on perceptual judgments.

Given the relatively large number of texture samples involved in the experiment, specific measures were adopted to reduce visual fatigue and maintain evaluation reliability. All texture images were divided into groups of 20 samples, and each observer randomly selected three groups for perceptual evaluation. This grouping strategy limited the duration of continuous visual exposure while preserving sufficient perceptual data coverage across the dataset. Random group selection further helped mitigate potential ordering effects and learning biases during the evaluation process. Since the grouping is random, each texture image is scored by multiple observers and the final average is taken. In this study, the minimum, median, and maximum numbers of ratings per image were 4, 5, and 6, respectively, with comparable rating coverage across the art-major and non-art-major groups.

To assess the reliability of the collected subjective ratings, inter-observer agreement was evaluated using Kendall’s coefficient of concordance (W) on representative perceptual attributes. The results indicate a moderate to high level of agreement among observers across both art-major and non-art-major groups, suggesting that aggregating ratings from multiple participants provides a stable perceptual feature representation suitable for subsequent modeling and retrieval.

To ensure consistency in the assessment of individual perceptual attributes, observers were instructed to evaluate one perceptual attribute at a time across all texture samples within a selected group. For each attribute, observers completed the rating task for all samples before proceeding to the next attribute. Short breaks were arranged between consecutive attribute evaluations to alleviate visual and cognitive fatigue and to maintain stable internal evaluation criteria for each perceptual dimension. During the experiment, observers were required to view each texture sample in its entirety and provide ratings for all 12 predefined perceptual attributes according to the experimental instructions. All perceptual rating data were recorded electronically and used for subsequent perceptual feature construction and retrieval modeling.

2.3. Construction of Perceptual Feature Space

Based on the psychophysical experiments described in Section 2.2, a perceptual feature space is constructed to represent texture images using aggregated human perceptual evaluations. The resulting perceptual feature space is designed for computational retrieval and similarity modeling, rather than for psychological interpretation of human perception. Let the perceptual attribute set be defined as

A = {a_{1}, a_{2}, \dots, a_{M}},

(1)

where

M

denotes the number of perceptual attributes considered in this study.

For a given perceptual query specified by the user, the desired perceptual description is represented as an

M

-dimensional perceptual query vector

q = {[q_{1}, q_{2}, \dots, q_{M}]}^{T},

(2)

where each element

q_{m}

corresponds to the target strength of the

m

-th perceptual attribute. This perceptual query encoding process is illustrated in Figure 5.

Similarly, each texture image in the database is represented by a perceptual feature vector derived from human ratings. Suppose that

N

observers provide rating scores for the

m

-th perceptual attribute of a texture image. The aggregated perceptual value for this attribute is computed as the mean of the corresponding ratings:

p_{m} = \frac{1}{N} \sum_{n = 1}^{N} r_{m}^{(n)},

(3)

where

r_{m}^{(n)}

denotes the rating score given by the

n

-th observer for the

m

-th perceptual attribute.

By aggregating the perceptual values of all attributes, the perceptual feature vector of a texture image is defined as

p = {[p_{1}, p_{2}, \dots, p_{M}]}^{T},

(4)

To ensure balanced contributions of different perceptual attributes during similarity computation, min–max normalization is applied independently to each perceptual dimension:

{(\hat{p})}_{m} = \frac{p_{m} - \min (p_{m})}{\max (p_{m}) - \min (p_{m})},

(5)

where

{\hat{p}}_{m}

denotes the normalized perceptual feature value of the

m

-th attribute. After normalization, all texture images are embedded into a unified perceptual feature space with comparable attribute scales.

Here, the minimum and maximum values are computed across all texture images in the dataset for each perceptual attribute independently.

Figure 6 provides an intuitive visualization of the distribution of texture images in the constructed perceptual feature space after perceptual feature normalization.

2.4. Perception-Aligned Texture Retrieval Method

Given a normalized perceptual query vector

\hat{q}

and a database of normalized perceptual feature vectors

{\hat{p}}_{i}

, perception-aligned texture retrieval is formulated as a similarity measurement problem in the perceptual feature space. The perceptual distance between the query and each texture image is computed using the Euclidean distance metric:

d ((\hat{q}), {(\hat{p})}_{i}) = \sqrt{(\sum {(m = 1)}^{M} {({(\hat{q})}_{m} - {(\hat{p})}_{(i, m)})}^{2})},

(6)

where

{\hat{p}}_{i, m}

denotes the normalized perceptual value of the

m

-th attribute for the

i

-th texture image.

For clarity, the squared Euclidean distance can be equivalently expressed as

d^{2} (\hat{q}, \hat{p_{i}}) = \sum_{m = 1}^{M} {(\hat{q_{m}} - \hat{p_{i, m}})}^{2},

(7)

Based on the computed perceptual distances, all texture images in the database are ranked in ascending order of distance. The Top-

K

textures with the smallest perceptual distances are selected as the retrieval result set:

R_{K} = {\hat{p_{(1)}}, \hat{p_{(2)}}, \dots, \hat{p_{(K)}}},

(8)

where

{\hat{p}}_{(i)}

denotes the texture ranked at the

i

-th position according to ascending perceptual distance.

To quantitatively evaluate the perceptual consistency between the retrieval results and the user-specified perceptual description, the Perception-Aligned Precision at

K

(PAP@

K

) metric is employed. PAP@

K

is defined as

PAP K = \frac{1}{K} \sum_{i = 1}^{K} δ_{i},

(9)

where

δ_{i} = 1

if the

i

-th retrieved texture is judged to satisfy the perceptual requirements of the query by multiple human observers following the same perceptual attribute definitions and rating criteria as those used in the psychophysical experiments, and

δ_{i} = 0

otherwise. The evaluation focuses on whether retrieved textures fall within the perceptually acceptable range defined by the query attributes, rather than requiring strict numerical matching of attribute values.

From a computational perspective, the proposed retrieval framework is lightweight and scalable. Perceptual feature construction is performed offline, and each texture image is represented by a low-dimensional perceptual vector (12 dimensions in this study). During retrieval, similarity computation involves simple Euclidean distance calculation, resulting in a time complexity linear to the database size. This design enables efficient retrieval even for large-scale texture databases and is suitable for interactive retrieval scenarios.

2.5. Baseline Methods and Implementation Details

To provide a comprehensive comparison with conventional image-driven texture retrieval methods, several representative handcrafted and deep learning–based feature representations were implemented as baselines under a unified experimental setting.

For handcrafted feature baselines, texture images were represented using Gray-Level Co-occurrence Matrix (GLCM), Local Binary Pattern (LBP), and Histogram of Oriented Gradients (HOG) descriptors. For all handcrafted features, feature extraction followed standard implementations with default parameter settings, and retrieval was performed by computing Euclidean distances between feature vectors.

For deep learning baselines, VGG16 and ResNet50 networks pre-trained on ImageNet were adopted and used as fixed feature extractors without fine-tuning. Specifically, for ResNet50, features were extracted from the Global Average Pooling layer, resulting in 2048-dimensional feature vectors. For VGG16, features were extracted from the first fully connected layer (FC1), producing 4096-dimensional representations.

Perceptual similarity for the deep learning baselines was computed using Euclidean distance in the corresponding high-dimensional feature spaces. No dimensionality reduction or task-specific feature adaptation was applied, in order to reflect the standard usage of deep features in conventional content-based image retrieval methods. It should be noted that the objective of this study is not to pursue state-of-the-art image retrieval performance through increasingly complex feature representations, but to investigate the effectiveness of an explicit, perception-aligned retrieval formulation. Therefore, representative handcrafted descriptors and widely adopted deep feature extractors (VGG16 and ResNet50) are selected as baselines to reflect conventional image-driven retrieval paradigms. More recent architectures typically require large-scale training data, task-specific adaptation, or high-dimensional embeddings, which are not directly comparable to the proposed low-dimensional perceptual feature space and reference-free query setting. Accordingly, such methods are beyond the scope of the current study and are left for future exploration.

3. Results

3.1. Perceptual Retrieval Examples

To visually examine the effectiveness of the proposed perception-driven texture retrieval framework, qualitative retrieval results are first presented. Given a user-defined perceptual feature input, the system computes perceptual distances between the query and all texture images in the selected perceptual feature library and returns the Top-K most similar textures ranked in ascending order of perceptual distance.

Figure 7 illustrates representative retrieval results obtained using the perceptual feature library constructed from the art-major observer group, while Figure 8 presents corresponding results from the non-art-major perceptual feature library. In each figure, the retrieved textures are arranged from left to right according to increasing perceptual distance from the query. The visual patterns in the retrieved results demonstrate that textures ranked at higher positions share similar perceptual characteristics with the input perceptual descriptions, whereas textures ranked lower exhibit gradually increasing perceptual divergence.

A comparison between Figure 7 and Figure 8 shows that the retrieval framework produces perceptually coherent results for both observer groups. Although the specific ranking orders differ between the two perceptual feature libraries, the retrieved textures in each case remain consistent with the perceptual attributes specified in the query. These qualitative examples provide an intuitive illustration of how the proposed framework maps subjective perceptual descriptions to texture images without requiring reference images.

3.2. Quantitative Evaluation Using PAP@K

To quantitatively evaluate the perceptual consistency of the proposed reference-free texture retrieval framework, Perception-Aligned Precision at K (PAP@K) is adopted as the evaluation metric. Table 2 reports the PAP@3 and PAP@5 results of different texture representation methods, including the proposed perceptual feature libraries (art-major and non-art-major) and several representative handcrafted and deep feature baselines.

As shown in Table 2, the proposed method achieves high PAP@3 performance for both perceptual feature libraries. When using both art-major and non-art-major perception feature libraries, the PAP@3 value reached 100% under the current evaluation setting. These results indicate that, for the top-ranked retrieval results, the proposed perceptual representations consistently return textures that satisfy the specified perceptual criteria under both user group settings. For PAP@5 evaluation, the proposed method maintains high perceptual consistency across both perceptual feature libraries. The art-major perceptual feature library achieves a PAP@5 value of 90%, whereas the non-art-major perceptual feature library attains a PAP@5 value of 85%. Compared with traditional handcrafted texture features such as GLCM, LBP, and HOG, as well as deep learning–based features including VGG16 and ResNet50, the proposed perceptual representations exhibit higher PAP@K values under both evaluation settings.

Overall, the quantitative results in Table 2 demonstrate that the proposed perception-aligned texture representations provide stable perceptual retrieval performance for reference-free queries at different K values. The high PAP@3 performance observed for both perceptual feature libraries indicates a high level of perceptual consistency between the constructed perceptual feature space and the aggregated human ratings used for evaluation. The results suggest that the proposed perceptual representations can capture dominant perceptual characteristics reflected in the subjective ratings under the controlled experimental setting. Since the queries in this validation phase were derived from the aggregated perceptual ratings of the target textures, the results demonstrate that the system can identify and retrieve textures that are perceptually consistent with the specified query when the user’s perceptual intent aligns with the texture’s perceptual profile. The reported PAP@K results therefore serve as a quantitative basis for further analysis of perceptual behavior and user-group-specific effects, which are discussed in the following section.

4. Discussion

Table 3 summarizes the PAP@5 performance of different texture representation methods across individual perceptual attributes, providing a detailed quantitative basis for analyzing attribute-level retrieval behavior.

Overall, the results reveal clear and systematic differences in how various feature representations respond to distinct perceptual dimensions, highlighting both the strengths and inherent limitations of image-driven features when confronted with explicit perceptual attributes.

To facilitate a holistic interpretation of these attribute-wise results, Figure 9 presents a radar-chart-based visualization that summarizes perceptual behavior across representative methods.

As illustrated in Figure 9, conventional handcrafted and deep feature representations exhibit uneven performance across perceptual attributes, with notable fluctuations depending on the attribute type. In contrast, the proposed perceptual feature representations maintain relatively balanced performance across most perceptual dimensions. This observation suggests that explicit perceptual modeling contributes to more consistent attribute-level retrieval behavior. These results suggest that explicitly modeling perceptual attributes contributes to more consistent attribute-level retrieval behavior, particularly in scenarios where perceptual consistency is more critical than category discrimination.

Beyond method-level differences, perceptual behavior may also vary across user groups with different perceptual backgrounds. To further examine the influence of perceptual sensitivity, Figure 10 provides a focused comparison between the art-major and non-art-major perceptual feature libraries. Overall, the two perceptual feature libraries exhibit highly consistent patterns across most perceptual attributes, suggesting the existence of a shared perceptual baseline in texture perception. This consistency implies that the constructed perceptual feature space captures dominant perceptual tendencies that remain robust across user backgrounds.

At the same time, modest yet systematic differences can be observed for certain perceptual attributes, reflecting variations in perceptual sensitivity associated with user background. These differences do not lead to drastic changes in retrieval behavior but instead manifest as subtle shifts in attribute-level performance. Importantly, such variations remain bounded and do not undermine the overall stability of the proposed retrieval framework.

From an attribute-level perspective, the results further indicate that perceptual attributes differ in their inherent difficulty for perception-driven texture retrieval. Attributes associated with global and visually salient texture properties, such as repetitiveness, randomness, and regularity, consistently achieve high PAP@5 values across perceptual feature libraries. These attributes correspond to prominent global patterns that can be more reliably perceived by observers.

Attributes describing structural organization and spatial arrangement, including structural complexity, coarseness, and feature density, also demonstrate relatively stable retrieval behavior. As shown in Table 3, retrieval accuracy under these attributes remains competitive across different methods, while the proposed perceptual representations preserve balanced performance. This suggests that explicit perceptual modeling is effective in capturing mid-level texture characteristics that are not fully determined by local pixel statistics.

In contrast, perceptual attributes related to finer local variations, such as granularity and roughness, exhibit greater variability across feature representation methods. While the proposed perceptual feature libraries maintain relatively stable accuracy under these attributes, traditional handcrafted features show more pronounced fluctuations. This observation highlights the inherent difficulty of representing subtle local texture cues using purely image-driven descriptors and underscores the advantage of incorporating subjective perceptual evaluations into texture representation.

Taken together, the results demonstrate that the proposed perception-aligned retrieval framework achieves balanced performance across diverse perceptual attributes while accommodating user-group-specific perceptual tendencies. By jointly modeling attribute-level behavior and user-group effects within a unified perceptual feature space, the framework avoids over-specialization and preserves system simplicity. This design is particularly suitable for reference-free texture retrieval scenarios, where user intent is expressed through subjective perceptual descriptions and perceptual background may influence interpretation.

From the perspective of controlled perceptual evaluation, the observed retrieval behavior is grounded in perceptual feature representations derived from aggregated subjective ratings. Standardized experimental conditions, predefined perceptual attributes, and consistent rating scales contribute to the stability of the constructed perceptual feature space. By aggregating perceptual ratings across multiple observers, individual variability is reduced while dominant perceptual tendencies are retained. The consistency observed across perceptual attributes and user groups therefore reflects both the effectiveness of the perception-aligned retrieval formulation and the reliability of the underlying psychophysical data. Within the scope of the current experimental setting, these results support the feasibility and practical effectiveness of using human-rated perceptual attributes as explicit feature dimensions for perception-driven texture retrieval.

Despite the promising results, this study has certain limitations that should be addressed in future work. First, the perceptual feature libraries were constructed based on data from 62 participants. While this controlled participant group enabled stable perceptual evaluation under standardized conditions, a larger and more culturally diverse demographic could further enhance the universality of the perceptual baseline. It should be noted that formal statistical significance testing was not the primary focus of this study, as the objective was to construct a stable perceptual feature space for retrieval rather than to compare competing hypotheses. Instead, the observed consistency across perceptual attributes and user groups provides practical evidence of perceptual alignment under the proposed framework. Second, the current framework relies on 12 predefined attributes. Although these cover major perceptual dimensions, they may not capture all subtle semantic nuances of complex artistic textures. Future research could explore expanding the attribute set or incorporating open-vocabulary descriptions to cover a broader range of visual semantics. Finally, the experiments were conducted on grayscale textures to control for color interference; extending the framework to fully incorporate color-perception interactions remains an important direction for practical applications.

5. Conclusions

This paper presents a perception-aligned texture image retrieval framework that operates directly in a human-centered perceptual feature space constructed from psychophysical experiments. By explicitly modeling texture appearance using subjective visual attributes, the proposed approach enables reference-free texture retrieval driven by perceptual descriptions rather than example images, addressing a key limitation of conventional image-driven retrieval methods.

Through controlled psychophysical experiments, perceptual ratings were collected under standardized conditions and aggregated to construct stable and interpretable perceptual feature representations. Based on this perceptual feature space, a unified retrieval formulation was developed, in which similarity is measured directly in the perceptual domain and retrieval results are obtained through distance-based ranking. Quantitative and qualitative evaluations demonstrate that the proposed method achieves consistent perception-aligned retrieval performance across different perceptual attributes and user groups.

In addition, the introduction of group-specific perceptual feature libraries for art-major and non-art-major participants provides a practical mechanism for accommodating perceptual sensitivity differences while preserving a unified retrieval pipeline. The discussion of attribute-level and user-group effects further illustrates the robustness and interpretability of the proposed perception-driven retrieval framework.

Overall, this work highlights the feasibility of integrating psychophysical perceptual data into texture retrieval system design. By bridging human subjective perception and computational retrieval models, the proposed framework offers an interpretable and flexible solution for perception-driven texture retrieval scenarios. From a system design perspective, such a perception-aligned retrieval framework can be readily integrated into interactive material search systems and human–computer interfaces, where users specify retrieval intent through perceptual descriptors rather than example images, thereby providing a practical foundation for future research on perceptual modeling and user-adaptive visual retrieval.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/electronics15030710/s1, The supplementary materials include: (1) A folder named “Dataset_Images” containing the 300 texture images used in the psychophysical experiments for perceptual feature evaluation; (2) An Excel file named “aggregated_perceptual_feature_matrices.xlsx” containing the aggregated perceptual feature ratings of all texture samples across the 12 perceptual attributes. These materials provide the original experimental data used for constructing the perceptual feature library and for reproducing the experimental results reported in this study.

Author Contributions

Conceptualization, S.X. and Y.Z. (Yao Zheng); methodology, S.X.; software, S.X. and Y.Z. (Yichi Zhang); validation, S.X. and Y.Z. (Yao Zheng); formal analysis, S.X.; investigation, S.X. and Y.C.; resources, Y.Z. (Yao Zheng); data curation, S.X.; writing—original draft preparation, S.X.; writing—review and editing, Y.Z. (Yao Zheng); visualization, S.X.; supervision, Y.Z. (Yao Zheng); project administration, Y.Z. (Yao Zheng). All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

This study involved a non-invasive, minimal-risk psychophysical evaluation of texture images. No medical intervention or collection of personally identifiable data was involved. According to applicable institutional policies for minimal-risk, non-interventional human studies, ethical review and formal IRB approval were waived/not required.

Informed Consent Statement

Informed consent was obtained from all participants prior to participation.

Data Availability Statement

The original contributions presented in this study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Kumar, S.; Pal, A.K.; Varish, N.; Nurhidayat, I.; Eldin, S.M.; Sahoo, S.K. A Hierarchical Approach Based CBIR Scheme Using Shape, Texture, and Color for Accelerating Retrieval Process. J. King Saud Univ.-Comput. Inf. Sci. 2023, 35, 101609. [Google Scholar] [CrossRef]
Li, X.; Yang, J.; Ma, J. Recent Developments of Content-Based Image Retrieval (CBIR). Neurocomputing 2021, 452, 675–689. [Google Scholar] [CrossRef]
Byju, A.P.; Demir, B.; Bruzzone, L. A Progressive Content-Based Image Retrieval in JPEG 2000 Compressed Remote Sensing Archives. IEEE Trans. Geosci. Remote Sens. 2020, 58, 5739–5751. [Google Scholar] [CrossRef]
Ahmad, F. Deep Image Retrieval Using Artificial Neural Network Interpolation and Indexing Based on Similarity Measurement. CAAI Trans. Intell. Technol. 2022, 7, 200–218. [Google Scholar] [CrossRef]
Wang, W.; Dong, X. Perception-Aware Texture Similarity Prediction. IEEE Trans. Image Process. 2024, 33, 3536–3549. [Google Scholar] [CrossRef]
Rao, A.R.; Lohse, G.L. Towards a Texture Naming System: Identifying Relevant Dimensions of Texture. Vision Res. 1996, 36, 1649–1669. [Google Scholar]
Kumar, S.; Pradhan, J.; Pal, A.K. Adaptive Tetrolet Based Color, Texture and Shape Feature Extraction for Content Based Image Retrieval Application. Multimed. Tools Appl. 2021, 80, 29017–29049. [Google Scholar] [CrossRef]
Wang, M.; Zhou, W.; Tian, Q.; Li, H. Deep Graph Convolutional Quantization Networks for Image Retrieval. IEEE Trans. Multimed. 2023, 25, 2164–2175. [Google Scholar] [CrossRef]
Jia, S.; Zhu, S.; Wang, Z.; Xu, M.; Wang, W.; Guo, Y. Diffused Convolutional Neural Network for Hyperspectral Image Super-Resolution. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5504615. [Google Scholar] [CrossRef]
Bouchakwa, M.; Ayadi, Y.; Amous, I. A Review on Visual Content-Based and Users’ Tags-Based Image Annotation: Methods and Techniques. Multimed. Tools Appl. 2020, 79, 21679–21741. [Google Scholar] [CrossRef]
Gao, Y.; Gan, Y.; Qi, L.; Zhou, H.; Dong, X.; Dong, J. A Perception-Inspired Deep Learning Framework for Predicting Perceptual Texture Similarity. IEEE Trans. Circuits Syst. Video Technol. 2020, 30, 3714–3726. [Google Scholar] [CrossRef]
Liu, Y.; Gao, Y.; Sadia, N.H.; Qi, L.; Dong, J. A Sketch-Texture Retrieval Framework Using Perceptual Similarity. Knowl.-Based Syst. 2024, 286, 111259. [Google Scholar] [CrossRef]
Onaizah, A.N.; Xia, Y.; Zhan, Y.; Hussain, K.; Koondhar, I.A. Systematic Literature Review on Approaches of Extracting Image Merits. Optik 2022, 271, 170097. [Google Scholar] [CrossRef]
Khaldi, B.; Aiadi, O.; Kherfi, M.L. Combining Colour and Grey-Level Co-Occurrence Matrix Features: A Comparative Study. Iet Image Process. 2019, 13, 1401–1410. [Google Scholar] [CrossRef]
Baarab, N.; Chaouki, B.E.L.K.; Masmoudi, L. Content-Based Image Retrieval Using Color and a Novel Texture Descriptor: Orientational-Based Local Binary Pattern. Concurr. Comput. Pract. Exp. 2022, 34, e7302. [Google Scholar] [CrossRef]
Sikha, O.K.; Soman, K.P. Dynamic Mode Decomposition Based Salient Edge/Region Features for Content Based Image Retrieval. Multimed. Tools Appl. 2021, 80, 15937–15958. [Google Scholar] [CrossRef]
Bu, H.H.; Kim, N.C.; Kim, S.H. Content-Based Image Retrieval Using a Fusion of Global and Local Features. ETRI J. 2023, 45, 505–518. [Google Scholar] [CrossRef]
Qasim, M.; Mahmood, D.; Bibi, A.; Masud, M.; Ahmed, G.; Khan, S.; Jhanjhi, N.Z.; Hussain, S.J. PCA-Based Advanced Local Octa-Directional Pattern (ALODP-PCA): A Texture Feature Descriptor for Image Retrieval. Electronics 2022, 11, 202. [Google Scholar] [CrossRef]
Ali, M.; Kumar, S.; Pal, R.; Singh, M.K.; Saini, D. Graph- and Machine-Learning-Based Texture Classification. Electronics 2023, 12, 4626. [Google Scholar] [CrossRef]
Zhang, Y.; Kang, Z.; Cao, Z. An Image Retrieval Method for Lunar Complex Craters Integrating Visual and Depth Features. Electronics 2024, 13, 1262. [Google Scholar] [CrossRef]
Bhatt, D.; Patel, C.; Talsania, H.; Patel, J.; Vaghela, R.; Pandya, S.; Modi, K.; Ghayvat, H. CNN Variants for Computer Vision: History, Architecture, Application, Challenges and Future Scope. Electronics 2021, 10, 2470. [Google Scholar] [CrossRef]
Liang, S.; Pan, L.; Mohammadzadeh, F.; Khishe, M.; Kh, T.I.; Shambour, Q.Y.; Ghazal, T.M. Enhancing Artistic Style Classification through a Novel ArtFusionNet Framework. Sci. Rep. 2025, 15, 36178. [Google Scholar] [CrossRef] [PubMed]
Thirunavukarasu, R.; Kotei, E. A Comprehensive Review on Transformer Network for Natural and Medical Image Analysis. Comput. Sci. Rev. 2024, 53, 100648. [Google Scholar] [CrossRef]
Kasthuri, A.; Suruliandi, A.; Poongothai, E.; Raja, S.P. Deep Learning-Based Texture Feature Extraction Technique for Face Annotation. Int. J. Pattern Recognit. Artif. Intell. 2025, 39, 2532001. [Google Scholar] [CrossRef]
Chugh, H.; Gupta, S.; Garg, M.; Gupta, D.; Mohamed, H.G.; Noya, I.D.; Singh, A.; Goyal, N. An Image Retrieval Framework Design Analysis Using Saliency Structure and Color Difference Histogram. Sustainability 2022, 14, 10357. [Google Scholar] [CrossRef]
Khosla, G.; Rajpal, N.; Singh, J. Evaluation of Euclidean and Manhanttan Metrics in Content Based Image Retrieval System. In 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom); Hoda, M.N., Ed.; IEEE: New York, NY, USA, 2015; pp. 12–18. [Google Scholar]
Hu, W.; Wu, L.; Jian, M.; Chen, Y.; Yu, H. Cosine Metric Supervised Deep Hashing with Balanced Similarity. Neurocomputing 2021, 448, 94–105. [Google Scholar] [CrossRef]
Garg, M.; Malhotra, M.; Singh, H. A Novel Machine-Learning Framework-Based on LBP and GLCM Approaches for CBIR System. Int. Arab J. Inf. Technol. 2021, 18, 297–305. [Google Scholar] [CrossRef]
Jabnoun, J.; Haffar, N.; Zrigui, A.; Nsir, S.; Nicolas, H.; Trigui, A. An Image Retrieval System Using Deep Learning to Extract High-Level Features. In Advances in Computational Collective Intelligence ICCCI 2022; Badica, C., Treur, J., Benslimane, D., Hnatkowska, B., Krotkiewicz, M., Eds.; Springer International Publishing Ag: Cham, Switzerland, 2022; Volume 1653, pp. 167–179. [Google Scholar]
Cheng, X.; Sun, Y.; Zhang, W.; Wang, Y.; Cao, X.; Wang, Y. Application of Deep Learning in Multitemporal Remote Sensing Image Classification. Remote Sens. 2023, 15, 3859. [Google Scholar] [CrossRef]

Figure 1. Overall framework of the proposed perception-aligned texture retrieval system.

Figure 2. Preprocessed grayscale images.

Figure 3. Psychophysical experiment procedure for perceptual attribute evaluation.

Figure 4. Illustrations of perceptual attribute definitions.

Figure 5. Illustration of perceptual query encoding into a perceptual feature vector.

Figure 6. Distribution of texture images in the perceptual feature space.

Figure 7. Retrieval results from the art-major perceptual database.

Figure 8. Retrieval results from non-art-major perceptual databases.

Figure 9. Radar chart comparison of perceptual attribute behaviors across different feature representation methods.

Figure 10. Radar chart comparison of perceptual attribute behaviors between art-major and non-art-major perceptual feature libraries.

Table 1. The 9-Point Likert Scale for the 12 Perceptual Attributes and the Perceptual Meaning of the Scale Endpoints.

Perceptual Attributes	1	9	Perceptual Attributes	1	9
Contrast	Low Contrast	High Contrast	Directionality	Non-directional	Highly Directional
Repetitiveness	Non-repetitive	Highly Repetitive	Structural Complexity	Low Complexity	High Complexity
Granularity	Non-granular	Highly Granular	Roughness	Rough	Meticulous
Randomness	Non-random	Highly Random	Regularity	Irregular	Regular
Coarseness	Coarse	Smooth	Local Direction Consistency	Non-directional	Locally Directional
Feature Density	Low Density	High Density	Homogeneity	Non-homogeneous	Homogeneous

Table 2. PAP@3 and PAP@5 results for different feature representation methods.

Models	Accuracy (%)
Models	PAP@3	PAP@5
OURS (art model)	100	90
OURS (non-art model)	100	85
GLCM	58.3	58.3
LBP	44.4	46.7
ResNet50	83.3	73.3
HOG	58.3	50
VGG16	66.6	63.3

Table 3. Comparison of PAP@5 accuracy of different methods under various perceptual features.

Perceptual Features	Accuracy (PAP@5, %)
Perceptual Features	OURS (Art Model)	OURS (Non-Art Model)	GLCM	LBP	ResNet50	HOG	VGG16
Contrast	80	80	20	20	80	20	20
Repetitiveness	100	100	100	100	80	80	40
Granularity	80	80	40	0	60	40	60
Randomness	100	100	60	40	100	60	80
Coarseness	100	80	40	60	60	60	20
Feature density	80	80	80	40	80	80	80
Directionality	80	80	100	40	80	100	100
Structural Complexity	100	80	80	60	60	40	60
Roughness	80	80	40	40	40	60	80
Regularity	80	80	60	20	80	20	60
Local Direction Consistency	100	100	40	60	80	0	100
Homogeneity	100	80	40	80	80	40	60

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xu, S.; Chen, Y.; Zhang, Y.; Zheng, Y. Reference-Free Texture Image Retrieval Based on User-Adaptive Psychophysical Perception Modeling. Electronics 2026, 15, 710. https://doi.org/10.3390/electronics15030710

AMA Style

Xu S, Chen Y, Zhang Y, Zheng Y. Reference-Free Texture Image Retrieval Based on User-Adaptive Psychophysical Perception Modeling. Electronics. 2026; 15(3):710. https://doi.org/10.3390/electronics15030710

Chicago/Turabian Style

Xu, Shaojun, Yulong Chen, Yichi Zhang, and Yao Zheng. 2026. "Reference-Free Texture Image Retrieval Based on User-Adaptive Psychophysical Perception Modeling" Electronics 15, no. 3: 710. https://doi.org/10.3390/electronics15030710

APA Style

Xu, S., Chen, Y., Zhang, Y., & Zheng, Y. (2026). Reference-Free Texture Image Retrieval Based on User-Adaptive Psychophysical Perception Modeling. Electronics, 15(3), 710. https://doi.org/10.3390/electronics15030710

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Reference-Free Texture Image Retrieval Based on User-Adaptive Psychophysical Perception Modeling

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset and Texture Selection

2.2. Perceptual Evaluation Protocol

2.2.1. Participants

2.2.2. Texture Attributes and Rating Scale

2.2.3. Experimental Procedure

2.3. Construction of Perceptual Feature Space

2.4. Perception-Aligned Texture Retrieval Method

2.5. Baseline Methods and Implementation Details

3. Results

3.1. Perceptual Retrieval Examples

3.2. Quantitative Evaluation Using PAP@K

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI