Next Article in Journal
Continuous Vibration-Driven Virtual Tactile Motion Perception Across Fingertips
Next Article in Special Issue
DrSVision: A Machine Learning Tool for Cortical Region-Specific fNIRS Calibration Based on Cadaveric Head MRI
Previous Article in Journal
Numerical Assessment of Electric Underfloor Heating Enhanced by Photovoltaic Integration
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Evaluation and Improvement of Image Aesthetics Quality via Composition and Similarity

1
College of Software, Quanzhou University of Information Engineering, Quanzhou 362000, China
2
College of Mathematics and Computer Science, Zhejiang A&F University, Hangzhou 311300, China
3
Shaoxing Institute of Technology, Shaoxing 312000, China
4
Information and Education Technology Center, Zhejiang A&F University, Hangzhou 311300, China
*
Authors to whom correspondence should be addressed.
Sensors 2025, 25(18), 5919; https://doi.org/10.3390/s25185919
Submission received: 26 July 2025 / Revised: 11 September 2025 / Accepted: 16 September 2025 / Published: 22 September 2025
(This article belongs to the Special Issue Recent Innovations in Computational Imaging and Sensing)

Abstract

The evaluation and enhancement of image aesthetics play a pivotal role in the development of visual media, impacting fields including photography, design, and computer vision. Composition, a key factor shaping visual aesthetics, significantly influences an image’s vividness and expressiveness. However, existing image optimization methods face practical challenges: compression-induced distortion, imprecise object extraction, and cropping-caused unnatural proportions or content loss. To tackle these issues, this paper proposes an image aesthetic evaluation with composition and similarity (IACS) method that harmonizes composition aesthetics and image similarity through a unified function. When evaluating composition aesthetics, the method calculates the distance between the main semantic line (or salient object) and the nearest rule-of-thirds line or central line. For images featuring prominent semantic lines, a modified Hough transform is utilized to detect the main semantic line, while for images containing salient objects, a salient object detection method based on luminance channel salience features (LCSF) is applied to determine the salient object region. In evaluating similarity, edge similarity measured by the Canny operator is combined with the structural similarity index (SSIM). Furthermore, we introduce a Framework for Image Aesthetic Evaluation with Composition and Similarity-Based Optimization (FIACSO), which uses semantic segmentation and generative adversarial networks (GANs) to optimize composition while preserving the original content. Compared with prior approaches, the proposed method improves both the aesthetic appeal and fidelity of optimized images. Subjective evaluation involving 30 participants further confirms that FIACSO outperforms existing methods in overall aesthetics, compositional harmony, and content integrity. Beyond methodological contributions, this study also offers practical value: it supports photographers in refining image composition without losing context, assists designers in creating balanced layouts with minimal distortion, and provides computational tools to enhance the efficiency and quality of visual media production.

1. Introduction

Image aesthetic evaluation and enhancement play a crucial role in advancing the field of visual computing, as they directly influence the quality of image interpretation and its emotional impact on viewers. The ability to assess and optimize the aesthetic elements of an image is vital in various domains, including photography, design, and computer vision, contributing to the creation of visually appealing and contextually meaningful images. Drawing upon aesthetic principles, image optimization encompasses a range of techniques, including image refocusing, brightness/contrast adjustments, and image composition optimization. Among these, image composition stands out as a critical element, reflecting the rational arrangement of elements within the frame and often serving as the primary focal point for appreciating and assessing the aesthetic quality of an image.
The primary aim of composition optimization is to augment the thematic impact of images, specifically aligning with the three fundamental principles of image aesthetics delineated by Luo et al. [1]. Recent researchers have been focusing on aesthetic principles to measure image quality and have increasingly recognized the impact of aesthetics on the overall perception of images, and a series of valuable studies have emerged to fill the void in this element of image optimization [2,3,4]. For instance, Patnaik et al. [5] proposed AesthetiQ, enhancing graphic layouts via multi-modal LLMs’ aesthetic preference alignment, with layout-quality filtering and a new metric. Alsmirat et al. [6] proposed a supervised deep learning-based method for the ideal identification of image retargeting techniques, which utilizes transfer learning to construct deep learning models such as Resnet18, DenseNet121, and InceptionV3 to predict the suitable retargeting method for an input image with a specific resolution. Hong et al. [7] proposed GenCrop a weakly supervised approach to learn high-quality subject-aware cropping from professional stock images by combining them with a pretrained text-to-image diffusion model to generate cropped-uncropped training pairs automatically. Shen et al. [8] proposed a content-aware image retargeting method called PruneRepaint, which incorporates semantic importance for each pixel and an adaptive repainting module to maintain key semantics and achieve local smoothness, outperforming previous approaches in preserving semantics and aesthetics on the RetargetMe benchmark. Hong et al. [9] proposed GenCrop, a weakly supervised subject-aware cropping approach using stock images and a diffusion model to generate training pairs, performing well against supervised and weakly supervised methods. Additionally, studies by Taichi Hussain et al. [10], Hui Wang [11] and Gao et al. [12] have also explored different perspectives and methods to promote the development of image composition optimization.
Presently, these methods can be broadly categorized into three types. The first type is rule-based optimization methods, which focus on applying fundamental composition principles to optimize image composition. Despite their adherence to aesthetic rules, these techniques tend to be complex and may necessitate manual intervention, reducing their efficiency and applicability in automated scenarios. The second type is learning-based optimization methods, which typically involve modeling compositional features using big data and applying these features to generate visually more pleasing images. Unfortunately, such methods mostly rely on cropping to process images, hence they suffer from the drawback of content loss, which may lead to the disappearance of important image details. The third type is example-based optimization methods, which adjust input images based on the composition of reference images. These methods usually require retrieving reference images from a database, but finding images with compositions similar to those of the input image poses a challenging problem, limiting their practicality. Beyond the limitations of these three types of methods, existing image optimization approaches also encounter common practical challenges including compression-induced distortion, imprecise object extraction, and disproportionate proportions caused by cropping, all of which hinder the further improvement of image aesthetic quality.
To address these issues, an IACS method was proposed. The proposed method achieved a balance between composition aesthetics and image resemblance through a unified function, while allowing for precise control via parameter adjustments. In evaluating the composition aesthetic, emphasis was placed on the distance between the main semantic line or salient object and the nearest rule-of-thirds line or central line. For images with prominent semantic lines, a modified Hough transform with length filtering and pixel spacing constraints was employed to detect the main semantic line. Similarly, for images containing salient objects, a salient object detection method based on LCSF was utilized to determine the salient object region. In evaluating the similarity to original images, edge similarity measured by Canny operator was combined with the SSIM for measurement. Furthermore, a framework of image aesthetic evaluation with composition and similarity-based optimization (FIACSO) was proposed. After categorizing the composition of the image, the framework utilized a semantic segmentation network for image segmentation to obtain composition information. Subsequently, the IACS method was applied to optimize and adjust the image. Finally, the optimized segmentation result was refined using DeepSIM, a GAN-based image generation model. The purpose of this step is to ensure that the adjusted composition is seamlessly integrated into the original image, avoiding visual artifacts, boundary discontinuities, or distortions that may occur when only the segmentation result is used. In this way, the GAN produces a more natural and visually consistent final image. In brief, the main contributions of this study are as follows:
  • An image aesthetic evaluation with composition and similarity (IACS) method was proposed. The proposed method achieved a balance between composition aesthetics and image similarity through a unified function while allowing for precise control via parameter adjustments. In composition aesthetic evaluation, emphasis was placed on the distance between the main semantic line or salient object and the nearest rule-of-thirds line or central line. For images with prominent semantic lines, a modified Hough transform with length filtering and pixel spacing constraints was employed to detect the main semantic line. For images containing salient objects, a detection method based on LCSF was employed to determine the salient object region. In evaluating the similarity to original images, edge similarity measured by Canny operator was combined with the SSIM for measurement.
  • A framework of image aesthetic evaluation with composition and similarity-based optimization (FIACSO) was proposed. This framework categorized the composition of the image initially and then utilized a semantic segmentation network to segment the image, extracting composition information. Subsequently, the IACS method was applied to optimize adjustments to the image. Finally, a generative adversarial network (GAN) was employed to generate optimized images that adhered to composition rules and closely resembled the original image.
  • A salient object detection method based on LCSF was introduced. Initially, the image is processed through Gaussian filtering and converted to the Lab color space. Afterwards, threshold segmentation and morphological operations were performed on the brightness channel to calculate saliency features and extract the maximum feature. Then, the GrabCut algorithm was applied for image segmentation to extract the foreground. The resulting foreground was multiplied by the saliency feature to generate a saliency map, effectively highlighting the salient object region within the image.

2. Materials and Methods

2.1. Main Ideas

In various themes and scenarios, diverse composition techniques are commonly employed. For instance, landscape photography frequently employs linear or symmetrical composition, while portrait photography leans towards central composition or rule of thirds (RoT). This paper delves into optimizing two distinct types of images: those with main semantic lines and those featuring salient objects. These categories of images are pervasive in practical applications and are pivotal in augmenting the visual allure and aesthetic appeal of images.
For images featuring main semantic lines, the main semantic line plays a crucial role in directing visual attention and emphasizing the structure of the image. As depicted in Figure 1, when optimizing the composition of such images, the role of semantic lines can be further emphasized, enabling viewers to focus more on the theme and understand the content of the image.
For images containing salient objects, the salient object serves as the central element of composition, capturing viewers’ attention at its core. As depicted in Figure 2, optimizing the composition of such images aids in accentuating the salient object, granting it a more prominent presence within the frame.
The process of FIACSO for predicted aesthetic score improvement, which integrates composition and similarity, was proposed in this paper and is illustrated in Figure 3.
From Figure 3, it is evident that FIACSO primarily comprises the following components.
  • Composition category determination: In this section, a composition category determination network was designed to detect the composition category of the input image. This network was capable of flexibly identifying complex composition characteristics during the composition category prediction phase and provided confidence scores for adherence to composition rules, thereby demonstrating high accuracy and generalization capabilities.
  • Image aesthetic evaluation with composition and similarity (IACS): In this section, a unified function was used to balance composition aesthetics and image similarity. The evaluation of composition aesthetics focused on the distance between the main semantic line or salient object and the nearest rule-of-thirds line or central line. For images with prominent semantic lines, an modified Hough transform was employed to detect the main semantic line. For images containing salient objects, a salient object detection method based on LCSF was utilized to determine the salient object region. In evaluating the similarity to original images, edge similarity measured by Canny operator was combined with the SSIM for measurement.
  • Composition optimization adjustment: This section focuses on maximizing the aesthetic evaluation of composition while preserving the original semantic and structural information of the image. It involves a series of steps, including content-aware rotation, determining the position of the main semantic line or salient object, and gradually adjusting this position using the IACS method. The ultimate goal is to achieve the highest possible composition aesthetic evaluation.
The main objective of this study is to investigate methods for optimizing image composition. Therefore, special emphasis is placed on three key elements: determining the categories of composition, employing the IACS method for aesthetic evaluation, and making adjustments for composition optimization. Semantic segmentation and image generation methods are not the primary focus of this study. Instead, well-established methods are adopted for both tasks in subsequent experiments. The Swin-Base [13] was utilized as the semantic segmentation network model, while the DeepSIM [14], a generative adversarial network trained on single images, was employed for image generation.

2.2. Problem Definition

Unless otherwise specified in the Rules, the relevant definitions are as follows:
X i : Original input image.
Y r : Optimized output image.
C : Composition category.
A c : Composition aesthetic evaluation.
A s : Similarity evaluation.
A : Comprehensive evaluation.
I i : Semantic segmentation result of image.
I o : Intermediate image during the optimization process.
λ 1 ,   λ 2 ,   λ 3 : Weight parameter, with a value range of [0, 1].
L : Length of the semantic line.
L _ m i n : Minimum length threshold of the semantic line.
d : Pixel spacing between points on the semantic line.
d _ m a x : Maximum pixel spacing threshold.
C = { L 1 ,   L 2 ,   ,   L n } : The collection of candidate main semantic lines.
L i : The main semantic line.
L l : The line of thirds or center line closest to line L i .
R i : The region of the salient object.
p i : The center point of R i .
p i s : The intersection point closest to the line of thirds or the center line, nearest to point p i .
D i : The central axis of R i .
D l : The line of thirds or center line closest to line D i .
d i s : Euclidean distance.
l ,   c , s : Luminance, contrast, and structure.
μ I i : The mean of I i .
μ I o : The mean of I o .
σ I i : The standard deviation of I i .
σ I o : The standard deviation of I o .
σ I o I i : The covariance between I o and I i .
I o : The edge detection results of the image during the optimization process.
I i : The edge detection results of the semantic segmentation result image.
σ I o : The standard deviation of I o .
σ I i : The standard deviation of I i .
σ I o I i : The covariance between I o and I i .
C 1 , C 2 , C 3 , C 3 : Ensure the stability of calculations by incorporating constants to prevent instability when the denominator approaches zero.
A m a x : The initial IACS score.
X : The initial number of datasets.
X = { X ,   T θ X } : The number of datasets after data augmentation.
The original image X i is initially categorized into composition category C , followed by semantic segmentation to obtain the segmented result image I i containing composition information. Subsequently, the image composition is incrementally adjusted using the IACS method, resulting in the adjusted image I o . Finally, a generative adversarial network combines the original image X i with the adjusted image I o to generate the image, producing the optimized result image Y r . This process enhances the composition aesthetics while preserving the content and features of the original image.

2.3. Composition Category Determination

In the domain of image composition optimization, prior research has predominantly focused on RoT and central composition methods [15]. However, this approach is relatively narrow and fails to encompass the full diversity and richness of image composition. To address this limitation, the current study embarks on a more extensive exploration, identifying linear, symmetrical, RoT, and central compositions as the primary categories. These four types of compositions integrate a wide range of common compositional techniques and principles, providing a comprehensive framework for design. Consequently, this study extends beyond previous studies by incorporating a broader array of factors in composition optimization, delivering deeper and more comprehensive insights into the art of image composition. A schematic illustration is provided in Figure 4.
To select an appropriate composition optimization module, it is essential to employ a network capable of determining the composition category for input images. The approach to choosing the composition categorization model in this study includes several key elements:
  • Image labeling and enhancement: This study categorized and labeled images from the composition classification dataset into four types: linear, symmetrical, RoT, and central compositions. All images were front-facing (i.e., horizontally oriented) professional photos without any skew. However, images captured in everyday situations often did not adhere to composition norms or horizontal alignment principles. Therefore, this paper introduced a data augmentation step involving random rotations of ±10 degrees to enhance the model’s ability to generalize [16].
  • Model selection and tuning: To identify the optimal network model for recognizing composition rules, this study experimentally evaluated various well-known CNN and transformer-based models. After fine-tuning, Swin-Base was selected as the most suitable model. Further details are available in Figure 5 of the experimental section.
  • Model Training: The AdamW optimizer [17] was utilized during the training process. Hyperparameters, learning rate ranges, and convergence strategies were meticulously set to promote efficient learning of composition rules. Recognizing that an image might conform to multiple composition rules, a multi-label handling strategy was implemented: in each training epoch, one label was selected as the ground truth. During testing, a prediction was deemed correct if it corresponded to any of the image’s applicable labels. The model predicted by issuing confidence scores for each of the four composition categories. By establishing thresholds and decision criteria, the model adeptly recognized and managed images with complex composition features, thereby enhancing the accuracy and robustness of the composition categorization.

2.4. Image Aesthetic Evaluation with Composition and Similarity IACS

To ensure the quality of output images, optimization focused on two essential attributes. Firstly, the image composition was optimized according to specific compositional rules. Secondly, the optimized image preserved as much information from the original image as possible while minimizing visual flaws or distortions. To address these requirements, an IACS method was proposed. This method combined the aesthetics of image composition and the similarity between the final and the original images into a unified function, calculated according to the following Equation (1):
A ( I o ) = λ 1 · A c ( I o ) + ( 1 λ 1 ) · A s ( I o , I i )
where A c represents the evaluation of the compositional aesthetics of the optimized output image I o . A s represents the similarity assessment between I o and the input image I i . The parameter λ 1 [0, 1] regulates the impact of these two elements. The objective is to maximize the value of I o . A higher λ 1 value enhances the compositional quality of the output, while a lower λ 1 maintains closer similarity to the input image. λ 1 was determined as 0.5 through the sensitivity analysis presented in Section 3.4.1. This value maximizes both the IACS comprehensive score (A = 0.80 ± 0.04) and subjective preference rate (82 ± 4%), achieving an optimal balance between compositional aesthetic enhancement and original content preservation.

2.4.1. Composition Aesthetic Evaluation Based on Main Semantic Line

Linear and Symmetric compositions are particularly suited for images with semantic lines, including landscapes or architectural scenes. A direct and effective method to evaluate such compositions is by detecting the main semantic line and assessing its aesthetic quality using A c ( I ) .
Since the results of semantic segmentation may include multiple semantic lines, it is necessary to identify one main semantic line based on its position for aesthetic evaluation.
The Hough Transform [18] can detect semantic lines, including short and discontinuous ones. The core of the Hough Transform for line detection relies on the Hough theorem, which maps line detection in the image space ( x ,   y ) to peak detection in the parameter space ( ρ ,   θ ). The mathematical expression of the Hough theorem for line detection is given in Equation (2):
ρ = x cos θ + y s i n θ
where ρ denotes the perpendicular distance from the origin of the image coordinate system to the straight line, and θ denotes the angle between the perpendicular line and the x-axis. In Hough space, each pixel ( x ,   y ) in the image space corresponds to a sinusoidal curve; the intersection of multiple sinusoidal curves in Hough space indicates that these pixels belong to the same straight line in the image space.
However, the standard Hough Transform may detect short or discontinuous lines that are not suitable as main semantic lines. Thus, this paper modified the Hough Transform as follows:
  • Semantic Line Length Filtering: To exclude overly short semantic lines, a minimum length threshold L _ m i n was established. Any detected semantic line, represented by length L , must meet the criteria of Equation (3) to be considered:
    L L _ m i n
  • Pixel Spacing Threshold: To ensure continuity of semantic lines, a maximum pixel spacing threshold d _ m a x was established. For a segment formed by pixels ( x 1 ,   y 1 ) and ( x 2 ,   y 2 ) , their distance must satisfy Equation (4):
    ( x 2 x 1 ) 2 + ( y 2 y 1 ) 2 d m a x
Determining the Main Semantic Line: Based on the filtering conditions from the first two steps, a semantic line L i that meets both conditions is selected as the main semantic line from the candidate set C   =   { L 1 ,   L 2 ,   ,   L n } .
In evaluating composition aesthetics based on the main semantic line, the image was first converted to grayscale and processed through Canny edge detection [19] to produce a binary image. The modified Hough Transform was then used to detect the main semantic line. Finally, the compositional aesthetic evaluation is calculated by utilizing the Euclidean distance between the main semantic line and the nearest rule-of-thirds line or central line.
For linear composition, the alignment of the main semantic line with both horizontal and vertical rule-of-thirds lines is evaluated, and the axis yielding the smaller normalized distance is selected. The corresponding compositional aesthetic score A c is given by Equation (5):
A c ( I ) = cos ( 2 · min { d i s ( L i ,   n e a r e s t ( { x = w / 3 , 2 w / 3 } ) ) w / 3 , d i s ( L i ,   n e a r e s t ( { y = h / 3 , 2 h / 3 } ) ) h / 3 } · π 2 )
where w and h respectively represent the width and height of the image, L i is the main semantic line, d i s ( · ,   · ) is the perpendicular Euclidean distance between lines.
For symmetrical composition, the alignment of the main semantic line with both the vertical and horizontal central lines is evaluated, and the corresponding compositional aesthetic score A c is given by Equation (6):
A c ( I ) = cos ( 2 · min { d i s ( L i , x = w / 2 ) w / 2 , d i s ( L i , y = h / 2 ) h / 2 } · π 2 )

2.4.2. Composition Aesthetic Evaluation Based on Salient Object

RoT and center Composition are widely used in image composition for photos with clear foreground salient objects. Such images are particularly popular in personal photo collections, including pictures of family members, friends, pets, and interesting objects like flowers.
Detecting the salient object region is crucial for this type of composition and subsequent optimization. To this end, a salient object detection method based on LCSF was proposed to accurately distinguish between the foreground and background of an image, aiming to highlight the salient object region. The method involved processing the semantic segmentation results using a Gaussian filter and converting the image to the Lab color space. The luminance channel was then subjected to threshold segmentation and morphological operations to calculate salience features and extract the maximum features.
The calculation of salience features based on the luminance channel is given by Equation (7):
I s a l i e n t   ( x , y ) = max ( 0 , L ( x , y ) 1 k ( i ,   j ) Ω ( x ,   y ) L ( i , j ) )
where I s a l i e n t   ( x ,   y ) represents the salience feature value at pixel ( x ,   y ) ; L ( x ,   y ) represents the luminance value of pixel ( x ,   y ) in the Lab color space; Ω (x, y) represents a 3 × 3 local neighborhood centered at ( x ,   y ) ; K = 9 represents the total number of pixels in the neighborhood; m a x ( 0 ,   · ) is used to suppress negative differences and retain pixels with higher luminance than the local average.
Subsequently, the GrabCut algorithm [20] was used to segment the image, extract the foreground, and multiply it by the salience features. The fusion process of the foreground region and salience features is given by Equation (8):
I s a l i e n t _ i m a g e ( x ,   y ) = I f o r e g r o u n d ( x ,   y ) · I s a l i e n t ( x ,   y )
where I f o r e g r o u n d ( x ,   y ) represents the foreground mask output by GrabCut; I s a l i e n t _ i m a g e ( x ,   y ) represents the final salient object image. This formula enhances salient features in the foreground region through pixel-wise multiplication while suppressing background noise, effectively enhancing the visibility of the salient object region in the image.
The process of salient object region detection method based on LCSF proposed in this study is illustrated in Algorithm 1.
Algorithm 1. Salient object region detection based on LCSF
Input: I
Output: I salient _ object _ image  
function SalientObjectDetection   ( I )
      I smooth GaussianFilter   ( I ,   G )
      L , a , b   ( I smooth )
      I brightness L
      I binary Thresholding   ( I brightness , T )
      I closed Closing   ( I binary )
      I salient ComputeSalientFeature   ( I closed )
      I mask CreateMask   ( I ,   I salient )
      I foreground GrabCut ( I , I mask )
      I salient _ image I foreground   ×   I salient
return  I s a l i e n t _ o b j e c t _ i m a g e
end function
Main:
I Input Image
I salient _ object _ image SalientObjectDetection ( I )
Based on the detected salient object region, two elements need to be considered when calculating compositional aesthetics. The first element is the distance from the salient object to the four points of intersection of the rule-of-thirds or the center point in the image. The second element is whether the salient object is placed along the lines of the rule-of-thirds or the center line. For an image, the calculation was done according to the following Equation (9):
A c ( I ) = λ 2 · A d i s ( I ) + ( 1 λ 2 ) · A p o s ( I )
where A d i s and A p o s respectively consider the aforementioned elements. The parameter λ 2 ∈ [0, 1] controls the influence of A d i s and A p o s . Through the sensitivity analysis presented in Section 3.4.2, α = 1/3 was identified as optimal: it prioritizes A p o s while retaining a moderate weight for A d i s , resulting in the highest average A c (0.83 ± 0.07) and subjective aesthetic score (4.2 ± 0.2).
For the rule-of-thirds composition, its parameter A d i s is given by Equation (10):
A d i s ( I ) = c o s ( ( | p i x p i x s | w / 3 + | p i y p i y s | h / 3 ) · π 2 )
where p i and p i s respectively represent the center point of the salient object region R i and the intersection point closest to p i .
For central composition, its similarity evaluation score A d i s is given by Equation (11):
A d i s ( I ) = c o s ( ( | p i x p i x s | w / 2 + | p i y p i y s | h / 2 ) · π 2 )
According to the above formulas, when the center of the main object in the image aligns with one of the intersection points, the value of A d i s is 1.
For most images with prominent salient objects, including a person or a tall building, the central axis is nearly vertical. Therefore, this paper calculates the central axis by dividing the salient object into two equal regions using a vertical line.
For rule-of-thirds composition, its similarity evaluation score A p o s is given by Equation (12):
A p o s ( I ) = cos ( 2 · min { d i s ( D i ,   n e a r e s t ( { x = w / 3 , 2 w / 3 } ) ) w / 3 , d i s ( D i ,   n e a r e s t ( { y = h / 3 , 2 h / 3 } ) ) h / 3 } · π 2 )
where D i represents the central axis of the salient object region R i .
For central composition, its similarity evaluation score A p o s is given by Equation (13):
A p o s ( I ) = cos ( 2 · min { d i s ( D i , x = w / 2 ) w / 2 , d i s ( D i , y = h / 2 ) h / 2 } · π 2 )
where D i represent the central axis of the salient object region R i .

2.4.3. Evaluation of Similarity to Original Images

Image retargeting is a technique used to adjust the position of semantic lines or foreground salient objects in order to enhance compositional aesthetics. During this process, visual distortion affects the image, particularly when dealing with images featuring complex background structures. To manage this distortion within an acceptable range, this paper employed similarity measurement to quantify the visual variances between the optimized and original images.
While traditional quality metrics like mean square error (MSE) are computationally simple, they fail to accurately reflect visual quality from the perceptual standpoint. Hence, in some studies, SSIM [21] has been widely used to assess perceptual image similarity. Unlike MSE, SSIM is a perceptual model that aligns more closely with human visual perception. SSIM is calculated according to the following Equation (14):
S S I M ( I o , I i ) = [ l ( I o ,   I i ) ] α · [ c ( I o ,   I i ) ] β · [ s ( I o ,   I i ) ] γ
where l ,   c and s respectively represent the luminance, contrast, and structure between I o and I i .
The SSIM value typically ranges from 0 to 1, with values closer to 1 indicating greater similarity between the two images. The individual modules are calculated using the following Equation (15):
    l ( I o ,   I i ) = 2 μ I o μ I i + C 1 μ I o 2 + μ I i 2 + C 1 c ( I o ,   I i ) = 2 σ I o σ I i + C 2 σ I o 2 + σ I i 2 + C 2 s ( I o ,   I i ) = σ I o I i + C 3 σ I o σ I i + C 3
where μ I o and where μ I i respectively represent the means of σ I o and σ I i , while σ I o and σ I i represent their respective standard deviations. σ I o I i represents the covariance between I o and I i . C 1 , C 2 and C 3 are constants used to ensure computational stability and prevent instability when the denominator approaches zero.
However, the evaluation of edge structures lacks correlation in the structural section of SSIM, making it difficult to measure differences in edge and contour structure information between images [22]. Because the human visual system is most sensitive to edge and contour structure information, this paper improves upon SSIM by adding a measure of edge similarity. Canny edge detection was applied to both the semantic segmentation result image and the image obtained during the optimization process to generate edge maps. Subsequently, we compute the edge similarity based on Equation (16):
e ( I o ,   I i ) = σ I o I i + C 3 σ I o σ I i + C 3
where σ I o and σ I i respectively represent the standard deviation of the edge detection result images I o and I i , and σ I o σ I i represents their covariance. The constant C 3 is a constant that ensures computational stability.
By combining Equations (14)–(16), the equation for calculating the similarity between the images A S ( I o ,   I i ) was obtained as shown in Equation (17):
A S ( I o ,   I i ) = l ( I o ,   I i ) · c ( I o ,   I i ) · ( λ 3 · s ( I o ,   I i ) + ( 1 λ 3 ) · e ( I o ,   I i ) ) = [ 2 μ I o μ I i + C 1 μ I o 2 + μ I i 2 + C 1 ] α · [ 2 σ I o σ I i + C 2 σ I o 2 + σ I i 2 + C 2 ] β · [ λ 3 · [ σ I o I i + C 3 σ I o σ I i + C 3 ] + ( 1 λ 3 ) · [   σ I o I i + C 3 σ I o σ I i + C 3 ] ] γ
where α , β and γ were set to 1 to control the relative importance of the three components. The parameter λ 3   [0, 1] governs the influence of image structure and edges, and its value is set to 0.5 to balance the two factors. Typically, the constants are defined as C 1   =   ( K 1 L ) 2 , C 2   =   ( K 2 L ) 2 , and C 3   =   C 2 / 2 , where K 1   =   0.01 , K 2   =   0.03 , and L   = 255. In this experiment, C 3   =   1 × 10−4 was introduced to ensure numerical stability.

2.5. Composition Optimization Adjustment

The optimized image is the one that maximizes the function A ( I o ) as calculated by Equation (18):
A ( I o ) = a r g m a x I o A ( I o )
The solution is found through the following steps:
  • Image rotation. Any angle of tilt can disrupt the balance in the image. After detecting the main semantic lines, the image needs to be adjusted to the correct orientation. Rotation can result in the loss of some semantic information. To address this, a content-aware rotation [23] method was adopted, which maintains the element ratio and preserves the semantic and structural information of the original image to the greatest extent. This step is only applied to images with linear or symmetrical compositions.
  • Determine the optimal position P c of the main semantic line or salient object for compositional aesthetic evaluation A c ( I ) . Based on the semantic image I i of the original image, identify the main semantic line or salient object, and use the image composition aesthetic evaluation equation A c ( I ) to maximize this function to find the optimal position of the main semantic line or salient object P c , disregarding similarity to the original image. Practically, the optimal position is the closest line of thirds or center line to the main semantic line or salient object of the image.
  • Determine the adjustment position of the main semantic line or salient object for this round. In the semantic image I i of the original image, adjust the main semantic line or salient object from its original position P s , which is the position with the highest similarity, to the optimal position P c determined in step (2).
  • Adjust the main semantic line or salient object in the semantic image. The Seam Carving method [24] is employed in this paper, selectively removing or inserting rows of pixels (Seams) without altering the image size while preserving important content, to move the position of the main semantic line or salient object, resulting in the adjusted semantic image I o . Compared to traditional image cropping and scaling techniques, Seam Carving more effectively preserves image information without reducing image resolution.
  • Compositional aesthetic evaluation based on IACS. Calculate the aesthetics score A ( I ) for the semantic image after adjusting the position of the main semantic line or salient object using the IACS aesthetic evaluation method.
  • If the adjustment position of the main semantic line or salient object has not reached the optimal position A c ( I ) determined in step (2), return to step (3); otherwise, proceed to step (7).
  • Among all the semantic images adjusted during the process, the one with the highest IACS evaluation A ( I ) is the final result of the composition optimization adjustment, with the position of the semantic line or salient object marked as P o .
During the adjustment process, the aesthetic evaluation result A c ( I ) continuously improves, while the similarity score to the original image A s ( I ) gradually decreases. The position where the combined IACS aesthetic evaluation A ( I ) reaches its maximum is the optimal position for compositional optimization adjustment of the main semantic line or salient object in the input image. At this point, the image was optimized according to specific compositional rules while retaining as much of the original image’s information as possible.

2.6. The Process of FIACSO

By following the steps outlined above, the process of the framework of image aesthetic evaluation with composition and similarity-based optimization (FIACSO) is detailed in Algorithm 2.
Algorithm 2. The process of the framework of image aesthetic evaluation with composition and similarity-based optimization (FIACSO)
Input:  X i
Output:  Y r
function   CompositionClassification   ( X i )
      Classify   the   composition   category   of   input   image   X i
      Obtain   composition   category   label   C
return   C
function   SemanticSegmentation   ( X i )
      Perform   semantic   segmentation   on   input   image   X i
      Obtain   segmented   result   image   I i
return   I i
function   CompositionOptimizationAdjustment   ( I i ,   C )
      I o I i
      λ 1   [ 0 ,   1 ]   Control   the   influence   of   composition   aesthetics   and   image   similarity
      A max
      improvement     true  
      while   improvement   do
                A c   EvaluateComposition   ( I o ,   C )
                A s   EvaluateSimilarity   ( I o ,   I i )
                A ( I o ) λ 1   ×   A c + ( 1 - λ 1 )   ×   A s
                if   A ( I o )   >   A max   then
                        A max A ( I o )
                        I o   ApplyOptimizationAdjustment   ( I o )
                        improvement     true  
                else
                        improvement     false  
                end   if
      end   while
      return   I o
end   function
function   GenerationOfOptimizedImage   ( I o ,   X i )
      Combine   adjusted   image   I o   with   input   image   X i   to   generate   Optimized   output   image   Y r
      return   Y r
end   function
Main :
C   CompositionClassification   ( X i )
I i   SemanticSegmentation   ( X i )
I o   CompositionOptimizationAdjustment   ( I i ,   C )
Y r   GenerationOfOptimizedImage   ( I o ,   X i )
  • Composition category determination: First, the input image X i undergoes a determination of its composition category C . This step involves analyzing the image’s features and attributes, classifying it into a specific composition type, and obtaining the corresponding composition category label C . This process is implemented by the function CompositionClassification .
  • Semantic segmentation: Next, a trained semantic segmentation network is used to segment the input image X i , resulting in a segmented image that contains semantic and compositional information. This process is implemented by the function SemanticSegmentation .
  • Compositional optimization adjustment: In the compositional optimization adjustment stage, the algorithm incrementally adjusts the image’s composition to enhance the aesthetic score while maintaining image similarity. This process is implemented by the function CompositionOptimizationAdustment . Initially, the segmented image I i is used as the initial optimized image I o , and a parameter λ 1 is set to control the influence of compositional aesthetics and image similarity. Then, through a cyclic iterative process, evaluate the adjusted image I o ’s compositional aesthetic score A c and its similarity assessment A s to the image I i , the algorithm decides whether to accept the adjusted image and makes compositional adjustments under the premise of preserving the image content, until the optimal compositional effect is achieved.
  • Generation of the optimized image: Finally, by combining the original image X i and the adjusted image I o , the final optimized output image Y r is produced. This process is completed by the function GenerationOfOptimizedImage .
The entire algorithm framework considers both compositional aesthetics and image similarity, optimizing the image composition under the premise of maintaining content integrity, effectively enhancing the overall aesthetic quality of the image.

3. Experiments

3.1. Experimental Software and Hardware Configurations

The training and testing of the network models in this paper were conducted using the PyTorch deep learning framework. The specific experimental hardware and software configuration is shown in Table 1.

3.2. Experimental Datasets

Two fundamental datasets, namely the KU_PCP dataset [25] and the ImageNet dataset [26] were utilized in this paper. The KU_PCP dataset was employed to conduct experiments to determine the network’s selection of composition categories. To mitigate overfitting risks associated with training on a small dataset, the network was pretrained on the large-scale ImageNet dataset for image classification. Subsequently, fine-tuning took place on the KU_PCP dataset to refine composition-related elements.
(1)
KU_PCP dataset
The KU_PCP dataset consists of images sourced from social sharing platforms and encompasses 4244 landscape images that adhere to diverse composition rules. The composition types are annotated by 18 annotators, with each image being assigned a corresponding label indicating its composition type. For the purpose of the network’s composition category determination, this paper employs images representing four composition methods: Linear, Symmetric, RoT, and Center.
(2)
ImageNet dataset
The ImageNet dataset is a comprehensive visual recognition dataset, containing over 14 million images. Each image is associated with a class label encompassing more than 20,000 categories, spanning various domains including animals and transportation. Widely employed in the field of computer vision, particularly in tasks like image classification and salient object detection, this dataset plays a crucial role.

Data Preprocessing and Augmentation

To ensure consistency and quality of the input data, all images were first resized to 256 × 256 pixels. When necessary, symmetric padding was applied to preserve the original element ratio, and pixel values were normalized to the range [0, 1] to accelerate convergence and improve training stability. These steps ensured that the input data were standardized and suitable for training deep neural networks.
In addition to preprocessing, data augmentation was applied to enhance generalization and alleviate the imbalance of composition categories in the KU_PCP dataset. The augmentation strategies included random rotations within ±10 degrees, horizontal flipping, and slight variations in brightness, contrast, and saturation, as illustrated in Figure 6.
These operations effectively expanded the dataset and introduced variations resembling real-world conditions including handheld camera skew or illumination changes. After data augmentation, the scale of the dataset is approximately three times that of the original, and the distribution of the four composition categories becomes more balanced, as presented in Table 2.

3.3. Determining the Network’s Selection of Composition Categories

This section evaluates the suitability of the Swin-Base for categorizing image compositions, and it utilizes the enhanced KU_PCP dataset to compare well-known CNN and transformer-based models. The study focuses on various composition types: linear, symmetrical, RoT, and central. CNN models including ResNet [27], ResNext [28], MobileNet [29], and EfficientNet [30]. Transformer models including ViT Transformer [31], Swin Transformer, and MobileViT [32]. Each model was fine-tuned with the final classification layer replaced to adapt to the four composition categories.
To ensure robust training and minimize overfitting, we employed a systematic parameter tuning process. The AdamW optimizer was used with an initial learning rate of 1 × 10−4, and a cosine annealing learning rate schedule gradually reduced the learning rate as training progressed. A batch size of 32 was chosen after empirical comparison, providing a good balance between convergence stability and computational efficiency. Early stopping with a patience of 15 epochs was introduced to prevent overfitting, while maintaining sufficient opportunity for convergence. All models were pretrained on ImageNet, and comparative results between CNN and Transformer models are shown in Table 3.
From Table 3, it is evident that Transformer models generally outperform CNN models in terms of cross-validation accuracy, with CNN models averaging 78.96% and Transformer models achieving 80.80%. Notably, the Swin-Base excelled, achieving a cross-validation accuracy of 86.96%. Consequently, Swin-Base proves to be particularly effective for classifying image compositions.

3.4. Parameter Sensitivity Analysis

To address potential concerns regarding the arbitrariness in selecting key parameters λ 1 and λ 2 , as well as to validate the rationality of these two key parameters, a comprehensive sensitivity analysis was conducted on the augmented KU_PCP dataset. This section elaborates on the experimental design, presents the corresponding analysis results, and systematically clarifies the process for determining the optimal values of two core parameters: λ 1 is responsible for balancing composition aesthetics and image similarity, while λ 2 focuses on balancing distance and position in the context of salient-object-based aesthetic assessment.

3.4.1. Sensitivity of λ 1

The parameter λ 1 ∈ [0, 1] regulates the trade-off between composition aesthetics A c and image similarity A s in the IACS comprehensive evaluation. In the experiment, five representative values of λ 1 —0.1, 0.3, 0.5, 0.7, and 0.9—were tested to cover low, medium, and high weights. A total of 100 images were randomly selected from the augmented KU_PCP dataset, with 25 images drawn from each composition category: Linear, Symmetric, RoT, and Center, thereby ensuring consistency with the main experiments. Evaluation metrics comprised objective indicators, including the average values of A c , A s , and the overall IACS score A, as well as subjective validation. For the latter, 10 volunteers, five with photography expertise and five without, assessed the images by providing binary ratings: a score of 1 indicated preference and a score of 0 indicated non-preference, reflecting the perceived balance between aesthetic quality and content.
Table 4 summarizes the results. λ 1 = 0.5 achieved the highest average A (0.80 ± 0.04) and subjective preference rate (82% ± 4%): lower λ 1 values (0.1, 0.3) prioritized similarity ( A s ≥ 0.85) but failed to enhance aesthetics ( A c ≤ 0.63), while higher λ 1 values (0.7, 0.9) maximized A c (≥0.91) but caused severe similarity loss ( A s ≤ 0.58) and visual distortion. Thus, λ 1 = 0.5 is confirmed as optimal for balancing aesthetics and similarity.

3.4.2. Sensitivity of λ 2

The parameter λ 2 ∈ [0, 1] balances two components of A c for RoT and Center compositions, where A d i s denotes the distance from the salient object’s center to the nearest rule intersection and A p o s represents the alignment of the salient object’s central axis with rule lines. To isolate the impact of A d i s and A p o s , five values of λ 2 were tested, specifically 0, 0.25, 0.33 corresponding to one third, 0.5, and 0.67. A total of 100 images with clearly identifiable salient objects were selected from the augmented KU_PCP dataset, consisting of 50 RoT images and 50 Center images. Evaluation metrics comprised objective indicators, including the average values of A c and A, together with subjective validation, in which volunteers assessed aesthetic quality on a five-point scale, with 1 indicating poor and 5 indicating excellent.
Table 5 presents the results. λ 2   = 0.33 (1/3) achieved the highest average A c (0.83 ± 0.07) and subjective aesthetic score (4.2 ± 0.2): λ 2   = 0 (only A d i s considered) led to low A c (0.58 ± 0.09) due to axis misalignment with rule lines, while higher λ 2 values (0.5, 0.67) overemphasized A p o s and neglected A d i s , causing the salient object’s center to deviate from key intersections. Thus, λ 2   = 1/3 is verified as optimal for prioritizing rule alignment while retaining distance constraints.

3.5. Composition Optimization Results

3.5.1. Composition Optimization Results Based on Main Semantic Line

This section presents the composition optimization results based on main semantic line, as shown in Table 6. The first three rows display the optimization results for linear compositions, and the following three rows for symmetric compositions. The first column shows the original images, and the second column shows the images after semantic segmentation. The third column outputs from the composition category determination network, which lists, from top to bottom: Linear, Symmetric, RoT, and Center. The fourth column presents the optimized segmentation results, and the fifth column displays the images optimized by the generative adversarial network.
Table 7 details the changes in IACS evaluation during the image optimization process as shown in Table 6. The first column displays the position of the main semantic line without any composition optimization P s , which is the image’s initial state. The second column shows the optimal position of the main semantic line P c , as determined by maximizing the compositional aesthetic evaluation function A c ( I ) , which adjusts the main semantic line to the rule-of-thirds or the center line. The third column presents the final optimized position of the main semantic line P o , taking into account both compositional aesthetics and image similarity.
Regarding the evaluation metrics, A c represents the compositional aesthetic evaluation, measuring the aesthetic quality of the image composition; A s represents the similarity evaluation, assessing how similar the optimized image is to the original image; and A as a comprehensive evaluation, is a weighted average of A c and A s , providing a comprehensive score that considers both compositional aesthetics and similarity.
According to Table 6, the optimization process brings the main semantic lines closer to the rule-of-thirds or center line, resulting in a more harmonious overall composition. This optimization utilizes a semantic segmentation network that can precisely identify and divide different semantic regions within the image, and the network generates corresponding semantic segmentation result images. The composition category determination network outputs the category of the image’s composition, providing essential references for subsequent compositional optimization. The generative adversarial network outputs the optimized result images. The optimized segmentation results were further processed using DeepSIM to generate the final optimized images. To illustrate the necessity of this step, we conducted a comparative analysis of the images before and after GAN processing. As presented in Table 6, the “Optimized segmentation result” in the fourth column has already improved the compositional arrangement, while the “Optimized result” generated by the GAN in the fifth column further enhances visual realism by smoothing boundaries and preserving fine texture details. This indicates that the GAN not only retains the optimized composition but also enhances perceptual quality, resulting in outputs that are more natural and aesthetically consistent.
According to Table 7, it can be observed that the initial aesthetic evaluation of the composition A c under the initial state P s does not reach a high level, indicating that there is room for improvement in the image composition. For example, the A c of the image in the first row in the   P s state is only 0.38, but after optimization, it rises to 0.85.
In the second column, when the main semantic line is moved to the optimal composition position P c , the aesthetic evaluation A c is enhanced. For example, the image in the first row reaches an A c of 1 at the P s position because the main semantic line is precisely moved to the optimal location. However, such adjustments might reduce the similarity to the original content, causing the similarity evaluation A s to drop, as seen in the first row where the A s decreases to 0.39.
In the third column, when the main semantic line is moved to its final optimized position P o after adjustments, the comprehensive evaluation A is higher than at the original position P s and optimal composition position P c . This indicates that the compositional optimization indeed helps enhance the overall aesthetic quality of the image. For example, the comprehensive evaluation A of the image in the first row at the P o position is 0.80, exceeding the P c position’s 0.69 and the P s position’s 0.70, demonstrating the positive effects of compositional optimization.

3.5.2. Composition Optimization Results Based on Salient Object

This section presents the composition optimization results based on salient object, as shown in Table 8. The first three rows display the optimization results for linear compositions, and the following three rows for symmetric compositions. The first column shows the original images, and the second column shows the images after semantic segmentation. The third column outputs from the composition category determination network, which lists, from top to bottom: Linear, Symmetric, RoT, and Center. The fourth column presents the optimized segmentation results, and the fifth column displays the images optimized by the generative adversarial network.
Table 9 details the changes in IACS evaluation during the image optimization process as shown in Table 8. The first column displays the position of the salient object without any composition optimization P s , which is the image’s initial state. The second column shows the optimal position of the salient object P c , as determined by maximizing the compositional aesthetic evaluation function A c ( I ) , which adjusts the salient object to the rule-of-thirds or the center line. The third column presents the final optimized position of the salient object P o , taking into account both compositional aesthetics and image similarity.
Regarding the evaluation metrics, A c represents the compositional aesthetic evaluation, measuring the aesthetic quality of the image composition; A s represents the similarity evaluation, assessing how similar the optimized image is to the original image; and A as a comprehensive evaluation, is a weighted average of A c and A s , providing a comprehensive score that considers both compositional aesthetics and similarity.
According to Table 8, the optimization process brings the salient object closer to the rule-of-thirds or center line, resulting in a more harmonious overall composition. This optimization utilizes a semantic segmentation network that can precisely identify and divide different semantic regions within the image, and the network generates corresponding semantic segmentation result images. The composition category determination network outputs the category of the image’s composition, providing essential references for subsequent compositional optimization. The generative adversarial network outputs the optimized result images.
According to Table 9, it can be observed that the initial aesthetic evaluation of the composition A c under the initial state P s does not reach a high level, indicating that there is room for improvement in the image composition. For example, the A c of the image in the first row in the   P s state is only 0.40, but after optimization, it rises to 0.87.
In the second column, when the salient object is moved to the optimal composition position P c , the aesthetic evaluation A c is enhanced. For example, the image in the first row reaches an A c of 1 at the P s position because the salient object is precisely moved to the optimal location. However, such adjustments might reduce the similarity to the original content, causing the similarity evaluation A s to drop, as seen in the first row where the A s decreases to 0.48.
In the third column, when the salient object is moved to its final optimized position P o after adjustments, the comprehensive evaluation A is higher than at the original position P s and optimal composition position P c . This indicates that the compositional optimization indeed helps enhance the overall aesthetic quality of the image. For example, the comprehensive evaluation A of the image in the first row at the P o position is 0.80, exceeding the P c position’s 0.70 and the P s position’s 0.74, demonstrating the positive effects of compositional optimization.
This leads to a crucial insight: even when the main semantic line or salient object is repositioned to an optimal location within the composition, its aesthetic evaluation still falls short compared to when a balance between composition and similarity is achieved. This finding emphasizes the importance of striking a balance between aesthetics and similarity during the optimization of compositions to attain the best visual outcomes.

3.6. Comparative Experiments

To validate the effectiveness of the proposed method, we compared it with several previously released approaches for which source code or executables are publicly available: SVM [33], ACS [34], CAGIC [35], CGS [36], and GAIC-E [37], and the comparative results are shown in Figure 7.
Figure 7 displays optimized results of all methods for the four test images, with the original images as the baseline. SVM enhances composition via simple rule-based stretching but fails to protect texture integrity. For example, in the third image of Figure 7, SVM introduces distortion in texture-rich regions and requires manual interaction to reposition salient objects, limiting practicality.
Cropping-based methods ACS, CAGIC, CGS and GAIC-E adjust composition by cropping, yet they share common limitations. First, the loss of background context information occurs. For instance, in the first image of Figure 7, the details of distant scenes are removed. Second, element ratio imbalance arises. Taking the second image of Figure 7 as an example, the horizontal range is narrowed, which impairs visual harmony.
FIACSO (Ours) utilizes Seam Carving and GAN-based refinement (DeepSIM). Seam Carving avoids high-energy content regions, and GAN-based refinement preserves fine details, allowing FIACSO to retain complete salient objects and contextual details while aligning with composition rules. For instance, in the second image of Figure 7, FIACSO optimizes the linear semantic line to the rule of thirds without distorting the landscape’s horizontal balance. In the fourth image of Figure 7, it preserves background context while enhancing the salient object’s positioning.

Subjective Evaluation Across Multiple Optimization Methods

To provide a rigorous comparison between FIACSO and established optimization techniques, we conducted a subjective evaluation experiment involving six methods: SVM, ACS, CAGIC, CGS, GAIC-E, and FIACSO. A total of 100 original images were randomly sampled from the KU_PCP dataset, with 25 images selected from each of the four composition categories, namely Linear, Symmetric, RoT, and Center. Each image was optimized by the six methods, producing 600 processed images in total.
Thirty participants took part in the evaluation, including 15 individuals with professional backgrounds in photography or design and 15 individuals without such training. All participants were asked to assess each image along three dimensions of aesthetic evaluation: overall aesthetic appeal, compositional harmony, and content integrity. Ratings were collected on a five-point Likert scale, where a score of one denoted very poor quality and a score of five denoted excellent quality.
Before conducting the main analysis, normality tests were applied to the three variables. Both the Shapiro–Wilk and Kolmogorov–Smirnov tests indicated significant departures from a normal distribution, with all p values less than 0.001, as shown in Table 10. This finding confirmed the need for non-parametric analysis. The descriptive statistics also revealed that the overall mean scores across methods were around 3.25, suggesting a moderate baseline aesthetic evaluation when no distinction among methods was made.
The Kruskal–Wallis test was applied to evaluate whether significant differences existed among the six methods. Table 11 provides the detailed results. All three evaluation dimensions showed statistically significant group differences with chi-square statistics above 5700 and p values less than 0.001. Importantly, the median values indicate that FIACSO achieved higher subjective ratings than the other methods. For overall aesthetic appeal, methods 1 to 5 all yielded a median score of 3, whereas FIACSO reached a median of 5. For compositional harmony and content integrity, methods 1 to 5 also remained at a median of 3, while FIACSO obtained a median of 4. Moreover, the within-group dispersion of FIACSO was consistently lower, with a standard deviation of 0.50 compared to approximately 0.81 for the competing approaches, reflecting more stable ratings across participants.
To further explore where the differences lay, post hoc pairwise comparisons were conducted using the Mann–Whitney U test. Table 12 reports the results for comparisons between FIACSO and each of the other five methods. In all cases, FIACSO obtained significantly higher scores with p values less than 0.001. The effect sizes were larger than 2.0 according to Cohen’s d , representing very large differences in practical terms. By contrast, comparisons among the five conventional methods did not reveal consistent or substantial differences, and their effect sizes were negligible.
This multi-method evaluation provides robust evidence that FIACSO consistently outperforms existing optimization approaches in terms of subjective image aesthetics. The combination of higher median scores, reduced rating variability, and very large effect sizes across all three dimensions demonstrates that FIACSO not only achieves statistical significance but also delivers practically meaningful improvements. The findings highlight its effectiveness in generating visually appealing images while maintaining compositional balance and content integrity, thereby aligning computational optimization with human perceptual judgment.

4. Limitations

While the proposed framework demonstrates effectiveness in aesthetics-guided image composition optimization, it still has several limitations. First, the quantitative metrics used in this work, including IACS and SSIM, are designed to approximate human aesthetic judgment but cannot fully reflect subjective preferences influenced by personal and cultural differences. Second, the current method is primarily validated on landscape images with clear semantic structures, and its generalizability to other types of imagery has not yet been verified. Third, when handling images with severe tilting, including those with angles beyond 30 degrees, although the content-aware rotation can properly straighten dominant semantic lines without distorting local textures, the subsequent GAN-based generation process results in texture loss. This issue is particularly noticeable in natural elements like dunes, where fine-grained texture details, including subtle undulations and grainy patterns on dune surfaces, are partially or completely lost, resulting in a smoother, less realistic visual effect, as illustrated in Figure 8.

5. Conclusions

In this paper, a method called IACS is proposed, which integrates image composition and similarity. This method uses a unified function to balance the aesthetics of composition and image similarity, and adjusts their influence through parameters. In evaluating the composition aesthetics, it considers the distances between the main semantic line or salient object and the nearest rule-of-thirds line or central line. For images featuring prominent semantic lines, a modified Hough transform is utilized to detect the main semantic line. Similarly, for images containing salient objects, a salient object detection method based on LCSF is applied to determine the salient object region. In evaluating similarity to the original image, edge similarity (measured by the Canny operator) is combined with the SSIM for calculation. Furthermore, a framework of image aesthetic evaluation with composition and similarity-based optimization (FIACSO) is proposed. After categorizing the composition of an image, the framework uses a semantic segmentation network to segment the image (and thus obtain composition information) and applies the IACS method for optimization. Ultimately, it uses a GAN to generate an optimized image that adheres to compositional rules and closely resembles the original image.
Experimental and comparative results show that FIACSO enhances image composition, minimizes visual distortion, and preserves the original content to a great extent---thus optimizing image processing and elevating the image’s aesthetic value. It exhibits high accuracy and generalization capabilities in image composition optimization. Furthermore, subjective evaluations involving human participants demonstrate that FIACSO significantly outperforms existing methods in terms of overall aesthetics, compositional harmony, and content integrity, validating its effectiveness from a human-centric perspective.
Beyond empirical performance, the study makes a theoretical contribution by grounding the optimization process in Gestalt visual psychology. By enhancing figure–ground separation, maintaining structural continuity, and aligning with the principles of balance and proximity, the framework generates results that align with human perceptual mechanisms. This alignment between computational modeling and psychological theory highlights the robustness and interpretability of the method.
Equally important is the practical value of the framework. In professional contexts, it provides a reliable tool for photographers, designers, and visual curators to refine large-scale image collections with high efficiency and consistency. In digital platforms and online environments, it has the potential to be deployed in content creation, sharing, and recommendation systems, where automated aesthetic assessment and real-time optimization can improve both user experience and visual communication quality.
Nevertheless, the current focus on conventional compositional rules limits its scope. Future research will broaden the optimization strategy by incorporating multimodal visual features—including color harmony, texture, and lighting—as well as semantic dimensions such as object categories and scene context. Additionally, the proposed framework’s quantitative metrics cannot fully capture subjective aesthetic preferences shaped by personal and cultural factors; moreover, it may lose fine-grained texture details when optimizing severely tilted images. These are limitations to be addressed in subsequent work.

Author Contributions

Conceptualization, X.C., G.T., G.W., L.M. and S.Z.; methodology, X.C., G.T., G.W., L.M. and S.Z.; software, X.C.; validation, X.C.; formal analysis, X.C., S.Z. and L.M.; investigation, X.C., G.T., G.W., L.M. and S.Z.; data curation, X.C.; writing—original draft, X.C.; resources, G.T. and L.M.; writing—review and editing, G.T.; funding acquisition, L.M. and S.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the Key Research and Development Program of Zhejiang Province (Grant number: 2021C02005).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The code for our proposed framework FIACSO and dataset used in the experiments can be found on GitHub: https://github.com/zafucslab/FIACSO (accessed on 17 June 2024).

Acknowledgments

Thanks to the Natural Science Foundation of Fujian Province (Grant numbers: 2023J011807, 2023J05309) for its support of this research.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Liu, Y.-J.; Luo, X.; Xuan, Y.-M.; Chen, W.-F.; Fu, X.-L. Image retargeting quality assessment. Comput. Graph. Forum 2011, 30, 583–592. [Google Scholar] [CrossRef]
  2. Fan, S. Image Aesthetics and Visual Communication Optimization Based on Deep Convolutional Neural Network. Procedia Comput. Sci. 2025, 262, 236–243. [Google Scholar] [CrossRef]
  3. Uchida, T.; Kanamori, Y.; Endo, Y. 3D View Optimization for Improving Image Aesthetics. In Proceedings of the ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, India, 6–11 April 2025; IEEE: New York, NY, USA, 2025; pp. 1–5. [Google Scholar]
  4. Naderi, M.R.; Givkashi, M.H.; Karimi, N.; Shirani, S.; Samavi, S. Aesthetic-aware image retargeting based on foreground–background separation and PSO optimization. Multimed. Tools Appl. 2024, 83, 34867–34886. [Google Scholar] [CrossRef]
  5. Patnaik, S.; Jain, R.; Krishnamurthy, B.; Sarkar, M. AesthetiQ: Enhancing Graphic Layout Design via Aesthetic-Aware Preference Alignment of Multi-modal Large Language Models. In Proceedings of the Computer Vision and Pattern Recognition Conference, Nashville, TN, USA, 10–17 June 2025; pp. 23701–23711. [Google Scholar]
  6. Alsmirat, M.; Kharsa, R.; Alzoubi, R. Supervised Deep Learning for Ideal Identification of Image Retargeting Techniques. IEEE Access 2024, 12, 190821–190837. [Google Scholar] [CrossRef]
  7. Hong, J.; Yuan, L.; Gharbi, M.; Fisher, M.; Fatahalian, K. Learning subject-aware cropping by outpainting professional photos. Proc. AAAI Conf. Artif. Intell. 2024, 38, 2175–2183. [Google Scholar] [CrossRef]
  8. Shen, F.; Li, C.; Geng, Y.; Deng, Y.; Chen, H. Prune and Repaint: Content-Aware Image Retargeting for any Ratio. arXiv 2024, arXiv:2410.22865. [Google Scholar] [CrossRef]
  9. Sheng, N.; Yang, S.; Liu, H.; Wang, K.; Ke, Y.; Qin, F. Optimizing photographic composition with deep reinforcement learning. Neurocomputing 2025, 640, 130363. [Google Scholar] [CrossRef]
  10. Hussain, I.; Tan, S.; Huang, J. A knowledge distillation based deep learning framework for cropped images detection in spatial domain. Signal Process. Image Commun. 2024, 124, 117117. [Google Scholar] [CrossRef]
  11. Wang, H. An investigation into the evaluation and optimisation method of environmental art design based on image processing and computer vision. Scalable Comput. Pract. Exp. 2025, 26, 277–286. [Google Scholar] [CrossRef]
  12. Gao, B.; Qian, H.; Jiang, L. Visual communication design and color balance algorithm for multimedia image analysis. Comput. Inform. 2024, 43, 900–925. [Google Scholar] [CrossRef]
  13. Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
  14. Vinker, Y.; Horwitz, E.; Zabari, N.; Hoshen, Y. Deep single image manipulation. In Proceedings of the ICLR 2021 Conference, Vienna, Austria, 3–7 May 2020. [Google Scholar]
  15. Bhattacharya, S.; Sukthankar, R.; Shah, M. A holistic approach to aesthetic enhancement of photographs. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 2011, 7, 1–21. [Google Scholar] [CrossRef]
  16. Leyvand, T.; Cohen-Or, D.; Dror, G.; Lischinski, D. Data-driven enhancement of facial attractiveness. ACM Trans. Graph. (TOG) 2008, 27, 1–9. [Google Scholar] [CrossRef]
  17. Loshchilov, I.; Hutter, F. Decoupled weight decay regularization. arXiv 2017, arXiv:1711.05101. [Google Scholar]
  18. Duda, R.O.; Hart, P.E. Use of the Hough transformation to detect lines and curves in pictures. Commun. ACM 1972, 15, 11–15. [Google Scholar] [CrossRef]
  19. Canny, J. A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 1986, 8, 679–698. [Google Scholar] [CrossRef]
  20. Rother, C.; Kolmogorov, V.; Blake, A. “GrabCut” interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. (TOG) 2004, 23, 309–314. [Google Scholar] [CrossRef]
  21. Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
  22. Yu, J.; Liang, D.-Q.; Bi, Q.; Bu, Y. Image quality assessment based on structural orientation information. J. Comput. Appl. 2010, 30, 1622. [Google Scholar] [CrossRef]
  23. He, K.; Chang, H.; Sun, J. Content-aware rotation. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 553–560. [Google Scholar]
  24. Avidan, S.; Shamir, A. Seam carving for content-aware image resizing. Semin. Graph. Pap. Push. Boundaries 2023, 2, 609–617. [Google Scholar]
  25. Lee, J.-T.; Kim, H.-U.; Lee, C.; Kim, C.-S. Photographic composition classification and dominant geometric element detection for outdoor scenes. J. Vis. Commun. Image Represent. 2018, 55, 91–105. [Google Scholar] [CrossRef]
  26. Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, USA, 20–25 June 2009; IEEE: New York, NY, USA, 2009; pp. 248–255. [Google Scholar]
  27. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  28. Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1492–1500. [Google Scholar]
  29. Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
  30. Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning (ICML), PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
  31. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16×16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
  32. Mehta, S.; Rastegari, M. Mobilevit: Light-weight, general-purpose, and mobile-friendly vision transformer. arXiv 2021, arXiv:2110.02178. [Google Scholar]
  33. Luo, P. Social image aesthetic classification and optimization algorithm in machine learning. Neural Comput. Appl. 2023, 35, 4283–4293. [Google Scholar] [CrossRef]
  34. Celona, L.; Ciocca, G.; Napoletano, P. A grid anchor based cropping approach exploiting image aesthetics, geometric composition, and semantics. Expert Syst. Appl. 2021, 186, 115852. [Google Scholar] [CrossRef]
  35. Horanyi, N.; Xia, K.; Yi, K.M.; Bojja, A.K.; Leonardis, A.; Chang, H.J. Repurposing existing deep networks for caption and aesthetic-guided image cropping. Pattern Recognit. 2022, 126, 108485. [Google Scholar] [CrossRef]
  36. Li, D.; Zhang, J.; Huang, K.; Yang, M.-H. Composing good shots by exploiting mutual relations. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; IEEE: New York, NY, USA, 2020; pp. 1–10. [Google Scholar]
  37. Zeng, H.; Li, L.; Cao, Z.; Zhang, L. Grid anchor based image cropping: A new benchmark and an efficient model. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 2020, 44, 1304–1319. [Google Scholar] [CrossRef]
Figure 1. Example of images with semantic lines.
Figure 1. Example of images with semantic lines.
Sensors 25 05919 g001
Figure 2. Example of images with salient objects.
Figure 2. Example of images with salient objects.
Sensors 25 05919 g002
Figure 3. Framework of image aesthetic evaluation with composition and similarity-based optimization (FIACSO) (arrows indicate the flow of the optimization process).
Figure 3. Framework of image aesthetic evaluation with composition and similarity-based optimization (FIACSO) (arrows indicate the flow of the optimization process).
Sensors 25 05919 g003
Figure 4. Four composition categories: (a) Linear, (b) Symmetric, (c) RoT, (d) Center.
Figure 4. Four composition categories: (a) Linear, (b) Symmetric, (c) RoT, (d) Center.
Sensors 25 05919 g004
Figure 5. Flowchart for determining compositional categories.
Figure 5. Flowchart for determining compositional categories.
Sensors 25 05919 g005
Figure 6. Data augmentation scheme.
Figure 6. Data augmentation scheme.
Sensors 25 05919 g006
Figure 7. Comparison of experimental results of FIACSO with SVM, CAGIC, ACS, CAGIC, CGS and GAIC-E (red boxes highlight the failure areas).
Figure 7. Comparison of experimental results of FIACSO with SVM, CAGIC, ACS, CAGIC, CGS and GAIC-E (red boxes highlight the failure areas).
Sensors 25 05919 g007
Figure 8. Failure case in severe tilting conditions (red boxes highlight the failure areas).
Figure 8. Failure case in severe tilting conditions (red boxes highlight the failure areas).
Sensors 25 05919 g008
Table 1. Experimental hardware and software configuration.
Table 1. Experimental hardware and software configuration.
ItemDetail
CPU13th Gen Intel® CoreTM i7-13700KF @3.40 GHz
RAM32 GB
Operating systemWindows 11 64-bit
CUDA11.8
Python3.8
PyTorch1.10.0
PyCharm2025.1
Table 2. The number of images before and after data augmentation.
Table 2. The number of images before and after data augmentation.
Composition CategoryBefore AugmentationAfter Augmentation
Linear7582274
Symmetric232696
RoT9232769
Center7502250
Table 3. Comparative results between CNN and Transformer.
Table 3. Comparative results between CNN and Transformer.
Model TypeModel NameImage SizeImage SizeMean F1-ScoreTop-1 Acc (%)
CNNResNet50224 × 22425.560.73571.89
ResNext50224 × 22425.030.68475.42
MobileNetV2224 × 2243.50.81265.87
MobileNetV3224 × 2245.480.71071.35
ConvNeXt224 × 22487.570.81183.58
TransformerViT-B/16224 × 224103.190.73174.64
ViT-L/16224 × 224326.740.77278.42
Swin-Base224 × 22487.760.86786.96
Swin-Large224 × 224228.570.81484.90
MobileViT-XXS224 × 2241.270.81280.14
MobileViT-XXS224 × 2242.320.79179.23
MobileViT-S224 × 2245.580.80181.35
Note: Best scores are bolded.
Table 4. Sensitivity Analysis Results for λ1.
Table 4. Sensitivity Analysis Results for λ1.
λ1 Average   A c (±SD) Average   A s (±SD) Average   A (±SD)Subjective Preference Rate (%)
0.10.41 ± 0.080.92 ± 0.050.67 ± 0.0628 ± 5
0.30.63 ± 0.070.85 ± 0.060.71 ± 0.0555 ± 7
0.50.82 ± 0.060.74 ± 0.070.80 ± 0.0482 ± 4
0.70.91 ± 0.050.58 ± 0.080.77 ± 0.0563 ± 6
0.90.96 ± 0.040.40 ± 0.090.68 ± 0.0735 ± 5
Note: Best scores are bolded.
Table 5. Sensitivity Analysis Results for λ2.
Table 5. Sensitivity Analysis Results for λ2.
λ2 Average   A c (±SD) Average   A (±SD)Average Subjective Aesthetic Score (±SD)
00.58 ± 0.090.65 ± 0.072.8 ± 0.4
0.250.69 ± 0.080.73 ± 0.063.5 ± 0.3
0.330.83 ± 0.070.81 ± 0.054.2 ± 0.2
0.50.78 ± 0.080.77 ± 0.063.8 ± 0.3
0.670.65 ± 0.090.70 ± 0.073.2 ± 0.4
Note: Best scores are bolded.
Table 6. The composition optimization results based on main semantic lines.
Table 6. The composition optimization results based on main semantic lines.
Original ImageSemantic Segmentation ResultClassification ResultOptimized Segmentation ResultOptimized Result
Sensors 25 05919 i001Sensors 25 05919 i002Sensors 25 05919 i003Sensors 25 05919 i004Sensors 25 05919 i005
Sensors 25 05919 i006Sensors 25 05919 i007Sensors 25 05919 i008Sensors 25 05919 i009Sensors 25 05919 i010
Sensors 25 05919 i011Sensors 25 05919 i012Sensors 25 05919 i013Sensors 25 05919 i014Sensors 25 05919 i015
Sensors 25 05919 i016Sensors 25 05919 i017Sensors 25 05919 i018Sensors 25 05919 i019Sensors 25 05919 i020
Sensors 25 05919 i021Sensors 25 05919 i022Sensors 25 05919 i023Sensors 25 05919 i024Sensors 25 05919 i025
Sensors 25 05919 i026Sensors 25 05919 i027Sensors 25 05919 i028Sensors 25 05919 i029Sensors 25 05919 i030
Note: The “Classification Result” column lists the four composition classes—Linear, Symmetric, RoT, Center—in top-to-bottom order. A “1” denotes the model’s predicted composition class for the input image, and “0” denotes classes that do not match the image’s composition.
Table 7. The IACS evaluation of the image optimization process with main semantic line.
Table 7. The IACS evaluation of the image optimization process with main semantic line.
P s P c P o
A c A s A A c A s A A c A s A
0.3810.6910.390.700.850.740.80
0.4010.7210.480.740.870.750.81
0.3710.6910.470.740.880.740.81
0.5210.7610.550.790.900.770.84
0.5410.7710.570.780.870.790.83
0.5510.7810.580.790.890.780.84
Note: Best scores are bolded.
Table 8. The composition optimization results based on salient object.
Table 8. The composition optimization results based on salient object.
Original ImageSegmentation ResultClassification
Result
Optimized
Segmentation Result
Optimized Result
Sensors 25 05919 i031Sensors 25 05919 i032Sensors 25 05919 i033Sensors 25 05919 i034Sensors 25 05919 i035
Sensors 25 05919 i036Sensors 25 05919 i037Sensors 25 05919 i038Sensors 25 05919 i039Sensors 25 05919 i040
Sensors 25 05919 i041Sensors 25 05919 i042Sensors 25 05919 i043Sensors 25 05919 i044Sensors 25 05919 i045
Sensors 25 05919 i046Sensors 25 05919 i047Sensors 25 05919 i048Sensors 25 05919 i049Sensors 25 05919 i050
Sensors 25 05919 i051Sensors 25 05919 i052Sensors 25 05919 i053Sensors 25 05919 i054Sensors 25 05919 i055
Sensors 25 05919 i056Sensors 25 05919 i057Sensors 25 05919 i058Sensors 25 05919 i059Sensors 25 05919 i060
Note: The “Classification Result” column lists the four composition classes—Linear, Symmetric, RoT, Center—in top-to-bottom order. A “1” denotes the model’s predicted composition class for the input image, and “0” denotes classes that do not match the image’s composition.
Table 9. The IACS evaluation of the image optimization process with salient object.
Table 9. The IACS evaluation of the image optimization process with salient object.
P s P c P o
A c A s A A c A s A A c A s A
0.4010.7010.480.740.870.720.80
0.4210.7110.520.760.860.760.82
0.4910.7510.510.760.870.740.81
0.5010.7510.540.770.880.790.84
0.4810.7410.570.790.870.780.83
0.4610.7310.550.780.890.770.83
Note: Best scores are bolded.
Table 10. Normality test results for subjective evaluation data.
Table 10. Normality test results for subjective evaluation data.
VariableNMeanSDSkewnessKurtosis Shapiro Wilk   p Kolmogorov Smirnov   p
Overall aesthetic18,0003.2560.9560.055–1.0980.000 ***0.000 ***
Compositional harmony18,0003.2460.9520.068–1.0890.000 ***0.000 ***
Content integrity18,0003.2440.9510.065–1.0940.000 ***0.000 ***
Note: *** denotes significance levels of 1%.
Table 11. Kruskal–Wallis test results for subjective evaluation across the optimization methods.
Table 11. Kruskal–Wallis test results for subjective evaluation across the optimization methods.
Analysis ItemGroupSample SizeMedianSD χ 2 p Cohen’s f
Overall aesthetic1300030.825754.5240.000 ***0.011
2300030.816
3300030.815
4300030.816
5300030.82
6300050.5
Total18,00030.956
Compositional harmony1300030.815744.2160.000 ***0.011
2300030.82
3300030.813
4300030.821
5300030.805
6300040.5
Total18,00030.952
Content integrity1300030.8145705.6690.000 ***0.011
2300030.819
3300030.817
4300030.806
5300030.817
6300040.5
Total18,00030.951
Note: *** denotes significance levels of 1%.
Table 12. Pairwise Mann–Whitney U test results between the optimization methods.
Table 12. Pairwise Mann–Whitney U test results between the optimization methods.
ComparisonMedianU Statistic p Cohen’s d
Group AGroup BGroup AGroup B
Overall aesthetic_SVMOverall aesthetic_ACS334,319,498.50.009 ***0.074
Overall aesthetic_SVMOverall aesthetic_CAGIC334,376,036.50.1000.051
Overall aesthetic_SVMOverall aesthetic_CGS334,360,197.50.054 *0.057
Overall aesthetic_SVMOverall aesthetic_GAIC-E334,444,223.50.7560.023
Overall aesthetic_SVMOverall aesthetic_FIACSO35711,228.50.000 ***2.265
Overall aesthetic_ACSOverall aesthetic_CAGIC334,557,187.50.7320.023
Overall aesthetic_ACSOverall aesthetic_CGS334,541,099.51.0320.017
Overall aesthetic_ACSOverall aesthetic_GAIC-E334,624,577.50.098 *0.051
Overall aesthetic_ACSOverall aesthetic_FIACSO35771,539.50.000 ***2.183
Overall aesthetic_CAGICOverall aesthetic_CGS334,483,9331.5990.007
Overall aesthetic_CAGICOverall aesthetic_GAIC-E334,567,8850.5660.028
Overall aesthetic_CAGICOverall aesthetic_FIACSO35747,2680.000 ***2.213
Overall aesthetic_CGSOverall aesthetic_GAIC-E334,583,7880.3710.034
Overall aesthetic_CGSOverall aesthetic_FIACSO35754,6230.000 ***2.204
Overall aesthetic_GAIC-EOverall aesthetic_FIACSO35731,0870.000 ***2.237
Compositional harmony_SVMCompositional harmony_ACS334,449,433.50.8480.021
Compositional harmony_SVMCompositional harmony_CAGIC334,411,4940.3230.036
Compositional harmony_SVMCompositional harmony_CGS334,448,4420.8300.021
Compositional harmony_SVMCompositional harmony_GAIC-E334,417,972.50.3890.033
Compositional harmony_SVMCompositional harmony_FIACSO34716,578.50.000 ***2.258
Compositional harmony_ACSCompositional harmony_CAGIC334,463,029.51.1180.015
Compositional harmony_ACSCompositional harmony_CGS334,499,008.51.9750
Compositional harmony_ACSCompositional harmony_GAIC-E334,469,8651.2680.012
Compositional harmony_ACSCompositional harmony_FIACSO34755,0070.000 ***2.212
Compositional harmony_CAGICCompositional harmony_CGS334,535,9601.1390.015
Compositional harmony_CAGICCompositional harmony_GAIC-E334,507,227.51.8180.003
Compositional harmony_CAGICCompositional harmony_FIACSO34754,253.50.000 ***2.209
Compositional harmony_CGSCompositional harmony_GAIC-E334,470,882.51.2910.012
Compositional harmony_CGSCompositional harmony_FIACSO34755,760.50.000 ***2.211
Compositional harmony_GAIC-ECompositional harmony_FIACSO34738,4300.000 ***2.227
Content integrity_SVMContent integrity_ACS334,592,944.50.2830.038
Content integrity_SVMContent integrity_CAGIC334,536,9481.1180.015
Content integrity_SVMContent integrity_CGS334,657,947.50.025 **0.065
Content integrity_SVMContent integrity_GAIC-E334,550,940.50.8410.021
Content integrity_SVMContent integrity_FIACSO34783,3600.000 ***2.177
Content integrity_ACSContent integrity_CAGIC334,444,0560.7530.023
Content integrity_ACSContent integrity_CGS334,563,1760.6360.026
Content integrity_ACSContent integrity_GAIC-E334,458,0471.0140.017
Content integrity_ACSContent integrity_FIACSO34758,0160.000 ***2.212
Content integrity_CAGICContent integrity_CGS334,620,1840.1150.049
Content integrity_CAGICContent integrity_GAIC-E334,513,9841.6500.006
Content integrity_CAGICContent integrity_FIACSO34774,1440.000 ***2.19
Content integrity_CGSContent integrity_GAIC-E334,394,0810.1880.044
Content integrity_CGSContent integrity_FIACSO34710,4000.000 ***2.269
Content integrity_GAIC-EContent integrity_FIACSO34770,3040.000 ***2.195
Note: ***, **, * denote significance levels of 1%, 5%, and 10%, respectively.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cui, X.; Tu, G.; Wang, G.; Zhang, S.; Mo, L. Evaluation and Improvement of Image Aesthetics Quality via Composition and Similarity. Sensors 2025, 25, 5919. https://doi.org/10.3390/s25185919

AMA Style

Cui X, Tu G, Wang G, Zhang S, Mo L. Evaluation and Improvement of Image Aesthetics Quality via Composition and Similarity. Sensors. 2025; 25(18):5919. https://doi.org/10.3390/s25185919

Chicago/Turabian Style

Cui, Xinyu, Guoqing Tu, Guoying Wang, Senjun Zhang, and Lufeng Mo. 2025. "Evaluation and Improvement of Image Aesthetics Quality via Composition and Similarity" Sensors 25, no. 18: 5919. https://doi.org/10.3390/s25185919

APA Style

Cui, X., Tu, G., Wang, G., Zhang, S., & Mo, L. (2025). Evaluation and Improvement of Image Aesthetics Quality via Composition and Similarity. Sensors, 25(18), 5919. https://doi.org/10.3390/s25185919

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop