1. Introduction
Computer vision is one of the most attractive research areas in artificial intelligence (AI), encompassing tasks such as object tracking, recognition, classification, and scene understanding [
1,
2]. Modern computer vision systems typically consist of complex neural network models that need to process large amounts of high-dimensional data and achieve robust performance across diverse scenarios. This requires careful consideration of the mathematical foundations and algorithm reliability in system design to ensure that the model remains stable and performs accurately in dynamic and complex visual environments [
3]. Current research in this field is generally divided into two areas, namely improvements to mathematical algorithms for fundamental models and application of mathematical methods to solve specific problems [
4,
5].
Despite extensive research in the field of computer vision, many fundamental problems rooted in mathematical theory remain unsolved, including graph-based computation and geometric deep learning; understanding the landscape of high-dimensional non-convex optimization; the theoretical limits of generalization in deep models; invariance and disentanglement principles in representation learning; suitable loss functions for learning; efficient sample selection strategies; and the mathematical feature extraction of images or objects, all of which constitute a series of key challenges [
6,
7].
To improve mathematical algorithms for foundational models, Gu et al. [
4] proposed the Mamba architecture, based on a selective state-space model, as an alternative to existing architectures such as transformers. This approach aims to address the efficiency and performance limitations of existing models in temporal modeling. However, this field still faces challenges such as insufficient computational accuracy and poor robustness, mismatches in feature representation across different modalities, limitations in the temporal modeling capabilities of existing models, and challenges in the current theoretical framework. In the field of mathematical methods for solving specific problems, Wang et al. [
5] systematically discussed how to apply AI technologies, including foundational models, to accelerate and revolutionize scientific discovery. However, as pointed out in their paper, this field still faces challenges such as difficulties in obtaining data for specific scientific domains, the authenticity of simulated data, the low efficiency and accuracy of traditional methods, and the insufficient training and generalization capabilities of AI models for specific scientific tasks.
In response to the above problems in improving the mathematical algorithms of fundamental models and solving specific problems with mathematical methods, this Special Issue brings together eight original research papers, showcasing the latest progress in the application of mathematics in solving computer vision problems. The topics covered in improving mathematical algorithms for fundamental models include robust geometric computation methods for Boolean operations on triangulated surfaces; a semi-supervised few-shot learning model based on pseudo-label guided contrastive learning and local factor clustering; a temporal action localization framework combining bidirectional Mamba models; and a systematic review and analysis of deep learning loss functions. The topics covered in terms of applying mathematical methods to solve specific problems include Dongba scripture segmentation and detection using a multi-scale hybrid attention network; weakly supervised specular reflection removal using only specular images; neural network predictions of shale gas well productivity integrated with optimization algorithms; and an adaptive trajectory association method combining the longest common subsequence and classification theory.
2. Contributions
The following is a detailed summary of the eight original papers, which are divided into two aspects, namely improved mathematics in fundamental models and mathematical methods for specific AI problems.
2.1. Improved Mathematics in Fundamental Models
The following provides an overview of mathematical methods in fundamental AI models:
Zhou et al. (Contribution 1) proposed a simple and robust Boolean operation method specifically for triangulated surfaces. This study focused on eliminating errors caused by floating-point operations. The method relies solely on entity indexing operations when combining the final results, eliminating the need for coordinate calculations, thus ensuring robustness. Specifically, the method consists of two main phases: the first phase employs an octree data structure and a parallel algorithm to efficiently compute the intersection lines of all intersecting triangle pairs; the second phase forms intersection rings, creates subsurfaces, and combines subblocks entirely by cleaning and updating the mesh topology without involving geometric coordinate calculations. Furthermore, the authors proposed a novel entity index-based method to distinguish between unions, differences, and intersections in Boolean operation results, replacing the traditional inside-out classification approach. Overall, the method strikes a good balance between algorithmic efficiency and complexity, and its effectiveness is demonstrated through operational examples on various models, including open surfaces, closed surfaces, and hybrids of the two.
Lin et al. (Contribution 2) proposed a semi-supervised few-shot learning model that combines pseudo-label guided contrastive learning (PLCL) and local factor clustering (LFC) to address the embedding mismatch problem caused by the discrepancy in data distribution between the base dataset and the new dataset. Firstly, the model is pre-trained on a large base dataset and then fine-tuned on a new task with a small number of labeled samples and a large number of unlabeled samples. During fine-tuning, the model utilizes the LFC strategy to fuse local feature information and generate pseudo-labels for unlabeled samples. Subsequently, the PLCL module uses these pseudo-labels for supervised contrastive learning, aiming to bring similar samples closer and separate dissimilar samples in the feature space. Compared with the two main clustering methods, the proposed clustering strategy significantly improves accuracy in specific application scenarios.
Liu et al. (Contribution 3) proposed an innovative temporal action localization framework that combines a separated bidirectional Mamba (SBM) model with a boundary correction strategy (BCS). This framework aims to address the challenges of existing methods in capturing long-term actions and precisely localizing action boundaries. The proposed method employs a “pre-localization-optimization-re-localization” process: first, the SBM leverages its powerful long-term temporal modeling capability to encode video features, generating a preliminary localization result. Then, the BCS module analyzes these results, evaluating the actual contribution of each video frame to the action instance, and it adjusts and optimizes the original video features accordingly. Finally, these optimized features are fed back into the SBM and the detection head for a second localization, resulting in a more accurate and refined action boundary. Experimental results demonstrate that its performance surpasses the current state of the art.
Li et al. (Contribution 4) systematically restructured and categorized loss functions in deep learning. First, this review breaks through the traditional binary division of loss functions into regression loss and classification loss, innovatively introducing and establishing metric loss as a new core category. Then, based on this three-fold classification (regression, classification, and metric), the review is further subdivided into each category, elaborating on the improvement paths and existing issues for each type of loss function. Finally, the review also discussed emerging trends such as compound loss and generative loss, providing researchers in the field of deep learning with a new perspective and a comprehensive reference resource.
2.2. Mathematical Methods for Specific AI Problems
An overview of mathematical methods for solving specific problems is presented as follows:
Xing et al. (Contribution 5) proposed a modular approach to Dongba scripture sentence segmentation line detection (DS-SSLD). The study utilized a self-constructed dataset (DBS2022) comprising 2504 images and their corresponding annotations, all sourced from authentic Dongba scripture. To ensure data authenticity and model generalization, the dataset was created using the original scanned images directly without any additional preprocessing such as noise reduction or blurring removal. Subsequently, a multi-scale hybrid attention network (Multi-HAN) based on YOLOv5s was developed. This network employs an innovative multi-hybrid attention unit (MHAU) and a multi-scale cross-stage partial unit (Multi-CSPU) to enhance detection accuracy. Overall, the trained Multi-HAN demonstrates robust effectiveness and accuracy in detecting sentence segmentation lines in Dongba scripture text, providing a solid foundation for sentence segmentation in Dongba scripture analysis.
Zheng et al. (Contribution 6) proposed a weakly supervised specular highlight removal method that only requires highlight images, addressing the challenge of obtaining paired highlight/non-highlight image data for training deep learning models. This method estimates a mask using non-negative matrix factorization (NMF) to generate non-highlight images from the highlight images. These non-highlight images are then fed into a sequence of modules, similar to a cycle generative adversarial network (Cycle-GAN), including highlight generation, highlight removal, and reconstruction modules for joint training. During the validation phase, an efficient highlight removal module is obtained. Experimental results demonstrate that this method outperforms existing methods on multiple datasets, showing its significant potential for improving image quality.
Peng et al. (Contribution 7) proposed a shale gas well productivity prediction model based on a neural network optimized by the cuckoo search (CS) algorithm. This model aims to address the issues of high computational cost and low accuracy in traditional analytical and numerical simulation methods, as well as the problem of standard neural networks easily becoming stuck in local optima. The model first identifies seven key geological and engineering factors influencing the absolute open flow rate of natural gas wells in a specific region of western China through covariance and Spearman rank correlation analysis. Then, the cuckoo search algorithm is used to globally train the neural network, optimizing its weights and biases to effectively avoid local optima. Experimental results demonstrate that the model can accurately predict productivity, contributing to efficient shale gas resource extraction, thus reducing development costs and improving the economic benefits of the oil and gas field.
Zhang et al. (Contribution 8) proposed an adaptive feature extraction method based on the longest common subsequence (LCSS) combined with classification theory, aiming to overcome the problems of traditional machine learning methods in trajectory association, such as difficulty in obtaining training samples, long training time, and insufficient model generalization ability. This method first uses the LCSS algorithm to measure trajectory similarity and initially classifies trajectory pairs into three categories, namely “definite association”, “definite non-association”, and “ambiguous association”. Subsequently, it automatically extracts discriminative features from the definite categories using a membership function-based method and trains a support vector machine (SVM) model using these features. Finally, the trained SVM is used to accurately classify the ambiguously associated trajectories. Experimental results show that this method significantly improves the accuracy and recall of trajectory association, resolving the inherent contradictions of traditional fuzzy methods. Furthermore, it exhibits high computational efficiency, with computation time being only one-third that of traditional machine learning algorithms, demonstrating excellent adaptability to various scenarios.
3. Conclusions and Future Scope
This Special Issue showcases the latest advances in mathematical methods for computer vision. Specifically, it features original contributions from academics and industry experts that address the core challenges presented by the increasing complexity and precision required in image and video analysis. These contributions will foster interdisciplinary research that integrates theoretical models with practical applications, aiming to enhance the robustness, accuracy, and interpretability of computer vision algorithms and models, enabling breakthrough applications in key technologies such as tracking, understanding, and recognition.
With the development of large language models, future research will focus on the application of mathematics within them. Future research will particularly focused on the optimal combination and application of mathematical theories to address specific bottlenecks. For example, this will involve exploring the performance of integrating different mathematical frameworks to improve model training stability or build trustworthy AI. At the same time, it is necessary to closely monitor the development of emerging branches at the intersection of mathematics and artificial intelligence, as well as their theoretical innovations in the field of large models. Additionally, these findings must be systematically integrated into the foundational theoretical framework of the next generation of general artificial intelligence.