Cognitive Classifier of Hand Gesture Images for Automated Sign Language Recognition: Soft Robot Assistance Based on Neutrosophic Markov Chain Paradigm

: In recent years, Sign Language Recognition (SLR) has become an additional topic of discussion in the human–computer interface (HCI) field. The most significant difficulty confronting SLR recognition is finding algorithms that will scale effectively with a growing vocabulary size and a limited supply of training data for signer-independent applications. Due to its sensitivity to shape information, automated SLR based on hidden Markov models (HMMs) cannot characterize the confusing distributions of the observations in gesture features with sufficiently precise parameters. In order to simulate uncertainty in hypothesis spaces, many scholars provide an extension of the HMMs, utilizing higher-order fuzzy sets to generate interval-type-2 fuzzy HMMs. This expansion is helpful because it brings the uncertainty and fuzziness of conventional HMM mapping under control. The neutrosophic sets are used in this work to deal with indeterminacy in a practical SLR setting. Existing interval-type-2 fuzzy HMMs cannot consider uncertain information that includes indeterminacy. However, the neutrosophic hidden Markov model successfully identifies the best route between states when there is vagueness. This expansion is helpful because it brings the uncertainty and fuzziness of conventional HMM mapping under control. The neutrosophic three membership functions (truth, indeterminate, and falsity grades) provide more layers of autonomy for assessing HMM’s uncertainty. This approach could be helpful for an extensive vocabulary and hence seeks to solve the scalability issue. In addition, it may function independently of the signer, without needing data gloves or any other input devices. The experimental results demonstrate that the neutrosophic HMM is nearly as computationally difficult as the fuzzy HMM but has a similar performance and is more robust to gesture variations.


Introduction
The SLR aims to provide a reliable method for transcribing sign language into written or spoken form, facilitating communication between deaf people and those with normal hearing [1].Automatic gesture recognition is very engaging since hand gestures are spatiotemporally changing.The primary difficulty in sign language recognition is finding a model that can grasp the language and scale up to significant terms [2,3].It is possible to achieve gesture recognition in one of two ways [4]: either by wearing sensual gloves on one's hands or with the aid of computer vision.
Glove-based approaches use sensory gloves to measure the angles and spatial positions of a hand and fingers.Computer-vision-based approaches to hand gesture detection need either a single camera or many cameras to acquire images.Using computer-generated gloves or other resources as input devices produces difficulties in distinguishing gestures from a typical standpoint and it is particularly hard to distinguish them in real-time.Thus, in recent years, researchers have made several SLR systems using the available computer vision methods [5,6].Regular Human-Computer Interaction (HCI) should be glove-free, rapid, reliable, and acceptable.
In the literature, the standard hand-communication systems may be broken down into three layers: detection, tracking, and recognition [7].The task of the detection layer is to identify and extract visual features that may be linked to the presence of hands in the camera's view; for the system to know "what is where" at all times, it is up to the tracking layer to establish temporal data links between successive frames of images.The recognition layer is responsible for integrating the collected spatiotemporal data from the previous layers and communicating the resulting clusters, which are annotated with labels related to certain types of gestures/postures.
Building a hand gesture detection and recognition system involves several main steps.A high-level overview is provided as follows (see Figure 1) [1][2][3][4]: (1) Data Collection: Gather a diverse dataset of hand gesture images.These should cover different hand poses, lighting conditions, backgrounds, and skin tones.(2) Preprocessing: Preprocess the collected data to improve their quality and suitability for training.This may involve tasks such as resizing, normalization, noise reduction, and background subtraction.(3) Hand region detection: This involves isolating the region of an image that contains the hand(s).Several techniques can be used for hand region detection, such as skin color segmentation.(4) Feature Extraction: Features relevant to hand gestures are extracted from the preprocessed data.This may include extracting hand regions, a contour analysis, a histogram of oriented gradients (HOG), or other deep-learning-based feature extraction techniques.(4) Gesture Detection and Recognition: A machine learning model is employed to detect and recognize hand gestures based on the extracted features.This involves classifying the hand gestures into predefined categories.(5) Output/Feedback: The recognized gestures are conveyed as output, which can be used for various purposes, such as controlling devices, interacting with applications, or providing feedback to the user.
Computers 2024, 13, x FOR PEER REVIEW 2 of 27 to achieve gesture recognition in one of two ways [4]: either by wearing sensual gloves on one's hands or with the aid of computer vision.
Glove-based approaches use sensory gloves to measure the angles and spatial positions of a hand and fingers.Computer-vision-based approaches to hand gesture detection need either a single camera or many cameras to acquire images.Using computer-generated gloves or other resources as input devices produces difficulties in distinguishing gestures from a typical standpoint and it is particularly hard to distinguish them in real-time.Thus, in recent years, researchers have made several SLR systems using the available computer vision methods [5,6].Regular Human-Computer Interaction (HCI) should be glove-free, rapid, reliable, and acceptable.
In the literature, the standard hand-communication systems may be broken down into three layers: detection, tracking, and recognition [7].The task of the detection layer is to identify and extract visual features that may be linked to the presence of hands in the camera's view; for the system to know "what is where" at all times, it is up to the tracking layer to establish temporal data links between successive frames of images.The recognition layer is responsible for integrating the collected spatiotemporal data from the previous layers and communicating the resulting clusters, which are annotated with labels related to certain types of gestures/postures.
Building a hand gesture detection and recognition system involves several main steps.A high-level overview is provided as follows (see Figure 1) [1][2][3][4]: (1) Data Collection: Gather a diverse dataset of hand gesture images.These should cover different hand poses, lighting conditions, backgrounds, and skin tones.(2) Preprocessing: Preprocess the collected data to improve their quality and suitability for training.This may involve tasks such as resizing, normalization, noise reduction, and background subtraction.(3) Hand region detection: This involves isolating the region of an image that contains the hand(s).Several techniques can be used for hand region detection, such as skin color segmentation.(4) Feature Extraction: Features relevant to hand gestures are extracted from the preprocessed data.This may include extracting hand regions, a contour analysis, a histogram of oriented gradients (HOG), or other deep-learning-based feature extraction techniques.(4) Gesture Detection and Recognition: A machine learning model is employed to detect and recognize hand gestures based on the extracted features.This involves classifying the hand gestures into predefined categories.(5) Output/Feedback: The recognized gestures are conveyed as output, which can be used for various purposes, such as controlling devices, interacting with applications, or providing feedback to the user.Different types of SLR may be broken down further into signer-related (dependent) and signer-free (independent) subcategories, depending on whether or not the signer is actively involved in the identification process.Signer-free SLR is crucial because it enables the direct application of a technique to the package and the development of the method for signers who were previously unrecognized [8,9].There are two issues with signerless Different types of SLR may be broken down further into signer-related (dependent) and signer-free (independent) subcategories, depending on whether or not the signer is actively involved in the identification process.Signer-free SLR is crucial because it enables the direct application of a technique to the package and the development of the method for signers who were previously unrecognized [8,9].There are two issues with signerless SLR [7]: There is a noticeable divergence in the signals (postures) of different people, which increases the model convergence complexity.In order to obtain a robust recognition paradigm, the training data must constitute a large number of signers.The second issue concerns the inadequate signatures (descriptions) extracted from several signers' data.The Computers 2024, 13, 106 3 of 25 investigation into the drawing out of an SLR signature is only just beginning, as opposed to speech recognition, where each voice signature has been exhaustively searched.Properly eliciting public signatures from varied postures is a challenging issue that demands a solution [8].
Several methods have been employed for signer-independent SLR, with HMM being the most popular option because of its many benefits [8,9].HMM provides the capacity to simulate a time-domain process when taking into consideration the positions and orientations of gestures throughout time.However, there are limitations to the standard HMM.The assumption that a mixture of Gaussian functions may adequately characterize the allocations of different control elements is one such limitation.The scalability of HMMs is another major limitation when using them for gesture recognition.There are hundreds of symbols in the SLR dictionary.It is impractical and time-consuming to train and test thousands of HMMs.As a result of these two issues with signer-free SLR, it is clear that conventional HMM cannot adequately create posture descriptions.
The incomplete information and non-specificity caused by gesture vagueness are two examples of the types of vagueness in gesture recognition, which include noisy or nonstationary data, an uncertain matching degree, and fuzzy similarity matches, in addition to fuzzy decision functions [10].When the variables of the HMM are stated, they are fully identified.As a consequence of incomplete or noisy data in sign recognition, these factors might not precisely show the primary distribution of the examinations.Also, the use of hiring probabilities to evaluate uncertain HMMs might look complicated based on the observation.It is still possible to describe the ambiguous factors of the HMM to account for the uncertain likelihoods, even though this does not create a dangerous problem for different uses.Generalized HMM with fuzzy measure and a fuzzy integral (fuzzy hidden Markov model) is a useful method.At this point, many researchers have used Fuzzy HMM to recognize posture and obtained great results.Even so, these limits are still in place [11][12][13].
Recently, the neutrosophic hidden Markov model (NHMM) has been employed as a significant mathematical mode for uncertainty, redundancy, inconsistency, and ambiguity.The neutrosophic variable (NV) is subject to change due to randomness and indeterminacy, in contrast to the conventional stochastic (random) variable, which accounts only for the changes due to randomness [14].The neutrosophic probability (NP) of an event E is defined as NP(E) = (the chance that the event may occur, the indeterminate chance of the event occurring, and the chance that the event may not occur).NHMM explicitly quantifies indeterminacy.Truth, indeterminacy, and falsity are independent.

Motivation
Researchers in the field of gesture recognition are driven by the fact that these gestural communication and device control methods are more commonplace.Numerous applications now rely on camera-based gesture detection technologies, allowing for two-way communication between humans and computers.Both the feature and hypothesis spaces in gesture/posture categorization involve ambiguities, and HMM-based approaches can accurately characterize these data with sufficient training data.Practically, the HMM still performs insufficiently on the test data for the reasons of noise, insufficient training data, and incomplete information.Therefore, in HMM, ambiguity (uncertainties) must be materialized.In order to improve performance in terms of resilience with postural anomalies, generalization capacity, and recognition accuracy, we combine high-order fuzzy sets (a neutrosophic set in our case) with HMM classifiers.The realization of the neutrosophic set was essentially qualified through three membership functions (truth, indeterminacy, and falsity) to manage more vagueness in real-world applications.

Contribution and Novelty
The work presented in this paper proposes a novel Neutrosophic-based Hidden Markov Model (NHMM) to address the issues with signer-independent SLR.It is motivated by the desire to provide users with an instinctive gesture input system that performs well on a large vocabulary size with a small set of training data under conditions of uncertainty.The classification of hand gestures involves ambiguities in both the feature space and the hypothesis space.Typically, class-conditional probability density functions in the feature space allow for the extraction of arbitrary observations.The variables that make up the decision function in the hypothesis space are random variables whose prior distributions were obtained using training data.Due to their exclusive focus on whether an element is a member of a particular class or not, fuzzy logic controllers and related logics do not incorporate the idea of a range of neutralities.As a result, these logics are ill-equipped to handle any uncertainty that may occur during data collection.Consequently, when the obtained data may be uncertain or insufficient, a neutrosophic controller is proposed as a solution.
According to recent reviews, neutrosophic logic with HMM classifiers has not been adjusted for SLR tasks, despite its use in many decision functions' tasks; hence, the current study offers something new in this area.The system is able to simulate the different changes in hand postures by recognizing a posture based on the structure of hand features.Considered a valuable matrix decay approach, particularly for data exploration and dimensional-decreasing translations, the observation sequence utilized to characterize the states of the NHMM is derived from the features extracted from the segmented hand image using Singular Value Decomposition (SVD).
The remainder of this paper consists of the following sections: Section 2 provides a literature review of relevant publications regarding the SLR framework.The suggested approach is presented in Section 3. The assessment of the suggested technique, including results and discussion, is presented in Section 4. The study is concluded, and possible future directions are discussed, in Section 5.

State of the Art
Hand gesture recognition is often divided into two categories [4,[15][16][17][18][19][20][21][22][23]: (1) Rule-based techniques, which include a set of rules that are manually configured among feature inputs.The rule that balances the input is output as the gesture after a set of features are retrieved and matched to the encoded rules.The fundamental flaw of rule-based systems is their reliance on human encoding expertise.(2) Machine-learning-based techniques that manage a gesture as the result of a stochastic process.Among these methods, HMMs have been extensively studied for their ability to classify gestures.On the other hand, accurate and real-time SLR with an extensive vocabulary still faces several obstacles.If the number of expressions increases, the hybrid HMM model will not be able to scale finely enough to handle them.Still, the vast majority of such systems are SLRs that rely on signers.
A deep-learning-based method that can recognize and classify words from a person's gestures is proposed in ref. [24].Finding a suitable model to deal with the problems of continuous signer-independent signs is the tricky part.Building a continuous and accurate model is problematic for various reasons, including the fact that signers' rates and durations vary.Two distinct deep-learning-based methods are used in this work for the purpose of effective and accurate sign identification.The authors generate a single hybrid algorithm after training both methods independently for static and continuous signs.To obtain a hand's feature data, the Leap Motion sensor is used in ref. [25].A hybrid HMM-DTW model is suggested for dynamic gesture recognition after evaluating the recognition effects of HMM and dynamic time warping (DTW) algorithms.It is clear that the model is significant based on the Kappa coefficient.
In ref. [26], the authors used a well-defined stochastic mathematical technique to build a sophisticated hand motion detection system for use in VR applications.First, the user and system do not make physical contact; second, geometric factors may dictate the hand gesture's rotation; and third, to increase measurement performance, the model parameter must be adjusted.The suggested study hybridizes a bio-inspired metaheuristic approach-specifically, the cuckoo search algorithm-to reduce the complicated trajectory in the HMM model, and this is compared to existing classification strategies used in hand gesture detection.The experimental results show how to increase system performance metrics by validating the HMM model using the optimizer's cost value.
The use of hidden Markov models as classifiers was confirmed in the method for identifying chosen human-computer interaction (HCI) gestures that was published in ref. [27].Using a system developed for recognizing VR gaming gestures, the studies looked at the possibility of applying it to the classification of HCI gestures.In this study, the authors evaluated several approaches to discretizing forearm orientation data and compared their classification accuracy.
The study in Ref. [28] presented an autonomous method for simultaneous and delayfree hand gesture detection and prediction by combining Hidden Markov Models (HMMs) with Deep Neural Networks (DNNs).A hidden Markov model (HMM) may be used for feature extraction, a forward spotting mechanism with variable sliding window sizes can be utilized to identify the meaning of gestures, and finally, deep neural networks can be employed for recognition.Hence, to reliably detect a significant number of gestures (0-9), a stochastic approach to building a non-gesture model using HMMs and no training data was presented.The authors use the confidence measure from the non-gesture model as an adaptive threshold to find the beginning and end of the input video stream, where meaningful gestures start and stop.The experimental data show that their system has a 94.70% success rate in detecting and predicting significant movements.
For the purpose of hand gesture recognition, a multistage spatial-attention-based neural network has been suggested in ref. [29].The suggested paradigm has three levels, with each step being passed down from the CNN.Before highlighting the dataset's practical features, the authors apply a feature extractor and a spatial attention module, utilizing the original dataset's self-attention.Then, they multiply the feature vector with the attention map.After that, they look at features that might be combined with the initial dataset to achieve modality feature embedding.They used the same approach in the second stage to build an attention map and feature vector using the feature extraction architecture and self-attention method.In the third step, a classification module is used to predict the label of the corresponding hand motion.
For skeleton-based hand gesture identification, ref. [30] research recommended a deep ensemble architecture known as the multi-model ensemble gesture recognition network (MMEGRN).For the purpose of extracting and categorizing skeleton sequences, the authors offered an architecture that makes use of four sub-networks and three spatiotemporal feature classifiers.Late feature fusion creates a new fusion classifier by merging the features obtained from each sub-network's feature extractor.The usefulness and superiority of the proposed framework compared to previous models in terms of recognition accuracy were shown via extensive testing on three skeleton-based hand gesture recognition datasets.
In order to assess the temporal hand trajectories derived from the identified hand poses, the segmentation process in ref. [31] employs the adaptive-region-based active contour (ARAC) method, which is based on the meta-heuristic foundation of the opposition strategic velocity updated beetle swarm optimization (OSV-BSO).To reduce dimensionality, principal component analysis (PCA) is used after feature extraction.Standard features used in this process include oriented fast and rotated brief (ORB), region props, and the histogram of oriented gradients (HOG).An optimized probabilistic neural network (PNN) uses these essential features to complete the recognition phase.The authors adjust the training weights based on the meta-heuristic basis of the OSV-BSO to enhance the present performance of PNN as a deep learning model.
To recognize hand gestures by extracting all conceivable types of skeleton-based gestures, the authors in ref. [32] presented a multi-branch attention graph and a generic deep-learning model.Using a generic deep neural network module, the final branch collects features based on general deep learning.Concatenating the spatial, temporal, and general features and putting them into the fully connected layer resulted in the final feature vector.The suggested technique surpassed the current state-of-the-art methods due to its cheap computing cost and excellent performance accuracy.
A lightweight deep neural network with enhanced processing was proposed for realtime dynamic sign language recognition (DSLR) in ref. [33].There are three primary parts of the solution approach.Dataset preprocessing involves first standardizing the number of frames in the incoming dataset.Then, in order to find and identify them, the MediaPipe framework takes hand and position landmarks (features).In order to correctly identify the DSL, the features of the models are passed on after processing the body's depth and position unification.The study results demonstrate that this solution approach can identify dynamic signs in real time with remarkable speed and accuracy.
To recognize the static gestures of the hands in sign language, a novel architecture of deep learning neural networks is presented in ref. [34].Classical, non-intelligent feature extraction and convolutional neural networks (CNNs) are part of the suggested architecture.Following background removal and preprocessing, the suggested structure uses three streams of feature extraction to extract useful features and assign a class to the gesture image.Three commonly used approaches in hand gesture classification-CNN, Gabor filter, and the Oriented Fast and Rotated Brief (ORB) feature descriptor-make up these three streams, each of which extracts its own distinctive features.The final feature vector is formed by merging these features.Not only does the suggested structure become more resistant to uncertainties and ambiguity in the hand gestures, it also achieves very high accuracy in hand gesture classification by combining these efficient approaches.
In ref. [35], the authors introduced a portable convolutional neural network (CNN), called a hybrid feature attention network (HyFiNet), for accurate hand gesture identification.The architecture of HyFiNet consists of four REEMs or multi-scale refined edge extraction modules.The REEM module incorporates a hybrid feature attention (HyAttention) block to collect the finer edge information of hand motions.In order to learn the discriminable semantic structure of hand postures, the HyAttention block is designed to zero in on efficient key features from multi-receptive fields.When compared to the current stateof-the-art networks, the results of the study and visual representation show a significant improvement in accuracy.
In ref. [36], the authors detail their work on a computer vision system that uses Hidden Markov Models (HMMs) to identify hand gestures.After receiving video input of hand gestures, the system performs morphological operations, segmentation of the hands, tracking of the hands, trajectory smoothing, and skin-color-based segmentation.The HMM learning package is used to implement the HMM with Gaussian emissions.The Viterbi technique is employed to decode observation sequences and ascertain the most probable state sequences, together with their corresponding probabilities.A validation set is used to examine the maximum likelihood classifier's sign recognition accuracy, while a test set of observation sequences is used to evaluate the system's performance.This system's ability to correctly recognize hand gestures and their matching letters is confirmed by the results.The reader can refer to refs.[37][38][39] for a comprehensive study on the challenges and solutions in hand gesture recognition.

The Need to Extend the Related Work
Using a large dataset to identify a few isolated signs under controlled illumination, many scholars presented an automated SLR system based on HMM.Backgrounds and visuals were severely constrained throughout the data collection phase.Given the behavior complexity and uncertainty of hand movement changes in SLR, this paper's primary objective is to investigate the efficacy of neutrosophic HMM in signer-independent SLR with the aim of improving the scalability of SLR systems, with the ultimate goal of processing more hand signs and predicting more correct models with different types of uncertainty, unlike current methods, which do not deal with all cases of uncertain data.Not all uncertain (indeterminate) data can be represented by high-order fuzzy sets, while neutrosophic sets deal with all types of indeterminacy.

Methodology
With a large enough vocabulary size, the suggested posture recognition system may be used in real-world scenarios.It is a vision-based method for identifying isolated signs in posture images without any constraints on the signer or the background.By using neutrosophic logic, this technique enhances the classifier ability of HMMs concerning uncertainty.Neutrosophic sets, one of the fuzzy set extensions, have the ability to deliver more successful results when modeling uncertainty since they contain the membership functions of truth, indeterminacy, and falsity definitions, rather than only a membership function, thus providing a fair estimate of the reliability of the information.The independence of truth-membership (TM) and indeterminacy-membership (IM) functions from each other and expression of the fact that an individual does not have full control of the issue with the IM function have an essential role in modeling uncertainty problems [40].The left-right non-skip HMM topology is often used to simulate posture signals.In this topology, states with marks lower than the current state cannot be entered via switches.The neutrosophic hidden Markov model's (NHMM) parameters are defined using a neutrosophic probability, which is determined by the training samples.Based on neutrosophic numbers, the training is carried out using the Baum-Welch method.To determine the most likely expression sequence, the recognition phase employs the Viterbi technique.
The many advantages of the proposed scheme include the following: the features are unaffected by the scaling or revolution of gestures under different conditions inside an image, making the system more adaptive, and Singular Value Decomposition (SVD) minimizes the feature vector required for a particular posture image.In addition, the feature vector that is unique to each motion is formed by features that are constructed using this approach.Real-time online SLR posture detection and interactive training are both possible with this technology.It is also capable of learning new postures intelligently, which may be used for recognition purposes in the future.The proposed SLR system is detailed below, and Figure 2 shows how it works.

Hand Detection
Due to limitations in vision-based methods, such as changing illumination conditions and composite image backgrounds, distinguishing hands is essential to success in any gesture classification effort.Two primary methods exist for skin recognition: color-based and region-based division [41,42].Due to the difficulties encountered while recognizing skin colors and the unpredictability of illumination conditions, skin-color identification could be more reliable.There must be a sufficient contrast between the object and background for methods that use shape outlines (region-based) to work.Considering the trade-off between computational effort and identification accuracy, the proposed system employs a YCbCr color space pixel-based, non-parametric (histogram) segmentation approach [43].The main steps include computing the histogram, selecting thresholds, applying the thresholds to segment the image, and optionally performing post-processing for refinement (morphological operations).In order to classify skin pixels and provide robust parameters under different lighting situations, the color space transformation is intended to reduce the overlap between skin and non-skin pixels [44].
Let I be a digital posture image, represented as a matrix of pixel intensities I(x, y) for x = 1, 2, . . .M and y = 1, 2, . . .N, where M and N are the dimensions of the image.The histogram H of image I is defined as a discrete function: where δ is the Dirac delta function and i is the intensity level.The histogram represents the frequency distribution of pixel intensities in the image.Then, the image is segmented into K + 1 regions (two region in our case) R 0 , R 1 , . . . . . ., R k , where R i consists of pixels with intensities in the ith interval, where T i−1 and, T i are the lower and upper bounds (inclusive-exclusive) of intensity level i, respectively.
The input in this approach may be an image or a video frame obtained from any camera, including webcams.The input RGB color image is first downsized to save processing time and then converted to a CbCr chrominance image (chrominance vector).In order to mitigate the effects of lighting variation, the system uses only the chrominance matrices to fully describe the color data, ignoring the Y matrix [44].Building a decision process that can differentiate between pixels that contain skin and those that do not is the last objective of skin color detection.One way to create a skin classifier is to define the skin cluster's borders in the CbCr color space using a set of rules.Refer to [45,46] for a comprehensive overview of skin clusters.

Hand Detection
Due to limitations in vision-based methods, such as changing illumination conditions and composite image backgrounds, distinguishing hands is essential to success in any gesture classification effort.Two primary methods exist for skin recognition: color-based and region-based division [41,42].Due to the difficulties encountered while recognizing

Feature Extraction
Because hand motions are so varied in form, movement, and texture, choosing features is critical to gesture detection.The following three methods are the most common ones that may produce features: viewpoint, model (Kinematic Model), and appearance-based low-level feature approaches.Refer to [47][48][49][50] for a comprehensive overview of the use of gesture feature extraction for static gesture recognition.In order to improve the accuracy of the hand posture identification scheme and to circumvent issues caused by multiple modifications like rotating, resizing, and reallocating, the SVD feature extraction method is applied to the skeleton of the hand shape using the fewest number of pixels without losing the contour data in order to identify unique and distinguishable features [50].This is achieved using the binary segmented hand image from the previous step.
SVD helps to reduce the feature space's dimensionality, which can improve computational efficiency and reduce the risk of overfitting.The singular vectors obtained from SVD can provide insights into the principal components of the data, aiding in understanding the most influential features of gesture recognition [51][52][53].For any m × n array A, there exist orthogonal arrays, as follows: where σ i is the ith unique quantity of A in a non-increasing sequence; u i and v i are the ith left and right unique vectors of A for i = min(m, n), in that order.The ith largest unique quantity σ i of A is really the Euclidean distance of the ith major estimated vector Ax in the x direction, which is perpendicular to all the i − 1 superior perpendicular vectors: ∥Ax∥ 2 ( 7) The highest is engaged over all i-dimensional areas U ⊆ R n .In this case, the HMM observable vectors consist of σ 1 , σ 2 , . . ., σ min(m,n) coefficients stemmed by the operating SVD transform for each training sign.The number, and particularly the order, of the coefficients play a significant role in producing an acceptable recognition model [54].

Neutrosophic Hidden Markov Model
Hidden Markov Models (HMMs) are a powerful tool for modeling sequential data, making them well-suited for gesture recognition tasks where the temporal aspect of the data is crucial [55][56][57].Handling uncertainty in Hidden Markov Models (HMMs) is crucial for many real-world applications, as it allows the model to account for the inherent ambiguity and variability in the observed data.Combining neutrosophic logic with HMMs introduces a framework for modeling uncertainty, indeterminacy, and imprecision in the context of sequential data analysis.This combination is referred to as Neutrosophic Hidden Markov Models (NHMMs) [58,59].
NP is the neutrosophic probability in which each event is associated with three values: T (truth), F (falsehood), and I (indeterminacy).These values represent the degrees of truth, falsehood, and indeterminacy associated with an event.
An interval neutrosophic stochastic process {X(n) : n ∈ N} is said to be an interval neutrosophic Markov chain if it satisfies the Markov property.
where i, j, and k establish the state space S of the process.Here, is called the interval-valued neutrosophic probabilities of moving from state i to state j in one step. where are the lower and upper truth-membership of the transition from state i to state j, respectively, I L are the lower and upper indeterminate membership of the transition from state i to state j, respectively, and are the lower and upper falsity-membership of the transition from state i to state j.Matrix P = ∼ P i,j is called the interval-valued neutrosophic transition probability matrix.
The Neutrosophic Hidden Markov Model (NHMM) is a neutrosophic Markov chain {P n } 1≤n≤N whose states are unobservable directly but can be observed through a sequence of observations {O n , n ≥ 1}, generated by the state X n , which is only conditional on X n .The NHMM is similar to a fuzzy hidden Markov chain [60,61], where the arithmetic operations are neutrosophic.The NHMM, like any other HMM, consists of the initial NP distribution, neutrosophic transition probability matrix, and the observation matrix providing the conditional NP: where The main problem of any hidden Markov chain involves the decoding part, which finds the best state sequence {X n } 1≤n≤N given the model ∼ λ and the observation sequence of length N. Traditionally, the Viterbi algorithm is used to solve this problem with the aim of finding ∼ γ n (i), which provides the highest possibility along the single path at time step n.See [62] for more details.
Here is how the NHMM could be applied to gesture recognition [14,58,59]: -Neutrosophic Representation of Gestures: Represent different possible gesture states using neutrosophic sets.Each state could have associated truth-membership, indeterminacy-membership, and false-membership values, capturing the uncertainty in recognizing a particular gesture.For observations, capture the uncertainty in detecting and interpreting individual features or components of gestures using neutrosophic sets.This accounts for variations in the observed data due to noise or imprecision.
In hand gesture recognition, neutrosophic membership functions can be used to model the uncertainty associated with recognizing gestures.Let us consider a simple example, where we have three possible gestures: "open hand", "closed fist", and "pointing finger".We can define neutrosophic membership functions for each gesture based on the degrees of truth, indeterminacy, and falsity and represent these membership functions using triangular functions, where the parameters defining the functions (e.g., the peaks, widths, and slopes) are determined based on the specific characteristics of each gesture and the available data.For example, for the "open hand" gesture, the membership functions could be defined as follows: Truth (T): The hand is fully open, so the truth membership function peaks at one and decreases as the observed hand gesture deviates from the fully open hand.Indeterminacy (I): There might be some ambiguity if the hand is partially open or partially closed, so the indeterminacy membership function has a peak around the intermediate state.Falsity (F): The hand is not in a closed fist or pointing gesture, so the falsity membership function peaks at 0 and decreases as the observed hand gesture becomes more similar to a closed fist or pointing finger.
Similarly, membership functions for other gestures can be defined based on their characteristic features and the degree of ambiguity associated with their recognition.Here is a simple triangular representation for each membership function [14]: For Indeteminacy (I) : For Falsity (F) : where a < b < c are the parameters defining the truth membership function: d < e < f for indeterminacy and g < h < i for falsity.More complex functions can be used depending on the complexity of the available hand gesture samples.
The inference engine processes these data to make decisions or inferences, considering the uncertainties inherent in the data.Let A be a set representing a piece of information.
We denote ∼ A as a neutrosophic set ∼ A = {(x, T(x), I(x), F(x)|x ∈ U)} where x represents an element in the universe of discourse U, and T(x), I(x), and F(x) represent the truth, indeterminacy, and falsity degrees associated with x, respectively.The inference engine operates based on a set of rules R, where each rule r i is defined as a triplet (A i , B i , C i ), where A i , B i , and C i are neutrosophic sets representing the antecedent, consequent, and indeterminate parts of the rule, respectively.A i is a neutrosophic set defining the conditions or criteria that must be satisfied for the rule to apply.B i is a neutrosophic set defining the outcomes or conclusions that follow if the antecedent is satisfied.C i is a neutrosophic set capturing the degree of indeterminacy associated with the rule, indicating the uncertainty or ambiguity in the inference process.
Given an input neutrosophic set X, the inference engine evaluates each rule against X to derive the degree of support for each consequence.This process involves determining the intersection between the input set X and the antecedent A i of each rule r i , resulting in a new set Y i .The degree of support for the consequent B i is then determined based on the truth, indeterminacy, and falsity degrees of the elements in Y i .If multiple rules provide conclusions, the engine aggregates these conclusions to derive a final decision or inference.This aggregation process may involve combining the degrees of support for each consequent using aggregation operators such as max, min, or weighted sum.Finally, based on the aggregated conclusions, the engine generates an output classification, decision, or inference, along with associated measures of confidence or uncertainty.
-Neutrosophic Transition Probabilities: Define the transition probabilities between neutrosophic gesture states.These transition probabilities should reflect the uncertainty in transitioning from one gesture state to another.For example, transitioning from a "hand raised" state to a "hand lowered" state may have some indeterminacy due to variations in the gesture execution.

A. Training phase
Given observable sequences for each training set, NHMM can be prepared with the neutrosophic form of the Baum-Welch algorithm.The Baum-Welch algorithm, also known as the forward-backward algorithm, is an expectation-maximization (EM) algorithm used for training hidden Markov models HMMs (tuning the parameters of the HMM).Adapting the Baum-Welch algorithm for an NHMM involves handling neutrosophic sets and updating the parameters accordingly [63,64].Here is a high-level overview of the adaptation of the Baum-Welch algorithm for neutrosophic HMMs: 1.
Initialization: initialize the parameters of the NHMM, including initial state neutrosophic measures π, transition neutrosophic measures A, and emission neutrosophic measures B.

2.
Forward Pass (Forward Algorithm): Compute the forward neutrosophic probabilities using the current model parameters.This involves propagating the neutrosophic measures through the states and time steps of the NHMM.

3.
Backward Pass (Backward Algorithm): Compute the backward neutrosophic probabilities, representing the probability of observing the remaining sequence given the current state of neutrosophic measures at a specific time.4.
Expectation Step: Use the forward and backward neutrosophic probabilities to compute the expected number of transitions and emissions for each state.This step involves calculating the expected neutrosophic counts based on the current model parameters.

Maximization
Step: Update the model parameters (transition neutrosophic measures, emission neutrosophic measures) using the expected counts obtained in the expectation step.This step involves maximizing the likelihood of the neutrosophic observed data given the model.

6.
Iterative Refinement: Repeat steps 2-5 until convergence or until a specified number of iterations is reached.Convergence can be determined by monitoring the change in the neutrosophic log-likelihood between iterations.

7.
Final Model: The final model parameters obtained after convergence represent the trained neutrosophic HMM.
Given the model λ = (A, B, π), the Baum-Welch algorithm trains the NHMM on a fixed set of observations O = (O 1 , O 2 , . . . . . . ,O T ).By adjusting its parameters, A, B, and π, the Baum-Welch algorithm maximizes P(O; λ).The parameters of the Baum-Welch algorithm are updated iteratively (for i, j = 1, 2, . . ., N and t = 1, 2, . . . . . ., T − 1), as follows: ξ t (i, j) is the probability of a path reaching state q i at time t and transitioning to state q ij at time t + 1. Summing a set of observations, we can obtain values for the expected number of transitions from or to any arbitrary state.Thus, it is straightforward to update the NHMM parameters: the model parameters are re-estimated iteratively using λ = Á, B, π , where Á = áij , B = ´j(k) b and π = { πi }.In this case, α t−1 (j)a ji is the probability of the joint event that O 1 , O 2 , . . . ,O T−1 are observed, given by α t−1 (j), and there is a transition from state q j at time t − 1 to state q i at time t, given by a ji ; b i (O t ) is the probability that O t is observed from state q i .Similarly, we can define the backward variable β t (i) as the probability of the observation sequence from time t + 1 to the end, given state q i at time t and the model λ, β t (i) = P(O t+1 , O t+2 , . . ., O T ; S t = q i , λ ) [63,64].
It is important to note that dealing with neutrosophic measures introduces additional complexities, and the mathematical operations involved in the computations are adapted to handle neutrosophic sets.The pseudocode for the neutrosophic Baum-Welch algorithm might look similar to the standard Baum-Welch algorithm but with additional considerations for neutrosophic measures and computations.The implementation may require specialized libraries or tools capable of handling neutrosophic sets and operations.See [65] for more details.

B. Recognition phase
Recognizers primarily function to classify testing gestures into their respective training classes.In order for the recognizer to find the corresponding trained gesture class, it must first rank the classes according to the available testing gesture [17,66,67].The NHMMbased gesture recognition paradigm's final parameters are derived using the neutrosophic Baum-Welch recursive approach.In order to maximize the probability of the state order for the given observation order, the neutrosophic Viterbi algorithm selects the highest state series [68,69].The key steps for adapting the Viterbi algorithm to an NHMM are as follows: 1.
Initialization: Initialize the Viterbi matrix to store the partial probabilities of the most likely sequence up to each state for each time step.Initialize the back pointer matrix to keep track of the path leading to the most likely sequence.Initialize the first column of the Viterbi matrix based on the neutrosophic emission probabilities and the initial probabilities of each neutrosophic state.

Recursion:
For each subsequent time step, calculate the partial probabilities in the Viterbi matrix based on the neutrosophic transition probabilities and the neutrosophic emission probabilities.Use the recursive formula that incorporates the truth, indeterminacy, and false memberships in the calculation.

3.
Backtracking: Update the back pointer matrix as the algorithm progresses, keeping track of the most likely path leading to each state.4.
Termination: Once the entire sequence has been processed, find the final state with the highest probability in the last column of the Viterbi matrix.This state represents the most likely ending state of the sequence.Use the back pointer matrix to backtrack from the final state to the initial state, reconstructing the most likely sequence of neutrosophic states.The formal definition of the Viterbi recursion is as follows: -Initialization: -Recursion: -Termination: The best path : The start of the back trace : is the previous Viterbi path probability from the previous time step, a ij is the transition probability from the previous state q i to the current state q j , and b j (O t ) is the state observation likelihood of the observation symbol O t given the current state j [68,69].
The pseudocode of the suggested model is summarized in Algorithm 1.
In conclusion, the system provides the option of combining hidden Markov chains with neutrosophic logic.This blending entails exchanging the basic HMM mathematical procedures with neutrosophic sets and numbers.Here are some potential advantages of using neutrosophic HMMs for gesture recognition: (1) Handling Uncertainty and Ambiguity: Neutrosophic logic allows for the representation of indeterminacy, uncertainty, and ambiguity in a more explicit manner.In gesture recognition, where the input data may be noisy or ambiguous, NHMMs can better handle uncertain information.(2) Flexibility in Modeling: Neutrosophic sets provide a flexible framework for modeling different degrees of truth, falsehood, and indeterminacy.This flexibility is particularly beneficial when dealing with complex and diverse gesture patterns.(3) Adaptability to Dynamic Environments: NHMMs can adapt to changes in gesture patterns over time.In dynamic environments, where gestures may evolve, or new gestures may emerge, the flexibility of neutrosophic logic can be advantageous.
While NHMMs offer several advantages for gesture recognition, they also come with certain disadvantages and challenges.It is essential to consider these aspects when deciding whether to use NHMMs for a particular application.Here are some potential disadvantages: (1) Computational Complexity: Neutrosophic logic involves the manipulation of truth values, indeterminacy, and falsity, which can increase the computational complexity of NHMMs.This may result in more extended training and inference times than traditional HMMs.(2) Learning Neutrosophic Parameters: Estimating the parameters of neutrosophic sets during the training process can be complex.The learning algorithm must effectively capture the various degrees of truth, falsehood, and indeterminacy, which requires careful tuning and validation.(3) Resource Intensity: Implementing NHMMs may demand more computational resources and memory than simpler models.This can be a concern in resource-constrained environments, such as embedded systems or real-time applications.

Results and Discussions
The accuracy of the proposed NHMM-based sign language recognizer was verified using gesture image samples downloaded from the Web and the Kaggle website.These samples included an extensive vocabulary with over 6000 isolated gesture signs selected from several sign language dictionaries, with ten samples for each gesture sign.Variations in brightness, scaling, distortion, rotation, and viewpoint were applied to those samples.Five samples, chosen randomly for each sign, were employed as the training samples; the registered test group (Reg) received two of the remaining samples, and the unregistered test group (Unreg) received the other samples.A hand gesture recognition rate, the proportion of hand postures that were correctly identified relative to the total number of hand postures, was used as the objective evaluation metric [3].The prototype recognition method was developed using Google Colab Python version 2.7 (https://saturncloud.io/blog/how-toupdate-google-colabs-python-version/ accessed on 1 January 2024) in a modular form and tested on a DellTM InspironTM N5110 Laptop, Dell Computer Corporation, Texas, which included the following specifications: 64-bitWindows 7 Home Premium, 4.00 GB RAM, Intel (R) Core (TM) i5-2410 M CPU, 2.30 GHz.
The current version of our model can effectively recognize single words.While determining word boundaries in sentences can be challenging, modern NLP systems have developed sophisticated methods to handle these complexities effectively.Python offers various libraries and frameworks for hand gesture recognition, leveraging techniques from computer vision and machine learning.Some commonly used functionalities and libraries include: - OpenCV is a popular Python library for computer vision tasks.It provides functionalities for image and video processing, including hand detection and tracking.
OpenCV can be used for tasks such as contour detection, hand segmentation, and gesture recognition.-Scikit-learn: Scikit-learn is a widely used machine learning library in Python.It provides a high-level interface for various machine learning algorithms, including dimensionality reduction techniques such as Principal Component Analysis (PCA) and Truncated SVD (a variant of SVD).The Truncated SVD class in scikit-learn allows for an SVD-based feature extraction that is suitable for large datasets.-Gesture Recognition Toolkit (GRT): GRT is a C++ library with Python bindings that provides tools for real-time gesture recognition.It offers algorithms for feature extraction, classification, and gesture spotting, making it suitable for building gesture recognition systems.-NumPy: This is a fundamental package for Python numerical computing.It provides support for multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays.-SciPy offers a wide range of numerical algorithms and mathematical functions that are not available in NumPy, including optimization, interpolation, integration, signal processing, and statistical functions.It provides high-level interfaces to optimize and integrate numerical algorithms efficiently, making it suitable for various scientific and engineering applications.-While specific Python libraries or packages dedicated to NHMMs may not be readily available, we can leverage general-purpose libraries for probabilistic modeling and machine learning (such as NumPy, SciPy, and scikit-learn) to implement the NHMMs functionalities.
We tested our method on a large vocabulary with varying numbers of states and variable numbers of Gaussian mixtures per state, using the same observation vector of 52 SVD features per sign.Table 1 displays the test results of utilizing HMM based on conventional Gaussian mixtures model (GMM)-, fuzzy Gaussian mixtures model (FGMM)-, type-2 fuzzy Gaussian mixtures model (IT2GMM)-, and neutrosophic Gaussian mixtures model (NGMM)-based isolated SRL performances [70].On the unregistered test set, the tests show that the suggested system improves recognition accuracy by 7%, 17%, and 24% compared to IT2FHMM, FHMM, and HMM, respectively, while dealing with large vocabulary sizes and sign data variations.Table 1 reveals that six states and ten Gaussian mixtures are optimal.In this experiment, we used a training set of five samples for each gesture class.These statistics are often altered in various datasets due to the direct impact of observational distribution.
One probable explanation is that neutrosophic sets' greater uncertainty ability and HMM's excellent sequential handling features, when combined in a different structure, may work against one other to provide better results.In other words, when faced with large vocabulary sizes, the suggested approach shows promising results in singer-independent hand gesture recognition.Neutrosophic logic explicitly accommodates unknown or undefined information through the concept of "indeterminacy".This can be useful when dealing with incomplete or missing data, providing a way to express the lack of clear information, as in the case of hand movement.Type-2 fuzzy logic, on the other hand, deals with uncertainty through the use of membership functions but may not provide a direct representation of unknown information.The three truth values (truth, indeterminacy, falsity) can be more naturally associated with real-world situations where information might be vague or incomplete.In contrast, the additional complexity introduced by type-2 fuzzy logic, with higher-order fuzzy sets, might make interpretation more challenging.The relationship between the recognition ratio and the size of the SVD vector is also shown in Table 2.It is well-known that adding additional features causes the recognition rate to increase with time.Controlling the right feature size is obviously crucial.The relationship between the feature vector size and recognition rate is complex, and finding the right balance is often a matter of experimentation and fine-tuning.It involves considering the trade-offs between computational efficiency, model complexity, and the quality of the information represented by the features.In high-dimensional spaces, the "curse of dimensionality" can become a challenge.This phenomenon can lead to increased computational complexity, data sparsity, and the risk of overfitting.A more significant feature vector size only sometimes guarantees improved recognition rates.In fact, including irrelevant or redundant features may negatively impact model performance.Feature selection and dimensionality reduction techniques are often employed to identify and retain only the most informative features.Generally, a well-balanced feature vector size containing relevant information without unnecessary complexity is more likely to lead to better generalization [71].In the third experiment, using signer-independent Arabic sign language, the suggested system is compared to the one in ref. [72].Both systems use the same datasets.However, they employ distinct feature extraction techniques, settings, and classifiers.According to Table 3, the proposed system outperforms the vision-based deep learning approach (an average improvement of 3%).This is because NHMM is capable of handling modeling parameters that, because of issues with sign identification, insufficient or noisy data, or both, might fail to accurately represent the initial distribution of the observations.In general, NHMMs can handle missing data more gracefully than many deep learning architectures.They have the ability to model the uncertainty associated with missing observations through their probabilistic framework.
Furthermore, NHMMs inherently possess a stateful memory, which allows for them to capture the dependencies and temporal relationships between observations.This makes them practical for tasks where the history of observations is crucial.The parameters of an HMM (transition probabilities, emission probabilities) often have clear interpretations, making it easier for domain experts to provide insights and fine-tune the model.Finally, NHMMs can perform well with relatively small amounts of training data.This is especially useful in scenarios where obtaining large labeled datasets for deep learning may be challenging.The efficacy of the proposed model for addressing uncertainty associated with hand signs in Arabic SLR applications was further demonstrated in a subsequent set of experiments, which compared it to other solutions stated in ref. [73] that were employed as a black box with default parameters.As shown in Table 4, while specific classifiers, including Naïve Bayes and MLP neural networks, achieved a better accuracy than the suggested model, the results are still relatively close.Traditional classifiers, such as k-Nearest Neighbors (kNN), Support Vector Machines (SVM), and Naïve Bayes, typically operate under assumptions of deterministic classification, where each input instance is assigned to a single class label, without quantifying uncertainty.Here are some reasons why traditional classifiers may struggle to handle the uncertainty associated with hand signs: -Traditional classifiers aim to find deterministic decision boundaries that separate different classes in the feature space.This approach may not adequately capture uncertainty when there are overlapping regions between classes or the decision boundary is ambiguous due to variations in hand signs.-Traditional classifiers typically assign a single class label to each input instance based on its feature representation.This binary decision-making process does not provide information about the confidence or uncertainty associated with the assigned class label, making it challenging to assess the reliability of classification results.-Traditional classifiers often make simplifying assumptions about data distribution and may not explicitly model uncertainty in their classification process.This can lead to suboptimal performance, especially when dealing with ambiguous or uncertain hand signs.-Hand signs in sign language can be inherently ambiguous, with similar hand configurations or movements representing multiple meanings depending on context or subtle variations.Traditional classifiers may struggle to disambiguate between such signs and accurately assign class labels in uncertain scenarios.
To address these limitations and better handle the uncertainty associated with hand signs, our work explores neutrosophic HMM, which explicitly represents uncertainty using neutrosophic logic, allowing for the representation of indeterminacy, uncertainty, and contradiction in sign language data.This enables NHMMs to capture and quantify uncertainty associated with variations in hand positions, orientations, and movements, providing a principled framework for handling uncertainty in SLR.Finally, having a larger number of samples per person helps to reduce uncertainty in the recognition process.Uncertainty arises from variations in how individuals produce signs, as well as from environmental factors such as lighting conditions, background clutter, and occlusion.By providing more examples of each sign from different perspectives and contexts, the recognition system can better learn to distinguish between different variations and reduce uncertainty in the recognition process.
The impact of NHMM topologies on the proposed system's recognition rate is examined in the fourth experiment.There are three types of models: the fully connected Ergodic model, the left-right model (LR), and the left-right banded (LRB) model.In the former, every state may be reached from any other state, while in the latter two, each state can only move backward or forward [74].The six states' average LRB topology ratio was 95.76%, as shown in Table 5.Compared to LR and ergodic topologies, the LRB topology performed better.In general, LRB topology often requires fewer parameters compared to fully connected topologies.The reduced number of parameters can lead to more efficient training, especially when dealing with limited data.Furthermore, the constrained structure of LRB topology can help mitigate overfitting, as the model is forced to learn a more constrained set of transitions.This is particularly beneficial when dealing with limited training data.While LR strictly enforces transitions from left to right without skipping any states, LRB introduces a degree of flexibility by allowing transitions between neighboring states within a specified band or distance.The choice between LR and LRB depends on the characteristics of the underlying process being modeled.LR may be sufficient for scenarios with strict sequential dependencies, while LRB can be beneficial when more flexible state transitions are needed while still maintaining a general left-to-right structure [74].Using a confusion matrix that contrasts the classifier's assigned class (column) with their correct initial class (row)-as shown in Table 6 of randomly selected samples-the fifth experiment can assess the accuracy of the NHMM recognizer with the experimentally obtained optimal parameters.This allows for the accurate recognition of items on the diagonal.Recognition errors occur when items are not on the diagonal.The results of the tests reveal that the proposed system is reliable, as it correctly recognizes sign language with few false alarms.It is clear that some signs have abysmal recognition rates.When it comes to the gesture "B", for instance, the recognition rate is 80%, and it is often categorized as "C".This is because both motions have striking similarities in the placement, motion, and orientation of the center hand.Hence, it is quite probable that the observation (feature) vectors generated from the hand-tracking segment will be exceptionally closely spaced.Therefore, the system will disorganize between the two signs, resulting in a high mistake rate for these particular gestures.Sign language involves a wide range of gestures, and there can be significant variability in how individuals express the same concept.This variability can make it challenging for NHMMs to accurately model and recognize different signing patterns.Furthermore, the recognition rates may be affected by environmental factors such as lighting conditions, background noise, or camera angle.NHMMs may be sensitive to such variations, and if the training conditions differ significantly from the testing conditions, recognition rates can drop.Finally, NHMMs may face challenges in adapting to individual signing styles.If there is a significant variation in signing styles among different signers, the model may struggle to adapt and generalize well across diverse signing patterns.
There are several scenarios where a hand gesture recognition system may provide incorrect classifications, leading to entries in the confusion matrix that do not correspond to the diagonal (where the predicted class equals the actual class).Here are a few common scenarios: (1) Gestures that are visually or semantically similar may be confused with each other, leading to misclassifications.For example, two gestures that involve similar hand movements may be too complicated for the system to distinguish accurately.(2) In real-world scenarios, input data may be noisy or ambiguous, making it challenging for the system to make accurate predictions; for example, poor lighting conditions, occlusions, or variations in hand orientation may lead to misclassifications.(3) The hand gesture recognition model may have limitations in capturing the complexity or variability of gestures in the dataset; for example, the model may not generalize well to unseen variations or may be biased towards certain gestures.Analyzing these errors can help identify areas for improvement, such as collecting more training data, refining the feature extraction process, or fine-tuning the model parameters.

Model's Computational Complexity
Determining the exact time complexity (in terms of big O notations) of a Neutrosophic Hidden Markov Model (NHMM)-based hand gesture recognition system is complex and would depend on various factors, such as the size of the input data, the complexity of the NHMM model, the number of states in the NHMM, the number of observations, the training algorithm used, and the inference algorithm that was employed.Below, we will provide an overview of the standard components of the suggested system and their computational complexities: - The complexity of the preprocessing step in hand gesture recognition can be influenced by the specific combination of preprocessing techniques that is used and the size of the input images.In many cases, the preprocessing step has a linear or near-linear time complexity with respect to the number of pixels in the input image with complexity O(N × M), where N × M is the number of pixels in the image.- The complexity of the feature extraction step based on SVD depends on the size of the input data and the desired number of features to be extracted.Consider an input matrix of size T S × T f , where T S is the number of samples (e.g., gesture images) and T f is the number of features.The time complexity of computing the full SVD of a matrix of size T S × T f is generally considered to be O(min(T S T f 2 , T S 2 T f ).After computing the SVD, the next step might involve selecting a subset of the computed singular vectors/values as features.Thus, the overall complexity of NHMM-based hand gesture recognition would be the dominant term among them O(S 2 T O ).The overall complexity of the NHMM-based hand gesture recognition system would depend on the combination of these factors and any additional preprocessing or post-processing steps involved.It is important to note that while NHMMs offer certain advantages over traditional HMMs in terms of handling uncertainty and imprecision, they also introduce additional computational complexity.

Limitations
The suggested model handles static hand posters and cannot capture dynamic gestures that involve continuous movement or changes in hand position over time.As a result, dynamic gestures, such as waving or pointing, cannot be effectively represented in static images alone.In real-time communication or interaction scenarios, dynamic gesture recognition systems are more suitable.The quality of the image-capturing device and the resolution of the image can significantly affect the accuracy of hand gesture recognition.Low-quality images or low-resolution images may not provide enough detail for accurate recognition.Finally, NHMMs may require a significant amount of data to accurately estimate model parameters, especially when dealing with neutrosophic uncertainties.Insufficient data can lead to poor model performance and unreliable inference results.The use of inference and learning algorithms for NHMMs may be computationally intensive, particularly when dealing with large datasets or complex models.This can result in longer training times and increased computational resources compared to traditional HMMs.

Conclusions
To improve the accuracy of signer-independently separated SRL identification, this research introduces an adaptive technique based on neutrosophic HMM.Using the collected SVD feature space, NHMM generates a robust approximation of the distinct recognition gestures.After training the set of pictures using the neutrosophic-based forward-backward technique, the approach constructs a database based on the test image's similarity using the neutrosophic-based Viterbi algorithm.Then, it uses that database to identify the most likely gestures.NHMMs can offer certain advantages for sign language recognition, especially when dealing with the inherent uncertainties and complexities in sign language data.
Integrating neutrosophic logic with HMMs can provide a more robust framework for sign language recognition.Here are some potential advantages: (1) Sign language gestures often exhibit inherent uncertainty and vagueness.Neutrosophic logic allows for the representation of indeterminate and vague information, providing a more flexible and realistic approach to model uncertainty in sign language recognition.(2) Ambiguity is common in sign languages, where different signs may share similar or overlapping visual characteristics.Neutrosophic logic can handle ambiguity by representing the degree of truth, falsity, and indeterminacy associated with different interpretations of a sign.
(3) Neutrosophic logic can handle contradictions, which can be helpful when dealing with conflicting or inconsistent information in sign language recognition.This can enhance the robustness of the model in scenarios where multiple interpretations or conflicting visual cues are present.(4) Neutrosophic logic extends traditional fuzzy logic by explicitly handling indeterminacy, thus providing a more nuanced representation of fuzzy concepts.This can be beneficial when dealing with subtle and fuzzy distinctions in sign language recognition.
While NHMMs offer particular advantages in dealing with uncertainty and vagueness, they also have disadvantages, such as the following: (1) Complexity: NHMMs can introduce additional complexity compared to traditional HMMs due to the incorporation of neutrosophic logic.This complexity may manifest in model formulation, inference algorithms, and parameter estimation.(2) Generalization: The generalization of NHMMs to new or unseen data may be challenging, particularly if the model is overfit to the training data or if the neutrosophic uncertainties are not representative of real-world uncertainties.
(3) Model Selection: NHMMs introduce additional parameters related to neutrosophic uncertainties, which can complicate the model selection process.Choosing an appropriate model structure and complexity for a given application may require careful consideration and experimentation.
The results of the experiments demonstrate that the proposed model is better than the type-2 fuzzy HMM, with an improvement of around 7% in terms of accurately recognizing SLR.In the future, there will be more investigations aiming to develop sophisticated structures for neutrosophic-based HMMs to capture the dynamic and complex nature of sign language gestures.This may involve exploring different state transition models and emission probabilities, incorporating additional contextual information, and enhancing the computational efficiency of neutrosophic-based HMMs to enable real-time processing for applications such as interactive sign language recognition systems or assistive devices.This may involve optimization techniques, parallel processing, or hardware acceleration.

Figure 1 .
Figure 1.Schematic diagram of a hand gesture recognition system.

Figure 1 .
Figure 1.Schematic diagram of a hand gesture recognition system.

Figure 2 .
Figure 2. Schematic diagram of proposed neutrosophic HMM-based hand gesture recognition system.

Algorithm 1 : 1 .
Neutrosophic HMM-based hand gesture recognition system Step 1: Data Preprocessing # Extract features from hand gesture images Def extract_features (gesture images): # Implement feature extraction technique Features = [] For img in gesture_images: Feature = SVD_extract_feature_from_image (img) Features.Append (feature) Return features Step 2: Neutrosophic HMM Training Initialize: -Define the number of states (N) and observations (M).-Initialize neutrosophic parameters (truth, indeterminacy, falsity) for -Transition and emission probabilities.-Initialize initial state probabilities using neutrosophic parameters.Training (Baum-Welch Algorithm): { 1. Initialize transition and emission probability matrices with random neutrosophic parameters.2. Repeat until convergence { a. Forward pass: Compute the forward probabilities using neutrosophic arithmetic.# Neutrosophic arithmetic would be employed for operations involving neutrosophic parameters, such as addition, multiplication, and comparison b.Backward pass: Compute the backward probabilities using neutrosophic arithmetic.c.Update transition and emission probabilities using neutrosophic arithmetic.} Inference (Viterbi Algorithm): { Given an observation sequence O (O = Features), initialize the Viterbi matrix and back pointer matrix.2. For each observation in O { a. Update the Viterbi matrix using neutrosophic arithmetic.b.Update back pointer matrix.} 3. Terminate: Find the most likely sequence of states using backtracking.
-Neutrosophic Emission Probabilities: Associate neutrosophic emission probabilities with each gesture state.These probabilities represent the likelihood of observing specific features or components given the current gesture state.Again, this accounts for uncertainty in the observed data.-Learning and Inference: Train the NHMM using a dataset of neutrosophic represented gestures.Learning involves estimating the neutrosophic parameters, such as transition and emission probabilities, from the training data.During inference, the NHMM can be used to recognize and classify new gestures.The model considers the uncertainty associated with each aspect of the gesture and provides a more nuanced understanding of the gesture recognition process.

Table 1 .
Comparison results of different HMM models with different numbers of states and Gaussian mixtures for an unregistered test set.

Table 2 .
Recognition rate of different sizes of SVD feature for an unregistered test set for 6 states/10 mixtures.

Table 3 .
Comparison with a similar free-hands signer-independent isolated SLR system.

Table 4 .
[73]of results of different solutions stated in Ref.[73]that use isolated static gesture signs.

Table 5 .
Isolated gesture recognition results with different NHMM topologies (six states/four mixtures).

Table 6 .
Confusion matrix (10 samples per class) with experimentally obtained optimal parameters.
Common methods for selecting features include choosing the top-k singular vectors/values.-Training Complexity: Training an NHMM typically involves estimating parameters, which often involves the Baum-Welch algorithm.The time complexity of training an NHMM can be pretty high, and it is often polynomial in terms of the number of observations and the number of states with complexity O(ST O I), where S is the number of states, T O is the number of observations, and I is the number of iterations required for convergence.-Inference Complexity: Inference in NHMMs involves computing the likelihood of the observed data given the model, which typically requires the Viterbi algorithm.The time complexity of inference in NHMMs is often polynomial in terms of the number of states and the number of observations with complexity O(S 2 T O ). -Prediction Complexity: If the prediction of future observations is required, it generally involves the use of complexities as the inference.