Artificial Intelligence-based algorithms in medical image scan seg-mentation and intelligent visual-content generation -- a concise overview

Recently, Artificial Intelligence (AI)-based algorithms have revolutionized the medical image segmentation processes. Thus, the precise segmentation of organs and their lesions may contribute to an efficient diagnostics process and a more effective selection of targeted therapies as well as increasing the effectiveness of the training process. In this context, AI may contribute to the automatization of the image scan segmentation process and increase the quality of the resulting 3D objects, which may lead to the generation of more realistic virtual objects. In this paper, we focus on the AI-based solutions applied in the medical image scan segmentation, and intelligent visual-content generation, i.e. computer-generated three-dimensional (3D) images in the context of Extended Reality (XR). We consider different types of neural networks used with a special emphasis on the learning rules applied, taking into account algorithm accuracy and performance, as well as open data availability. This paper attempts to summarize the current development of AI-based segmentation methods in medical imaging and intelligent visual content generation that are applied in XR. It concludes also with possible developments and open challenges in AI application in Extended Reality-based solutions. Finally, the future lines of research and development directions of Artificial Intelligence applications both in medical image segmentation and Extended Reality-based medical solutions are discussed


Introduction
The human brain, a paramount example of evolutionary biological sophistication, transcends its anatomical categorization.Constituted by an estimated 86 billion neurons linked through an intricate web of synapses (ranging in the trillions), it is the epicenter of our cognitive, emotional, and consciousness-related functions [1].This masterful structure of the central nervous system represents a nexus of myriad neurobiological processes, intricately overseeing sensory input conversion, motoric responses, and advanced cognitive functionalities.As a product of relentless evolutionary adaptations spanning millions of years, the brain epitomizes the apex of neurobiological optimization, synergizing complex neural circuitry with higher-order cognitive undertakings such as cognitive reasoning, emotional homeostasis, and the intricate processes of memory encoding, storage, and retrieval [1][2][3].Thus, the human brain is a super-complex system whose functioning and intelligence depend rather on the type of neurons (depending on their role in the brain), their connections, and the way of supplying energy to neurons than the number of neurons [2].It is an ideal reference model for the foundations of Artificial Intelligence (AI) [3,4].
Thus, processing and analysis of biomedical data for diagnostic purposes is a multidisciplinary field that combines AI, Machine Learning (ML), biostatistics, time series analysis as well as statistical physics and algebra (e.g.graph theory) [3].Variables derived from biomedical phenomena can be described in several ways and in different domains (time, frequency, spectral values, spaces of states describing the biological system), depending on the characteristics and type of signal.Effective diagnosis of the early stages of the disease, as well as the determination of disease development trends, is a very difficult issue that requires taking into account many factors and parameters.Therefore, the state spaces of biomedical signals are huge and impossible to fully search, analyze, and classify even with the use of powerful computational resources.Therefore, it is necessary to use Artificial Intelligence, in particular, bio-inspired AI methods to limit research to a smaller but significant part of the state space.
Recently, computer-generated three-dimensional (3D) images have become increasingly important in medical diagnostics [5,6].In particular, Extended Reality (XR) so-called Metaverse is increasingly used in health care and medical education, while it enables the deeper experience of the virtual world, especially through the development of depth perception, including the rendering of several modalities like vision, touch, and hearing [7].In fact, medical images have different modalities and their accurate classification at the pixel level enables the accurate identification of disorders and abnormalities [8,7].However, creating a 3D model of organs and/or their abnormalities is time-consuming and is often done manually or semi-automatically [10].AI can automate this process and also contribute to increasing the quality of the resulting 3D objects [11,12] as well as visual content in the Metaverse [4,13].To give the users a real sense of visual immersion, the developers should implement virtual objects of high quality [14].In the context of medicine, it is combined with good quality medical data and their classification/segmentation algorithms with high accuracy, to faithfully reproduce the content in virtual three dimensions.
In this study, we aim to determine existing research gaps in the area of broadly understood medicine, including clinical trials in the application of explainable Artificial Intelligence.For that reason, this paper focuses on the overview of Artificial Intelligencebased algorithms in medical image scan segmentation and intelligent visual content generation in Extended Reality, including different types of neural networks used and learning rules, taking into account mathematical/theoretical foundations, algorithm accuracy, and performance, as well as open data availability.Specifically, we aim to answer the following research questions: can AI- In this study, we concentrate on the theoretical foundation of neural communication, the model of neurons, the type of neural networks, and learning rules with a special emphasis on their application in medical image scan segmentation and intelligent visual-content generation.We analyzed the Artificial Neural Networks (ANNs), Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Spiking Neural Networks (SNNs) as well as Generative Adversarial Networks (GAN), Graph Neural Networks (GNNs) and Transformers.The first one is made by the simplest neuron model (i.e.perceptions) and can process the information only in one direction.The second one consists of multilayer perceptrons and contains one or more convolutional layers that are responsible for the creation of feature maps, which are subjected to nonlinear processing.RNNs save the output to the processing nodes and feed the result back into the network (bidirectional information processing).The last type is closest to the real nervous system.SNNs transmit the information in the case the membrane potential of a neuron reaches the threshold not in every cycle of propagation like other listed neural networks.Another field that we analyzed in the context of medicine is learning rules, including backpropagation (i.e. in which the weight of the network is calculated according to the chain rule of the partial derivatives of the error function), ANN-SNN Conversion (i.e.transforming SNN networks into ANNs and application of the learning rules that are efficient in ANNs), Supervised Hebbian Learning (i.e. the postulate based on the rule that when the human brain in learning the neurons activates), Reinforcement Learning with Supervised Models (i.e. it enables the monitoring the reaction on the learning rule), Chronotron (i.e.learning rules which take into account both, spiking neuron and the time of spiking), and biologically inspired network learning algorithms.

Neural communication
Neurons, which are basic brain building blocks, function as the core computational units of the brain, underpinning the vast expanse of conscious and subconscious processes, and defining our neural identity with each electrochemical interaction [11].Neurons communicate with other neurons, and non-neuronal cells like muscles and glands by biological connections called "chemical synapses", which are the communication points, at which sending nerve cells called presynaptic neurons, transmit the message to receiving nerve cells called the postsynaptic neuron.The presynaptic neuron releases neurotransmitters, a diverse group of chemicals, into the synaptic cleft (i.e.small gap at which neurons communicate).Following the release, these compounds traverse the synaptic gap, interacting with receptors on the postsynaptic membrane, eliciting a series of intracellular events, potentially leading to the generation of an action potential, a transient depolarizing event propagated along the neuronal membrane.
Since the famous experiments of Adrian [17,177], it is assumed that in the nervous systems (including the brain), information is transmitted through weak electric currents (on the order of 100 (mV)), in particular employing action potentials (spikes) that are a transient, sudden (1-2 millisecond) change in the membrane potential of the cell/neuron associated with the transmission of information [18].The stimulus for the creation of an action potential is a change in the electric potential in the cell's external environment.A wandering action potential is called a nerve impulse.In literature [19,20] it is assumed that the sequences of such action potentials, called spike-trains, play a key role in the transmission of information, and the times of appearance of these action potentials play a significant role.Mathematically, such time sequences can be and are modeled in particular after digitalization as trajectories (or their various variants) of certain stochastic processes (Bernoulli, Markov, Poisson, ...) [19,[21][22][23][24][25][26][27].

Taxonomy of neural network applied in the medical image segmentation process
The Artificial Neural Networks (ANNs) are constructed with the perceptron neuron model [28] that is based on the binary decision rule.If the linearly weights   the sum of the input signals (input vector   ) exceeds the threshold  ℎ neuron fires (i.e. the output is equal to 1) or if not output is equal to 0.
The basic input function is described as follows The output vector of all neurons in -th layer can be expressed as well as the combination of the linear transformation and nonlinear mapping (i.e.ANN activation values) [29].
where   is the weight matrix between layer  and  − 1, and ℎ(•) denotes the activation function, in this case, Rectified Linear Unit (ReLU) () =  + = max(0, ) and the vector   denotes the output of all neurons in -th layer.The formula (2) has been quoted following the designations in the publication [29].Neuron models from the Integrate-and-Fire family are among the simplest, however, also the most frequently used.They are classified as spiking models.From a biophysical point of view, action potentials are the result of currents flowing through ion channels in the membrane of nerve cells.The Integrate-and-Fire neuron model [30,31] focuses on the dynamics of these currents and the resulting changes in membrane potential.Therefore, despite numerous simplifications, these models can capture the essence of neuronal behavior in terms of dynamic systems.
The concept of Integrate-and-Fire neurons is the following: the input ion stream depolarizes the neuron's cell membrane, increasing its electrical potential.An increase in potential above a certain threshold value ℎ produces an action potential (i.e. an impulse in the form of Dirac's delta) and then the membrane potential is reset to the resting level.The leaky Integrate-and-Fire (LIF) neuron model [30,31] is an extended model of the Integrate-and-Fire neuron, in which the issue of time-independent memory is solved by equipping the cell membrane with a so-called leak.This mechanism causes ions to diffuse in the direction of lowering the potential to the resting level or another level  0 →   <  ℎ .Thus, the third generation of neural networks, i.e. the Spiking Neural Networks (SNN) [32] are mostly based on the LIF, where the membrane potential () is determined by the equation where τ m is the membrane time constant of the neuron, R m is total membrane resistance, and I(t) is the electric current passing through the electrode.The spiking events are not explicitly modeled in the LIF model.Instead, when the membrane potential U(t) reaches a certain threshold U th (spiking threshold), it is instantaneously reset to a lower value U rest (reset potential) and the leaky integration process starts a new one with the initial value U r .To mention just a little bit of realism to the dynamics of the LIF model, it is possible to add an absolute refractory period Δ abs immediately after U(t) hits U th .During the absolute refractory period, U(t) might be clamped to U r, and the leaky integration process is re-initiated following a delay of Δ abs after the spike.More generally, the membrane potential (3) can be presented as <  =1 (4) where () is a fixed casual temporal kernel that is an operation that allows scale covariance and scale invariance in a causaltemporal and recursive system over time [33] and   ,  = 1, . .,  denotes the strength of neuron synapses.Following Equation (2), the neuron's output   () (membrane potential after the neuron firing) can be described as follows [29]   () =   ( − 1) +    −1 ()  = 1, … ,  (5) where   denotes the membrane potential before the neuron fires,   is the weight in -th layer ( denoted layer index), and  −1 () is the input from the last layer.Thus, to avoid the loss of information the reset-by-subtraction" mechanism was introduced [34] where   () is membrane potential after firing,   () − membrane potential before firing, (  () −   ) refers to the output spikes of all neurons, and   is a vector of the firing threshold   .There are also some applications of the concepts of the metaneuron model in SNNs [35].The main differences between the LIF neuron and meta neurons stay in the integration process, where meta neurons use a 2nd-order ordinary differential equation and an additional hidden variable.The basic differences between ANN and SNN (taking into account the type of neuron models) are presented in Figure 1.
Figure 2. The scheme of the basic differences between ANN and SNN takes into account the type of neuron models.

Convolutional Neural Network
The most commonly used deep neural network (DNN) in medical image classification is the two-dimensional (2D) Convolutional Neural Network (CNN) [36,37].In Figure 2. The basic scheme of the SNN is presented.Its principle of operation is based on linear algebra, in particular matrix multiplication.CNNs consist of three types of layers: a convolutional layer, a pooling layer, and a fully connected layer.In fact, most computations are performed in the convolutional layer or layers.The image (pixels) is converted into binary values and patterns are searched.Every convolutional layer operates a dot product between two matrices, namely one matrix is a set of learnable parameters (kernel), and the second matrix is a limited part of the receptive field.Each subsequent layer contains a filter/kernel that allows you to classify features with greater efficiency.A pooling layer reduc es the number of parameters in the input, which causes the loss of part of the information calculated in the common layer/layers, however, it allows for improvement in the efficiency of the CNN network.This operation is performed by sliding windows [38].Next, the output of these two layers is transformed into a one-dimensional vector, i.e. input to the fully connected layer.In this last type of layer, image classification based on the features extracted in the previous layers is performed, i.e. the object in the image is recognized.The output  , () from CNN can be described as follows . () +  () ) (7) where  , () denotes input to the network at the spatial location (, ),  is the activation function,  . () is the weight of the th kernel at the th channel producing the th feature map, and  () ) is the bias for th feature map.
In the case of large datasets, CNN achieves high efficiency and is resistant to noise [39].The crucial disadvantages of CNNs in image processing are high computational requirements and difficulties in achieving high efficiency in the case of small datasets (i.e. if the dataset is too small the network may overfit to training data, and poorly recognize new data).

Recurrent Neural Network
Another neural network commonly applied in medical data analysis is the Recurrent Neural Network (RNN) [40].In the Figure 3.The basic scheme of the RNN is presented.This type of network contains at least one feedback connection.The output of RNN can be expressed as [41]   =  ℎ ℋ( ℎℎ ℎ −1 +  ℎ   +  ℎ )ℎ  +   (8) where   ,  = 1, … ,  is the input sequence of T states (  , … ,   ) with   ∈ ℝ  ,  ℎ ,  ℎ , ℎℎ denotes weight matrices,  ℎ ,   are bias vectors, and ℋ is the non-linear activation function, for example, ReLU, Sigmoid () = 1 1+ − , Tanh Function (Hyperbolic Tangent)() =   − −   + − .The network operation is recursive since the hidden layer state depends on the current input and the previous state of the network.Thus, the hidden state ℎ −1 is the memory of past inputs.
Thus, the RNN can operate on the sequential dataset and has an internal memory.It may have many inputs.However, RNNs exhibit learning-related problems, namely vanishing gradients (i.e. in the case of small gradients the updates of parameters are irrelevant) or exploding gradients (i.e.superposition of large error gradients leading to large parameter updates).These contribute to the long training process, low level of accuracy, and low network performance.

Spiking Neural Networks
Besides the Artificial Neural Networks, i.e.CNNs, and RNNs, one can also be applied to the medical signals bio-inspired neural networks like Spiking Neural Networks [41,42].In the Figure 4.The basic scheme of the SNN is presented.SNNs encode information taking into account spike signals, and shells are promising in effectuating more complicated tasks, while the more spatiotemporal information is encoded with spike patterns [43].They are mostly based on the LIF neuron model.SNNs were formulated to map organic neurons, i.e. the appearance of the presynaptic spike at synapse triggers the input signal () (the value of the current) that in the simplified cases can be written as follows where   denotes synaptic time constant,   is a presynaptic spike train,  is time [44].In contrast, the majority of DNNs do not take into account temporal dynamics [45].In fact, SNNs show promising capability in playing a similar performance as living brains.Moreover, the binary activation in SNNs enables the development of dedicated hardware for neuromorphic computing [46].The potential benefits are low energy usage and greater parallelizability due to the local interactions.

Learning algorithms
The heart of Artificial Intelligence is its learning algorithms.At their core, strive to automate the learning process, enabling machines to recognize patterns, make decisions, and predict outcomes based on data.Their design is often a balance between theoretical rigor and practical applicability.While mathematics and statistics provide the foundation, translating these into algorithms that can operate on vast and diverse datasets requires creative programming skills [22].One can distinguish many types of network training algorithms [47].Below we briefly discuss the most important of them, taking into account the theoretical foundations.

Back Propagation Algorithm
The most commonly used learning algorithm is the back propagation (BP) algorithm.Ititers overweight optimizations via error propagation in the neural networks.BP plays a pivotal role in enabling neural networks to recognize complex and non-linear patterns from large datasets [23,48,49].From the mathematical point of view, it is a calculation of the cost function, which minimizes the calculated error of the output using gradient descent or delta rule [50].It can be split into three stages: forward calculation, backward calculation, and computing the updated biases and weights.The input to the hidden layer   is the weighted sum of the outputs of the input neurons and can be described as [51]   =   + ∑      = (10) where   is the input to the network (input layer),  is the number of neurons in the input layer,   is the bias input layer, and   denotes the weight associated with the -th input neuron and the -th hidden neuron.The output   is as follows where (()) is a transfer function,  is the number of neurons in the hidden layer, and   is the bias of the hidden layer.The most commonly used transfer function is the sigmoid transfer function (()) =  + −(()) .The back propagation algorithm is especially effective when used in multi-layered neural architectures such as feed-forward neural networks, convolutional neural networks, and recurrent neural networks [26].In image recognition, CNNs, energized by BP, can independently identify hierarchical features, from basic edges to detailed structures.Similarly, RNNs, amplified by BP, are adept at sequence-driven tasks like machine translation or speech recognition, as they incorporate previous data to influence present outputs.It is one of the most effective deep learning methods.However, BP requires large amounts of data and enormous computational efforts.

ANN-SNN Conversion
Artificial Neural Networks and Spiking Neural Networks are both computational models inspired by biological neural networks.While ANNs have been the mainstream for most deep learning applications due to their simplicity and effectiveness, SNNs are gaining traction because they mimic the behavior of real neurons more closely by using spikes or binary events for communication.To obtain a similar accuracy of the SNN-based algorithm as the algorithm using ANN, for example, the BP-type training rule consumes a lot of hardware resources.The already existing platforms have limited optimization possibilities.Thus, the conversion of ANNs to SNNs seeks to harness the energy efficiency and bio-realism of SNNs without reinventing the training methodologies [28], while it is based on the ReLU activation function and LIF neuron model [52].The basic principle of the conversion of ANNs to SNNs is mapping the activation value of the ANN neuron to the average postsynaptic potential (in fact, firing Rate) of SNN neurons, and the change of the membrane potential (i.e. the basic function of spiking neurons) can be expressed by the combination of the Equation ( 2) and Equation ( 6) [29]   () −   ( − ) =    − () −   ()  (12) Here   () refers to the output spikes of all neurons in layer  at time .
Tuning the right thresholds is paramount for the SNN to effectively and accurately represent information.Incorrectly set thresholds could lead to either too frequent or too rare spiking, potentially affecting the accuracy of the SNN post-conversion [35].On the other hand, the neuromorphic hardware platforms that support SNNs natively can primarily offer energy efficiency benefits by converting ANNs to SNNs.Due to their event-driven nature, SNNs can be more computationally efficient [36].However, the challenge lies in maintaining accuracy post-conversion.Some information might be lost during the transition, and not all ANN architectures and layers neatly convert to their SNN equivalents.The conversion from ANNs to SNNs is a promising direction, merging the advanced training methodologies of ANNs with the energy efficiency of SNNs.As we delve deeper into the realm of neuromorphic computing, this conversion process will play a pivotal role in bridging traditional deep learning with biologicallyinspired neural models [37,38].

Supervised Hebbian Learning (SHL)
Taking into account Artificial Intelligence, Supervised Hebbian Learning (SHL)can be described as a general methodology for weight changes [53].Thus, this weight increases when two neurons fire at the same time, while it decreases when two neurons fire independently.According to this rule, the change in weight can be written ∆ = (  −   ) (13) where  is the learning rate (in fact, the small scalar that may vary with time,  > ),   the actual time of the postsynaptic spike, while   is the time of firing of the second presynaptic spike [54,55].The crucial disadvantage of Hebbian learning is the fact that when the number of hidden layers increases the efficiency decreases, while in the case of 4 layers is still competitive [56].

Reinforcement Learning with Supervised Models
According to the additional constraints in the SHL rule, Reinforcement Learning with Supervised Models (ReSuMe) was proposed [54].ReSuMe, is a dynamic hybrid learning paradigm.It effectively combines the resilience of Reinforcement Learning (RL) with the precision of Supervised Learning (SL).This fusion empowers ReSuMe to leverage feedback-driven mechanisms inherent in RL and take advantage of labeled guidance typical for SL [37,38,39].The difference between SHL is that the learning signal is expected not to have or have a marginal direct effect on the value of the postsynaptic somatic membrane potential [57], thus the synaptic weights are modified as follows where  denoted learning rate,   is desired/targeted spike train,   () is the output of the network (spike train), and  ̅  () expresses the low-pass filtered input spike train.ReSuMe guided one of its most salient features exploration.By leveraging labeled data via SL, ReSuMe can effectively steer RL exploration, ensuring agents avoid falling into the trap of suboptimal policies.The hybrid nature of ReSuMe also grants it a unique resilience, especially in the face of noisy data or in reward-scarce environments.Moreover, its adaptability is noteworthy, making it an ideal choice for tasks that combine immediate feedback (through SL) with long-term strategic maneuvers (through RL).However, like all things, ReSuMe is not without challenges.A potential bottleneck in ReSuMe is computational complexity, as managing both RL and SL can sometimes strain computational resources.Another challenge is the precise tuning of the  coefficient.The key is to find a balance where neither RL nor SL overly dominates the learning process.By melding immediate feedback from supervised learning with a deep reinforcement learning strategy, ReSuMe establishes itself as a formidable tool in Machine Learning [49,50,52].

Chronotron
The Chronotron, by its essence, challenges and reshapes our understanding of how information can be encoded and processed in neural structures [50,55].Traditional neural models have predominantly focused on the spatial domain, emphasizing the architecture and interconnections between neurons.While this spatial component is undeniably critical, it offers only a part of the full informational symphony that the brain plays.Just as the rhythm and cadence of a song contribute as much to its essence as its melody, in the vast theater of the brain, timing is not just a factor; it is a storyteller in its own right.The brilliance of the Chronotron lies in its ability to discern and respond to this temporal narrative.Unlike its counterparts, which often treat time as a secondary parameter, the Chronotron places it center stage.As a consequence, it acknowledges and leverages the intricate interplay of spatial and temporal dynamics in neural computation.This means that it doesn't just consider which neurons are firing, but also pays meticulous attention to when they fire concerning one another.Thus, the membrane potential is ≤  (15) where the models the  model's refractoriness is caused by the past presynaptic spikes,   is the synaptic efficacy,    is the time of appearance of the -th presynaptic spike on the  synapse,   (,    )denotes normalized kernel [58].When () reaches the threshold level, a spike is fired.And () is reset to the value of reset potential.In this approach, it is crucial to find the appropriate error functions, i.e. such an error function that enables the minimization with a gradient descent method [59].The advantage of this learning rule is the fact that it uses the same coding for inputs and outputs.Chronotron's hallmark, its granularity, can sometimes surge computational demands, especially during intense training.And like many cutting-edge neural frameworks, harnessing Chronotron's full potential can be intricate, necessitatin' fine-tuned parameters and rich, well-timed data.

Bio-inspired Learning Algorithms
Brain-inspired Artificial Intelligence approaches, in particular spiking neural networks, are becoming a promising energyefficient alternative to traditional artificial neural networks [60].However, the performance gap between SNNs and ANNs has been a significant obstacle to the wild SNNs application (applicable SNNs).To fully use the potential of SNNs, including the detection of the non-regularities in biomedical signals, and designing more specific networks, the mechanisms of their training should be improved, one of the possible directions of development is the bio-inspiring learning algorithms.Below we briefly discuss the most important of them.

Spike Timing Dependent Plasticity
Spike Timing Dependent Plasticity (STDP) is rooted in the idea that the precise timing of neural spikes critically affects changes in synaptic strength [61].This principle highlights the intricate dance between time and neural activity, showcasing the dynamics of our neural circuits.This biologically plausible learning rule is a timing-dependent specialization of Hebbian learning (13) [62].STDP shed light on the intricate interplay between timing and synaptic modification.It is based on the change in synaptic weight function ∆ = ( + )(;   −   ) (16) where  denotes the learning speed,  is Gaussian white noise with zero mean, while (;   −   ) is the function, that determines the long-term potentiation (LTP, ie.presynaptic and postsynaptic neurons emit a high rate) and depression (LTD, i.e. presynaptic neurons emit a high rate) in the time window   −   [ ) for   −   >  (17) where () is a scaling function that determines the weight dependence, while  denotes the time constant for depression [61][62][63].STDP's significance is underpinned by its numerous advantages.Chiefly, it offers a biologically authentic model by 'mimicking the temporal dynamics observed in real neural 'systems.Furthermore, its event-centric nature promotes unsupervised learning, enabling networks to autonomously adjust based on the temporal patterns present in input data.This time-based sensitivity equips STDP to adeptly process data with spatiotemporal attributes and detect intricate temporal relationships within neuronal signals [64,65].However, STDP is not without its complexities.A prominent challenge is the fine-tuning of parameters.The exact values assigned to constants like ()and  can substantially dictate the behavior and efficacy of STDP-informed networks.Balancing these values requires a meticulous approach.Moreover, the precision demanded by STDP's time-centric nature often calls for higher computational rigor, especially within simulation contexts.STDP stands as a testament to the elegance and intricacy of neural systems.By emphasizing the role of spike timing, STDP offers a vivid depiction of how synaptic interactions evolve [66,67].
5.6.2Spike-Driven Synaptic Plasticity Spike-Driven Synaptic Plasticity (SDSP) offers the ability to elucidate the causality in neural communication.It operates on a fundamental principle: the sequence and timing of spikes determine whether a synapse strengthens or weakens.If a neuron consistently fires just before its downstream counterpart, it's a strong indication of its influential role in the latter's acti vity.This "prebefore-post" firing often leads to synaptic strengthening, cementing the relationship between the two neurons.Conversely, if the sequence is reversed, with the downstream neuron firing before its predecessor, the connection may weaken, reflecting a lack of causal influence [68,69].This causative aspect of SDSP provides valuable insights into the learning mechanisms of the brain.It suggests that our neural circuits are continually evolving, adjusting their connections based on the flow of spike-based information.Such adaptability ensures that our brains remain receptive to new information, enabling us to learn and adjust to ever-changing environments.Moreover, SDSP emphasizes the significance of precise spike timing.In the realm of neural computation, milliseconds matter.Small shifts in spike timing can change a synapse's fate, showcasing the brain's precision and sensitivity.This meticulousness in spike-driven modifications underscores the importance of timing in neural computations, hinting at the brain's capacity to encode and process temporal patterns with remarkable accuracy [70][71][72][73].In this learning rule the changes in synaptic weights can be expressed as [64] where  + >  and  − <  denotes the learning parameters,  + and  − are time constraints, and ∆ is the difference between post-and pre-synaptic spikes.This representation, while streamlined, encapsulates the principle that the mere presence of a spike can induce modifications in the synaptic weight, either strengthening or weakening the connection based on the specific neural context and the directionality of the spike's influence [70].
The appeal of Spkie-Driven Synaptic Plasticity is manifold its primary virtue is its biological relevance.Focusing on individual spike occurrences mirrors the granular events that take place in real neural systems.Such an approach facilitates the modeling of neural networks in scenarios where individual spike occurrences are of paramount importance.Furthermore, by anchoring plasticity on singular events, this model is inherently suitable for real-time learning and rapid adaptability in dynamic environments [Błąd!Nie można odnaleźć źródła odwołania.].
A crucial challenge lies in the accurate capture and interpretation of individual spikes, especially in densely firing neural environments.Moreover, the plasticity model's sensitivity to' single events 'means that it can' be susceptible to noise, requiring sophisticated filtering mechanisms to discern genuine learning events from spurious spikes.SDS elucidates the profound influence of singular neuronal events on the grand tapestry of neural learning and adaptation [75].

Tempotron Learning Rule
One of the most interesting biological-inspired learning algorithms is the tempotron principle [65,76,77].It is designed to adapt synaptic weights based on the temporal precise patterns of incoming spikes, rather than only the frequency of such spikes.While traditional neural models might emphasize synaptic weights or connection topologies, tempotron underscores that the 'when' of a neural event can be as informative, if not more so, than the 'where' or 'how often' [78][79][80].The tempotron learning rule is based on the LIF neuron model.It fires when the membrane potential described by Equation (4) exceeds the threshold (binary decision).Thus, one can define the potential of the neuron's membrane as a weighted sum of postsynaptic potentials (PSPs) from all appearance spikes [77]  )) where   is the decay time constant of membrane integration, while   denotes the decay time constant of synaptic currents.While the   normalized the PSP that the maximum kernel value is equal to 1.The neuron is fired when the value of the potential of the neuron's membrane described by Equation ( 19) is greater than the value of the firing threshold.Next, the potential of the neuron's membrane described by Equation ( 19) smoothly decreases to the value of   .In the case of the segmentation/classification task, the input to the neuron may belong to one of two classes, namely  + when a stimulus occurs (i.e.pattern is presented) the neuron should fire), and  − when the pattern is presented neuron should not fired.Each input consists of  spike trains.In turn, the tempotron learning rules are as follows ∆  =  ∑ (  −   ) <  (21) where   is the time when the potential of the neuron's membrane (19) reaches a maximum value.While  is the constant that is greater than zero in the case of  + , and smaller than zero in the case  − .In this operation, tempotron introduces gradient-decent dynamics, i.e. minimizing the cost function for each input pattern measures the maximum voltage that is generated by the erroneous patterns.In comparison to the STDP learning rule, tempotron can make the appropriate decision under a supervisory signal, by tuning fewer parameters than STDP.Thus, tempotron uses LTP and LTD mechanisms like STDP.The advantage of the tempotron learning rule is the speed of learning.

Neural networks and learning algorithms in the medical image segmentation process
Image segmentation has a crucial role in creating both, medical diagnosing supported by image analysis and virtual object creation like the medical digital twin (DT) of organs [66,67], holograms of the human organs [81,82], and virtual medical simulators [68,83].One can split the image segmentation process into semantic segmentation (i.e.assigning a label or category to each pixel), instance segmentation (i.e.identifying and separating individual objects in an image and assigning a label to it), and panoptic segmentation (i.e. more complex tasks, which involves the two segmentations above) [77,78].The application of AI enables to increase in the efficiency and speed of these processes [84].In Table 1. the comparison of the AI-based algorithms applied in medical image scan segmentation taking into account the neuron model, the type of neural network, learning rule, and biological plausibility is shown.It turned out that the most commonly used in image segmentation are CNNs, in particular, Unet architecture and its variations [71,72. 74, 75, 85].In [73] the authors modified this neural network structure by adding dense and nested skip connections (UNet++), while [178] added the residual blocks and attention modules to enable the network to learn deeper features and increase the effectiveness of segmentation.To connect the efficiency of segmentation with access to global semantic information, often CNNs are combined with transformer blocks [85][86][87].Another CNNs-based algorithm commonly used in medical image segmentation is You Only Look Once (YOLO), which is open-source software used under the GNU General Public License v3.0 license [88,179].It uses one fully connected layer, the number (depending on the version) of convolution layers that are pre-trained with the CNN (YOLO v1 ImageNet, YOLO v2 Darknet-19, YOLO v3 Darknet-53, YOLO v4 CSPNet, YOLO v5 EfficientNet, YOLO v6 Effi-cientNet-L2, YOLO v7 ResNET, YOLO v8 RestNet), and pooling layer.The algorithm divides the input in the form of a photo into specific segmentations and then uses CNN to generate bounding boxes and class predictions.Recently, in image classification, SNN has become more popular [78,79] due to its low power consumption.However, SNN training rules require refinement to achieve ANN accuracy.However, the development of an efficient, automatic segmentation procedure is of high importance [207].
Recently, Transformer Networks that were designed to machine translation (Natural Language Processing task) have been applied in the field of image processing, including medical image processing [180].This architecture is based on network normalization feed-forward network and residual structures (namely Multi-Head Attention (MHA) and Positionwise Feed-Forward Networks), while it does not contain any convolutions [181].Such an architecture enables them to achieve a powerful ability to represent long-term receivables.Thus, the architecture of transformers in the field of computer vision contains only vision transformers (ViT) and Swin transformers [182,183].The MHA has multiple attention modules which learn different aspects in different subspaces.In [184] was shown that Transformers may have a higher level of efficiency in the field of image processing compared to the CNNs taking into account learning which is applied to large datasets.To increase the applicability and accuracy of transformers in the area of image processing, data augmentation and regularization strategies are used, among others [185].On the other hand, vision transforms do not contain inductive biases.Also, the combination of CNNs and Transformers was applied to image processing [186], which contributed to reducing the consumption of computing resources and training time [187,188].The main disadvantages of Transformers are the need for commitment of large amounts of computational resources and the requirements of the long training time.Also, Generative Adversarial Networks are applied to medical image fusion [190].This type of approach divided the neural networks into two parts: generators (learn to generate reliable data.The generated instances become negative examples for training the second part of the network) and discriminators (i.e.binary classifiers that learn to distinguish generators from real data, see Figure 5.The discriminator presents a penalty for generating meaningful results.The output of the generator is connected directly to the input of the discriminator.In back-propagation, the discrimination classification determines the signal that the generator uses to update the weights.In fact, GANs are the paris of CNNs that are connected adversarial.The difference between them is their approach to getting results.For example, in [191,192] GAN was successively applied to the segmentation of the blood vessels of the retinal and the coronary with high accuracy.However, centralized training algorithms can potentially mishandle sensitive information such as medical data.Additionally, GANs have significant security issues, such as vulnerabilities that exploit the real-time nature of the learning process to generate prototype samples of private training sets [194].Also, the use of deep neural networks such as CNN and GAN is limited by the need to have large annotated data sets, which is quite a challenge, especially in medicine [193].All the above solutions are based on Euclidean space data that have fixed dimensions.However, data can be also presented in non-Euclidean space (i.e.graphs, namely a set of objects (vertices) and relationships between these objects (edges)) [195].This kind of data has dynamical dimensions, i.e. the input data do not have to be in particular order like in the case of Euclidean space data [196].Thus, in the field of medical data processing also occur irregular spatial patterns that may be important for the diagnostics point of view.The analysis of these patterns is a challenge that is proposed to be solved by applying Graph Neural Networks [197,202,204].GNNs are based on the convolution operation on the graphs, see Figure 6.One disadvantage of GNNs is the fact that they are strongly dependent on the geometry of the graph.Consequently, the neural network must be trained every time data is added.In the context of large image diagnostics, this can make a less practical approach.Also, the low computational speed of GNNs taking into account medical data processing contributes to the fact that GNNs need further development for practice applications.As a solution to improved calculation efficiency, a framework for inductive representation learning on large graphs i.e., GraphSAGE was proposed [198].Thus, GNNs can expand the possibilities of training CNNs on non-grid data [199].In the field of medical image segmentation, GNNs find especially application in tissue semantic segmentation in histopathology images [203,205,206].In the case of tumor segmentation, the application of CNN leads to the number of parameters that contribute to the high computational complexity.Here, the combination of CNN and GNN is a very promising solution, like in the [206].First, the two-layer CNN was applied to the creation of the feature maps, and then two GNN layers were used to selectively filter out the discriminative features.
Recently, also Sinusoidal Representation Networks so-called SIRENs were applied to image segmentation.The essence of this approach is based on periodic activation functions for implicit neural representations.In fact, this AI solution mostly applied the sine periodic activation function.In [208] was proposed to analyze images, and in [209,210] to segment medical images (cardiac MRI).However, this approach in the field of medical image segmentation requires still improvement.
Another interesting algorithm for natural image segmentation with was recently developed (April 2023) by Meta is the Segmentation Anything Model (SAM) [89,90].This AI-based algorithm enables cutting out any object from the image with a single click.It uses CNNs and transformer-based architectures for image processing, in particular, transformers-based architectures are applied to extract the features, compute the embedding, and pomp the encoder.The first attempt has been made to apply it in the field of medical imaging, however, in medical segmentation, it is still not so accurate in comparison to other application fields [91,92].The imperfections of the SAM algorithm in the field of medical image segmentation are mainly connected to insufficient numbers of training data.In [93], the authors proposed to apply the Med SAM Adapter to overcome the above limitations.The pretraining method like Masked Autoencoder (MAE), Contrastive Embedding-Mixup (e-Mix), and Shuffled Embedding Prediction (ShED) was applied.There is a lot of work in the area of medical image segmentation using machine learning, but relatively little addresses the issue related to the network learning process itself (along with data, a key element in achieving high accuracy of the process) [94].
The comparative comparison of the neural network architectures, learning algorithms, and datasets that are applied in the field of image segmentation in medicine is shown in Table 1.It turned out that still the most commonly used (taking into account accuracy of prediction) in these areas are ANN, and CNN constructed with the perceptrons and LIF neuron models and BP learning rules.Thus, the most commonly used learning algorithms in medical image segmentation are still on the low level of biological plausibility.On the other hand, in other image segmentation, in particular, biologically plausible learning algorithms are applied, for example, in the field of the images of handwritten digits [77].Thus, Table 1 presents works that contain information about the neuron model, architecture and type of neural network, input and output parameters to the network, and type of learning algorithm.The segmented structures (in this case organs and their disorders) may be next applied to the development of the 3D virtual environment [105].These 3D objects may be implemented through for example, holograms displayed in the head-mounted display (HDMs) like Mixed Reality glasses in medical diagnostics [113], pre-operative imaging [114], surgical assistance [115,116], robotics surgery [117], and medical education [81,82].However, the crucial issue is connected with the quality of obtained segmented structures, and this process can be significantly accelerated and improved by the use of Artificial Intelligence.
The crucial issue when the AI-based system is developed is connected with the accuracy and performance of algorithms.Many metrics have been introduced in image segmentation that enable the evaluation of algorithms.They can be split into two metric types, binary which takes into account two types of classification, and multi-class classification, in which the number of classes is higher than two.These metrics have been widely described in [163,164].The most commonly used metrics in medical image segmentation are the binary classifier − (F-measure) or so-called 1− (also called Dice Coefficient) [165], Mean Absolute Error () [166], Mean Squared Error (), Root-Mean-Squared Error (), Area Under Receiver Operator Curve () [167] as well as Index of Union (o) [168].

Data availability
One of the key issues in the development of AI algorithms in the field of medicine is the availability and quality of data, i.e. access to electronic health records (EHRs) [118,119].Thus, the medical data should be anonymized.In Table 2 a summary of publicly available retrospective image scan medical databases is presented.Some authors also provide anonymized data upon request.It is worth stressing that data, including medical image scans, are subjected to various types of biases [120].The databases listed in Table 2 do not contain precise information regarding, for example, the ethnic composition of the study participants.Their age ranges and gender are usually disclosed.Moreover, another important issue concerning medical data is connected with Internet segmentation errors as was pointed out in the [169].The authors discovered that the publicly available dataset contains duplicate records, which may contribute to the overlearning of some patterns in AI and ML models as well as result in false predictions.Also, the random procedure of splitting the database into raining and tasting sets will then influence the results obtained.As a consequence may lead to obtaining inflated classification.

Discussion and conclusions
The most commonly used neural networks in the field of medicine are CNNs and ANNs, see Table 1.Moreover, the combination of Transformers and CNNs as well as GAN allows users to achieve increasingly more accurate results, these methods require refinement.It is also worth noting that the diagnostics processes require the interpretation of visual scenes, and here GNNs-based solutions like scene graphs [200] and knowledge graphs [201] may be beneficial.It is also important to remember that GNNs are designed to perform tasks that neural networks like CNN cannot perform.SIRENs seem also to be an interesting solution.What was surprising was the fact that many works on the use of Machine Learning do not contain a detailed description of the neural network architecture and the description of learning, even the description of the data sets is very general (i.e.treats AI as a black box), which are key issues responsible for the accuracy and reliability of the approach used.The effectiveness of learning algorithms is compared among others in terms of the number of learning cycles, number of objective function calculations, number of floatingpoint multiplications, computation time, and sensitivity to local minima.In addition to the selection of appropriate parameters and network structure, the selection of an appropriate (effective) network learning algorithm is of key importance.The most commonly applied learning algorithm in ANNs is backpropagation, however, it has a rather slow convergence rate and as a consequence, ANN has more redundancy [146].On the other hand, the training of the SNNs due to quite complicated dynamics and the non-differentiable nature of the spike activity remains a challenge [147].The three types of ANN and SNN learning rules can be distinguished: unsupervised learning, indirect, supervised learning, and direct supervised learning.Thus, a commonly used learning algorithm in SNNs is the arithmetic rule SpikePropo, which is similar in concept to the backpropagation (BP) algorithm, in which network parameters are iteratively updated in a direction to minimize the difference between the final outputs of the network and target labels [148,149].The main difference between SNNs and ANNs is output dynamics.However, arithmetic-based learning rules are not a good choice for building biologically efficient networks.Other learning methods have been proposed for this purpose, including bio-inspired algorithms like spike-timing-dependent plasticity [150], spike-driven synaptic plasticity [151], and the tempotron learning rule [65,76,77].STDP is unsupervised learning, which characterizes synaptic changes solely in terms of the temporal contiguity of presynaptic spikes and postsynaptic potentials or spikes [152], while spike-driven synaptic plasticity is supervised learning and uses rate coding.However, still, ANN with BP learning achieves a better classification performance than SNNs trained with STDP.To obtain better performance the combination of layer-wise STDP-based unsupervised and supervised spike-based BP was proposed [153,154].Other commonly used learning algorithms are ReSuMe [57], and Chronotron [58].The tempotron learning rule implements gradient-descent dynamics, which minimizes a cost function that measures the amount by which the maximum voltage generated by erroneous patterns deviates from the firing threshold.Tempotron learning is efficient in learning spiking patterns where information is embedded in precise timing spikes (temporal coding).Instead, [155] proposed a neuron normalization technique and an explicitly iterative neuron model, which resulted in a significant increase in the SNNs' learning rate.However, training the network still requires a lot of labeled samples (input data).Another learning algorithm is indirect.It firstly trains ANN (created with perceptron's) and thereupon transforms it into its SNN version with the same network structure (i.e., ANN-SNN conversion) [156].The disadvantage of such learning is the fact, that reliably estimating frequencies requires a nontrivial passage of time, and this learning rule fails to capture the temporal dynamics of a spiking system.The most popular direct supervised learning is gradient descent, which uses the first-spike time to encode input [157].It uses the first-spike time to encode input signals and minimizes the difference between the network output and desired signals, the whole process of which is similar to the traditional BP.Thus, the application of the temporal coding-based learning rule, which could potentially carry the same information efficiently using less number of spikes than the rate coding, can help to increase the speed of calculations.On the other hand, active learning methods, including bio-inspired active learning (BAL), bio-inspired active learning on Firing Rate (BAL-FR), and bio-inspired active learning on membrane potential (BAL-M) have been proposed to reduce the size of the input data [158].During the learning procedure, the labeled data sets are used to train the empirical behaviors of patterns, while the generalization behavior of patterns is extracted from unlabeled data sets.It leverages the difference between empirical and generalization behavior patterns to select the samples unmatched by the known patterns.This approach is based on the behavioral pattern differences of neurons in SNNs for active sample selection, and can effectively reduce the sample size required for SNNs training.
The impact of a bio-inspired AI-based system in clinical practice has significant potential for clinicians and medical experts.As can be clearly observed, further directions of development of Artificial Intelligence are leaning towards elaboration of not treating it as a black box and the development of Biological Artificial Intelligence, i. e. using neuron models that accurately reproduce experimentally measured values, understanding how information is transmitted, encoded and processed in the brain and mapping it in learning algorithms.The main issue is how replicated the architecture of the human brain and the mechanisms governing it.Biologically realistic large-scale brain models require a huge number of neurons as well as connections between them.Estimation of the behavior of a neuron network requires accurate models of the individual neurons along with accurate characterizations of the connections among them.In general, these models should contain all essential qualitative mechanisms and should provide results consistent with experimental physiological data.To fully characterize and predict the behavior of an identified network, one would need to know this architecture as well as any external currents or driving forces, and afferent input, applied to this network.Thus, the information transmission efficiency essentially depends on how neurons cooperate in the transfer process.The specific network architecture i.e. presence and distribution of long-range connections and the influence of inhibitor neurons, in particular the appropriate balance between excitatory and inhibitory neurons, make information transmission more effective [170].Taking all these factors into account will put insight into the understanding of what factors contribute to the fact that the human brain is such a perfect computing machine.Then these mechanisms can be translated into the improvement of AI methods [171].Moreover, it will put insight into the development of the next-generation AI, including Autonomous AI (AAI) as well as the development of brain simulators that balance computational complexity, energy efficiency, biological plausibility, and intellectual competence.
The second issue is connected with the so-called open data policy [211].However, the publicly available datasets are not numerous, very often not labeled, described very generally, subject to bias, and additionally burdened with segmentation errors.
Another trend, which can be also observed is connected with the compliance of Artificial Intelligence with human rights, bioethics principles, and universal human values, which is especially important in medicine.For example, in Germany, a patient must give informed consent to the use of AI in the process of his diagnosis and treatment, which we believe is a good practice.Also, rules that should be fulfilled by the AI-based system like the Assessment List for Trustworthy Artificial Intelligence (ALTAI) [173][174][175], were formulated.In [163,176] 10 ethical risk points (ERP) important to institutions, policymakers, teachers, students, and patients, including potential impacts on design, content, delivery, and AI-human-communication in the field of AI and Metaversebased medical education were defined.Moreover, links are made between technical risks and ethical risks have been made.Now procedures need to be developed to enable their practical enforcement.
Thus, the integration of AI and Metaverse is a fact and suggests that AI may become the dominant approach for image scan segmentation and intelligent visual-content generation in the whole virtual world, not just medical applications [6,159].Recently, the Segment Anything Model (SAM) based on AI was introduced for natural images [89], in [160] SAM was proposed to be applied to medical images with a high level of accuracy.Better image segmentation contributes the higher-quality virtual objects.AI application in the context of the Metaverse is connected with the identification and categorization of meta-verse virtual items [161].Moreover, AI may lead to more efficient cybersecurity solutions in the virtual world [162].However, this is closely related to the accuracy of AI-based algorithms and, consequently, the accuracy of their training.Supplementary Materials: Not applicable.

Figure 1 .
Figure 1.The scheme of the methodology of literature review.

Figure 3 .
Figure 3.The basic scheme of the simple Convolutional Neural Network.

Figure 3 .
Figure 3.The basic scheme of the simple Recurrent Neural Network.

Figure 4 .
Figure 4.The basic scheme of the simple Spiking Neural Network.

Figure 5 .
Figure 5.The basic scheme of the simple Generative Adversarial Network.

Figure 6 .
Figure 6.The basic scheme of the simple Graph Neural Network.

Table 1 .
The comparison of the AI-based algorithms applied in medical image scan segmentation

Table 2 .
A summary of publicly available retrospective image scan medical databases.
KURIAS-ECG: a 12-lead electrocardiogram database with standardized diagnosis ontology-EEG 147 subjects VinDr-PCXR: An open, large-scale pediatric chest X-ray dataset for interpretation of common thoracic diseasesadult chest radiography (CXR) 9125 subjectsVinDr-SpineXR: A large annotated medical image dataset for spinal lesions detection and classification from radiographs -10466 spine X-ray images