A Dynamic Probabilistic Model for Heterogeneous Data Fusion: A Pilot Case Study from Computer-Aided Detection of Depression

The present paper, in the framework of a search for a computer-aided method to detect depression, deals with experimental data of various types, with their correlation, and with the way relevant information about depression delivered by different sets of data can be fused to build a unique body of knowledge about individuals’ mental states facilitating the diagnosis and its accuracy. To this aim, it suggests the use of a recently introduced «limiting form» of the kinetic-theoretic language, at present widely used to describe complex systems of objects of the most diverse nature. In this connection, the paper mainly aims to show how a wide range of experimental procedures can be described as examples of this «limiting case» and possibly rendered by this description more effective as methods of prediction from experience. In particular, the paper contains a simple, preliminary application of the method to the detection of depression, to show how the consideration of statistical parameters connected with the analysis of speech can modify, at least in a stochastic sense, each diagnosis of depression delivered by the Beck Depression Inventory (BDI-II).


Introduction
This paper deals with a special but important and interesting case of the problem of treating experimental data of various types (heterogeneous data) together, to provide a detailed description of a subject, or in particular to classify it with a property chosen a priori.
The starting point of the work is finding methods and tools for implementing computeraided devices for the detection of major depressive disorders (MDD), i.e., the automatic classification of any given subject with respect to a set of previously identified possible depressive states.This task requires in general the analysis of a rather large amount of data obtained from experiments of various kinds and settled for investigating whether features describing the collected data can discriminate between healthy and depressed participants.The involved experiments aimed to gather behavioral data (such as speech, handwriting, facial, vocal, and gestural expressions) from healthy/depressed diagnosed subjects by defining behavioral tasks known to be affected by depressive states.The collected data were then appropriately processed to extract features describing changes in the healthy perception of individuals due to depressive states.
Depression is classified as a mood disorder by the Diagnostic and Statistical Manual of Mental Disorders (DSM-5, American Psychiatric Association, 2013).The DSM-5 report as its symptoms the impairment impaired of emotional expressions, cognitive functions, social behavior, social relationships, and physical functioning (changes in appetite or in weight, headaches, pain).However, not all depressed individuals meet these criteria.Some individuals may show irritable rather than depressed mood and may not be classified as depressed."The conditions and underlying physiologies associated with the experience of depression may vary widely, and no core composition can be assumed" ( [1], p. 1).Given the complexity of the accompanying symptoms, it was shown that the automatic analysis of behavioral data can provide quantitative measures for describing changes affecting healthy/depressed individuals (see [2] for depression symptoms and [3] and reference therein for automatic behavioral analyses) calling for the identification of a multi-dimensional mathematical model serving for the implementation of automated and cost-effective technological systems devoted to the early detection of depressive states.
The mathematical model proposed in this paper has been applied to speech data.More precisely, in this case, for any subject, the experimental procedure involves the collection of speech data obtained from the reading of a narrative text and the free description of daily activities performed during a week, and the answers provided by the participants to data collection to the questions of BDI-II [4].Further data can be collected on this line of conduct, other than speech, including facial expressions, handwriting and drawing, body movements, hands' gestures, and any other activity useful for behavior analysis ( [5][6][7][8][9][10][11][12][13], among others).Though all the outcomes of these experiments can be (and actually are) given a quantitative expression, by means of suitably defined measures or scores, nevertheless, this raises a new problem, regarding the comparison of heterogeneous measures.
To solve this problem, a good method seems to be shifting the attention from the measures as such to their joint probabilities and, as a consequence, to their mutual conditional probabilities.This goal has been achieved by the use of a recently introduced «limiting form» of the kinetic-theoretic language [14], a scheme of models based on the use of a generalization of Boltzmann equation governing gas dynamics [15,16], which in the last fifty years has been proposed to describe the evolution of many-particle systems in which the interactions between elements go far beyond purely mechanical ones, and can produce unexpected and not deterministically predictable behaviors, both mechanical and non-mechanical.The elements of the system are themselves no longer just particles in the mechanical sense, but individuals of the most varied nature: cells [17][18][19][20][21], cars [22,23], insects (dynamics of swarms) [24], and finally, human beings and possibly care structures [21,[25][26][27][28][29].
The key idea the whole scheme relies upon is that the results of pairwise interactions between the «particles» of the system cannot be predicted with certainty, but they still produce a deterministic evolution of the (relative) frequency (or, in appropriate contexts, probability) distribution on the set (space) of possible states of particles [15,30].
One of the techniques often employed in the analysis of the dependence of this distribution on the interactions consists in subdividing the system into homogeneous subsystems (i.e., containing the same kind of objects), which can also have different spaces of states.No models, however, have been outlined in which at least one functional subsystem consisted of abstract objects, e.g., «procedures» or «experiments» or «diagnostic methods».
In addition, as is quite natural, in all models based on the Kinetic Theory, the time variable is continuous, and as a consequence the evolution of the system turns out to be governed by a system of differential equations, when the space D of all possible states of the «particles» is discrete (see Section 2) or of integro-differential equations, when D is continuous, and described by either functions of time or functions of state variables and time.In this paper, a formulation of the model introduced in [14] is used, in which the complex system under examination contains such abstract objects as «random variables» and the time variable is allowed to be discrete.
The proposed model is applied to speech data, collected from depressed (diagnosed from medical doctors) and healthy subjects.The research assumes that in the acoustic and linguistic content of people's speeches it is possible to capture information regarding their psychological state, according to the narrative psychological content analysis framework [31].
There are several studies showing that there are vocal changes in the speech of subjects with depression.These changes concern, mainly, (a) the fundamental frequency f0, which seems to reduce its frequency range and average values in the acute stages of the pathology; (b) the speaking rate which seems to increase as the pathology improves, and finally, that this improvement is related to c) a decrease in the number and duration of speech pauses (for a review, see [3,32]).
The data utilized for the following research included 24 subjects divided into an experimental and control group.The experimental group, included 12 subjects (nine females) with severe (four subjects), moderate (five) and mild (three) depression, aged from 27 to 60 years (mean age = 41.07 years; standard deviation = 12.5).The control group was selected to match the experimental groups for age and gender.
The diagnostic criteria of the Beck Depression Inventory, Second Edition (BDI II, [4]), an instrument for self-assessment of depression severity in already diagnosed patients and for detecting the risk of depression in the normal population, were used to assess depression.The BDI consists of 21 sets of statements to be rated on a four-point scale.A score of >16 indicates the presence of depression in the patient.
Subjects with depression were recruited in Italy, from the mental health department of the ASL of Caserta, the mental hygiene institute of the ASL of Santamaria Capua Vetere, the Psychological Listening Center of Aversa, and a private medical practice.
The task performed by the subjects, divided into two parts, consisted of a reading of the Aesop's tale "the North Wind and the Sun" visualized on a computer screen (see Figure 1).In the second part, participants were required to produce a spontaneous narrative (diary), describing salient events which happened during the previous week, highs and lows with friends, relatives, co-workers and also, in many cases, opinions or criticisms of the Italian political situation.
During the reading and diary sections, the subjects were audio and/or video recorded upon consent.The duration of the recordings varied.The diary ranges from a minimum of 2 min to a maximum of 5/6 min; while the reading takes about 50 s.The total time of the task, from meeting with the examiner to the end of the recording, was around 15 min.
The recordings were made using a clip-on microphone (Audio-Technica ATR3350), with an external USB sound card.Speech was sampled at 16 kHz and quantized at 16 bits.This software/hardware was developed inside the European Space Agency (ESA 2012) funded project (n.HSO-US 2012-108) "Psychological Status Monitoring by Computerised Analysis of Language phenomena (COALA)" of which one of the authors (Professor Anna Esposito) was the coordinator.It utilized standard software/hardware equipped with a sound card to sample the speech data.It can run on any operating system.It was installed on Window 7 for collecting the COALA data.In the following years it has been installed on the operating systems Windows 8, 9 and 10 and can also be installed on Windows 11.The data exploited in this study were collected between years 2016 and 2017, after obtaining the permission of the ethical committee of the department of psychology of the Università della Campania "Luigi Vanvitelli".The date displayed in Figure 1 and the time in which the data were collected do not affect the originality and innovation of the results presented in this study, which seeks to propose a mathematical model to fuse heterogeneous data.
The recording took place in an isolated and soundproofed room, granted by the health center where the data collection was taking place.
Prior to the recording, subjects signed an informed consent form for data processing and filled out the BDI II [33], while the examiner prepared the instrumentation consisting of a personal computer and recording software.
The research had received the approval from the ethical committee of the department of Psychology, University of Campania "Luigi Vanvitelli" with the protocol code 09/2016.
The model suggests a way to give a sharp definition of states and a precise method to place each individual in a state and to confirm or modify the placement after each «interaction».This point will be discussed in details in the following Sections.More precisely, in Sections 2 and 3, the formulation of the «discretized» model is presented, in Sections 4-6, the language to the detection of depression using the results reported in [34] is applied, and in Section 7 some possible research perspectives are outlined.

The Standard Model
In order to provide the reader at least some hints about the origin of the methods that will be used in the next Sections to analyze the correlation between different features possibly describing depression, the current one is devoted to recall the main features of the kinetic-theoretic language in its most general form.
Our starting point is a set S of a very large number of objects of any nature (cells, living individuals, and also-as was shown in [21] and will be discussed in more details in the next Section-care structures, e.g., hospitals and systems of care tools and experimental devices) called individuals or (active) particles.In most cases of interest, S is assumed to be the joining of a family {S 1 , S 2 , . . ., S k } of subsystems, called its functional subsystems.The introduction of such subsystems finds its natural application in biology, for instance to describe the fight between tumor cells and immune system, or the competition between different species to model Darwinian selection, or in social sciences, to model interactions between social classes, or in economics, to model the fluxes of wealth.Accordingly, for each subsystem S i , the state of each individual in S i is defined according to the context in which the present description should be developed, and can be expressed as a scalar variable u i or a vector variable u i ≡ (u 1 i , u 2 i , . . ., u m i ) which will be called state variable.This variable may describe any property we can find suitable for our research according to the context (biological, economic, social, etc.) in which it is performed.According to the context, the state variable may be assumed to take its values in a discrete domain or in a continuous domain: in the former case, the domain can be a finite or countable subset of Z or Z m , while in the latter case it can be a bounded or unbounded real interval or a bounded or unbounded domain of R m .In both cases, the domain D u i (or D u i ) of the state variable is called the state space of subsystem S i (in some cases, the same variable can be used to identify the state of the members of all the functional subsystems, but in most cases different state variables must be used for different subsystems).
As in Boltzmann's Kinetic Theory of Gases, the mathematical framework developed in [21] is statistical, i.e., one is interested to describe the evolution in time of the system as a whole rather than the evolution of the states of single particles.More precisely, it must be acknowledged that any precise description of the state of each particle of S must be given up, for both theoretical and practical reasons [16,35], so one cannot but decide to study the probability (expressed experimentally in terms of relative frequency) distribution over the state space at each instant.This means that the state variable is conceived as a random variable at each instant and the aim of the study is the forecast of its probability density function at any time.
In this Section, with no loss of generality, and only for the sake of simplicity and in view of the application to be discussed in the sequel, it will be described here in some detail only the case of a system S consisting in two different subsystems S 1 and S 2 such that the state of individuals in each S h is described by a different discrete scalar variable u h .We set For any t in a time-interval I ⊆ R, the state of system S h at time t will be identified by a probability distribution (also called state vector) Now, if the evolution of the system S is envisaged as a time-continuous stochastic process, then the time derivative of each probability function f k is expressed, according to the law of alternatives, in terms of transition probabilities.More precisely, 1.
for any (r, s, j) ∈ {1, 2, . . .m h } 3 , the symbol F r sj (h) ≡ F h (u h,s , u h,j ; u h,r ) will denote the probability that a particle of S h falls from the state u h,s to the state u h,r after an interaction with another particle of S h which is in the state u h,j : accordingly,

2.
for any (r, s) ∈ {1, 2, . . .m h } 2 , and for any j ∈ {1, 2, . . ., m k } (with k = h), the symbol Φ r sj (h, k) ≡ Φ hk (u h,r , u k,j ; u h,s ) will denote the probability that a particle of S h falls from the state u h,s to the state u h,r after an interaction with a particle of S k which is in the state u k,j : accordingly, So, the law of alternatives yields the following system of differential equations: where, for any k ∈ {1, 2} and any (r, s) ∈ {1, 2, . . ., m k } 2 , τ k rs ≡ τ(u k,r , u k,s ) is the so-called encounter rate of particles of S k in the states u k,r and u k,s .It is the number of pairwise interactions between particles of S k , that are in the states u k,r and u k,s , per time unit, and for any sufficiently small ∆t the product τ k rs ∆t is the probability that one such interaction occur in the time interval ∆t between individuals belonging to S k with states u k,r and u k,s respectively.As a consequence, τ k rs ∆t is a conditional probability and τ k rs is the ratio of a conditional probability to time.Its presence in the first two terms at the right-hand side of each equation of system ( 1) is due to the fact that these terms express two basic properties: the increase in probability of state u k,h is the probability that some «candidate» particles of S k in the state u k,r interact with some «field» particles of S k in the state u k,s : they fall in the state u k,h with a positive probability just in consequence of the interaction; 2.
the decrease in probability of state u k,h is the probability that some «test» particles of S k in the state u k,h interact with some «field» particles of S k in the state u k,s ; they leave the state u k,h with a positive probability only in consequence of the interaction.
The coefficients η k rs (for r ∈ {1, 2, . . ., m } and s ∈ {1, 2, . . ., m k }) in the other terms at the right-hand side of Equation ( 1) have the same meaning: for any ( , k) ∈ {1, 2} 2 , η k rs is the encounter rate between any particle of S which is in the state r and any particle of S k which is in the state s.
Notice that, as we shall do from now on to the end of the paper, the above system of equations has been identified by a single numerical label: from now on, each of the equations that compose a system will be distinguished by a subscript corresponding to its position in it: for instance, the third equation of system (X) will be indicated as (X) 3 .
For the sake of completeness, it will be suitable to recall that it is customary in the literature to write system (1) in the form where For obvious reasons, the term G ( ) ) , f (2) ](t) is called gain term, while the term L ( ) h [ f (1) , f (2) ](t) is called loss term.Note that system (2) consists of m = m 1 + m 2 equations.

Formulation and Interpretation of the Special Model
In this Section, the general model outlined in the previous one will be strongly modified under three aspects.A system S is considered, split into two subsystems S 1 and S 2 , but 1.
subsystem S 1 , which from now on will be also referred to as «the system of subjects», consists of only one member, so that interactions between subjects are excluded as meaningless; 2.
subsystem S 2 , which from now on will be also referred to as «the system of experiments», consists of m members (also called the «questions about the subject»), but again the interactions between these members are excluded; 3.
to each experiment in S 2 a different state space D i is associated, which is a set of positive integers (the outcomes of the experiment): we denote by y i the state variable in D i so that D i = {y i1 , y i2 , . . ., y im i }; 5.
interactions between the subject and experimental devices do not modify the states of the latter; 6.
the time variable is assumed to be discrete, i.e., the process described here is a stepwise process.
Remark 1.In virtue of condition 1, τ 1 ij = 0 for any couple (u i , u j ) of states in D.
Remark 2. It is necessary to point out, for the sake of the readability of the text, that some symbols introduced in Section 2 and appearing in system (1) as well as in definitions (4) must be modified because of condition 4. Precisely, the interaction rates τ 2 hj should be written τ 2 h,k;j,r , where h and j are indexes identifying the state spaces D h and D j (hence the experiments) involved in the interaction, and k and r are the states (in D h and D j , respectively) of interacting «questions».The same is true for the transition probabilities F i hj (2) (to be written F ri r,h;s,j (2)).However, of course, in virtue of condition 2, this remark can be completely disregarded, since all the terms containing the considered interaction rates and transition probabilities vanish.Remark 3. The situation is quite different for the «mixed» interaction rates η 12 ij and η 21 ij and transition probabilities Φ h ij (1, 2) and Φ h ij (2, 1), which should be written as η 12 i;kj , η 21 ki;j , Φ h i;kj (1, 2) and Φ k;h ki;j (2, 1), respectively, where k identifies in all cases the state space D k of the experiment involved in the interaction.Notice however that, in virtue of condition 5, Φ k;h ki;j (2, 1) = δ h i for any k ∈ {1, 2, . . ., m} and any j ∈ {0, 1, . . ., s}.

Remark 4. Analogously, f
h must be written f (2) kh .This last symbol expresses the probability that the k-th experiment is in the h-th state y kh of D k .Remark 5. Finally, condition 6, according to which the evolution of the system will be described as a stepwise process allows us to consider a sequence {t h } h∈N and to replace the time derivatives at the left-hand side of Equation (1) by finite differences of the form f In particular, we shall take n = 0.
In view of the above remarks, system (1) simply becomes to which the initial conditions must be associated.An interpretation as complete as possible of the special model obtained starting from conditions 1-6 will be outlined, not only in order to explain the meaning of these hypotheses, but also to turn the attention of the reader to the significant application of the system obtained above to the detection of depression.In this connection, what it will be necessary to stress is that system (5) is meant to describe the results of interactions between information rather than between objects.More precisely, the values u i of the variable in D should be interpreted as labels that define, in the appropriate context, possible states of the subject; then, the interaction between the subject and the i-th experience on it places this latter into a state, expressed by a real value y ih , that is the outcome of a measurement of a physical parameter.Now, in virtue of the information carried by the value y ih , the i-th experience interacts with the initial definition, in the sense that it can either confirm the initial state (by increasing its probability) or raise doubts about it (by lowering its probability).Accordingly, the terms f   5) must be interpreted as the absolute probability that the definition of the state of the subject before the experience is the value u i and the absolute probability that the outcome of the h-th experience is the state y kj , respectively; furthermore, the term Φ h i;kj (1,2) is the probability that definition u i , after «interaction» with information y kj becomes definition u h .This outline of the methodological meaning of system (5) will be now explained in somehow more precise terms by referring to a special experimental problem connected to the detection of depression.In the next sections, by means of an example based on actual numerical data, it will be shown how the values of f kj (t 0 ) and Φ h i;kj (1, 2) should be assigned according to a suitable preliminary statistical analysis.
Therefore, the problem which gave rise to the present analysis will be now described, namely, the search for automated procedures to diagnose the possible depressive state of a human subject.As is well known to psychologists, depression is nowadays mainly diagnosed by means of questionnaires.Among them, of particular interest seems to be the one devised by Aaron Beck called BDI (Beck Depression Inventory).This is a list of 21 questions, each admitting four possible answers, with different scores varying from 0 to 3. According to the total score obtained by his/her answers, the subject is placed in a "class of depression", namely, with a score from 1 to 10, the "normal class" (which will be denoted here as the state 0), with a score from 11 to 16, the "mild mood disturbance class" (the state 1), with a score from 17 to 20, the "borderline clinical depression class" (the state 2), with a score from 21 to 30, the "moderate depression class" (the state 3), with a score from 31 to 40, the "severe depression class" (the state 4), and finally, with a score over 40, the "extreme depression class" (the state 5).Now, it is well known that any psychological state produces (and is expressed by) particular behaviors, which can be taken as "symptoms" of that state.These behaviors can be in turn described by the values of physical parameters, for instance voice fundamental frequencies (pitches) in a sequence of sufficiently small time intervals, or the number and the length of empty (or silent) pauses, and the speed of speech and the choice of words when reading a text or simply talking about everyday life (see [36], also for more complete references).Thus, the problem of establishing in which cases the measurements of these and other parameters confirm the results of BDI arises spontaneously, and in which cases they modify the conclusions drawn from the questionnaire.The final diagnosis should result from a suitable "superposition" of all these results.More precisely, one takes a subject initially placed by his (or her) BDI score in a state k, then tries to determine the value of one of the physical parameters that could reveal depression or a normal state: for instance, he submits him or her to a check of the pitches of his or her voice while reading a text, or of the number and the length of his or her empty pauses when telling his or her experiences during the last week.As laid out above, the parameters defined by these tests, that are random variables, will be denoted by the symbols Y 1 , Y 2 , . . ., Y m .If the i-th test gives the outcome Y i = y ih (i.e., the result of the physical interaction between the subject and the experimental device is y ih ), provided one has defined for any j ∈ D ≡ {0, 1, . . ., 5} the probability Φ j k;ih (1, 2) ≡ Φ j kh that this result "moves" the subject from the state k (initial diagnosis) to the state j, system (5) allows us to find a final probability distribution on D (in this connection, notice that the initial probability distribution vector ( f (1) h0 ) 1≤h≤5 ≡ ( f h0 ) 1≤h≤5 is assigned according to our confidence in the reliability of the BDI questionnaire: a good starting point could be to set f k0 = 1 and f h0 = 0 for any h = k).Thus, the key step to make system (5) effective is to determine the transition probabilities Φ j kh ≡ Φ j k;ih (1, 2) (j, k ∈ D).The way in which this can be performed will be discussed in detail in Section 4. At this moment, it will be appropriate to conclude the present Section with some important remarks.
First of all, it should be stressed that the first step to compute the transition probabilities of the values of Y that confirm the diagnosis delivered by BDI score, or, what is the same, to synchronize the responses of BDI and of the additional test.In other words, for any state k, one must know the class of values of the additional parameter Y that seems to "characterize", in a suitable sense of the word, the state k.This will therefore be performed preliminarily in the next Section.
Thus, the procedure will enable the researchers to find the final probability distribution on the set of possible states of the subject, after all the tests devised to determine the level of his or her mental health.Remark 6.It should be carefully noted that the use of system (5) can be replaced by the repeated use of system kh (t k ) − f More precisely, for any k ∈ {1 , . . ., m}, the system for the parameter Y k can be written, where the values f h (t k−1 ) are delivered by the same system applied to the random variable Y k−1 .By the way, this will be the procedure which will be adopted in Section 6 in our application of the method to the detection of depression in human subjects.
Remark 7. It should also be noted that the method outlined in Remark 6 and the use of system (5) are not equivalent.The former is applied and illustrated in Section 6, while the latter roughly produces a «superposition» of the results obtained in the first step of the procedure shown therein.
h (t i m )} 1≤h≤s will be equal.As a matter of fact, as we shall see in Section 6, to obtain a single reliable distribution from the set of experiments, one needs to mediate between the different distributions obtained by following the different plots.

Detecting Symptoms of Depression: General Formulation
In this Section, the model described in purely theoretical terms in the previous one will be applied to a set of data, in order to exhibit a concrete example of the results of the outlined method.Three physical (behavioral) parameters will be introduced: (a) the total variation T of pitch over the speech while reading a prescribed text; (b) the average variation A of pitch over the same reading, and (c) the percentage O of inversions-also called oscillations-of the sign of two subsequent differences of pitches over the whole reading), and system (7) will be applied to them.
Accordingly, first of all, the problem of «synchronizing» the values of these parameters with BDI labels must be solved.From a theoretical viewpoint, this must be performed by the following procedure.First, a sufficiently large sample S of both healthy and depressed subjects is considred.Setting n = |S| and denoting by Y any one of the parameters T, A and O, we also set |C h | = n h so that n = ∑ 5 h=0 n h ; 4.
q h ≡ n h /n = P(C h ) the fraction of the subjects in S who are placed by BDI in the class {I h (Y)} 0≤h≤5 is a partition of R Y ; 3.
when a subject is assigned to class C h by BDI, then the corresponding value of Y has a high probability to be in I h (Y) (and a low probability to be in any interval In the present context, the three conditions above are assumed to be satisfied, and any description of the procedure one should follow to meet them is omitted.The reader eager for details can usefully consult any university textbook about statistics and data analysis.In the next Section, however, an example of such a procedure will be shown in a simplified case. Accordingly, a family of 6 sets I h ≡ I h (Y) (h = 0, 1, . . ., 5), some of which will be possibly split in two intervals, is assumed to be given.We now denote by n hk the number of the subjects in C h for which the value of the parameter Y belongs to I k , so that and, in accordance with condition 3, Obviously, and, setting p h ≡ p hh , For the sake of completeness, it must be recalled that It must be stressed that (a) whenever an individual is taken randomly from the considered sample without checking its BDI class, given the value of Y associated to it, the above relation gives the probability that this individual belongs to any BDI class, and (b) the left-hand side of relation (8) only serves as a reminder of how one should compute the required probability from the sample data: if one decides to export the values of probabilities q h and p hk to the whole, potentially infinite population of past, present and future human beings, then the middle side of relation ( 8) must be used to express the probability P(C h | I k ).
Now, for any k ∈ {0, . . ., 5}, the law ) is a probability distribution on the "state space" {0, . . ., 5} induced, for any examined subject, by the experimental result about the value of parameter Y, and it should be carefully noted that, when applied to the whole population, it does not depend on the BDI class to which the subject was assigned a priori, and this is quite natural since, as it has been just seen, each probability P(C h | I k ) is computed under the assumption to ignore that class.In order to determine the probabilities P(C h | C r ∩ I k ), that take into account the information about the BDI class to which the subject is assumed to belong before the experiment, one has to solve the following system of equations: Notice that, referring to system (7), one has P(C h | C r ∩ I j ) = Φ h r;j (where the index relative to the parameter has been dropped as useless for the sake of simplicity), so that the above system gives the transition probabilities needed to use system (7).

Detecting Symptoms of Depression: Computing Parameters on a Sample Study
Now, system (9) can be easily solved if one has enough time and patience, or simply a computer.Nevertheless, the general case will be left aside and the above procedure will be now applied to a reference sample already used to the same aim in [34].It could be objected that the considered sample cannot be defined "sufficiently large".As a matter of fact, the present application aims to be nothing more than an example of the way in which the method should work, and a suitable extension (and better validation) of the method outlined here is planned to be the object of a forthcoming paper.Furthermore, just in connection with the small size of our sample, the consideration of all the five possible states defined by BDI will be renounced, and only two states will be considered, 0 = healthy, and 1 = depressed.This simplifies the procedure and, at the same time, can provide the reader with a sufficiently clear plot of the method.
So, in the case under consideration, we have n because a straightforward calculation, based on the relation shows at once that system (11) is redundant, and that the third and the fourth of Equation ( 11) are reproductions of the first and second ones respectively.Now, as a system of two equations with four unknowns, system (12) does not allow us to determine all the required transition probabilities: two of them must be assigned as parameters.Therefore, setting and the parameters must satisfy the conditions The above procedure can be reproduced step by step for each of the two remaining parameters A and O.One only needs to insert in system (11) the values of n hk resulting from Tables 1-6 in [34] for these parameters.These values are listed in the following Table 1 (while, for the sake of completeness, the intervals I 0 and I 1 are specified for all the three parameters T, A and O in Tables 2-4).
So, with the same definition of parameters λ and µ and the same calculations as before, we find for λ and µ the conditions collected in Table 5.

Remark 9.
As probabilities, λ and µ belong to [0, 1], and the above conditions, as derived by means of straightforward algebraic computations, simply give lower bounds to their values.As a consequence, the negative values in the third column simply mean that, for the parameter O, λ and µ are free to run over the whole interval [0, 1].
Finally, it is possible to write the matrices of the transition probabilities for all three parameters in consideration.More precisely, for the parameter T; for the parameter A, and for the parameter O.
Remark 10.Notice that, though the same symbols λ and µ have been used for all three parameters T, A and O, they need not to take the same values in all cases.On the contrary, λ and µ will take in general different values from case to case.The reason for the differences will be explained in Remark 11.Remark 11.The meaning of the parametric nature of the solution as well as of these last conditions now requires careful discussion.As a matter of fact, beyond the mere algebraic condition, there is at least one deep conceptual reason why two of the above transition probabilities cannot be determined only on the base of the experienced relative frequencies of intervals I k : a careful check of the above conditions on parameters, as well as of the consequences of assigning to them either the least or the greatest value allowed, shows that the choice of the values for λ and µ is strictly related to our trust in the effectiveness of our experiments.More precisely, consider for simplicity only parameter T, and assume to have chosen λ = 5/7.It follows that So, this choice for parameters λ and µ corresponds to ascribing to the experiment an heavy bias towards the result «healthy» (or «not depressed»).Indeed, a subject classified as «healthy» by BDI is still classified «healthy» after the experiment even if the value of T belongs to I 1 , while a subject classified as «depressed» by BDI is classified «healthy» after the experiment when the value of T belongs to I 0 .Conversely, choose λ = 1 and µ = 5/9.These considerations can be repeated word by word for parameters A and O, and lead us to the conclusion that in each case the values of parameters λ and µ must be «tuned» according to our trust in the effectiveness of our experiment.At a first glance, the choice that seems not to introduce biases is

One finds P(C
Of course, this choice requires to be revised and corrected after a sufficiently large number of additional experiments, by comparison with other evidence. Remark 12.It is readily seen that with the above choice of the values of λ and µ the matrices (15)-( 17) assume the forms for the parameter T; for the parameter A, and for the parameter O, respectively.Now, it is to be noted that while the transition matrices for parameters T and A have one row (1, 0) or (0, 1) corresponding to a certain transition of the subject from the state C 1 to the state C 0 or vice versa according to whether the value of the parameter belongs to I 0 or to I 1 , the corresponding rows for the parameter O are in both cases (1/2, 1/2), expressing a complete uncertainty about the «final» state of the examined subject.This allows us to state that parameters T and A contribute information much richer than that transmitted by O, so that the computation of the value of parameter O could also be given up, or should be used to «put in doubt» the results of any previous sequence of experiments.This point will be illustrated more clearly through examples in the next Section.

Detecting Symptoms of Depression: How to Classify New Cases
The present Section is devoted to show the way in which the above model and the data obtained from the basic sample can be used to update the probability distribution on the states 0 (healthy) and 1 (depressed) for any additional case.In particular, two subjects not contained in the starting sample will be considered here, the former assigned by BDI in the class C 1 (depressed), the latter assigned instead to the class C 0 (healthy).
The first case, who from now on will be referred to as "S37", a forty year-old female labeled without uncertainty as «depressed» by BDI, has obtained for the parameters T, A and O the values shown in Table 6, which shows that S37 is depressed according to each experiment.For subject S08, the following list of plots is obtained: These paths from initial health and depression probabilities, as assigned to subject S08 by BDI, to final probabilities obtained according to different test arrangements are illustrated in Figure 3 Finally, consider again a subject-who will be identified as S15-labelled as depressed by BDI, but described as healthy by two indicators and as depressed by only one of them according to Table 8 below.According to the above results, it seems that the probabilities of health and depression for each subject, when updated according to the outcomes of tests about indicators T, A and O, depend on the order in which the values of indicators are registered and compared.Nevertheless, at least when the indicators are all in accordance, the variations in the final probability distributions are very limited.This suggests referring to the average probabilities (on the set of all possible permutations of the triple (T, A, O)) rather than choosing any one of the values listed in the final column, and to compute explicitly the standard deviation in order to get a quantitative expression of their spreading.More precisely, let us denote by µ h and µ d , respectively, the average values of probabilities for healthy and depressed state at each step (I, II or III), and by σ all the corresponding standard deviations.Thus, postponing to the next Section a discussion of the meaning of the results, we may collect them in Table 9.Average values for updated probabilities A comparison of this picture with Figures 2-4 highlights that the final results of the different test arrangements are much more dispersed around the average results for subjects S08 and S15 than for subject S37.This seems to reflect the presence at each step of results in apparent mutual contrast.

Discussion and Research Perspectives
As it has been seen in Section 6, the order in which the values of parameters T, A and O are considered influences the resulting probability distribution on the set {healthy, depressed}.Nevertheless, when the values of all these parameters are in agreement (i.e., they assign a subject to the same BDI class), the procedure turns out to be approximately «commutative», in the sense that the probability distributions obtained by following the different plots differ very little from each other.A perceivable difference can be observed when some parameters confirm BDI classification and others refute it.It is precisely this circumstance that suggests to use the averages of the outcomes of different plots, taking also into account the very small values of standard deviations at each step.
At a first glance, it can be considered as a loss of information to have transformed the BDI classification, which is assumed to be sharp, into a probabilistic statement.In this connection, however, it must be recalled that, strictly speaking, BDI classification is in turn probabilistic, and our choice to assign the values 0 and 1 to the initial probabilities simply expresses our absolute confidence in the response of the questionnaire.If one questions at least partially the veracity of this decision, and make different hypotheses about the reliability of the additional indicators (for instance, assuming them more accurate in negative evaluations than in positive ones or vice versa), so that parameters λ and µ are given different values, then the probabilistic character of all the results becomes evident.
Naturally, the assignment of the values of the initial probabilities and of the parameters λ and µ should not be thought of as definitive, but in turn subjected to repeated updates by further experiments on more subjects.This immediately defines the natural development of the research and the problems to be tackled.
First, it is evident that the reference sample used to define the confidence intervals of the parameters introduced here should be greatly expanded; a sufficient expansion would certainly modify the ranges of possible values for λ and µ and their choice.It can be expected that after a sufficiently large number of examined subjects, both ranges and choices will stabilize, especially if one establishes the «relative weight» that should be assigned to the BDI initial classification and to the values of parameters T, A and O, respectively.So, the first step in the future development of this research must be to increase the size of the reference sample.In addition, the values of λ and µ should be arbitrarily changed in order to determine through experimental controls the different degrees of belief that could be assigned to BDI and to parameters T, A and O, respectively.
In connection with the expansion of the sample, of particular interest would be to conduct an analysis of the depressive effects of COVID on a large sample, both through the BDI and through speech analysis.Unfortunately, we do not yet have sufficient data on this subject, but we intend to develop research in this direction as soon as we have collected enough data.
A natural second step will then consist in automatizing the computation of the transition matrices and, as a consequence, of the solutions of Equation ( 7) 1 .In such a way, Equation (7) 1 will behave similarly to a tool for training (on the reference sample) and then to use on any subsequent subject in the form of software for diagnosing depression.
Finally, the question about the possibility of reproducing the method presented here for any kind of diagnostic problems arises spontaneously, once the experimental devices and the parameters of interest will be defined for them.The answer can be readily seen to be affirmative.A very good example of this kind of development of the studies about possible applications of the method presented here can be found in [37,38], where a study of mental health based on an electroencephalogram headset is proposed: the application of such a study should be of the greatest interest with regard to depression and the comparison of its results with those of the BDI using the method presented here.A more general exposition of the method, showing at once its independence of the context, and its effectiveness in solving any classification problem, is given in [14].

Figure 1 .
Figure 1.The Aesop's tale "The North Wind and the Sun" as it was presented to the participants involved in the data collection.

Figure 4 .
Figure 4. Graphic representation of the way in which the estimated probabilities of health and depression for subject S15 vary after each speech test, depending on the different relative importance assigned to each measured parameter.

Figure 5 .
Figure 5.The plots above depict the evolution of the average depression probabilities for subjects S37, S08 and S15.The coordinates of the edges of the plot are the average health and depression probabilities of these subjects at each step, as updated by the triples of outcomes of experiments.A comparison of this picture with Figures2-4highlights that the final results of the different test arrangements are much more dispersed around the average results for subjects S08 and S15 than for subject S37.This seems to reflect the presence at each step of results in apparent mutual contrast.

Remark 8 .
(7)logously, consider a sequence {Y k } 1≤k≤m of parameters.One cannot in general expect that, if Equation(7)is applied according to the plot Y 1 → Y 2 → . ..→Y m or according to a plot Y i 1 → Y i 2 → . ..Y i m (where {i 1 , i 2 , . . ., i m } is any permutation of the set {1, 2, . . ., m}), the same result will be obtained, i.e., the probability distributions { f 2. In general, one expects that h= k does not imply [e h (Y), e h (Y)] ∩ [e k (Y), e k (Y)] = ∅.So, one has to determine, for any h ∈ {0, 1, 2, 3, 4, 5}, a set I h (Y) in such a way that 1.for any h, I h (Y) ⊆ R h (Y) is either a real interval or the join of a finite number of disjoint intervals; 2.

Table 1 .
Values of n hk for T, A, O.

Table 2 .
Confidence intervals for T.

Table 3 .
Confidence intervals for A.

Table 4 .
Confidence intervals for O.

Table 5 .
Algebraic conditions on the least values of λ and µ for T, A, O.

Table 6 .
Values of parameters T, A, O and confidence intervals for S37.

Table 7 .
values of parameters T, A, O and confidence intervals for S08.Graphic representation of the way in which the estimated probabilities of health and depression for subject S37 vary after each speech test, depending on the different relative importance assigned to each measured parameter.The coordinates of the edges of each plot are the health and depression probabilities of subject S37 after each test.More precisely, the x-coordinate is the health probability and the y-coordinate is the depression probability. .
Figure 3. Graphic representation of the way in which the estimated probabilities of health and depression for subject S08 vary after each speech test, depending on the different relative importance assigned to each measured parameter.

Table 9 .
Average values and standard deviations for updated probabilities.The results on µ h (and, consequently, on µ d ) listed in the above table are illustrated in Figure5.