Visualizing Collaboration in Teamwork: A Multimodal Learning Analytics Platform for Non-Verbal Communication

: Developing communication skills in collaborative contexts is of special interest for educational institutions, since these skills are crucial to forming competent professionals for today’s world. New and accessible technologies open a way to analyze collaborative activities in face-to-face and non-face-to-face situations, where collaboration and student attitudes are difﬁcult to measure using traditional methods. In this context, Multimodal Learning Analytics (MMLA) appear as an alternative to complement the evaluation and feedback of core skills. We present a MMLA platform to support collaboration assessment based on the capture and classiﬁcation of non-verbal communication interactions. The developed platform integrates hardware and software, including machine learning techniques, to detect spoken interactions and body postures from video and audio recordings. The captured data is presented in a set of visualizations, designed to help teachers to obtain insights about the collaboration of a team. We performed a case study to explore if the visualizations were useful to represent different behavioral indicators of collaboration in different teamwork situations: a collaborative situation and a competitive situation. We discussed the results of the case study in a focus group with three teachers, to get insights in the usefulness of our proposal. The results show that the measurements and visualizations are helpful to understand differences in collaboration, conﬁrming the feasibility the MMLA approach for assessing and providing collaboration insights based on non-verbal communication.


Introduction
Teamwork and collaboration have become relevant as the complexity of today's problems surpasses individual capabilities [1,2]. Collaboration, defined as a group of people (or organizations) working together to achieve a common goal [3], requires effective communication among the participants. In the educational context, collaboration has been identified as an important learning component that helps to improve students' performance [4] and to develop higher-level reasoning. Companies are now looking for graduates that possess these new skills, together with many others, such as decision making, problem-solving, time management, and critical thinking [2,[5][6][7].
The above poses a new challenge for Higher Education Institutions (HEI) since they need to provide relevant knowledge and practices to allow their students to be highly productive and tailored for these new industry requirements [5,8,9]. It is noticeable that traditional methods struggle to assess the learning of these skills, as they usually focus on the results rather than the processes that led learners to acquire and/or develop them. In the specific case of collaboration, there are difficulties in producing standardized tests the theoretical framework on which we base to elicit the requirements for the platform [17]. Section 4 presents the design and technical considerations of the proposed system. Then, Section 5 presents the case study along with the results. In Section 6 we discuss in the implications of the results in the observation of collaboration constructs. Finally, Section 7 presents our conclusions and discusses future work.

Related Work
Non-verbal communication is defined as the behavior of the face, body, or voice, without linguistic content, i.e., everything except words [22]. Non-verbal communication involves, for example, facial expressions, gestures, voice tonalities, and speaking time, among many others. The work of [15] approaches the assessment of non-verbal collaboration, but just considering non-verbal elements of spoken interactions. Despite this limitation, this work serves as an initial foundation for our proposal, which aims to extend its spoken interaction-based approach to body posture analysis. This section considers works regarding using body posture and time in collaborative contexts.
Postures may provide information related to the sentiments and intentions of a person or indicate power and social status [23]. For instance, the respect and disposition towards the participants during the interaction may be identified by the individual's posture [23]. In this sense, a closed and inflexible posture is less attractive than an open and relaxed posture. Identifying postures during collaboration may be important complementary information about the participants and may help to better understand the entire learning process [24].
Andolfi et al. [25] investigated how posture influences the generation of novel ideas in the context of creativity by proposing two studies. The first study used a sample of 102 students divided into two balanced groups. Each subgroup completed one of two creative tasks, and they requested the students to adopt randomly open and closed postures while describing their ideas. The findings support the hypothesis that posture influences creative task performance but did not conclude that open postures facilitating effects are specific to creativity. The second study involved 20 students, and they added additional dimensions to the analysis, incorporating different physiological measures and a logical task not requiring creativity. The results showed that postures specifically influence the performance of creative tasks.
Hao et al. [26] incorporate the component of emotions in the participants. The method is very similar to that proposed by Andolfi et al., but here the emotions are induced by watching videos, and the participants are standing. The authors show that participants exhibited the greatest associative flexibility in the open-positive posture and the greatest persistence in the closed-negative posture. These findings show that compatibility between body posture and emotion is beneficial for creativity. This work makes us reflect on how an individual's posture might influence the ability to solve collaborative problems with a creative component or how it can affect the creativity of other team members.
Moreover, Latu et al. [27] investigate how the behavior of visible leaders empowers women in leadership tasks. They hypothesize that women tend to imitate the empowered posture of successful women. Experiments showed that, in groups, women adopted the postures of the female leaders when these were famous models (but not when women were exposed to non-famous models). The above suggests that finding mimicry between postures may be a reflection of leadership among interlocutors.
From the MMLA perspective, understanding collaboration and communication among students has been studied from different points of view. Grover et al. [28] developed a framework to capture multimodal data (video, audio, clickstream) from pairs of programmers while they were working together to solve a problem in order to predict their level of collaboration. Starr et al. [29] studied how delivering feedback to students regarding collaboration can affect productive small learning group interactions. This feedback can be by a traditional method (verbally delivered interventions) or multimodal (real-time). One of their findings is that simple verbal interventions can help participants pay attention to specific aspects (e.g., how much they talk and how much space they provide to their partner). However, they did not find evidence that continuous feedback supports collaboration. On the other hand, Davidsen et al. [30] expound on how two 9-year-olds collaborate through gestures and body movements. The experiment showed that differences of opinion were reflected in oppositional gestures and movements in the face of the same phenomenon. Cornides-Reyes et al. [31] analyze the collaboration and communication of students in a Software Engineering course in an exploratory study. They collect data using multidirectional microphones and applied social networks analysis techniques and correlational analysis. Their findings show that MMLA techniques offer considerable feasibilities to support the skill development process in students.
Some of the mentioned articles consider using multiple modalities of communication, such as posture, proxemics, and chronemics. However, the tools to measure the data are traditional as recordings or data collection systems tailored to the experiment. In the case of Riquelme et al. [15], a tool was developed to provide automatic feedback to teachers. However, it only considers the chronemic component of communication. Therefore, there is an opportunity to expand and integrate new aspects of communication. On the other hand, Järvelä et al. [32] conclude that multimodal data can help understand regulatory processes in collaboration. Furthermore, a relevant factor pointed out by the authors is the delivery of timely information to improve results. Table 1 shows a synoptic summary of the research mentioned in this section. Instruments to measure mimicry. Videos and post analysis.

Postures
Groups, women adopted the postures of the female leaders when these were famous models (but not when women were exposed to non-famous models  [29] Studied how delivering feedback to students regarding collaboration can affect productive small learning group interactions

Yes
The participants used a block-based programming language to navigate a robot through a maze.
Pre and post-test assessments based on fill-in-the-blank questions. Self assessment questionnaire, post-experiment.

Body Tracking System Data
One of their findings is that simple verbal interventions can help participants pay attention to specific aspect No [30] Analyzed how two 9-year-old boys collaborate through gestures and body movements around a touch screen.

Yes
They collected data regarding the movement of the children's body around a touchscreen.
The data were analyzed through the observation of movements, speech, screen touch and gestures.
Gestures and body movements.
The experiment showed that differences of opinion were reflected in oppositional gestures and movements in the face of the same phenomenon No [31] Analyze the collaboration and communication of students in a Software Engineering course in an exploratory study

Yes
The collected data based on the DiSC factor (Dominance, Influence, Steadiness and Compliance). The data were gathered by a series of low-cost sensors distributed in the classroom.

Social networks analysis techniques and correlational analysis
They collect data using multidirectional microphones and applied MMLA techniques offer considerable feasibilities to support the skill development process in students

Background: Collaboration and Multimodal Learning Analytics
Boothe et al. [21] have presented a framework to close the gap between research efforts on the theoretical understanding of the collaboration process and the multimodal learning analytics approach. The framework aims to connect collaboration theory constructs with MMLA measurements, quantitatively supporting the study of collaboration constructs with quantitative measurements.
The framework is based on six collaboration constructs proposed by [17] (contribution, assimilation, team coordination, self-regulation, cultivation of environment, and integration), which are in their turn grouped into three categories: cognition, metacognition, and affect (see Table 2). Regarding the cognition category, the contribution construct refers to a cognitive action that contributes to advance in the collaborative goal, while the assimilation construct concerns the actions performed when receiving a contribution from another team member. Concerning metacognition, the team coordination construct refers to the actions taken to improve the team's overall efficiency, while the self-regulation deals with the individual actions through which a group member adapts his or her behavior to facilitate participation in the group. Finally, concerning the affect category, cultivation of environment refers to subjects supporting other team members through verbal or non-verbal signals of acceptance, while integration addresses affective actions of a group member towards the cohesion of the group.
According to the framework, the collaboration constructs are firstly refined into behavioral indicators (e.g., subjects have a positive attitude when interacting) and then into traces of behavior from different communication modalities (e.g., an open body posture when speaking) [17]. With MMLA tools, it is possible to use sensors to collect media from different communication channels (e.g., audio and video) and then process them to extract communication features (e.g., speaking time and body postures) to support the observation of traces of behavior. The extracted features are organized and visually displayed to provide feedback analytics (e.g., a timeline with the spoken interaction and different body postures of all the group members), in order to support the observation of behavioral indicators and providing insights about the collaboration constructs.
Our proposal aims to exploit the above framework by designing a MMLA platform to study collaboration constructs from non-verbal communication. Therefore, we consider the challenges of MMLA identified in [33], such as heterogeneity of data measurements, data integration, and generalization of the study, among others. Table 2. Framework based on [21]

Developed Solution
In this section, we present the design of a system to support the multimodal analysis of collaboration constructs. From a methodological point of view, we based our research on the Design Science (DS) methodology, particularly on the interpretation by Wieringa [34]. Design science specifies four stages to design and research artifacts in their context: problem definition, treatment design, treatment validation, and treatment implementation. This article covers the problem definition and the treatment design stage. In the problem definition stage, the stakeholder's goals and needs are identified, for which we use Boothe's framework [21]. In the Treatment Design stage, the tool must be designed, developed, and tested to determine if it could contribute to the stakeholder's goals, which we achieve through the case study and the focus group. The rest of the DS stages that consider validating the tool and its transference to a real-world context are out of this paper's scope.
The design goal of the developed solution is to help teachers to understand how a team collaborates using MMLA. To achieve this goal, we have instantiated the framework by Ochoa [17] by proposing a set of behavioral indicators and their associated requirements for feedback analytics, as well as the behavior traces and their respective feature extraction requirements. We summarize these definitions in Table 2.
In order to meet the above requirements, we propose to provide feedback analytics in the form of a set of visualizations based on measurements of the spoken interaction and the postures of the subjects.
We designed five visualizations to address the six feedback analytic requirements presented in Table 2, which are detailed below. • Timeline: This visualization jointly depicts the spoken interactions (bars) and the body postures of each subject (circles) throughout the activity. The widths of the bars and circles show the length of each interaction and posture, respectively. With this visualization we aim to support the understanding of the assimilation, self regulation, and cultivation of the environment constructs. • Spoken interaction graph: In this visualization, each subject is represented by a node, whose relative size represents the number of spoken interactions. The directed arcs between the nodes are stronger (thicker) when a spoken interaction from a subject, represented by the source node, is followed by a spoken interaction of another subject, represented by the target node. This visualization was designed to support the contribution, team coordination, and integration constructs. For our proposal, we take as starting point our previous work [15], which supports capturing, storing, analyzing, and visualizing voice data coming from collaborative discussion groups. Multidirectional microphones provide the captured voice data, and we use social network analysis techniques for data analysis. We extend this work by incorporating four cameras and machine learning techniques to recognize the participants' postures. This involves addressing one of the challenges for MMLA researchers associated with synchronous multimodal data collection [35]. We have incorporated this kind of device/technique to present a panoramic scenario to the educator/researcher. Following, we present the technical environment of the system, which includes the high-level architecture and the technologies used. Figure 1 illustrates the high-level architecture of the developed system. It focuses on the distribution of the hardware used and the context of use. The system has a data-collection device, composed of a Raspberry Pi 4, which integrates the ReSpeaker, for audio data capture, and a group of four USB camera modules, for video data capture. The ReSpeaker consists of a group of multidirectional microphones that allow, through an algorithm, the detection of the vocal activity (VAD) and the direction of arrival (DOA) of four individuals within a capture radius of three meters. Furthermore, camera modules are used to obtain the images of four participants around the device. Thus, this device was designed to be located at the center of the interaction for the purpose of individualizing the participants. This device communicates with a server, which is in charge of storing and generating the data processing for its visualization. In order to control the operation of the ReSpeaker and the cameras, an application was developed. It receives data from the ReSpeaker through the GPIO connection and from the cameras through the USB ports. It was divided into two independent modules written in Python 3.7 and C. This application collects audio and video and then transmits this information to a server. The transmission is done wirelessly to a previously configured server using the UDP protocol. The transmission includes the audio from the four microphones and the images from the four cameras.
The server receives and processes the data transmitted by the device, as shown in Figure 2. It deploys a web application composed of a front-end developed with the Flask 2.0.1 Framework and a back-end developed in Python 3.7. In addition, MongoDB 1.21 has been used as database management system. This web application aims to allow the user to record the sessions of an activity, process the data, and visualize the results. The process starts when the user sets up an activity. It then indicates the start of the recording of the activity. This generates a command on the server to record the audio and video, and starts extracting audio features in real time. Then, the user indicates the end of the recording, so the server ends the recording process. After the activity is recorded, the user starts the video processing, which consists of two parts, the obtaining of features and their subsequent classification. Finally, the visualizations are obtained. The platform processes the audio data in real time, from which it obtains the first metrics (speaking time and number of interventions). These first metrics are stored locally in the database with a time tag. The metrics are related to the analysis of the participants' interventions as described in [15]. The raw data are recorded and stored in WAV and AVI file formats for audio and video, respectively. Due to hardware limitations, the video data processing is performed subsequent to the activity, and is focused on posture metrics.
Video processing is divided into two components. The first one consists of taking a frame (image) from the video and getting the key points. The key points, i.e., the parts of the body that describe the human anatomy, were estimated from the image using OpenPose [36], which uses a previously trained convolutional neural network. This method has been previously employed in the literature [37][38][39][40]. The second component takes the key points and classifies the pose. The classification model used is MultiLayer Perceptron (MLP), which is a helpful tool for classification problems and has been previously used to classify poses either from the image perspective [41,42], or from 2D and 3D skeletons [43,44]. MLP has three types of layers: the input layer, the output layer, and the hidden layers between the other two types of layers. In this work, the input layer has 100 neurons with a data input of 30. Then, the hidden layers consist of 21 neurons with a relu activation function. Finally, the output layer has 8 neurons with a Softmax function, to determine each pose.
The postures were determined by the definition of closed posture. A closed posture is defined as any posture that involves covering the body and/or bending or crossing the limbs, such as crossing an arm, hand, leg, or foot with its opposite [45]. Therefore, the opposite is understood as an open posture. Moreover, the choice of postures was derived from those presented in [46,47], where the camera angle and that the individual is seated are considered.
In Figure 3  In MLP training, we constructed the dataset from 2-min videos in which a person interprets the postures. The dataset was converted from videos to key points using Open-Pose. In total, 16, 640 samples were accounted for. These were divided into 75% for training and 25% for testing. The training result achieved 99% accuracy.

Case Study
In order to validate the proper achievement of the design goal presented in Section 4, we performed a case study to answer the following research question: Does the analytics feedback collected by the tool provide insights about the collaboration constructs? To get insights on this matter, we decided to compare two different teamwork activities, with high contrast between collaborative and non-collaborative work. The first activity, namely Collaborative Activity, aimed to explore whether the MMLA visualizations on subjects interacting collaboratively effectively support the observation by the teacher of behavioral indicators and traces of collaboration. The second activity, namely Competitive Activity, aimed to identify indicators and traces of non-collaborative behavior in an activity designed to produce conflict and more chaotic interactions among the subjects.
We collected data automatically (through the MMLA platform) and manually (taking field notes) in both activities. Field notes allowed us to describe the flow of interaction of the subjects and observations about the six collaboration constructs of the framework [17]. The automatic data collection performed by the MMLA platform followed the requirements presented in Table 2: measurement of the number of spoken interventions, speaking time (per intervention), type of posture (open, closed, hands on the heaps, hands on the head, and hugging the opposite arm). The visualizations and the field notes were handed to two members of the research team that hold the degree of Master in Teaching for Higher Education, namely the reviewers. In the two activities, subjects received a task to be performed in five minutes, without further instructions about how to interact to achieve it. For both activities were considered the same four subjects. We recorded audio and video of each of them during the whole activity.
The four subjects are students from different careers and universities: Psychology, Auditing Accountant, Industrial Management Execution Engineer and Business Administration. All participated voluntarily and gave their informed consent. The group is composed of 3 women and 1 man between 25 and 28 years old and they do not know each other.

Collaborative Activity
The subjects were asked to collaboratively write, in five minutes, a sentence about what might be the first article of Chile's new Constitution (at the time of the case study, Chile was in the midst of the process of writing its new political constitution). Field notes were taken about the interaction flow and the subjects' attitudes during the activity. According to the field notes, four main stages were identified during the activity: (S1) A brief initial coordination, where the subjects agreed to present their opinions sequentially and then write the sentence in agreement.
(S2) The first exposition (by subject 4) that ended after approximately one minute, interrupted by another subject worried about the time remaining to complete the activity. (S3) The rest of the presentations, which continued sequentially with sporadic interventions by the rest of the subjects. (S4) An attempt to write down an agreement, although subjects could not successfully finish the activity during the remaining time.
Regarding the subjects' attitude, a dominant attitude of Subject 4 was remarked by the experimenter, as the subject constantly commented on the positions of the rest of the group's members. The experimenter observed the rest of the members as open to hearing and collaborating. The recorded data, presented in Table 3, summarize the number of interventions recorded for each subject, as well as the number of posture changes identified by the system. When asked about the degree to which the visualizations contribute to understanding the interaction flow of the subjects, the reviewers agreed that the timeline visualization was the most valuable because it clearly shows that the subjects took turns to present their positions. The timeline visualization, presented in Figure 4, depicts what the reviewers characterize as the four stages described in the experimenter's notes. The analysis criterion agreed to by the reviewers was to ignore isolated detections that could be produced by noise or slight changes in posture. Instead, they focused on the big blocks of interaction. The timeline clearly shows how all the four subjects speak to agree on the interaction procedure during Stage 1, while the dominance of Subject 4 is shown in Stage 2. Then, Stage 3 interactions show the presentation of Subject 3, one comment by Subject 4, and then a brief exposition by Subject 2, complemented by Subject 1. Finally, Stage 4 interaction shows how Subject 4 starts wrapping up with the contribution of the other subjects. Another useful visualization for the interaction flow is the spoken interaction graph in Figure 5A. Although it does not provide a timely representation of events, it is clearly visible how Subjects 2 and 4 dominate the number of interventions. Note that this visualization does not allow observing how long the speaking interventions of the subjects took. Therefore, Figure 5B helps to understand better the distribution of the speaking time: Subject 4, again, shows some of the most extended interventions (22 s), while most of the interventions of the rest of the subjects are no longer than six seconds.
When asked about subjects' attitudes during the activity, the timeline visualization in Figure 4 was also preferred by the reviewers to have an initial idea of the subjects' performance. Subject

Competitive Activity
In this activity, the same four subjects were asked to jointly decide who should be saved in a bunker in an apocalyptic scenario. Again, subjects had five minutes to get to an agreement while the experimenter took notes about the interaction flow and the subjects' attitude. The experimenter's notes describe that the activity was as chaotic as predicted: no interaction agreement was defined by the group, and each started to argue on how they themselves were the best choice to be saved. The experimenter noticed that Subject 4 kept a dominant attitude, but in this case, Subject 2 was more active in presenting their arguments, while Subject 3 was remarkably overwhelmed by the situation. Subject 1 showed a calm attitude, although their interventions were longer than the ones from the Collaborative Activity. Analogously to the collaborative activity, Table 4 illustrates the recorded data for this second activity.  Figure 6 presents the timeline visualization. In this case, reviewers found less value in the visualization regarding the interaction flow. The reason is that the subjects' chaotic interventions can hardly be distinguished from what was considered noise by the reviewers in the previous case. However, in his case, the spoken interaction graph in Figure 7A was highly valuable, as the reviewers found that it reflects an intensive interchange of ideas among Subject 4 and Subjects 1 and 3, with strong colored arcs. Comparing this visualization with the analogous in the Collaborative Activity ( Figure 7A), the reviewers concluded that this visualization might be helpful to identify when the subjects are discussing a topic. Regarding the duration distribution of the spoken interactions, both reviewers agreed that there were no differences between the collaborative and competitive activities. Finally, regarding the subjects' attitude, reviewers agreed that the timeline, in this case, is valuable to understand the intensity of the discussion, as non-open postures were prevalent in all four subjects. When comparing the timelines of both cases, reviewers consider that the postures shown in the timeline could provide insights about the intensity of the debate and even help indicate changes in its dynamics. For instance, as shown at the last minute, subjects 1, 2, and 3 seem to anticipate the finish of the activity with a calm attitude, unlike subject 4, who consistently raised his arms. The posture proportion visualization also supports this in Figure 7B, where a higher proportion of non-open postures is found for all the subjects, unlike the results of collaborative activity ( Figure 5D).

Discussion
In this section, we present a focus group conducted to explore the usefulness of the visualizations. Then, we discuss the focus results and their relationship with the design goals and requirements.

Focus Group
To discuss the potential applications of the visualizations, we conducted a focus group with three teachers from Chile. The main research question was: What visual feedback elements can help you to assess whether a group is well-performing in a collaborative activity or not? The three teachers differ in experience and discipline, but all teach primary and secondary students and have educational backgrounds. Teacher 1 (T1) is a secondary teacher in mathematics with six years of experience. Teacher 2 (T2) is a secondary teacher in history and geography with ten years of experience. Teacher 3 (T3) is a primary teacher in English language (English) with four years of experience. Two researchers conducted a 60-min focus group. The three stages of the activity and its main results are detailed below.
Blind Stage: The guiding question of this stage was "what non-verbal and paraverbal communicative characteristics does a collaborative team have?". We call this stage "blind" because none of the teachers has seen the visualizations.
T1 and T2 commented that the members look at each other when talking in a collaborative group: "when everyone is looking at what they have to do individually, it is often a non-collaborative group" (T1). T2 and T3 agreed that engaged students have high kinesthetic activity: "generally a non-collaborative group is a group that does not express much with its body, because it has no interest, it is more individualistic" (T2). The three participants also agreed that open body postures show that team members are eager to collaborate: "when you're standing with your arms crossed, all in a little more rigid or backward position as you mention, it's a posture, shall we say, of little interest in collaborating" (T3).
Guessing Stage: In this stage, we presented the visualizations of the collaborative (Figures 4 and 5) and competitive activity (Figures 6 and 7) to the three teachers, without telling them which type of activity it was. The visualizations of the collaborative and competitive activities were tagged as Group A and Group B, respectively. The guiding question was "which of the two groups is collaborative?".
T1 and T2 agreed that Group A was collaborative because the timeline visualizations showed more structured interactions: "Each one had its moment, you could even see that subjects 1 and 2 of Group A as there was an interaction between the two of them in the last part, they interact in an orderly way, and in the other one (Group B) no, you don't see a process"..."I can think that they interrupt each other many times because one speaks, then the other speaks and they are almost speaking at the same time." (T1), and "generally when doing collaborative work, it is important that I give my point of view and that others listen to me" (T2).
Also, T1 and T2 agreed that Subject 3 in Group B postures and hand movements were signs of a non-collaborative behavior: "He's kind of hedging, probably being a little bit more defensive. In my opinion, in the classroom this has a relationship with being individualistic" (T2).
Teachers T1 and T2 comment on the postures that accompany the interactions, both of those who speak and those who listen: "Subject 4 of group B has, as far as I can see, purple circles at the moment of interacting, that is, talking, it is also a characteristic of nonverbal language" (T1), and "subject 1 did not show a variation because he kept himself in something that we know as active listening. Therefore, as subject 4 in group A was talking, moving, explaining, probably the others were with their hands down listening." (T2). T3 indicated agreement with these statements.
On the other hand, T3 stated that Group B also seems collaborative from the point of view of collaborative language activities: "there are short dialogues, clearly there is less speaking time and in fact it is very good that everyone gets to speak for the same amount of time. Otherwise it becomes a monologue and the children don't practice the language" (T3) Usefulness Stage: The guiding question of this stage was "Which alerts or indicators could help you to improve the collaboration facilitation and assessment of the groups?". All the teachers agreed on the following indicators and alerts for the group: participation time and distribution among team members and alerting when the collaboration flow differs from a previously designed structure. The teachers also agreed on the importance of showing subjects' kinesthetic activity and knowing if they are looking to each other, as a sign of engagement in the activity. The teachers also agreed on alerting when a team member speaks significantly more than the others and when just a single team member is receiving all the interactions (as a sign that only one team member was doing all the work). Finally, all the participants agreed on alerting when a subject does not look to other team members.

Discussion on Feedback Usefulness
The results from the case study consist of a starting point to provide feedback for the behavioral indicators and traces proposed in Table 2. In the following paragraph, we detail our insights about each of the collaboration constructs.
Regarding cognitive contribution, as the two activities were mainly spoken, we believe it was possible to trace the contribution of each member by the number and duration of the spoken interactions, as presented in Figures 5A and 7A. Furthermore, the activity's short duration helped the subjects to focus on contributing. Under this context, we think that the provided visualizations could be helpful for teachers to observe and understand the cognitive contribution of the subjects. More complex activities requiring more coordination, or longer activities where subjects could speak about other subjects than the required task, would need to identify each spoken interaction's matter to consider it a cognitive contribution. Moreover, complementary measures would be needed for activities with other types of cognitive contribution (e.g., collaborative writing or modeling).
Concerning the assimilation construct, we believe that the results for the competitive activity successfully show the criticality behavior indicator in the overlapped, short-timed spoken interactions depicted in Figure 6, which are characteristic of a non-collaborative behavior. We think this result could allow teachers to identify whether a team needs their intervention to avoid excessive criticality between the subjects.
Regarding the team coordination construct, the graphs in Figures 5 and 7 depict that team members communicated with each other. We expected that in the visualization for the competitive activity, it would be apparent how a subject was less involved in the debate. However, the graphs do not seem to show any insights into this fact. It seems the proposed analytic and visualization do not provide enough insight into team coordination. An improvement could be measuring the spoken interventions of the subjects aimed to achieve team coordination.
For the self-regulation coordination, the timeline visualizations allow to clearly observe differences in how the subjects adapt their behavior to achieve a collaborative goal: while in the collaborative activity, each team member takes a turn to contribute, in the competitive activity, the chaotic interaction shows no adaptations to collaborate. We think that this visualization might be helpful for teachers to distinguish teams that are capable of self regulating from groups that would need their help to get coordinated, such as presented in [48].
For the cultivation of the environment, the differences in body postures presented in Figure 5B,D clearly show that subjects kept an open posture in the collaborative activity in contrast with more varied postures in the competitive activity. The emergence of expansive postures (e.g., hands to the head shown by Subject 4, in the competitive activity) and defensive ones (e.g., hugging the opposite arm by Subject 3, in the same activity) seems to provide insights about a change of attitudes that could affect the collaborative environment. However, this is valid for observing the same subjects in different situations. Besides the visualization, it would be helpful to notify the teacher when there is a change in the typical collaborative postures of the team's subjects.
Finally, concerning integration, we think that spoken contribution visualization in Figures 5A and 7A is helpful given the specific characteristics of the activity, as subjects can participate in any other way than speaking. In this context, the visualization is valuable in identifying subjects with less spoken interaction, allowing teachers to intervene to foster the integration of those subjects.

Limitations and Validity Discussion
In this section, we comment on the limitations of the designed tool and for the initial empirical evaluation.
Concerning the tool's design, our application of the framework by Boothe et al. [21] is constrained to non-verbal communication. Since our overarching goal is to provide realtime feedback for many groups simultaneously, we did not consider verbal communication or content analysis due to the technical limitations of analyzing multiple voice streams in real time. That said, we think that behavioral indicators combining non-verbal and verbal communication can better inform collaboration constructs, which is the focus of our future work. Another constraint for defining behavioral indicators is that the case study presented in Section 5 was performed under the restrictions of COVID-19, so the participants were using masks. Features such as facial expressions could not be extracted to inform behavioral indicators. However, thanks to the tool's architecture, they can easily be integrated without significant changes.
The initial empirical evaluation is limited to assessing whether the designed tool contributes to the stakeholders' goals, and further studies are required to validate the tool's effect on collaborative learning. With this aim, we explicitly decided to ask the subjects to perform two types of opposite collaborative behaviors to emphasize the differences in the visualizations for their discussion in the focus group. Alternative study designs, such as comparing the analytics and the performance of several groups performing a collaborative activity, are being considered for validating the tool.
Finally, the design and sample size of the focus group do not allow us to generalize the results. However, since we are not validating the tool but exploring if it helps stakeholders achieve their goals, we opted for a freer focus group design, favoring deeper discussions among participants, which is appropriate to our methodological framework.

Conclusions and Future Work
The gradual incorporation of technologies in educational environments can support teachers in developing highly valued competencies in the work environment [49]. Under this perspective, the measurement of aspects associated with non-verbal communication becomes relevant since it allows us to understand how subjects interact in collaborative activity, as well as providing effective feedback to both students and teachers.
This paper presents the design and development of a MMLA platform using sensors to capture and visualize audio and video data. It graphically provides feedback analytics to support collaboration assessment in face-to-face environments (co-located collaboration). For this purpose, we integrated hardware and software, and incorporate machine learning techniques to develop a scalable system. The platform allows to detect the amount and duration of the team members' spoken interactions, body postures, and gestures. These features are presented in five different visualizations to provide insights about theoretical collaboration constructs.
We conducted a case study to compare the visualizations provided by the system in two different situations: collaborative and competitive activities. The results suggest that the provided visualizations help to identify issues on cognitive contribution, assimilation, self-regulation, and integration of the team members. They could also support teachers to decide whether they must assist a team in fostering collaboration.
While the results are naturally constrained to the characteristics of the activities in which we tested the platform, they provide initial evidence about the technical feasibility of extracting behavioral indicators and traces using MMLA to give insights on team collaboration.
Future work will focus on the improvement of the platform's scalability in order to allow real-time monitoring of various teams. Moreover, future work will cover the extraction of features from verbal communication, allowing the identification of the topics/subjects of the team members' spoken interactions and better supporting different collaboration constructs in more extended and complex activities. Once real-time monitoring is implemented we intend to assess to what extent teachers' actions based on visualizations input affect students participation in the activities and helped to enhance their collaboration. For that, we intend to follow some conditions and guidelines for fruitful collaboration identified by [50].  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available from the corresponding author upon request.

Conflicts of Interest:
The authors declare no conflict of interest.