Protected Network Architecture for Ensuring Consistency of Medical Data through Validation of User Behavior and DICOM Archive Integrity

: The problem of consistency of medical data in Hospital Data Management Systems is considered in the context of correctness of medical images stored in a PACS (Picture Archiving and Communication System) and legality of actions authorized users perform when accessing MIS (Medical Information System) facilities via web interfaces. The purpose of the study is to develop a SIEM-like (Security Information and Event Management) architecture for ofﬂine analysis of DICOM (Digital Imaging and Communications in Medicine) archive integrity and users’ activity. To achieve amenable accuracy when validating DICOM archive integrity, two aspects are taken into account: correctness of periodicity of the incoming data stream and correctness of the image data (time series) itself for the considered modality. Validation of users’ activity assumes application of model-driven approaches using state-of-the-art machine learning methods. This paper proposes a network architecture with guard clusters to protect sensitive components like the DICOM archive and application server of the MIS. New server roles were designed to perform trafﬁc interception, data analysis and alert management without reconﬁguration of production software components. The cluster architecture allows the analysis of incoming big data streams with high availability, providing horizontal scalability and fault tolerance. To minimize possible harm from spurious DICOM ﬁles the approach should be considered as an addition to other securing techniques like watermarking, encrypting and testing data conformance with a standard.


Introduction
Today, healthcare facilities are vast ecosystems made up of a large number of network devices, equipment and systems that often require connection to external systems. Medical data are very sensitive to change, and poses a real threat to the health and life of patients. One does not need to have special skills to become familiar with the potential vulnerabilities that a healthcare facility may face. Therefore, the security of medical data must be ensured at every stage of receiving, transferring, processing and storing information to ensure the confidentiality of patient data, as well as the availability and sustainability of health services at the same time [1]. Based on this, manufacturers of medical systems, as well as organizations that organize support, need to implement measures to ensure the necessary level of protection against cyber threats, to increase the level of safety of patients and the infrastructure of the medical institution as a whole. The following consider prevalent directions of securing medical data.

1.
Ensuring incoming data conformance with DICOM standards. This should be implemented on the server-side, the DICOM server or filtering component running as a front end for the DICOM server. The datum of interest is IOD (Information Objects Since any software may have vulnerabilities, the problem of authenticity of medical data arises, i.e., whether the file in the DICOM archive is legally added and is not fabricated. All the techniques mentioned above are necessary to ensure security of the data transferred to and stored in the archive. We propose architecture of a versatile and extensible information system designed to work in a SIEM cluster with traffic interception, analytics facilities and alert management. The main architectural purpose is to allow integration of present and future state-of-the -art model-driven approaches of the medical data and user behavior validation mentioned in paragraphs 4 and 5 in the above. Since human life is of the highest value, all measures to ensure data security in medical information systems are aimed at ensuring the safety of life and health of patients. A dangerous situation can arise due to the actions of an intruder gaining access to the institution's network by exploiting vulnerabilities in the external perimeter, and/or overly permissive access policies within the local computer network. An attacker can not only spoil or fabricate medical images in the DICOM archive, but also harm the patient and the institution by taking over the MIS user accounts (doctors' credentials are of particular interest) and adjusting the patient's course of treatment at his own discretion. Two measures under consideration will help protect patients and doctors from misdiagnoses and the consequences of inappropriate treatment. (1) Data protection in DICOM archives eliminating influence of damaged or fake medical images on decisions made by the diagnostician.
(2) Analysis of user activity making it possible to reveal the extraordinary and dangerous behavior of MIS users associated with destructive interference in the activities of doctors. Protection of a medical institution's network implies both basic perimeter protection using firewalls and access rules implemented in network equipment, as well as the DICOM archive and application server protection components with analysis of user activity, as proposed in this paper. Thus, the protection of technical objects of the infrastructure of a medical institution ensures the protection of the life and health of patients, as well as the professional reputation of specialists in particular, and the institution as a whole.

Materials and Methods
This section is organized as follows. First, the formalization of the problem of data legality is presented, where three aspects are discussed: correctness of periodicity of the incoming data stream, correctness of image data (time series) for a considered modality, and legality of users' behavior. Second, we describe architectural features of subsystems intended for offline analysis of DICOM files and user activity. Consider data series with periodically posted samples into an archive: they may be generated not only by immovable equipment, but also wearable devices, like biomedical electrocardiogram sensors. Hardware provides data on demand of software, which usually performs periodic requests and sends data to the cloud [22][23][24][25]. If periodicity of the incoming data stream is violated, it is highly likely to have an attack, if a simple loss of network connection is not the case. The following describes the proposed model of periodicity correctness for incoming data streams. It allows some variance of the periodicity to handle rare network connection issues.
Let X = X 1 , X 2 , . . . , X r 1 , X r 2 , . . . , X r n , . . . , X d 1 , X d 2 , . . . , X d m , . . . be the time series of considered modality. It is divided into two parts: reference part (samples denoted as X r • ) and a part for analysis (samples denoted as X d • ). Let time(X r • ) be the function returning timestamp of the sample X r • passed as the argument.
Let t r i = time(X r i+1 ) − time(X r i ), i = 1, . . . , n − 1 be the time interval of two samples in the reference part.
Then average time interval for the reference part 1 where T r is the expected period, ∆ r is allowed variation of the expected period. Both are user-defined external parameters represented by positive datetime values.
Consider A malicious attack may involve not only the sending of knowingly fake data, breaking the periodicity, but also damage to images already recorded in the PACS. The second aspect we take into account is legality of the medical data themselves [26]. Special methods and algorithms are developed [27][28][29][30] for different modalities to perform analysis of the files and estimate possibility of fabrication of medical data with correct timestamps. Further Section 2.2.1 presents architecture of an analyzing cluster for offline analysis of DICOM archive content using a proposed periodicity correctness model and existing model-driven methods, implementation of which can be integrated via a pluggable modules mechanism. Development of such plugins is out of the scope of the present research, since the purpose is to provide their integration within a dedicated subsystem to protect a DICOM archive.

Legality of User Actions
User accounts also form the attack surface. An attacker can take control of a specialist's account and carry out formally correct operations leading to a drastic change in the patient's treatment plan, which can lead to a threat to life. Special methods and algorithms are developed to perform user behavior analysis on the model-driven basis. Further Section 2.2.2 presents architecture of an analyzing cluster for offline analysis of user activity using modern methods based on machine learning, implementation of which can be integrated via a pluggable modules mechanism as well as for DICOM archive content validation. Development of such plugins is out of the scope of present research, since the purpose is to provide their integration within a dedicated subsystem to protect an application server of an MIS.

Validation of DICOM Archive Integrity
The proposed network analyzer (cluster) is intended for intercepting DICOM traffic and offline analysis of medical data of observed patients. Low-level software components rely on libpcap library [31] to capture traffic from network interfaces on Linux machines. All Linux functions and libpcap capabilities are labeled as LinuxKernel components in the presented UML (Unified Modelling Language) diagrams for simplicity.
The class diagram in Figure 1 illustrates the anatomy of traffic analysis components relevant to medical data. All the methods are documented in the Tables A1-A5.
The purpose of the classes on the diagram is: 1. DICOMInterceptor class is for inspecting DICOM packets, initiating content checking and generating alerts in case of detection of suspicious DICOM files. It is the main class interacting with LinuxKernel.

2.
LegalityCheckingProvider class is for configuration management and providing functionality for medical data analysis (testPatientSeries, testPatientFile methods). The patient's medical data in the DICOM archive is only analyzed if tracking is enabled for the respective patient and data series (lookOnPatientSeries method). In this case, the associated machine learning models must be initialized with a training sample of regular legal data specified by a time interval (retrainPatientSeriesModel method).

3.
SeriesTester class loads and uses various plugins to check legality of data for different modalities (types of data series). It also uses the instance of PeriodAnalyzer to test periodicity of incoming data.

4.
PeriodAnalyzer class is for analyzing the periodicity of data incomings in accordance with the proposed periodicity correctness model (getSuspiciousFiles method).

5.
ILegalityChecker is a common interface each modality-checking plugin must implement to provide functionality for initialization and utilization of the machine learning model intended for data legality checking (trainModel and testSeries methods). 6.
MRChecker, RGChecker, ECGChecker are MR (Magnetic resonance imaging), RG (Radiographic imaging), ECG (Electrocardiography) modality sample checking plugins implementing ILegalityChecker interface, respectively. The purpose of the classes on the diagram is: 1. DICOMInterceptor class is for inspecting DICOM packets, initiating content checking and generating alerts in case of detection of suspicious DICOM files. It is the main class interacting with LinuxKernel. 2. LegalityCheckingProvider class is for configuration management and providing functionality for medical data analysis (testPatientSeries, testPatientFile methods). The patient's medical data in the DICOM archive is only analyzed if tracking is enabled for the respective patient and data series (lookOnPatientSeries method). In this case, the associated machine learning models must be initialized with a training sample of regular legal data specified by a time interval (retrainPatientSeriesModel method). 3. SeriesTester class loads and uses various plugins to check legality of data for different modalities (types of data series). It also uses the instance of PeriodAnalyzer to test periodicity of incoming data. 4. PeriodAnalyzer class is for analyzing the periodicity of data incomings in accordance  Figure 2 illustrates the relationship of components in the system and other software with which it interacts, for example, dcm4che, which provides the functionality of a DICOM archive. The source of data for the latter is various medical hardware and software systems and wearable devices (the designation of their firmware appears in the component diagram as MREquipmentFirmware, RGEquipmentFirmware, ECGEquipmentFirmware, WearableE-quipmentFirmware). The activity of the attacker's software can also affect the contents of the archive; it is denoted on the component diagram as AttackersSoftware, which also uses DCM4CHE like medical equipment. tion classes for different modalities are implemented in separate components. The component diagram in Figure 2 illustrates the relationship of components in the system and other software with which it interacts, for example, dcm4che, which provides the functionality of a DICOM archive. The source of data for the latter is various medical hardware and software systems and wearable devices (the designation of their firmware appears in the component diagram as MREquipmentFirmware, RGEquipmentFirmware, ECGEquipmentFirmware, WearableEquipmentFirmware). The activity of the attacker's software can also affect the contents of the archive; it is denoted on the component diagram as AttackersSoftware, which also uses DCM4CHE like medical equipment. DICOMAnalyzer, MRAnalyzer, ECGAnalyzer, RGAnalyzer (and other validation components for different modalities) are packaged as separate artifacts (with the same names, Figure 3). They should all be deployed to the LegalityAnalysisBackend node. These nodes should be clustered for high availability. Inbound traffic destined for PACS is monitored at the TCPInterceptorFrontend node and routed to LegalityAnalysisBackend nodes for analysis with round-robin balancing. dcm4che, which provides the DICOM archive functionality, and PostgreSQL, which provides the database for it, are deployed on separate DICOMArchive and DICOMArchiveDatabase nodes, respectively. DICOMAnalyzer, MRAnalyzer, ECGAnalyzer, RGAnalyzer (and other validation components for different modalities) are packaged as separate artifacts (with the same names, Figure 3). They should all be deployed to the LegalityAnalysisBackend node. These nodes should be clustered for high availability. Inbound traffic destined for PACS is monitored at the TCPInterceptorFrontend node and routed to LegalityAnalysisBackend nodes for analysis with round-robin balancing. dcm4che, which provides the DICOM archive functionality, and PostgreSQL, which provides the database for it, are deployed on separate DICOMArchive and DICOMArchiveDatabase nodes, respectively. The activity diagram in Figure 4 illustrates the process of inspecting and analyzing DICOM packets in a system on an analytic cluster. The process shown in the diagram assumes that the model for observing the series has already been trained (training was initiated manually or automatically using planning tools). Only valid DICOM packets can be parsed. When a packet arrives at the Linux kernel on a TCPInterceptorFrontend node, it is redirected for analysis to one of the LegalityAnalysisBackend nodes where it is The activity diagram in Figure 4 illustrates the process of inspecting and analyzing DICOM packets in a system on an analytic cluster. The process shown in the diagram assumes that the model for observing the series has already been trained (training was initiated manually or automatically using planning tools). Only valid DICOM packets can be parsed. When a packet arrives at the Linux kernel on a TCPInterceptorFrontend node, it is redirected for analysis to one of the LegalityAnalysisBackend nodes where it is checked and the timestamp value is retrieved from it (DICOMAnalyzer component). If the package being checked is a valid DICOM package, the parsing procedure runs asynchronously. It waits until the DICOM file appears in the DICOM archive, and then begins analysis for correctness of periodicity and content using the PeriodAnalyzer and [Modality]Checker instances. The getCheckerForSeries method of the SeriesTester class decides which plugin to select for content analysis. When suspicions arise, the postAlert method is called, which can provide the required functionality for alerting about threats and additional event logging. In all cases, network activity is logged (passThroughAndLog). A more detailed description of the interaction of components is illustrated in the sequence diagram in the Figure A1.

Validation of User Behavior
The network analyzer (cluster) is designed to intercept HTTP (Hypertext Transfer Protocol) traffic and autonomously analyze incoming events generated by specialists interacting with the interfaces of the medical information system. Examples of interfaces are a radiologist's personal account, an ECG specialist's personal account and an anesthesiologist's personal account. Low-level software components rely on mitmproxy (man-in-the-middle proxy) [32] functionality to capture traffic destined for the MIS web server. All Linux functions are labeled as a LinuxKernel component in the presented UML diagrams for simplicity.
The class diagram in Figure 5 illustrates the anatomy of the user behavior analysis components. All the methods are documented in Tables A6-A10.
The purpose of the classes on the diagram is: 1.
HTTPInterceptor class is designed to parse HTTP packets, run content inspection and generate alerts in case of detection of dangerous user actions. This is the main class that interacts with mitmproxy.

2.
UserActivityJournal class is designed to manage user activity log and provide data to analytic components.

3.
IUserActivityJournal is an interface for working with the user activity log.

4.
AuthenticityCheckingProvider class is designed to manage configuration and provide compliance checking functionality for the User Behavior Analysis Component (HTTPAnalyzer). User activity data is only analyzed if tracking is enabled for the respective user (trackUser method). In this case, the associated machine learning models must be initialized with a training sample of regular legal data specified by a time interval (retrainUserAuthModel method).

5.
UserActivityTester class loads and uses various plugins to authenticate user activity to various medical professionals. 6.
IAuthenticityModel is a common interface that every behavior validation plugin for any medical specialization must implement to provide functionality for initialization and utilization of machine learning model intended for data legality checking (trainModel, refineModel and testSeries methods). 7.
All classes, except for the behavior testing classes for various medical specializations (MRSpecialistModel, CTSpecialistModel, AnesthetistModel), are implemented in the HTTPAnalyzer component. Behavior validation classes for different medical specializations are implemented in separate components. The component diagram in Figure 6 Appl. Sci. 2021, 11, 2072 8 of 19 illustrates the relationship between the components in the system and other software with which it interacts. For example, an application server that provides the functionality of the interfaces of specialists' personal accounts. The source of events for the latter is various client devices of specialists with a web browser. The activity of the attacker's software can also affect the state of the application server; it is denoted on the component diagram as AttackerHTTPClient, which also uses the web server like specialists' browsers.

Validation of User Behavior
The network analyzer (cluster) is designed to intercept HTTP (Hypertext Transfer Protocol) traffic and autonomously analyze incoming events generated by specialists interacting with the interfaces of the medical information system. Examples of interfaces are a radiologist's personal account, an ECG specialist's personal account and an anesthesiologist's personal account. Low-level software components rely on mitmproxy (man-in-the- Linux functions are labeled as a LinuxKernel component in the presented UML diagrams for simplicity. The class diagram in Figure 5 illustrates the anatomy of the user behavior analysis components. All the methods are documented in Tables A6-A10. The purpose of the classes on the diagram is: 1. HTTPInterceptor class is designed to parse HTTP packets, run content inspection and generate alerts in case of detection of dangerous user actions. This is the main class that interacts with mitmproxy. 2. UserActivityJournal class is designed to manage user activity log and provide data to analytic components. 3. IUserActivityJournal is an interface for working with the user activity log. 4. AuthenticityCheckingProvider class is designed to manage configuration and provide compliance checking functionality for the User Behavior Analysis Component (HTTPAnalyzer). User activity data is only analyzed if tracking is enabled for the respective user (trackUser method). In this case, the associated machine learning HTTPAnalyzer, MRSpecialistValidator, CTSpecialistValidator, AnesthetistValidator (and other user behavior validation components for various medical specializations) are packaged as separate artifacts (with corresponding names, Figure 7). They should all be deployed to the BehaviorAnalysisBackend node. These nodes should be clustered for high availability. Inbound traffic destined for the application server is intercepted at the HTTPInterceptorFrontend node and routed to BehaviorAnalysisBackend nodes for analysis with round-robin balancing. User activity log functionality can be provided by the rsyslog component, which is deployed to a separate JournalBackend node. An application server that provides the functionality of specialists' personal accounts should be deployed on a separate WebServer node.
The activity diagram in Figure 8 illustrates the process of inspecting and analyzing HTTP packets in a system on an analytic cluster. The process shown in the diagram assumes that the behavioral models of the specialists have already been trained (training was initiated manually or automatically using the planning tools). Only valid HTTP packets carrying specialist activity data can be parsed. When a packet arrives at the mitmproxy proxy on the HTTPInterceptorFrontend node, it is redirected for analysis to one of the BehaviorAnalysisBackend nodes, where it is checked, and the timestamp value is retrieved from it (HTTPAnalyzer component). If the package being checked is a valid HTTP package associated with a specialist activity, the following happens: the user ID is found out and, if the user is being tracked, the event is logged, and the new incoming events counter is incremented. If the minimum threshold for the number of new incoming events is reached, the procedure for analyzing them starts asynchronously. The time interval for collecting events is passed to it (eventsCollectingStartTime defines the beginning, timestamp of the last incoming event defines the end). An analysis of the authenticity of user activity using the ModalitySpecialistModel begins, the new incoming event counter is reset to zero, and the current time is captured as a new point to start collecting events from. The getCheckerForUser method of the UserActivityTester class decides which plugin to select for events analysis. When suspicions arise, the postAlert method is called, which can provide the required functionality for alerting about threats and additional event logging. In all cases, network activity is logged (passThroughAndLog). A more detailed description of the interaction of components is illustrated in the sequence diagram in Figure A2. Figure 9 illustrates the place of network analyzers (clusters) in a typical hospital network infrastructure.

Place of Server Roles in Network Infrastructure
Appl. Sci. 2021, 11, x FOR PEER REVIEW 10 of 20 models must be initialized with a training sample of regular legal data specified by a time interval (retrainUserAuthModel method). 5. UserActivityTester class loads and uses various plugins to authenticate user activity to various medical professionals. 6. IAuthenticityModel is a common interface that every behavior validation plugin for any medical specialization must implement to provide functionality for initialization and utilization of machine learning model intended for data legality checking (train-Model, refineModel and testSeries methods). 7. MRSpecialistModel, CTSpecialistModel, AnesthetistModel are MR, CT (Computed tomography imaging), anesthesiologist specialist behavior validator plugins that implement the IAuthenticityModel interface, respectively.
All classes, except for the behavior testing classes for various medical specializations (MRSpecialistModel, CTSpecialistModel, AnesthetistModel), are implemented in the HTTPAnalyzer component. Behavior validation classes for different medical specializations are implemented in separate components. The component diagram in Figure 6 illustrates the relationship between the components in the system and other software with which it interacts. For example, an application server that provides the functionality of the interfaces of specialists' personal accounts. The source of events for the latter is various client devices of specialists with a web browser. The activity of the attacker's software can also affect the state of the application server; it is denoted on the component diagram as AttackerHTTPClient, which also uses the web server like specialists' browsers. packaged as separate artifacts (with corresponding names, Figure 7). They should all be deployed to the BehaviorAnalysisBackend node. These nodes should be clustered for high availability. Inbound traffic destined for the application server is intercepted at the HTTPInterceptorFrontend node and routed to BehaviorAnalysisBackend nodes for analysis with round-robin balancing. User activity log functionality can be provided by the rsyslog component, which is deployed to a separate JournalBackend node. An application server that provides the functionality of specialists' personal accounts should be deployed on a separate WebServer node. The activity diagram in Figure 8 illustrates the process of inspecting and analyzing HTTP packets in a system on an analytic cluster. The process shown in the diagram assumes that the behavioral models of the specialists have already been trained (training was initiated manually or automatically using the planning tools). Only valid HTTP packets carrying specialist activity data can be parsed. When a packet arrives at the mitmproxy proxy on the HTTPInterceptorFrontend node, it is redirected for analysis to one of the BehaviorAnalysisBackend nodes, where it is checked, and the timestamp value is retrieved from it (HTTPAnalyzer component). If the package being checked is a valid HTTP package associated with a specialist activity, the following happens: the user ID is found out and, if the user is being tracked, the event is logged, and the new incoming events counter is incremented. If the minimum threshold for the number of new incoming events is reached, the procedure for analyzing them starts asynchronously. The time interval for collecting events is passed to it (eventsCollectingStartTime defines the beginning, timestamp of the last incoming event defines the end). An analysis of the authenticity of user activity using the ModalitySpecialistModel begins, the new incoming event counter is reset to zero, and the current time is captured as a new point to start collecting events from. The getCheckerForUser method of the UserActivityTester class decides which plugin to select for events analysis. When suspicions arise, the postAlert method is called, New server roles TCP Interceptor Frontend, Legality Analysis Backend, together with the existing DICOM Archive and DICOM Archive Database, form a PACS cluster for which it is advisable to allocate a separate VLAN1 (Virtual Local Area Network). Traffic that was previously directed to the DICOM Archive must be directed to the TCP Interceptor Frontend. After passing through the Legality Analysis Backend, it is directed to the DICOM Archive automatically: this is the default behavior. The latter aspect is regulated by a separate option to allow alternative analytic cluster deployment. Traffic directed directly to the DICOM Archive can be duplicated on the TCP Interceptor Frontend using port mirroring on the managed switch. In this case, routing packets from the Legality Analysis Backend to the DICOM Archive would be redundant and should, therefore, be disabled.
New server roles HTTP Interceptor Frontend, Behavior Analysis Backend and Journal Backend, together with the existing Web Server, form a Workplaces Servers Cluster, for which it is advisable to allocate a separate VLAN2. Traffic that was previously directed to the Web Server must be directed to the HTTP Interceptor Frontend. After passing through the Behavior Analysis Backend, it is directed to the Web Server automatically: this is the default behavior. The latter aspect is regulated by a separate option to allow an alternative analytic cluster deployment. Traffic directed directly to the Web Server can be duplicated to the HTTP Interceptor Frontend using port mirroring on the managed switch. In this case, routing packets from the Behavior Analysis Backend to the Web Server would be redundant and should, therefore, be disabled.
Traffic routing rules, including between VLANs, should be developed taking into account the specifics of the location of specialists' workplaces, and should not be overly permissive in order to reduce the network attack surface.
The DICOM archive, application servers, and the proposed components of their protection, form a centralized system intended for deployment within the local computer network of a medical institution. The databases required for the functioning of the protected components (DICOM Archive Database and other databases of the MIS application servers), as well as the user activity log (Journal Backend), cannot be moved outside the LAN perimeter, since they require connections with low time delays. They contain all the medical data that comes to the institution and are generated within it, being the subject of interest of this work. The system has decentralized analytical cluster subsystems to provide horizontal scalability and improve fault tolerance. which can provide the required functionality for alerting about threats and additional event logging. In all cases, network activity is logged (passThroughAndLog). A more detailed description of the interaction of components is illustrated in the sequence diagram in Figure A2.  Figure 9 illustrates the place of network analyzers (clusters) in a typical hospital network infrastructure. The initial start-up of system functions requires the participation of a medical specialist. The patient's medical data in the DICOM archive, as well as MIS user activity data, are only analyzed if tracking is enabled for the respective patient or user and the associated machine learning models are trained with an appropriate sample specified by a specialist with a time interval. After this, the system's behavior boils down to continuous automatic analysis of incoming big data streams and issuing warnings in case of detection of unwanted activities, e.g., suspicious DICOM files or dangerous user actions.

Place of Server Roles in Network Infrastructure
for which it is advisable to allocate a separate VLAN2. Traffic that was previously directed to the Web Server must be directed to the HTTP Interceptor Frontend. After passing through the Behavior Analysis Backend, it is directed to the Web Server automatically: this is the default behavior. The latter aspect is regulated by a separate option to allow an alternative analytic cluster deployment. Traffic directed directly to the Web Server can be duplicated to the HTTP Interceptor Frontend using port mirroring on the managed switch. In this case, routing packets from the Behavior Analysis Backend to the Web Server would be redundant and should, therefore, be disabled. Traffic routing rules, including between VLANs, should be developed taking into account the specifics of the location of specialists' workplaces, and should not be overly permissive in order to reduce the network attack surface.
The DICOM archive, application servers, and the proposed components of their protection, form a centralized system intended for deployment within the local computer network of a medical institution. The databases required for the functioning of the protected

Discussion
Commercial SIEM solutions are common practice for healthcare facilities, as technical support is often critical. One of the leading long-standing solutions on the market is IBM QRadar [33] which embodies many of the problems raised in this work, such as allowing IBM technology partners to develop extensions, called QRadar Applications [34], including the native IBM Qradar User Behavior Analytics application [35] and also having mechanisms for maintaining high availability. Unfortunately, we could not find extensions for DICOM traffic mining. Among the undeniable architectural advantages of Qradar, it is worth noting a cluster of DataNode components, each node in which adds space and computing resources [36]. High availability relies on the presence of primary and secondary analysis hosts, with the secondary role being a standby role that is constantly synchronized with the primary.
Another common solution from AT&T Cybersecurity, according to a Gartner study [37], is USM (Unified Security Management) Appliance [38], which also contains UBA modules but does not allow the development of third-party extensions; only native plugins that parse events using regular expressions are allowed [39]. The HA support mechanism relies on a passive copy of the installation (slave) [40].
The architecture proposed does not imply limitation on the number of analytical backends in the cluster. Therefore, from the point of view of horizontal scalability, it can be considered more reliable than the considered solutions. However, intercepting front-end servers can be considered a single point of failure. To improve reliability, they can have a backup "twin". There is no need to cluster them with load balancing over TCP connections, since the primary traffic processing is lightweight and does not use local file systems to store any intermediate data.

Conclusions
The architecture of analytical clusters is used to improve functionality of SIEM environments deployed in hospitals. It is designed specifically to allow integration of validation of medical data itself and user behavior analytics to validate medical specialists' activity. The practical impact of software implementation and deployment improves the security of sensitive medical imagery. Since there are no requirements for plugins, except of implementation of one common interface which provides ML-model management, there are no additional technical efforts when implementing any model-driven approach for data validation. This allows programmers to extend the system with new analyzers when new effective methods appear.
Software implementation of the proposed architectural principles can form the basis of a framework designed to collect and analyze medical data for its authenticity, as well as user behavior data for its legality. The availability of such a toolkit will allow researchers working in any of these areas to simplify the tasks of collecting data and conducting experiments when debugging machine learning models.
The development of fault-tolerant event stores is out of the scope of this article, but is one of the subjects for further research, namely optimizing the Apache Hadoop stack [41] to build an event store ready for big data analysis. The Apache Spark Mllib [42] implementations of the many machine learning algorithms provide a suitable integrated solution for use in analytic plugins. API development, implementation of thin wrappers for such use, as well as analysis and optimization of performance when handling events, is the subject of further work.

Conflicts of Interest:
The authors declare no conflict of interest.         Returns hashmap with event timestamps as keys and their authenticity estimation as boolean values. ML model source is defined by user identified by modelUserID, test data is defined by activity of the user identified by userID. If users identified by modelUserID and userID have incompatible ML model types, an exception is raised.

Appendix B. Classes for Validation of User Behavior
refineModel Refine model for user identified by modelUserID with data associated with activity of the user identified by userID. If users identified by modelUserID and userID have incompatible ML model types, an exception is raised. Returns hashmap with event timestamps as keys and their estimation as boolean values. «True» stands for authentic, «False» stands for illegal. ML model used is associated with the user identified by modelUserID, data series are defined by activity of user identified by userID. Users identified by modelUserID and userID must be associated with the same ML model type.
refineModel Refine model for user identified by modelUserID with data associated with activity of the user identified by userID. Users identified by modelUserID and userID must be associated with the same ML model type. testSeries Record event data to journal with specified timestamp and user identifier. Figure A2. Analysis of user activity (sequence diagram).