File System Support for Privacy-Preserving Analysis and Forensics in Low-Bandwidth Edge Environments

: In this paper, we present initial results from our distributed edge systems research in the domain of sustainable harvesting of common good resources in the Arctic Ocean. Speciﬁcally, we are developing a digital platform for real-time privacy-preserving sustainability management in the domain of commercial ﬁshery surveillance operations. This is in response to potentially privacy-infringing mandates from some governments to combat overﬁshing and other sustainability challenges. Our approach is to deploy sensory devices and distributed artiﬁcial intelligence algorithms on mobile, offshore ﬁshing vessels and at mainland central control centers. To facilitate this, we need a novel data plane supporting efﬁcient, available, secure, tamper-proof, and compliant data management in this weakly connected offshore environment. We have built our ﬁrst prototype of Dorvu, a novel distributed ﬁle system in this context. Our devised architecture, the design trade-offs among conﬂicting properties, and our initial experiences are further detailed in this paper.


Introduction
Numerous Internet of Things (IoT) devices are being deployed in geo-distributed locations far outside traditional computing facilities [1,2]. Examples include video surveillance cameras, home security devices, activity trackers, logistic tracking devices, and smart factory equipment. High velocity, high volume, and heterogeneous data are continuously produced by these devices at an unparalleled scale. A key challenge is to analyze and obtain trusted, timely insight from these data streams.
Distributed system architects must carefully consider structuring alternatives to centralized on-premise or public cloud services for data analysis. Moving computations closer to data sources is likely a better option than federating and centralizing all this data [3,4]. Hence, edge computing can supplement a two-tier centralized architecture by providing an additional computing infrastructure closer to the data sources, residing between IoT devices and centralized back-end services. Edge computed data can still be federated for further analysis and storage at the central locations, but now pre-processed, filtered, and transmitted in reduced volumes.
Edge devices can produce data that are too voluminous to transmit over edge networks, and too heterogeneous to adhere to a unified set of data management rules. For the purpose of tackling this challenge, we introduce the Dorvu file system, a storage system that can be extended with policies and fine-grained access control to solve problems of bandwidth, privacy, and compliance. The Dorvu file system is part of the larger Dutkat [5] framework, comprising a distributed Artificial Intelligence (AI) hybrid cloud and edge system. This system is motivated by the need for sustainable harvesting of resources from the sea, including commercial fisheries in the Arctic Ocean. The world's global population depends on food obtained from the sea, and this dependence is growing [6]. This can become problematic because the global sea ecosystems have been and are currently under serious attacks by human activities and might not be able to meet this growing demand. Challenges include overfishing and depleted fish stocks, destroyed or polluted sea ecosystems, increased water temperatures, and lack of management and control regimes for sustainable fisheries. According to the United Nations Office on Drugs and Crime, fishery crimes are frequently transnational and organized in nature, and include illegal fishing, document fraud, drug trafficking, and money laundering [7]. As a result, several governments have proposed surveillance systems to track the activity of workers on fishing vessels [8][9][10], which has been met by some with criticism and claims of privacy intrusion and mass surveillance [11]. The Dutkat framework aims to provide some of the proposed sustainability benefits [12], while preserving the privacy of a fishing vessel crew.
In this work, we are addressing and presenting some specific parts of the envisioned Dutkat system. Specifically, the main contributions are (1) a system of how to perform privacy-preserving analysis of continuously produced data are conceptualized, incorporating challenges such as potentially poorly connected fishing vessels moving about in the Arctic Ocean, (2) we show how multi-sensory data are handled, and we showcase how to use the Dorvu file system to alleviate decision-making around issues like what type of data to store, in what format, and how to mandate access control and encryption on it. Overall, we present an alternative approach to existing surveillance programs [9,10] by replacing human inspection of video footage with a combination of automated processing of sensor data, and access control policies enforced on the storage layer at the time of data creation. We conjecture that incorporating Dorvu in this process better preserves the privacy of people working in proximity to edge sensors, through flexible, fine-grained data access, and storage policies that can retain sustainability-relevant data while discarding privacy intrusive footage, resulting in a less invasive system. The rest of this paper presents the motivation behind our edge analysis system, the architecture of our cloud and edge hybrid storage system, and the implementation of a prototype of the Dorvu file system. Particular focus is on edge nodes and our design choices targeting a distributed file system that tolerates failures, survives adversarial attacks, and meets compliance requirements.

A Mobile AI Edge System at Sea
The Dorvu storage system is intended to serve as a storage layer in the larger Dutkat [5] project revolved around monitoring professional fishing activities in isolated, offshore areas, while retaining the privacy of those working in close proximity to areas subject to surveillance. Its design involves installing robust and safe monitoring devices and related software on board commercial fishing vessels with licence to fish in Norwegian parts of the Arctic Ocean. Each such vessel has been granted a specific quota from the government detailing the amount of fish allowed to catch, the fish species, fish sizes, and similar.
Video surveillance of fishing vessels to combat sustainability issues and enforce fishing quotas has been proposed by some governments [8,9] and explored by others [10]. The Dutkat system is designed to provide some of the sustainability benefits claimed by the proposed national surveillance systems, without infringing on the privacy of workers on fishing vessels.

Geo-Distribution
The overall architecture of Dutkat reflects the widely distributed and mobile nature of this application domain. We will first detail the horizontal dimension of the architecture, which reflects physical distribution among three separate components: (1) one or several back-end control centres, (2) a collection of traditional computers on board each vessel, and (3) IoT devices primarily located outside on ship decks. We consider each of the participating large fishing vessels as individual edge nodes in a hybrid cloud computing system, where each edge node has sufficient power facilities and indoor space for deployment of a collection of computers. These computers will for security and fault-tolerance be configured to only run local Dutkat communication, storage, and analysis software parsing locally produced data from the IoT devices outside. Consider such a configuration as implementing certain features of a digital version of a local fishing inspector, which will act as an algorithmic intermediary between the IoT devices on deck and the back-end centralized control centres on mainland.
The Dutkat software must be safe-guarded and stable for 24/7 operability, while at the same time ensuring compliance and non-invasion of the daily operations of the vessel crew. Particularly, AI analysis performed at the edge must be able to detect local anomalies and activities, and consequently persist the relevant ground truth data locally while sending relevant insights to the mainland operational centres for further analysis. Data persistence and analysis being performed locally aid in preserving the privacy of the vessel crew.
A hybrid architecture is needed, with edge nodes connected to centralized structures. The problem at hand is complex and requires input by more than just insights from a single fishing vessel. For instance, one algorithmic trigger that requires input from several edge nodes is the comparison of reported catch from different fishing vessels in the same offshore proximity. Anomalies can be detected through such comparisons, one example being vessels reporting disproportionate amounts of fish caught relative to other vessels in the same area and their allocated quota.
A main problem in this domain is connectivity, since digital communication between these mobile vessels and mainland operational centres is primarily facilitated by satellites. We conjecture that, by moving computations closer to the data sources, the amount of data needed to transmit over satellite links can be reduced to a practical level. This is enabled by edge computing where local data filtering, analysis, and storage can be carried out in real-time.
Evaluating and filtering data streams close to their sources are well-known concepts for scaling distributed systems producing large quantities of data [13]. By this upstream evaluation structuring approach, algorithms can parse and analyze entire streams of data on the vessels, without adhering to the limitations of low-bandwidth satellite links.

Vertical Distribution
The vertical dimension of the Dutkat architecture determines separation of concerns at the individual horizontal components. The relationship between computers running Dutkat software at back-end control centers and on the edge is illustrated in Figure 1. Overall, (1) a persistent storage layer is in the bottom, (2) followed by a data transfer layer, (3) a data consumption layer, and finally (4) a user interface layer. For edge deployments, the interface layer can be omitted, and IoT devices can interface with the storage system as data producers, as illustrated in Figure 1. The data storage layer will be further detailed in Section 3. Figure 1 shows the generic distribution of data and its relation to the software components in Dutkat. Specifically, the system is deployed to store and transmit multimedia data, like video and images. Edge nodes generate heterogeneous multimedia data, which can vary in content, type, and sensitivity level (i.e., the amount of private information contained in the data). For example, a collection of recordings from an edge video surveillance system may vary in sensitivity level if only a subset of the collection contains footage of people. Similarly, users responsible for generating data may have consented to different data sharing policies, while still contributing to the same dataset. At the same time, data consumers may have varying rights to view this data. In the scenarios presented in Section 1, it may vary what data local law enforcement, fishing crew, and other interested parties may consume. This results in a system of several edge nodes collaborating to produce a multimedia dataset, consumed by several nodes, both in cloud and edge deployments that differ in their rights to view various parts of the whole dataset. Privileges can be enforced throughout the dataset by applying fine-grained access control mechanisms on individual files in the set.  Our system facilitates real-time decision-making from land observers, based on data generated from edge devices. This is a process that involves transmitting as much meaningful data as possible from edge nodes to land nodes. This process is restricted by the low bandwidth that is expected in edge environments, such as when using satellite communication. Dutkat and the Dorvu file system are designed to support a model of decision-making in which the semantic meaning extracted from multimedia recordings is of highest priority to expend bandwidth on transmitting, and the ground truth data, i.e., the full-sized media files, is of less importance and may be retrieved eventually, when the network tolerates it. An example of the extraction of meaningful data from a larger multimedia file is shown in Figure 2. Here, the smallest significant data item extracted from a video file is the acknowledgement of the existence of a file, representing an event. Extracted metadata and still images from the video provide greater detail at the expense of more bandwidth. Finally, the original video file gives an overview of the entire event but is not available to all parties.  Figure 2. Example of a multimedia data pipeline transmitting information extracted from a video recording from an edge device, to two parties with differing privileges to view the data.

System Properties and Architecture
The overall architecture of our distributed file system is based on a central hub structure with a cluster of file servers connected with multiple mobile edge nodes, each with local file storage capacity. This resembles a client-server star network with clients on the edge perimeter and servers in the centre. Another resemblance is with the distributed file system Coda [14] with back-end server clusters supporting numerous light-weight personal computers. Of particular interest is the support Coda has for disconnected operations, a situation still plausible in an offshore environment.
The centralized server hub is resource-rich and can use existing public cloud file systems supporting efficient, reliable, and centralized storage of multimedia data, sensor data, and machine learning results from the edge nodes. The edge nodes are less resource rich and are psychically located on active fishing vessels. When along the coast and near the shore, communication options include cellular networks and radio networks, but when more distant and offshore, satellite communication is the main option. Novelty in our work is primarily related to the edge nodes, and how and what they communicate back to the mainland-located servers.

System Requirements
There are special application-specific demands that need software tailoring and customization. The file system is intended for use in an area with very limited computational and communication resources since its mobile edge clients consist of fishing vessels moving about in large ocean areas. Communication in such a widely geo-distributed mobile edge computing environment is through partially disconnected, low-bandwidth polar orbit satellite links. Since the fishing vessels in our system operate in the far north of the globe, the geo-stationary satellite solutions we previously utilized are not adequate as they do not cover the northernmost hemisphere [15]. Add to the complexity that this distributed file system must be scalable, secure, fault-tolerant, and compliant with particularly the EU General Data Protection Regulation (GDPR) privacy regulations [16].
Special properties of the select problem domain motivate the design of our system as follows. First, we need to be able to continuously capture and store video and sensor data for fish management, control, and forensics purposes. An example is a continuously captured surveillance video of equipment stored on the deck of a fishing vessel. This is a resource demanding file storage challenge, and the file system should be used to store video sequences when specific activities or events are detected, and apply access policies based on the contents of the events. Video data are in any case high-volume data, which is challenging to reliably transmit to mainland operational surveillance and control centers from the offshore mobile fishing vessels. The bandwidth delivered by satellite-based solutions is not adequate to support live video streaming, even more so for networks based on the low-frequency L-band.
Next, we need to provide redundancy for fail-safe storage of vital data, through data copies at multiple local disks. The replicas are physically distributed on board each vessel to reduce the probability of data corruption, loss, or unavailability. Redundancy and update techniques similar to the Google file system [17] are adopted, with a master control node typically administering three replicas updated in a pipeline fashion. The master node will raise a flag, i.e., transmit a signal to the mainland operational centre, if the replication threshold is below a certain level. Additionally, since the master node might be a single point of attack or failure, we provide primary-backup replication with a hot stand-by node ready to take over. Hot stand-by implies that a sequence of the latest data written to disk is kept in main memory. Consider this as a large ring buffer whose content will be streamed to disk upon fail-detection and fail-over. The Google file system and similar master-based distributed systems [18] do not provide such a replication due to complexity with consistency, impact on cost, and performance, the deployment in a trusted enterprise environment, and the observation that master node failures seldom happen. In our case, executing on the edge in a less trusted environment, we cannot tolerate a single node failure weakness in the critical data storage path.
The hot stand-by keeps a large enough sliding window of data to be backed up in case of failure so that a primary node failure will be transparently handled. Notice that inconsistency problems among the data storage replicas or master replicas are to a large extent avoided since data to be permanently stored on the edge nodes are tagged with its timestamp and is immutable. This way no read-write conflict will appear.

Data Classification
Fail-safe storage implies that there are very strict access policies affiliated with some of this data. We must therefore distinguish between and classify data according to different compliance, safety, and liveness properties. That is, we provide different guarantees with regard to data reliability, availability, privacy, and confidentiality based on how the data are classified. Data must be classified as either RAW, FILTERED, RESTRICTED, or SECRET. As will become apparent, this classification differs from traditional security classification schemes, as it supports implementation of non-functional aspects other than security. This classification and its various properties are explained in the following paragraphs.
Data tagged RAW contains the continuous stream of data produced by video cameras and sensor devices. Select crew members on board the vessel where the data are produced can gain access to this data in real-time. This can be through real-time streaming to display monitors, or it might be made available as a configuration option if some of these data are persisted on local disks. No encryption of the data is mandated, and only privacy-preserving aspects must be handled. This might involve the fact that vessel crew members grant consent to store and access this data, or that software applies masking of any personally identifiable characteristics.
FILTERED data consist of processed select RAW data that can be persisted to disk and/or used as input to local analysis applications. Such FILTERED data can for instance contain a video sequence with human activities, or sequences of video captured upon other sensory input. Depending on its content, it can be enforced that these data are modified before persisting to disk, in order to be used in analysis applications.
Data tagged RESTRICTED is not accessible by any of the crew on board the vessel and is intended for surveillance operations. These data contain specific results obtained from local edge analysis software processing either RAW or FILTERED data. This classification also indicates that stricter access policies need to be applied, since data might be annotated with additional information from analysis software and because it is expected to be transmitted over network to a central control center at some point. Examples of this type of data include output from edge located machine learning applications analyzing activities at the fishing vessels, select I-frames from specific surveillance videos, and other sensor data detecting for instance amount of fish caught, fish types, their average size, and relative distribution among species.
The purpose of this data classification is to provide context for central control centres, and extract semantic meaning from a larger data set generated at edge nodes. This serves two purposes: (1) reduce the amount of data transmitted, to support lower bandwidths, and (2) transmit as little data as required to provide meaning and perform forensics. This is to apply a principle of minimal privilege of access to parts of a live surveillance feed, as only the data deemed necessary to provide ground truth to some observation or detected event will be transmitted from the edge node.
The goal of providing these data overlaps with a goal of the overarching Dutkat system [5], which is to provide a probabilistic and evidence-based approach to inspection of fishery activities, and to shield crew members from being subjected to continuously transmitted surveillance.
SECRET data are a complete log of RAW data that is encrypted upon storage and persisted in a highly fault-tolerant manner. These data must be stored locally on the edge nodes due to its sheer volume, and access to it is mandated by very restricted access policies. Notably, SECRET implies that nobody on board the fishing vessel might access it, and the data are immutable and cannot be altered or deleted. These data can optionally be preprocessed upon storage, blurring out personal identification characteristics. The data can only be accessed and decrypted by a trusted third-party with legal, explicit authorization to do so. This can be a fishery inspector or other forensic parties inspecting a vessel with access permissions according to existing laws and regulations.
In general, we build this distributed file system adhering to the proportionality principle in a legal context striking a balance between human privacy rights and the claimed sustainability benefits of video and sensory surveillance of fishing waters [12]. Invading surveillance on a physically limited area as a trawler deck impossible for the vessel crew to avoid might violate privacy principles well grounded in constitutional, national, and international laws. One example of this is that people in a video sequence can be personally identified while working.

Implementation Details
The data plane described in Section 2.2 is implemented by the Dorvu filesystem, to achieve support of heterogeneous data formats and pre-existing tools for analysis and surveillance, through POSIX-compatibility. In this section, we detail some of the implementation details of our prototype.
Customization and adaptability are core aspects of the architecture, where the software aims to provide a basic layer of traditional data storage, with the possibility to interposition and add extra functionality modules between applications and the file system storage. We refer to the modules attached to a file as its meta-code, similar to the work done in [19]. This provides a means for transparently adding custom software modules in the critical path of data storage.
The version of Dorvu implemented for this paper includes (1) encryption as a module between the user and disk, (2) file versioning based on user access rights, and (3) an interface for a user to configure the encryption module and access control. Additionally, we investigate the performance overhead of this functionality and the userspace file system platform.
Interfacing with the Dorvu file system can be listed in three steps, beyond reading and writing as if to a local and un-encrypted file system: (1) Users can define access rights for their own files, indicating identities with public key signatures; (2) When a user creates a file, a corresponding configuration file is created by Dorvu. This defines the available versions of a file to the members of listed access groups; (3) When writing to a file, its file extension, referenced in its corresponding configuration file, decides what access control and encryption the file system will apply to the newly written data.

FUSE
Dorvu is implemented as a File System in Userspace (FUSE) application [20]. FUSE is a library and Linux kernel module that enables user-level programs to function as mountable file systems, by calling the FUSE kernel module via the FUSE userspace library. In short, a FUSE daemon can service Linux Virtual File System (VFS) calls despite running with userspace privileges. Dorvu is implemented with the Rust programming language, using the Fuser library (https://crates.io/crates/fuser, accessed on 12 October 2021) to interface with the FUSE kernel module. Fuser provides a userspace library that is implemented separately from the FUSE reference library libfuse.
Dorvu implements storage by mirroring the contents of a directory on a local file system. By using Dorvu while it is mounted to a local folder, the mirrored folder will be populated with internal files and encrypted data files. These files can only be read through Dorvu, or by means of manual decryption.

File Definitions
Dorvu handles three different types of files internally. Interfacing with Dorvu as a regular file system is done by creating, writing, and reading files. These files of arbitrary content are referred to as data files in the context of this implementation. Creating a data file automatically creates an auxiliary configuration file. A data file must always have a configuration file in order to be visible in a Dorvu directory listing. This file contains a JSON specification of the different available versions of a file (referred to as layers in the file), in addition to a path to the access group definition to use for the corresponding data file. The group definition file is the second type of auxiliary file used in Dorvu. Access group definitions in these files are listed in JSON and require a name and a list of public key SHA-256 signatures. Examples of these files are shown in Figure 3.  As manually maintaining configurations for every newly created file can be time consuming and increase the chance of human error, Dorvu includes the concept of configuration templates. A configuration template is a directory-wide configuration file that is applied to every newly created file. This feature is introduced for ease-of-use; the expected usage pattern of Dorvu is to set up file configurations before any automated data production begins. In that case, the expected access control will be applied to newly created files without the need for manual maintenance.

Encryption
With the encryption module implemented in Dorvu, files classified as SECRET are stored encrypted by default, using the OpenSSL implementation (https://crates.io/crates/openssl, accessed on 12 October 2021) of 128 bit AES-CBC. The AES key for a given file is encrypted with RSA once for each user with access to that file, using their 2048 bit public key. A base 64-encoded version of this encrypted AES key is stored in the file's config, as shown in Figure 3.

I/O Speed and Overhead Measurement
We want to gain insight into the potential overhead cost by adding Dorvu as an extra layer of indirection in the critical data path for disk access. This experiment is performed by measuring time taken for read and write operations on various storage back-ends. The test environments are chosen to provide information about the expected sources of performance overhead: the cost of encryption, the cost of file versioning and access control, and the cost of utilizing a FUSE-based file system rather than a kernel-integrated file system.
To observe the impact of encryption on read/write throughput, we deployed two configurations of Dorvu, one with encryption enabled and one with all encryption features disabled. To observe the costs associated with deploying a FUSE file system, we implemented a simple FUSE application that forwards all operations to an ext4 file system, labeled FUSE passthrough in our experiment. Our assumption is that the maximum possible throughput of Dorvu will be that of the FUSE passthrough application, and that throughput loss between the FUSE implementation and the ext4 storage back-ends will be outside of the control and scope of our implementation. The difference between the throughput of the encrypted and decrypted Dorvu configurations will indicate the cost of encryption, and the difference between the decrypted Dorvu configuration and the FUSE passthrough application will indicate the cost of file versioning and other metadata operations.

Experimental Setup
This experiment was performed on a desktop workstation with an AMD Ryzen 5 3600 6-Core processor running at 3.60 GHz, and a Kingston UV400 solid state drive storage device, running Ubuntu 18.04. 8 randomly generated files of varying sizes were read and written 10 times per storage environment. These file sizes are chosen due to our use-case of storing media files suited for low-bandwidth network transfer, while we acknowledge that system overhead and inefficiencies are more easily observed during longer operations with larger files. Because of this, later experiments described in Sections 5.2 and 5.3 utilize smaller files.

Results
The results from this experiment are shown in Figure 5. Results show that, for reading our largest files of 64 MB and 128 MB, encryption was the biggest source of overhead. For every other test case, however, the difference between the FUSE passthrough performance and unencrypted Dorvu performance indicate that file versioning and metadata operations are the biggest software bottlenecks in Dorvu. We theorize that this is particularly prevalent during file writes because these operations are split into smaller operations of individual page sizes of 4096 bytes on our test system, resulting in worse performance during writes than reads, relative to the baseline ext4 environment. We further hypothesize that, while performance penalties associated with encryption are expected, file versioning and metadata overhead observed in both read and write operations can be investigated through software profiling, and reduced by further optimization of the file system.
We observe that costs associated with utilizing a FUSE implementation are negligible in many of our observations, particularly during reads because the majority of overhead relative to the ext4 environment is visible in the Dorvu environment, and not in the FUSE passthrough application. It is worth noting that the measurements for both the FUSE and ext4 storage deviate by up to 30% between minimum and maximum observations, but their averages are within each other's standard deviation for every file size in the experiment.
The FUSE passthrough comparison measurement was also performed in [19], but was re-implemented for this experiment due to our adoption of the Rust programming language and its Fuser library. A more thorough examination of the performance implications of utilizing user-space file systems is given by Vangoor et al. [21]. They show that the throughput penalty of using FUSE over ext4 can be as low as 5%, but that certain workload characteristics can severely negatively impact this performance.
We conjecture that, for our use-case of optimizing for low-bandwidth transfer, utilizing a FUSE implementation does not significantly negatively impact the performance of Dorvu, while recognizing that a future use-case involving larger file sizes or different workload characteristics may change this outlook.

Satellite Latency and Bandwidth
Network communication between edge components in Dorvu and the Dutkat system will be provided by satellite broadband. Observations of the capabilities of the available satellite network are key to designing communication models and delegating tasks to edge and land components in the system. In addition to measuring the sustained average bandwidth provided by the network, we measure the transfer speed of individual files of specific sizes. This is both to emulate file system usage on the network, and to give an indication of the impact of network latency when transferring small amounts of data.

Experimental Setup
The experiment was performed by using the Linux curl command to download files from a test GitHub repository with files generated for the purpose of this experiment. The experiment is performed on the Iridium Certus 200 broadband satellite service, an L-band non-geostationary satellite network claiming global satellite coverage and download and upload speeds of up to 176 kilobits per second [22].
We connect to this network through a Thales Avionics VesseLink 200 broadband terminal, consisting of an antenna and router for maritime use [23]. The experiment was ran on an HP EliteDesk 800G6 workstation with the VesseLink providing its only connected network. The equipment is stationed at the University of Tromsø, Norway and was tested during cloudy weather conditions with drizzle, and each download was repeated five times. This number of downloads was chosen to adhere to restrictions on network resources.

Results
The purpose of this experiment is to observe the capabilities and limitations of the hardware and network available to our system when deployed in the targeted environment and weather conditions. The experiment is not intended as an exhaustive evaluation of the feasibility of satellite broadband, or the performance of this particular satellite service in broadly defined use-cases, but to explore various network conditions that our system must handle gracefully. These results will also be applied in the experiment described in Section 5.3.
The resulting measurements from this experiment are shown in Figures 6 and 7. Note that, in both of these figures, the y-axis follows a logarithmic scale, while the x-axis follows no particular scale, rather representing a set of files. The standard deviation on the y-axis is plotted as error bars.    Observed download speed when downloading files of various sizes over a satellite broadband connection. The y-axis represents download speed, measured in kilobits per second, of the files represented on the x-axis. Additionally, the theoretical maximum network bandwidth as defined by the system provider [22,23] is shown. Figures 6 and 7 indicate the same expected observation that the average download speed throughout the download process is lower when transmitting smaller files. We assume this is a result of the time of initiating the file transfer connection and cost of transferring metadata are proportionally more significant the smaller the payload. Some storage systems are designed to handle and distribute large amounts of small files specifically, to alleviate weaknesses of existing protocols in this use-case [24,25].

Results shown in both
It is observed in Figure 7 that the achieved download speed is considerably lower than the network maximum, which can be influenced by several factors, such as weather conditions, relative satellite location, and network traffic [26].

Machine Learning Workloads on the Edge vs. a Centralized Hub
We argue that analysis should be performed on the edge to preserve privacy. Additionally, based on our end-to-end satellite communication experiments in Section 5.2, we conjecture that transferring raw video data from the edge nodes to a centralized mainland hub has its performance limitations. We would therefore like to evaluate such a centralized system to see if it is feasible. The typical workload in our fishery use-case is activity recognition based on video data to determine whether e.g., discard of fish has occurred. Therefore, we will test if a centralized system could work based on an activity recognition workload.
In order to evaluate the throughput of a centralized system, we focus on the bitrate required to run inference on videos in real time and compare against the average bandwidth measured for the satellite connection. We send different levels of compressed video data over our satellite connection and identify the top-1 video-level accuracy of models trained on this data. We compress the video by reducing the resolution and/or reducing the frame rate. We aim to see how the compression affects the accuracy of the machine learning model chosen. If the accuracy decreases significantly from the base case (112 × 112, 30 fps), then it is not feasible to send data over the satellite connection and the inference should be performed locally.
If we use the best average results for bandwidth from Figure 7, we get an average bandwidth of ≈35 kbps over the satellite connection. Given that the video files, on average, are much smaller than 1 MB, this is a conservative estimate. The optimal bandwidth is taken from documentation sheets for the satellite router [22,23]. The bandwidth required to send data over the satellite connection should be lower than the average bandwidth measured.

Experimental Setup
In our experiments, we utilize a 18-layer R(2+1)D network, introduced by Tran et al. [27], to perform action recognition. The network was pretrained on the Kinetics-400 dataset [28] and then fine-tuned on the HMDB51 dataset [29]. The video data's resolution and frame rate are reduced to various degrees, while measuring required bandwidth. The network is fine-tuned over 50 epochs and the weights from the epoch which gave the best validation accuracy are kept. This network is then run on a test set giving the final accuracy in Figure 8.
We train on 16-frame clips, as was done during pre-training on the Kinetics-400 dataset. If the frame rate is too low, we repeat the last frame until we fill the tensor. The frames are consecutive and we apply temporal jittering while training. The video-level accuracy is calculated by taking the average prediction of 20 different clips from the same video, and then we choose the top-1 result. The bandwidth required for the different levels of compressed video data was calculated by taking the average bitrate of all compressed videos in the HMDB51 dataset.
We implemented the experiment using PyTorch [30], and the model was trained on an Nvidia RTX 2080 Ti. The model was imported from the torchvision module. The frames are extracted from the videos and are resized and combined into a tensor in the batch generator. The frames are augmented randomly using horizontal flips and affine translations before they are normalized according to the means and standard deviations of the Kinetics-400 dataset [31].

Results
We hypothesized that compressing data, in order to adhere to bandwidth restrictions, would lead to lower accuracy for the action recognition model. As we can observe in Figure 8, reducing the resolution results in a dramatic decrease in model performance.
Reducing the frame rate also decreases the accuracy, but not to the same degree. The highest possible accuracy we achieve that requires a bandwidth lower than the average bandwidth is at 43.81%, which is much lower than our highest accuracy at 69.3%. Assuming we want a high-performing model, with results as close to state-of-the-art as possible (see Figure 8), we conclude that performing inference on a centralized hub is infeasible. The reason for the discrepancy in our highest accuracy and the accuracy documented in [27] might be due to numerous factors, such as training time, different augmentation schemes, learning rate scheduling, etc.  Based on our experiments, upstream evaluation is a more realizable design option for our application scenario [13]. As our system should support real-time monitoring, we will choose the evaluation scheme and inference location based on the capabilities of transferring results from edge nodes in real time. Hence, data should be analyzed close to its source with inference on video data performed on the edge, on board the fishing vessel where the video camera is located. This design choice also complies with the privacy-preserving design of the system, with the edge nodes performing privacycritical operations.

Privacy-Preserving Surveillance
A surveillance system with built-in video processing and access control based on video analysis was presented in IBM's PrivacyCam [32]. This surveillance system provides cameras with the capability to re-render an input video stream with features such as persons or objects removed. Unedited output streams can be provided for authorized users. Various techniques were applied in similar surveillance systems, for instance by removing distinctive facial features [33], encrypting faces [34], or obscuring people based on specific visual markers [35].

File Systems
Numerous systems provide encryption as a transparent file system feature or as software on top of a traditional file system. Cryptfs [36] and its successor eCryptfs [37] are cryptographic file systems included in the Linux kernel that provide encryption to files located in local or remote file systems, by storing cryptographic metadata in headers of individual files. Similarly, software like TrueCrypt [38], VeraCrypt [38] and Apple FileVault provide a decrypted view of an encrypted directory mounted elsewhere in the file system.
Several projects provide cryptographic file systems implemented in FUSE. EncFS [39] runs in userspace and mounts to a directory in a local file system and encrypts all data written, storing encrypted versions of these files in a separate location. Gocryptfs [40] is a similar project implemented in the Go programming language with the Go-FUSE kernel bindings library. SecureFS [41] is a C++ FUSE project that aims to provide similar features as EncFs and Gocryptfs to multiple operating systems. Common for these FUSE file systems and Dorvu is that they are implemented as an overlay file system, providing a layer of indirection before writing to a separate local or remote file system. The design of generalized layered file systems and the technology that enables them on various platforms are reviewed and discussed by Zadok et al. [42].

Extensibility
Architecting extensible software in the offshore domain resembles how we structured our StormCast system [15,43], which further motivated the early mobile agent system TACOMA [44] built for shipping code and state around in a network for remote installation. Our current work utilizes the meta-code concept [19] for extending and customizing remote nodes where remote software can be configured with mobile code.
Our previous Balava file system [45] was built with FUSE and meta-code for managing computations that coupled multiple public clouds together transparently, and involved data with confidentiality constraints. Meta-code as a structuring toolkit is used as in Dorvu, but not in a weakly connected, mobile edge environment. Meta-code is used in Balava for transparently gluing together a hybrid cloud system that interconnects private environments with public clouds such as Microsoft Azure and Amazon Web Services.

Data Transmission
To avoid transmitting irrelevant and redundant data over the bandwidth-limited links from the remote edge devices to the central cloud-based servers, we aim to apply several data reduction mechanisms. By performing most of the analysis locally, transmission of large amounts of data can be reduced. This is especially important for bandwidthhungry data types like images and videos. Multiple approaches have been explored for reducing the amount of data generated and for reducing data transmission. For instance, Gurrin et al. [46] propose a system that detects action in images and keeps only images where action is detected. Ji et al. [47] extract features from both the spatial and the temporal dimensions by performing 3D convolutions, thereby capturing the motion information encoded in multiple adjacent frames. Such approaches are used to reduce data, both for storage, transmission, and later analysis. Further reductions can be achieved reducing image or frame dimensions and sizes without losing important information [48], and analyzing the trade-offs between better quantization and reducing the frame rate [49].
Compression of video data using machine learning is also something we investigated and compared against in Section 5.3. Related approaches include Nvidia Maxine [50,51], a recently developed tool for massive compression of video data for video conferences. This application domain involves videos of faces with typically static backgrounds. It is challenging to apply this approach to our application domain, with video data depicting general activities because it requires large amounts of data to train an equivalent generative adversarial network. Similar video analysis must be performed for privacy, e.g., avoiding to show faces or objects that should for some reason be protected. For example, Fitwi et al. [52] describe a system for masking private information in video frames from surveillance cameras by doing detection and filtering on the edge. Moreover, D'souza et al. [53] describe a similar system that uses object detection for surveillance camera video streams, and whitelists classes of objects that should not be censored. Thus, such approaches will both reduce bandwidth, but also provide support for privacy preservation. Neto et al. [54] describe an edge-based system for smart city applications. They describe a system for real-time processing of data that preserves privacy that also utilizes a workload balancer to balance tasks across multiple edge nodes. However, this workload distribution is not applicable for our application, since edge nodes are expected to be physically distant from each other.

Centralized Data Analysis
We have proposed that the desired rate of data production in our system is larger than the targeted satellite communication link can transfer in real time. Despite bandwidth restrictions, it can still be advantageous to analyze data from multiple nodes and sensors, and potentially in combination with additional data collected from third-parties, such as sales notes and weather data.
Multimodal analysis of data are usually leading to better and more accurate results as recent work shows but comes with additional costs regarding the hardware needed [55,56]. Especially, ensembles of experts models work well with multimodal data streams and complex task analysis [57,58] which makes them a good alternative for the presented use case. For future work, we can use pre-analyzed streams of data that will act as input to an expert ensemble model in which each of the expert sub-networks will focus on learning the specific patterns of that particular data stream.

Conclusions
We are developing a geo-distributed, loosely coupled AI system for surveillance of fishing activities in the Arctic Ocean. The development and deployment of this system come with several challenges, due to the nature of the data produced and the targeted edge environment. For example, continuous production of multimedia data requires privacy compliance and fault-tolerance, while the bandwidth of edge networks hinders data transmission and real-time monitoring from non-edge components in the system. We observe that our available satellite broadband networks are not suitable for real-time video transmission for activity recognition, and we propose a system for analysis and data storage on the edge to facilitate this.
We have presented details of a prototype of Dorvu, a geo-distributed file system with support for fine-grained access control policies and software modules. Our prototype demonstrates an implementation of encryption and file versioning based on access rights, and we outline the expected I/O overhead of encryption and metadata operations associated with these capabilities. The deployed version of this system will be spanning edge nodes on fishing vessels out at sea connected with mainland centralized file servers, in order to utilize a combination of data filtering, analysis, and access control, to serve as a privacy-preserving alternative to manual video surveillance. Data Availability Statement: Not Applicable, the study does not report any data.