Fully Homomorphically Encrypted Deep Learning as a Service

Fully Homomorphic Encryption (FHE) is a relatively recent advancement in the field of privacy-preserving technologies. FHE allows for the arbitrary depth computation of both addition and multiplication, and thus the application of abelian/polynomial equations, like those found in deep learning algorithms. This project investigates, derives, and proves how FHE with deep learning can be used at scale, with relatively low time complexity, the problems that such a system incurs, and mitigations/solutions for such problems. In addition, we discuss how this could have an impact on the future of data privacy and how it can enable data sharing across various actors in the agri-food supply chain, hence allowing the development of machine learning-based systems. Finally, we find that although FHE incurs a high spatial complexity cost, the time complexity is within expected reasonable bounds, while allowing for absolutely private predictions to be made, in our case for milk yield prediction.


I. INTRODUCTION
T RUST, data quality, data quantity, and data integrity are critical ingredients required for the successful application of deep learning: • Trust; is necessary to access data in the first instancewithout trust there is usually a rightful unwillingness to collaborate and subsequently share data, unless the data/ collaboration is very insensitive / cannot be harmful, or purposefully open for some other reason.• Data Quality; good quality data is necessary to train, and infer, if it does not have internal consistency / regularity and relation to some output then it can be difficult or impossible to seek to use this data for prediction of this output.• Data Quantity; there is some threshold of which anything less than this amount of data even under perfect use simply does not hold enough information to properly train or to be able to make a reasonable inference.• Data Integrity; both that there exists a lack of tampering, and that the data is properly representative of the scenario it tries to predict are important.The lack of the former would leave room for potential malicious actors to disrupt predictive algorithms, toward some damage or harm.The latter would make any successfully trained neural network G. Onoufriou was with the Department of Computing Science, University of Aberdeen, Aberdeen AB243UE UK and now is with the School of Computer Science, University of Lincoln, Lincoln, LN67TS, UK, email:gonoufriou@lincoln.ac.ukP. Mayfield is with the Scotland's Rural College and SAC Consulting, email: Paul.Mayfield@sac.co.ukG. Leontidis is with the Department of Computing Science, University of Aberdeen, Aberdeen, AB243UE, UK, email:georgios.leontidis@abdn.ac.uk useless in real scenarios, because the distribution it has learned is not properly representative of the ground truth.
We seek to help solve some of these problems, as far as is reasonably possible, by the use, application, and evaluation of fully homomorophically encrypted deep learning at scale: • Trust; if there is no requirement for trust, being that the data is undecryptable, and the whole process is an auditable open kerckoffian procedure in both encryption and processing, then this barrier to collaboration is removed or at-least significantly mitigated.• Data Quality; unfortunately encrypting data in this manner before it can be analyzed and properly engineered by a potentially more specialized entity will naturally mean the data quality cannot be assured during processing, since there is no means by which to verify it.However given that other barriers are being diminished it is possible that highly detailed sensitive data could now become available.• Data Quantity; data that would otherwise be too litigatious or sensitive to use could now become readily available in a fully homomorphically encrypted form, significantly increasing the quantity of usable data toward better trained although single purpose/ bespoke models.• Data Integrity; similarly to data quality, the relevance of the data to the underlying ground truth cannot be assessed, but there is some protection towards tampering, being that any encrypted cyphertext that did not originate from the data owner is not decryptable by that owners keys, nor processable by their public, reliniarisation, etc keys, halting the process immediately.
Especially if this can be accomplished at some scale, we believe reducing these barriers by implementing fully homomorphically encrypted deep learning as a service (EDLaaS) represents a large stride towards a more private, and secure transaction between data owners and data processors.EDLaaS would also mean a large change in the development, training, and general principles of deep learning towards a much more sustainable future for this science, in which privacy requirements are satisfied, and fears from risk are significantly reduced if not nullified.
An example, where EDLaaS could be a benefit to individual users, is that of home assistants, such as Google home, and Alexa.Given that the end devices could each individually house their own private keys, user voice could be fully homomorphically encrypted, and still used by the back-end service to divine the instructions, which are to be executed by the same end device; thus protecting the consumer from data privacy worries, and the operator from GDPR [1] concerns in both transit and operational-use, since this never requires decryption outside of the home or is indeed indecryptable by the back-end system.
Another example, however this time for agriculture/ industry, in which we have actively explored and will discuss it further as part of this paper, is that of a data processor for milk yield prediction.In many industries there are large concerns over potential data leaks over perceived or real sensitivity latent in their data such as genetics / breeding, or feed composition in the case of the milk industry.In addition, concerns around food traceability and safety, along with how information can be safeguarded against malicious input are very important in the agri-food sector [2].Moreover, considering that data in such industries is a very valuable intellectual property, stakeholders are hesitant to share their data, even when they are seeing some benefits in doing so.There is too much at stake for them, therefore solutions that could enable data sharing or alternatively sharing encrypted data that can be used to develop machine learning applications, would be a game changer for the sector [3].To test and evaluate our implementation we were provided with the last 30 years of breeding, feeding, and milk yields data, by the Langhill Dairy herd based at the SRUC Dairy Innovation Centre, Dumfries.
For the sake of clarity, FHE [4] or more specifically the CKKS [5], [6] implementation already exist as a technology, therefore we do not seek to prove that it is secure even though we believe that it currently is; instead our contribution is the derivation of a method along with an application that show how it can be used not just in laboratory conditions but also in production like environments to function as a means to conduct encrypted deep learning as a service, the penalties we incur when adopting such a technology, and our solutions to other problems along the way.This way we can help bring FHE out of emerging/ proof of concept status, by doing much of the hard work needed to use it at scale.

A. Motivation
Neural networks (NNs) are an ML algorithm that can be used in various settings and with many types of data, from images and time series to cloud points and fourier-transformed data.NNs in the context of agri-food have already been used across a number of settings, e.g.yield forecasting [7]- [9], crop and fruit detection [10], pest detection [11], etc.To train (deep) neural networks and exploit their full potential we require more and more data.There are concerns on behalf of the data owners, specifically on the sensitivity of their data, and its (mis)use.Their sensitivity creates a reluctance to share, especially if the collaboration is new, as there is a lack of trust and issues around background and foreground IP.Thus if we want to create more and new collaborations in order to enable net-zero transition, enhance environmental sustainability and improve productivity, it is necessary to build up this trust or create a system, where they do not need to trust the data processor as it will work with encrypted data.This lends itself to fully homomorphically encrypted data, as they no longer need to trust the data processor, their data cannot be decrypted, read, or leaked, but it can still be used for computation to produce effective predictions for the data owners.
Stemming from all the above our motivation has been to investigate how FHE can be used to enable the exchange of encrypted data in an agri-food setting and enable the use of ML as part of the pipeline via the use of edge devices and virtual machines (VMs).FHE as part of an ML pipeline is still in its infancy [12], which means that in contrast with ML methods that work with non-encrypted data, no off-theshelf approaches exist that can be routinely used, making their adoption a hard process, hence still considered an emerging and possibly disruptive technology.
Finally, the aim of this paper is to test the feasibility of using FHE on dairy milk data.In simple terms, applying FHE to a set of data enables useful operations to be conducted on encrypted values without decrypting them first.Both the input and output data remain encrypted (see figure 1).Homomorphic encryption solves a vulnerability inherent in all other approaches to data protection.Currently, traditional public key encryption requires that data be decrypted before it can be analyzed or manipulated, exposing the data to security and manipulation threats while in the decrypted state.FHE allows any data to remain encrypted while it's being processed, eliminating the need for decryption and providing another layer of security to the data.Such a tool can be used to develop trust in data sharing between businesses, practitioners and research organisations in the food and drink sector.

II. FULLY HOMOMORPHIC ENCRYPTION
Fully Homomorphic Encryption (FHE) is a structurepreserving encryption transformation [13] first appearing in 2009 [4], and having several advancements since to improve its efficiency, and speed.[13] FHE largely depends on commutative algebra, in particular modeling the ring learning with errors (RLWE) problem.Commutative rings are sets in which it is possible to add, subtract (via the additive inverse), and multiply, and still result in a member of the set [13] .For more in depth detail about rings, fields and the associated axioms that must be met by any deep learning algorithm please see II-A However, in summary the primary consequence to not being a field is the lack of divisibility, since we do not have access to the multiplicative-inverse, whereas a field can always guarantee the additive inverse, meaning we can still subtract by addition of a negative.The lack of division will undoubtedly cause issue with things such as activation functions if we were to use a sigmoid function ( 11+e −x ) meaning we should use a different function or approximate.One such possibility is a Taylor expansion series which closely approximates sigmoid, although there are a number of alternative methods proposed [14], [15].FHE/RLWE has recently been paired with deep learning with success, causing some movement toward FHE in literature but primarily for convolutional neural networks (CNNs).There is a gap in that homomorphic encryption has not been applied to more complex and up to date methods, nor real world problems with their own levels of added considerations, which is reinforced by the aforementioned Royal Societies statement that FHE is still a proof of concept technology [12].The majority of research in privacy-preserving neural networks is with CNNs applied to the MNIST dataset [16]- [18].The key aspect focused in these papers is the need to use different FHE compatible equivalents for things such as the activation functions, e.g.sigmoid approximation.This is because depending on the FHE implementation, only addition and multiplication may be applied to the cipher-text and still maintain its usability.In these same papers they suggest that FHE is 4-5 times slower to train and infer than the unencrypted data, so this is also something that should be sought for improvement.

A. Commutative Rings Formalisation
Commutative rings are sets in which it is possible to add, subtract (via the additive inverse), and multiply, and still result in a member of the set.This includes the sets: • Z; integers, E.G.: (−1, 0, 1, 2, ...) Formally: An integer is any number that has no fractional part (not a decimal).• Q; rational numbers, E.G.: (5, 1.75, 0.001, −0.1, ...) = ( 5 1 , 7 4 , 1 1000 , −1 10 , ...) Formally; a rational number is a number that can be in the fractional form a b where a and b are integers and b is non-zero.
This does not include the sets: • I; imaginary numbers, E.G where : i = √ −1, (i, −i, 39.8i, ...) Formally: Imaginary numbers are any numbers which are multiplied by the imaginary unit i. R for (commutative) ring shall henceforth be one of the four sets Z, Q, R, C. In contrast a field (F) is any commutative ring (R) which may also perform division and still result in elements from that ring.This includes only the sets Q, R, C as not all elements in the set of integers (Z) can be divided by another integer and still result in an integer [19].These rings are used through polynomial expressions instead of discreet matrices in the learning with errors (LWE) thus ring learning with errors (RLWE).For formalization if all of the following axioms are fulfilled then the resulting set is called a field: addition axioms; given : (x, y, z ∈ R), then : multiplication axioms; given : (x, y, z ∈ R), then : multiplicative additive axioms; given : (x, y, z ∈ R), then : If all but multiplicative-inverse then this is a commutative ring with 1, if this also does not fulfil multiplicative-unity then this is just a commutative ring [19].To evaluate FHEs applicability as an EDLaaS we needed to create and mimic as closely as possible what we expect to be a standard industrial use case for third party data processing, which we can evaluate the effect of FHE on, along with evaluating FHE's time and spatial complexity itself.To this end we devised a two part client server system depicted in Figure 1.

III. METHODOLOGY
Towards the end of creating this evaluable pipeline we had to overcome a few shortfalls we found at the time that prevented FHE to be integrated into an EDLaaS scenario/ pipeline.
• Fully Homomorphic encryption itself, specifically CKKS [5], [6], and adapting it to be usable at some scale.• Combining FHE with deep learning, which had only been peripherally explored at this point.

A. Data Pipeline
Broadly our data pipeline can be abstracted into a few different categories, necessary to test and evaluate FHE at some scale, and in a practical manner to garner real results; Data source where data is wrangled and encrypted, and data sink where the data is processed on much more powerful and fully featured machines.
1) Data Wrangling: For our study we used data on dairy herds over the last 30 years provided by the Langhill Dairy herd based at the SRUC Dairy Innovation Centre Dumfries.We use this data as ground truth with which to encrypt and infer on using time series neural networks, in our case a one dimensional convolutional neural network (1D CNN).We normalised this data between the range of 0-1, one hot encoded categorical features, and used the historic feeding, genetics, and subsequent milk yield as examples in time leading up to the current milk yield prediction.This is fundamentally the same as any other data wrangling where data is prepared for processing by neural network, with the sole exception that the data is then encrypted, meaning this is the final form of the data, and cannot be changed before training, but can of course be iteratively adapted if it does not provide the best results by simply feeding more but differently wrangled encrypted data.Since there is usually a significant number of empty slots in the encryption vector, it may be possible to optimize further by merging multiple examples into a single encrypted vector.However, this emptiness allows for a lot of variance in wrangling techniques without the need to create whole new neural network architectures.
2) Client/ Data Source: The data source, usually a small embedded device (in our case, NVIDIA jetson nano), is responsible for data wrangling and encryption, since once encrypted the data can no longer be seen, and cannot be verified; thus the need to transform pre-encryption.The data source must be the encryptor so that they are the only entity with a private key with which to decrypt the data again.
Normally, the data owner cannot be expected to be familiar with FHE, deep learning, and thus the requirements of the data to be properly processable.It is necessary that some form of interaction/ awareness of the data occurs such that appropriate auditable/ open-source data processors can be provided.In the ideal scenario this would not be necessary and the data owner would be capable of wrangling the data according to their needs, but it should be noted this is an unlikely occurrence given there is more data sources than expertise, and the existence of expertise reduces the likelihood in the need for an EDLaaS.However given client expertise or atleast proficiency in data cleaning, and use of open-source helpers and documentation, then no embedded device would be necessary, and the client can instead submit their cyphertext directly for processing.
Data from the client is serialised by the embedded device after encryption ready for transmission to the data processor, without the presence of the private key, ensuring the transmitted serialised cyphertext is undecryptable during processing in the later stages.These keys should instead remain indexed on the client machine/ embedded device.
3) Server/ Data Processor: Data from the data source is serialised and transmitted using standard https requests, to model how it would likely function in such an Internet service.The data sink then proceeds to deserialise and apply the arbitrary computation, in our case a neural network, and then serialises and transmits the still encrypted but now transformed data to the data source for final decryption and use.An example encrypted output can be seen in the following figure 2 showing cyphertexts, the associated keys, the data set name that it is associated with, who owns this data, and when it was submitted.In practice, we would filter unnecessary information, like the plain parameters, to reduce space and time costs of storing and transmitting this data, hence improving speed; and of course we would not handle the private key, which is shown here for experimental purposes only.
We can process data as in figure 2 using our library described in III-C and with our techniques outlined in III-D.Once processed to minimise space consumption, we swap all the way down the remaining coefficient chain, to create the smallest possible, and most quickly deciphered cyphertext, thus saving space and time while the data is stored until the data owner decrypts/ uses the results.Fig. 2. Serialised representation of encrypted data using CKKS scheme, and including all private, relin, and public keys, where objects here are byte arrays.

B. Interface
For ease of use and testing we created a web app interface to be used by the data owner to simplify the process of submitting data to the embedded client device for encryption and transformation, while also serving the purpose of user authentication (figure 3), as a main dashboard that prompts the user to upload the data to be encrypted (figure 4), and also an opportunity to view the encrypted data (figure 5).The training process, given the complexity involved, can run either on the jetson device or a remote host, e.g.high-performance computing.The user may also select what type of 1D CNN to use to provide them predictions.As far as our techniques are concerned, they can run in both an edge and a non-edge device, depending on the scalability of the problem and other constraints related to the amounts of data, training time, etc.

C. Fully Homomorphic Encryption Library
At the time in the community there was little work available to easily use fully homomorphic encryption in conjunction with deep learning frameworks, and of what was available they often did not support the more complicated Cheon,  Kim, Kim and Song (CKKS) [5], [6] FHE scheme, and its serialisation, or harbor some hidden catches.The CKKS scheme can operate on floating precision numbers which is critically necessary since normalised neural networks operate on floating point numbers usually in the range 0-1.The base library we used after quite some deliberation was the Microsoft Simple Encrypted Arithmetic Library (MS-SEAL) since it supports CKKS and some form of serialisation necessary to broadcast over the internet.However this did not solve the incompatibility with both GPU compute and deep learning libraries which MS-SEAL did not have.As such we created our own open-source abstraction library that would allow us to use python and python libraries to speed up our research and development of our system, at some scale along with using it in conjunction with web servers easily.Secondly MS-SEAL does not support bootstrapping yet, meaning there is a limit to the number of computations we can process, but since we were intending to stay within this limit for the sake of noise budgeting.
Most deep learning applications, and the associated plethora of libraries primarily use the programming language python.MS-SEAL is written in C++, C#, but not python.Thus if we intend to use MS-SEAL in conjunction with other machine learning libraries it was a requirement to create bindings from the C++ implementation to python.Luckily there existed many early attempts to create MS-SEAL python bindings, most of which were using already outdated versions of MS-SEAL that did not have good serialisation support.We found a few likely candidates in the community that were using current versions which we collaborated with and extended their implementation to form our first basic serialisable version of MS-SEAL bound to python.[20] These bindings however would need some further abstraction in python to make them usable at scale, which lead to a further effort creating our ReSeal library, which includes all the serialisation logic, and abstractions necessary to make MS-SEAL easily usable in conjunction with larger frameworks.Our ReSeal implementation is open source (OSLv3 Licenced), and freely available.[21] We would like to in future, improve on some aspects of serialisation, and to make installation of ReSeal easier, and this is something we intend to iterate, and improve on over time.

CNN Convolution (Cross correlation) Sigmoid Approximation
Bias Fig. 6.The computational graph of our encrypted 1D CNN, displaying the computational steps and gates necessary to calculate a given output y from input x.This also abstracts these computational steps into groups or modules such as the sigmoid approximation module.
Deep learning can broadly be abstracted into three stages, namely: Forward pass, backward pass, and the weight update; Forward pass propagates some input against the weights and internal activation functions of the whole network to produce some form of prediction thus from left to right in the computational graph 6. Backward pass calculates the effect of all weights and biases on the final result by differentiation and the chain rule thus operating from right to left in the computational graph 6.The weight update takes these weights and adjusts them given the loss (a measure of wrongness), and the gradient of the weight in question to approach a lower loss, usually using the gradient descent algorithm.
Fully homomorphic encryption requires that certain constraints be maintained such as the inability to compute division, thus we describe our process as well as how we overcome these obstacles as follows.
1) Forward Pass: As briefly described previously, the forward pass takes some input x, applies some transformation according to the internal weights of the neural network and outputs some prediction (6).However, depending on the neural network in question, these transformations are usually not FHE compatible, or are not performant under FHE.An example of FHE incompatibility is most activation functions, e.g ReLU, and sigmoid, which require operations such as max and division that are impossible, requiring context and a non abelian operation respectively.To overcome this, we found in literature approximations for -in particular -sigmoid 5, which uses polynomials to overcome the barrier of the divisions in the standard sigmoid 4: where: where: This approximation closely follows the standard sigmoid between the ranges of -5 and 5, which is more than sufficient for our purposes since, when normalised most data, weights, and subsequently activations will likely fall in the range 0-1.
This approximation in question, proposed by Chen ([22]) 5, is used interchangeably with sigmoid in our equations, thus our neural network equation 6 will stay relatively normal aside from the use of time t as a 1D (time series) CNN.
where: σ = sigmoid/ sigmoid approximation x = some input vector x e = eulers number The specific neural network we used for this forward pass was a 1D convolutional neural network (1D CNN), where we substituted space for time, thus our activation equation becomes what is depicted by equation 6.We used a 1D CNN over traditional time series neural networks despite having a time series dataset as not only have 1D CNNs been shown to be good time series predictors that are more parrelelisable, but the nature of a CNN means computations are wider rather than deeper.In practice, this means that less expensive operations such as bootstrapping are necessary, since after a few computations deep it is necessary to bootstrap and shrink the ciphertext, thus improving the overall time efficiency of the resulting computational graph.Our CNN is of shape (timesteps, 1, polynomial modulus degree / 2) since the process of encryption in the CKKS scheme utilises a set number of slots based on the polynomial modulus degree used.This means, with the exception of the total length of the encrypted vector, the absolute number of slots populated can not be assessed, and any unpopulated slots are padded with 0s prior to encryption.
2) Backward Pass: The backward pass, as touched on before, is the process of calculating the gradients of the weights and biases with respect to the output.However to do this you require some cached input which is used to derive the differentials.An example of this dependency of input could be sigmoid (4) which to calculate the gradient requires x the encrypted input (7).
where: σ = sigmoid / sigmoid approximation x = some input vector x df dσ = Differential with respect to sigmoid Since, to calculate the sigmoids gradient, x ( 7) is required, this means that either this gradient becomes encrypted in the process of using the currently encrypted x, or that this can only be calculated in plaintext.For now, we choose to calculate gradients in plaintext, as encrypting the gradients would inevitably mean encrypting the new weights, which would mean every operation would become between two ciphertexts, adding substantial time complexity (an order of magnitude), when unnecessary for our purposes, and would function to limit the neural network to that one specific key from that one specific data owner.While this may be desirable in certain bespoke situations, generally, as a data processor however, this is largely undesirable since it is necessary to serve multiple data owners/ sources simultaneously.It may be instead possible to have generic models that can then be privately tailored to the specific data in question.For these bespoke models this could potentially be represented somewhat like a graph/ digraph for each individual weight with respect to the generic models, such that if in future the data owner allows these weights to be decrypted, that they can be retroactively used to update other pre-existing models stemming from the same generic model/ node.
3) Weight Update: The weight update using gradient descent simply moves the weights in whichever direction approaches a lower/ minimum loss.They do this according to some optimisation function.Equation 8 represents a simple optimisation function we used to test weight update functionality with FHE to approach a lower loss.
where: In the work presented in this paper, we created a two part client and server system to facilitate EDLaaS at scale, just like any other platform as a service.We also created our own libraries which have consequences on the computational speed of the whole pipeline.Table I represents the computational complexity we achieved against an arbitrary dataset, in our case milk yield prediction using a 1D convolutional neural network as outlined in III-D.These results are averages of examples and their time of execution, including in the case of the remote examples the transmission time.For the sake of consistency, all results presented were obtained on the same local area network (LAN) to prevent the effect of otherwise uncontrollable conditions and traffic over the wider area network (WAN).While this is still a fairly small scale relative to production settings, our use of containers can easily be expanded upon such as with kubernetes, apache mesos, or docker swarm, allowing for scaling up and down.However the most difficult task of handling these requests in a scalable, encrypted manner has been tackled, paving the way for such expansions.It may also be noted in table I that some fields are left blank.These blank fields in the remote column are remote operations left unimplemented as they are too low level operations to be worth the overhead cost of transmission, also considering that simple operations alone, such as ciphertext + ciphertext, are not in of themselves a form of deep learning, and would not have been worthy candidates to implement, taking time away from more critical research.
It can be seen in I that local time complexity effectively represents the efficiency of our rebinding and abstraction implementation of MS-SEAL, whereas the remote results represent the time complexity of the pipeline overall.That is to say remote encryption, remote decryption, and remote inference.One thing not shown here is time complexity of training, as this is an ongoing area of work for us, where significant decisions must be made about the neural networks as described in III-D2.Along with how best to deal with and integrate bespoke models for problems where the client/ data owner must maintain absolute, uncompromising privacy, thus not allowing any backpropagation to occur.
Space complexity as shown in table II however shows how much larger an encrypted cyphertext is relative to its unencrypted counterpart.in the case where the polynomial modulus degree is 16384 and thus the length of the array is half of the poly-mod-degree at 8192, the plaintext numpy array is only 0.0656MB, compared to 9.60MB if it was to be encrypted.However this is the total size of the cyphertext, including all the other required information necessary to store with it, namely its private key, and at the highest point in the modulus switching chain, which is only necessary before computation begins.Thus there will be some gains once unnecessary information to the data processor is stripped, and the data has been computed thus approaching the end of its modulus switching chain, which produces a smaller cyphertext as a side effect, while also being the result of several other cyphertexts combined into a single prediction.This space complexity is likely easily optimised as this size is a result of some difficult serialisation logic necessary when going from MS-SEAL in C++ to ReSeal in python, and is likely an area where easy gains can be garnered in future.
V. CONCLUSION FHE is possibly one of the technologies that in the era of privacy will receive even further attention in the next few years, in particular as a component of machine learning applications.In this paper, We have found that FHE can be successfully applied to deep learning at scale to create Encrypted Deep Learning as a Service.We have also found the time complexity increase to be within acceptable bounds already, despite there being many areas where improvement can be had.In addition, We have conceived and implemented an open-source collection of software to facilitate EDLaaS, which we continue to improve upon.During our developments, we have overcome a plethora of difficulties with regards to combining deep learning with FHE, which we have discussed here in detail along with our solutions/ mitigations.Finally, we have outlined a few areas where special consideration is required in the future, such as bespoke models with encrypted weights.

Fig. 1 .
Fig.1.Pipeline that demonstrates the key stages of our project's pipeline, from the client and raw data (upper left) to the data processing and analytics (lower right).

Fig. 5 .
Fig. 5. Minimal data preview, so users can assess what data exists, by date and given data set name for reference.