An Architectural Based Framework for the Distributed Collection, Analysis and Query from Inhomogeneous Time Series Data Sets and Wearables for Biofeedback Applications

: The increasing professionalism of sports persons and desire of consumers to imitate this has led to an increased metriﬁcation of sport. This has been driven in no small part by the widespread availability of comparatively cheap assessment technologies and, more recently, wearable technologies. Historically, whilst these have produced large data sets, often only the most rudimentary analysis has taken place (Wisbey et al in: “Quantifying movement demands of AFL football using GPS tracking”). This paucity of analysis is due in no small part to the challenges of analysing large sets of data that are often from disparate data sources to glean useful key performance indicators, which has been a largely a labour intensive process. This paper presents a framework that can be cloud based for the gathering, storing and algorithmic interpretation of large and inhomogeneous time series data sets. The framework is architecture based and technology agnostic in the data sources it can gather, and presents a model for multi set analysis for inter- and intra- devices and individual subject matter. A sample implementation demonstrates the utility of the framework for sports performance data collected from distributed inertial sensors in the sport of swimming.


Introduction
The sports community is an acknowledged early adopter of technologies for the quantification of movement demands of athletes such as inertial sensors [1].Typically large datasets are produced by such devices.However until recently, it was often the case that only basic analysis was performed [2].This data has often been used to augment typical methods of data capture in specialist sports laboratories that often are restrictive in allowing activities to be assessed-for example, reduction in range of motion of joints by equipment cabling, or can only capture small volumes of data such as limited gait kinematics by a 3D motion analysis system capture volume.This wearable technology adoption is driven by the desire to improve sporting performance outcomes through technique, training, performance analysis and injury prevention and mitigation through workload assessment [3].To accomplish this, the sports community use a range of technologies for movement analysis [4] that typically produce time series data.These include 3D motion capture systems, 2D video, force plates, heart rate and electromyography data, and, more recently, ambulatory body worn sensors (such as inertial sensors) and even smart phones [5].These have been combined in applications as diverse as gait studies [6], swimming monitoring [7] and bowling action in cricket [8].As stand alones, these instruments have proven to be able to provide effective feedback for interventions to improve sporting outcomes.However, with the rise of connectedness, there is a desire and need to integrate these more efficiently than through labour intensive methods [9].
The value of feedback of performance and biomechanics data to both coaches and athletes is fundamental to the learning process.This information can come from sporting periods such as: intra session, inter session, seasonal, years, and multiple athletes/teams as an aid to learning [10].To gain competitive advantages, individuals or teams have inherently required greater detail or information.To meet this ongoing demand necessitates moving from single small data sets to large multimodal data from various sources ranging from wearables to laboratory equipment and video for effective intervention.The growth in this demand is evidenced by the 30+ year paralleled expansion in sport science professional recruitment as supporting staff as well as the world wide establishment of government backed institutes of sport in part to facilitate the increased need to analyse the larger volumes of data in order to gain or maintain a competitive edge.Such initiatives are on the rise in many disciplines including: health domains [11], expertise and topologies from collaboration type tools [12] and innovation in programming methodologies [13].These examples are useful points of reference to consider for technological uptake that have similar purposes for different applications.This paper presents a multidisciplinary approach grounded in feedback from wearables fused with various sources that will be presented in the remainder of this document.Therefore, the contained information has been arranged to present a basic architecture based framework along with a cloud based application, together with a sample implementation using commonly available tools and a high performance computation environment, specifically, MATLAB (version 7.11.0.584 (R2010b), The MathWorks, MA, USA).

Approach and Implementation
The development of a framework for fusing wearables and other data sources requires an intimate knowledge of the application area needs, thus listening to the demand pull rather than pushing particular technologies at a predetermined solution was critical.We adopted a user centric needs analysis [14] approach where the current and future needs of typical deployment scenarios were considered from existing research and consultancy relationships.From this, their test sites for deployment and their data needs were considered (Table 1).

Homogenenous Datastructure for Inhomogenous Datasets
One of the largest challenges for the development of the data management system was the ability to store, index and modify multiple data streams, without exhaustive development cycles for adaptive processing of inhomogeneous data sources.Specifically, inhomogeneous data are sourced from different technologies that require processing to provide evidence of a specific outcome, typically, in this case, kinematic measures.This required additional development time in producing a data structure that was abstract enough to accommodate various sensor types and data formats, with enough specificity to ensure appropriate indexing.Previous work has been made on the development of the ADAT Toolbox (Version 1.0., SABEL Laboratories, Brisbane, Australia) [19], a toolbox designed for the storage and processing of time-series based inertial data of elite athletes for post-process cognitive feedback.Development as part of the Toolbox, in a MATLAB data structure was produced to provide sensory information and data source associated metadata for a single sensory input.Limitations of the existing structure in the context of distributed collections of inhomogeneous data sources meant that the structure could only retain context into a singular time-series data source, for example, a single inertial sensor.However, due to the primary objective being the process of providing a unified platform for the distributed collection of varying datasets, modifications were required to accommodate the disconnect in various collection methods.
The process of redevelopment and extension of the existing structure presented with additional challenges for extension to distributed implementation.A unique requirement was generated for the distributed implementation of the structure.Allowing a generalised interface to be constructed between multiple differentiating platforms with common data source interfaces.
The Athdata structure was used as a central communication tool between the logical layers of data collection, storage, processing and feedback to users for intended purposes, e.g., motor learning applications, be they near real-time or for post processing intervention.This was initially determined by detailed user requirement analysis for the sport of swimming [20].The functional blocks consisted of data sources, data transport, staging/transformation, storage, analysis and retrieval.Existing development on the Athdata structure saw the use of the MATLAB data science environment as the primary platform of implementation.However, this provided limitations in the development of a distributed collection and management of inhomogeneous data sources, particular with MATLAB specific data structures.Primary focus for the redevelopment and the extension of the structure required the ability for primary data types, generic in implementation to accommodate various implementation methods for collections of disjointed data, which was undertaken in Unified Modelling Language (UML).The UML representation of the improved structure used for the implementation (Figure 1).This was extended to a cloud based system.The major challenge for the development of the structure was to provide the ability to have sessional context of a data source in relation to varying inhomogeneous data sources.From an implementation perspective, a major challenge was generated from the requirements for the existing MATLAB structure to be implemented in standard data types, for example, by allowing implementation in an SQL database backend.Thus, the updated implementation was intended to be provided as a specification of implementation, rather than an implementation method perse.The use of MATLAB was considered advantageous on balance, despite being a proprietary technology, as it was a common tool across the deployment sites.
The majority of the existing structure remains in the updated implementation with key elements included to satisfy the requirements of data source context in relation to distributed inhomogeneous data sources.Included in the structure, session information provided indexing for session context of the data source specific to both the sensor, and the context for the whole session of a collection of sensors.For example, the context of a sensor includes the definition of spatial placement in relation to the testing subject, such as the placement on the human body.Modifications to the "sensor" structure were made to ensure that the update structure was inclusive of varying data source collection methods, moving away from specific device structures.The majority of modifications made to the athlete data (Athdata) structure saw the inclusion of two additional structures referred to as "Metadata" and "Segments" (Figure 1).Both were developed to provide indexing interfaces of inhomogeneous data sources.Therefore, allowing the ability to associate searchable references to metadata tags in the context of a whole session, dataset and specific regions of time series data.
wearable devices through modern internet based protocols e.g., IPv6.Transformation includes the process of taking various distributed inhomogeneous data sources and compiling them into a converging uniform data structure-therefore, allowing the ability for autonomous and operator based metadata indexing for future processing.Storage provides an interface for the long-term storage and retrieval of indexed data sources including the file storage of archived original data sources and exported transformed data structures.Analysis includes the ability for the system to provide interfaces for the processing of transformed and indexed data sources.This allows for the retrieval of cached indexed homogeneous data structured through exported data sources as well as system visualisation and aggregation through integrated layers (Figure 2).

Framework for the Aggregation and Visualisation of Inhomogeneous Data Sources
Initial requirements for the system revolved around the primary objective for the implementation of a common interface for the storage, collection and modification of inhomogeneous data sources.Generated from the requirements, several phases of the system were developed as a high level abstract framework for the system, visually represented in the context of the system framework (Figure 2).The framework consisted of the following high level data components, sources, transport, staging and transformation, data storage, analysis and retrieval and visualisation.The framework was developed using open source tools with platform independence and scalability as important factors.Initially implemented on a single workstation, a browser based interface was seen as desirable because of its platform independence and scalability to multi workstation and cloud based functionality [21].It was considered that using open source enables resources that improve system development and reduce development time and dependency on proprietary systems.Java and in particular Node.JS [22] was selected due to its available packages, wide community and could be used both client and server side.It is reasonably portable and has favourable development times.Whilst legacy code exists within MATLAB, implementation in Python scripting language was seen as favourable because of similar processing power, popularity with the data sciences community in similar applications [23] and its backward compatibility with the data structures.

System Framework
The developed system was modelled closely following a service-orientated framework-by which, each module is developed as a micro-service application that plays a specific and detected role within the cloud system stack-abstracted to four primary modules, formally named, Data Process Queue, Data Management Frontend, Data Management Backend and the Web Proxy (Figure 3).
The Data Management Front micro-service handles all HTTP proxy traffic directed by the Web Proxy through a public SSL (Secure Socket Layer) pipeline, rendering the static frontend HTML application for system interaction by the user.The Data Management Backend was implemented to handle the process of storing all inhomogeneous data sources conformed to the revised data structure.The Data Management Backend also provides an Application Program Interface (API) to the stored and index data sources.Finally, the Data Process Queue was implemented to handle Because of the time-dependent nature of feedback, the efficient handling and flow of time series data was critical.In all applications, data transport required the processing of moving data from specific data sources, both from operators manually uploading or automatic transport from wearable devices through modern internet based protocols e.g., IPv6.Transformation includes the process of taking various distributed inhomogeneous data sources and compiling them into a converging uniform data structure-therefore, allowing the ability for autonomous and operator based metadata indexing for future processing.Storage provides an interface for the long-term storage and retrieval of indexed data sources including the file storage of archived original data sources and exported transformed data structures.Analysis includes the ability for the system to provide interfaces for the processing of transformed and indexed data sources.This allows for the retrieval of cached indexed homogeneous data structured through exported data sources as well as system visualisation and aggregation through integrated layers (Figure 2).
The framework was developed using open source tools with platform independence and scalability as important factors.Initially implemented on a single workstation, a browser based interface was seen as desirable because of its platform independence and scalability to multi workstation and cloud based functionality [21].It was considered that using open source enables resources that improve system development and reduce development time and dependency on proprietary systems.Java and in particular Node.JS [22] was selected due to its available packages, wide community and could be used both client and server side.It is reasonably portable and has favourable development times.Whilst legacy code exists within MATLAB, implementation in Python scripting language was seen as favourable because of similar processing power, popularity with the data sciences community in similar applications [23] and its backward compatibility with the data structures.

System Framework
The developed system was modelled closely following a service-orientated framework-by which, each module is developed as a micro-service application that plays a specific and detected role within the cloud system stack-abstracted to four primary modules, formally named, Data Process Queue, Data Management Frontend, Data Management Backend and the Web Proxy (Figure 3).

User Interface and User Workflow
Ensuring success of the platform requires strenuous detail applied to the user interface to allow seamless collaborative workflow allowing rapid and easy user adoption.The primary incentive into the development of the user interface was to provide a simple and easy to use interface with minimal start up time for new user bases.Figure 4 illustrates the user workflow for the typical use case scenario of generating new sessions and migrating existing session information.With the seamless integration of session creation with existing data source migration, rapid user adaption can be achieved.The Data Management Front micro-service handles all HTTP proxy traffic directed by the Web Proxy through a public SSL (Secure Socket Layer) pipeline, rendering the static frontend HTML application for system interaction by the user.The Data Management Backend was implemented to handle the process of storing all inhomogeneous data sources conformed to the revised data structure.The Data Management Backend also provides an Application Program Interface (API) to the stored and index data sources.Finally, the Data Process Queue was implemented to handle aggregation of varying inhomogeneous data sources, conforming the data sources to a single uniform Athdata structure to be stored within the backend.The Process Queue also provides visualisation functionality with real-time rendering of time series datasets, allowing the collaboration of operators for processing of distributed data sources.

User Interface and User Workflow
Ensuring success of the platform requires strenuous detail applied to the user interface to allow seamless collaborative workflow allowing rapid and easy user adoption.The primary incentive into the development of the user interface was to provide a simple and easy to use interface with minimal start up time for new user bases.Figure 4 illustrates the user workflow for the typical use case scenario of generating new sessions and migrating existing session information.With the seamless integration of session creation with existing data source migration, rapid user adaption can be achieved.
the development of the user interface was to provide a simple and easy to use interface with minimal start up time for new user bases.Figure 4 illustrates the user workflow for the typical use case scenario of generating new sessions and migrating existing session information.With the seamless integration of session creation with existing data source migration, rapid user adaption can be achieved.The primary objective was to provide a collaborative and highly socialised platform for continuous, distributed and seamless integration for research and high performance sport environments with multiple members.With focus on operator centric development, providing the ability for multiple operators at varying access levels to collaborate, process, index and export various supported datasets.Providing functionality for indexing and searching raw datasets, as well as collaborative data processing, forms a repository of existing work.
Once an operator has been granted access to the system, either through self-registration or registration via a lab manager, the operator is presented with a list of sessions.The encapsulation of a session is considered an experiment, allowing an operator to register various data sources encapsulated as datasets for a created session (Figure 5a).An operator is provided with the means of registering additional metadata to both session information and specific datasets.The operator can associate any supported data source as a dataset, which is provided as a basis for further data processing and collaboration with other operators.The operator with administrative privileges of a dataset has the ability to search for operators and assign various access levels for a defined session.
Key features for the platform were the ability to index and search inhomogeneous data structures from various data sources for a defined captured session.The primary method for indexing includes assigned tags to various components of a session.This includes the ability for operators to assign tags to sessions, datasets and various index regions of time-series data, referred to as events.Searchable indexes into specific areas of time-series data allow operators to store information for later processing, allowing collaborative data analysis with multiple operators (Figure 5b).Any operator assigned to a session has the ability to export datasets in various formats, including archived versions of the original uploaded file, and homogenously structured data conformed to the previous discussed structure in a variety of languages (Figure 5c) in addition to universal standardised formats such as JavaScript Object Notation (JSON) for currently non-supported languages.
A key requirement of the system is the migration process of the users for importing existing datasets to the new collaborative cloud based system.Major challenges for the design and implementation for the import functionality was the incentive of users to fill out information in a convenient and efficient manner.This was resolved by the segmentation of information required through an "import wizard", allowing users to stage multiple datasets efficiently through an iterative process.

System Implementation
The implementation of the cloud based system following the implementation of a service orientated framework was isolated to the three primary components including the frontend application server for a web based application.The backend API server that provided interfaces to processed data.In addition, the Python data processing server that provided interfaces to batch processing on inhomogeneous time-series based data sources.The development of the cloud application stack presented unique challenges for the imp.
The frontend of the system handles all browser based user interaction and was implemented using open source tools.These included using the Node.JS express web framework and presents a single page application to the user.The backend provides data management and storage functionality.It is implemented using Node.JS and a Restify API to link to a relation database.User security is also provided by the backend with token authentication.The data processing server handles the staging, transformation, analysis and visualisation processing of the data.It was implemented using Python and the Flask web framework.The processing of data is handled by the data science module SciPy, managed by a Celery task queue.

Sample Application
A key advantage of the framework is the ability for a number of researchers to access shared data capture from multiple data sources for subsequent analyses.In this sample application, the test data set was from six inertial sensors per participant (n = 12) doing six sessions each with video data segments.Specific details of the study's purpose including researcher and participant obligations were given, and, after agreeing to participate, volunteers signed the associated institute's ethical committee approved informed consent form (ENG/05/10/HREC).Each session involved a two-lap swim of a 25 m swimming pool.
This large amount of data would be difficult to distribute and search if it were stored as flat files.Using the developed framework, the data capture can be indexed and becomes accessible to any researcher it is shared with.The indexed metadata is also searchable, providing easy access to sessions and data relevant to a current area of interest.Analysis of data is typically performed using standalone tools either commercially available or written in-house.An example of one of these tools is the SABEL data analysis tool [23], which was applied to one of the swimmers in the data capture set (Figure 6).This tool enables the time series inertial sensor data to be synchronised to video data, thereby giving the data context.The data can then be analysed to find key patterns or events to be used in specific algorithm development.
To illustrate the concept, one of the swimmer's from the data capture was used to visualise the body roll when swimming a single lap of freestyle.The body roll is defined as the clockwise and anti-clockwise roll of the body whilst the arms are performing the strokes.The aim for the athlete is to be as consistent as possible in the strokes to minimise extraneous effort such as profile drag.This, in turn, means that the body roll will also be consistent.The sensor unit was located on the sacrum of the swimmer and the gyroscope channel examined aligned to the direction of the spine.The data was initially examined using the tool displayed in Figure 6.The state-space diagram with a delay of three was generated using the method detailed in Rowlands et al. [24] (Figure 7).The maximum and minimum roll velocities of the body during the left and right arm stroke actions are given at the extents of the diagram.It can be seen that, by examining the contours at the extents of the diagram, the action is asymmetric with variability that changes over time.The coach can then use this information to provide feedback to the swimmer.

Conclusions
Development of the application framework targeted development options specific to time series sport data to limit project scope and restricted available resources to sports biomechanical feedback applications.The majority of application design focused on providing the ability for the inclusion of varying system architectures and physical system resources within the framework.Allowing continuing expansion of hardware allocation from a single multicore machine to an array of high performance servers provides sufficient flexibility to scale and for tailored customisation.
The development of a cloud based inhomogeneous time series framework has been designed to allow the push forward and adoption of varying systems in a collaborative but also a decentralised

Conclusions
Development of the application framework targeted development options specific to time series sport data to limit project scope and restricted available resources to sports biomechanical feedback applications.The majority of application design focused on providing the ability for the inclusion of varying system architectures and physical system resources within the framework.Allowing continuing expansion of hardware allocation from a single multicore machine to an array of high performance servers provides sufficient flexibility to scale and for tailored customisation.
The development of a cloud based inhomogeneous time series framework has been designed to allow the push forward and adoption of varying systems in a collaborative but also a decentralised

Conclusions
Development of the application framework targeted development options specific to time series sport data to limit project scope and restricted available resources to sports biomechanical feedback applications.The majority of application design focused on providing the ability for the inclusion of varying system architectures and physical system resources within the framework.Allowing continuing expansion of hardware allocation from a single multicore machine to an array of high performance servers provides sufficient flexibility to scale and for tailored customisation.
The development of a cloud based inhomogeneous time series framework has been designed to allow the push forward and adoption of varying systems in a collaborative but also a decentralised manner of distributed data collection, moving away from the sole reliance on a proprietary platform to a larger open standardised platform within the data science community.Access to the data stored would enable the generation of algorithms based upon the activity performed.The results from the algorithms could then be presented to the appropriate stakeholders in the most meaningful form for them e.g., customised visualisation.These stakeholders would initially be research based.However, it is not intended to remain solely as a research focus tool, with the ultimate aim for greater application design for use in the public realm with developments for general users.
Continued development for the open standardised platform will require the seamless ability for data aggregation of wearable sensors through modern network protocols-moving from dedicated desktop applications to cloud based data management and analysis tools and allowing wearable sensors to push data to the cloud platform for real-time bio-feedback analysis of athletes.Future development would also see the expansion and support for additional data sources, in order to meet the future adoption trends of wearables in sports science, thus providing a greater ecosystem for traditionally non-supported hardware, data formats and emergent applications.

Figure 1 .
Figure 1.Unified Modelling Language (UML) top-down depiction of the revision for athlete homogeneous wearable sensor data structure.Figure 1. Unified Modelling Language (UML) top-down depiction of the revision for athlete homogeneous wearable sensor data structure.

Figure 1 .
Figure 1.Unified Modelling Language (UML) top-down depiction of the revision for athlete homogeneous wearable sensor data structure.Figure 1. Unified Modelling Language (UML) top-down depiction of the revision for athlete homogeneous wearable sensor data structure.

Figure 2 .
Figure 2. Visualisation of a typical user workflow and application for aggregation of inhomogeneous datasets.

Figure 2 .
Figure 2. Visualisation of a typical user workflow and application for aggregation of inhomogeneous datasets.

Algorithms 2017, 10 , 23 7 of 14 Figure 3 .
Figure 3. Visualisation of a typical user workflow and application for aggregation of inhomogeneous datasets in the context of system stack interaction.

Figure 3 .
Figure 3. Visualisation of a typical user workflow and application for aggregation of inhomogeneous datasets in the context of system stack interaction.

Figure 4 .
Figure 4. Illustration of a typical user workflow and application for aggregation of inhomogeneous datasets.

Figure 4 .
Figure 4. Illustration of a typical user workflow and application for aggregation of inhomogeneous datasets.

Figure 5 .
Figure 5. (a) Operator uploading several data sources for a newly created session; (b) operator session with six data sources shared with multiple operators; and (c) detailed dataset information with time-series dataset rendering.

Figure 6 .
Figure 6.SABEL data analysis with synchronised inertial sensor data and video.

Figure 7 .
Figure 7. State-Space Based Visualisation technique demonstrating the consistency of the strokes using the body roll.

Figure 6 . 14 Figure 6 .
Figure 6.SABEL data analysis with synchronised inertial sensor data and video.

Figure 7 .
Figure 7. State-Space Based Visualisation technique demonstrating the consistency of the strokes using the body roll.

Figure 7 .
Figure 7. State-Space Based Visualisation technique demonstrating the consistency of the strokes using the body roll.