Round Robin Testing: Exploring Experimental Uncertainties through a Multifacility Comparison of a Hinged Raft Wave Energy Converter

: The EU H2020 MaRINET2 project has a goal to improve the quality, robustness and accuracy of physical modelling and associated testing practices for the offshore renewable energy sector. To support this aim, a round robin scale physical modelling test programme was conducted to deploy a common wave energy converter at four wave basins operated by MaRINET2 partners. Test campaigns were conducted at each facility to a common speciﬁcation and test matrix, providing the unique opportunity for intercomparison between facilities and working practices. A nonproprietary hinged raft, with a nominal scale of 1:25, was tested under a set of 12 irregular sea states. This allowed for an assessment of power output, hinge angles, mooring loads, and six-degree-of-freedom motions. The key outcome to be concluded from the results is that the facilities performed consistently, with the majority of variation linked to differences in sea state calibration. A variation of 5–10 % in mean power was typical and was consistent with the variability observed in the measured signiﬁcant wave heights. The tank depth (which varied from 2–5 m) showed remarkably little inﬂuence on the results, although it is noted that these tests used an aerial mooring system with the geometry unaffected by the tank depth. Similar good agreement was seen in the heave, surge, pitch and hinge angle responses. In order to maintain and improve the consistency across laboratories, we make recommendations on characterising and calibrating the tank environment and stress the importance of the device–facility physical interface (the aerial mooring in this case).


Introduction
Physical model testing in wave basins is a key part of the research and development of wave energy converters (WECs) and other marine technologies. These facilities provide modelling services at a variety of scales and vary considerably in terms of configuration and physical size. Given the importance of the test outcomes to developers and other stakeholders, it is essential to understand the validity and comparability of the results obtained across different laboratories. The EU H2020 MaRINET2 project aims to improve the quality, robustness and accuracy of physical modelling and test practices implemented by test infrastructures. As part of this process, a number of "round robin" testing programmes were conducted where nonproprietary wave, floating wind and tidal devices were tested in different infrastructures to assess the influence the facility and local test practices had on the experimental results. This project's predecessor, the EU FP7 MaRINET project, conducted a tidal round robin programme [1], but attempts to gain meaningful wave data were unsuccessful. This paper describes the outcome of tests with a hinged raft WEC under irregular (pseudo-random) sea states and builds on previously published regular wave analysis [2]. This previous work highlighted the importance of agreeing on a common methodology for a test campaign, in particular with regard to the calibration and measurement of the generated waves and the analysis window that is selected. Despite some differences across facilities, the trends were the same across all facilities.
The round robin campaign was conducted over several months at four different European laboratories: Centrale Nantes (ECN-France); The University of Plymouth (UoP-UK); University College Cork (UCC-Ireland); and The University of Edinburgh (UoE-UK). IHCantabria (IHC-Spain) acted as the witness to the tests.
Errors in experimental testing commonly result from precision and bias errors [3] and can be affected by both modelling and testing uncertainty [4]. Bias errors may be introduced by scale effects, model inaccuracies, testing errors (e.g., setup and/or calibration), environmental modelling inaccuracy (e.g., wave spectral shape) and tank effects (e.g., reflections). The variability across facilities has not, to the authors' knowledge, been successfully studied in the field of wave energy converters. If the broader sector is examined, Reference [4] provided a useful practical examination of the sources and magnitudes of uncertainty for a floating wind turbine structure. It is notable that this work approached the problem from a more applied perspective, as might be applied by practitioners in the field. Concentrating on wave energy specifically, the International Towing Tank Conference (ITTC) produced a quality system manual on the subject of uncertainty analysis for wave energy converters [5]. This provides comprehensive tools for applying an analysis for a specific experiment, but does not address the likely magnitude of errors, nor their variability across facilities. Similarly, much of the guidance for conducting tank testing in the wave energy field (e.g., [6]) provides highly useful guidance on the sources of error and uncertainty in a test programme, but little in the way of understanding the specific influence of a facility. This is not surprising, as the data to compare an identical model across facilities are either nonexistent or commercially sensitive and not in the public domain. Studies on nonproprietary technologies, such as the experimental study described in [7], tend to explore the uncertainty when using experimental data to validate a numerical model or extrapolate to full scale. What generally has not been captured in previous works is how facility variability manifests in the model outputs of most interest to a wave energy developer. A round robin programme across multiple facilities provides this opportunity, while also providing a dataset that may be interrogated in detail by future projects to further explore the sources of this uncertainty.
The previous attempt at a wave round robin programme followed an approach more typical of a commercial procurement programme, with each laboratory building the model, executing the test and conducting the analysis to a common specification. In practice, this made a meaningful technical comparison difficult. In this test programme, the variables were reduced to a more manageable level by testing with the same model and conducting common analyses to the combined dataset from all four facilities.
In this particular programme, a hinged raft device was designed and built by ECN. Hinged rafts have been explored commercially by several developers and so present a relevant and representative case study. The device deployed in this programme had a relatively simple generic geometry, but incorporated a controllable power takeoff (PTO) and force instrumentation typical of a higher technology readiness level (TRL) test programme. A parallel round robin programme explored lower TRL testing using an oscillating water column, the uncertainty analysis of which was presented in [8]. Testing uncertainty for the raft model (unconnected to the specific infrastructure) was subject to a separate study for which multiple repeat tests were undertaken, and therefore, this element is not discussed here. This paper introduces the model WEC and experimental design followed by a description of key comparison parameters with results across all facilities. Finally, the results and implications for the physical testing of WECs are discussed.

Hinged Raft WEC Model Properties
A two-body hinged raft was selected to be the higher TRL concept for the MaRINET2 wave round robin tests. This device includes a PTO emulator that produces a controlled torque. Such advanced modelling of the PTO is usually implemented for concepts having reached a TRL of 3. At TRL 3, the model scale is generally between 1:30 and 1:10. For this study, a 1:25 scale factor was chosen to model this deep-water device. The depths of the wave tanks involved vary, and therefore, the corresponding full-scale configurations will have different depths. For most of the tests, the waves of interest were deep-water waves in the different tanks.
In order to provide some inputs for the design of the model and its PTO system, preliminary numerical simulations were performed using Centrale Nantes internal codes and InWave and OrcaFlex software, following the work performed in [9]. Several iterations of the calculations were performed to reach a design at model-scale with a natural pitch period of approximately 1.5 s and a maximum torque at the hinge of approximately 70 Nm. The raft main dimensions are given in Table 1. The front floater includes the control and monitoring system and the motor. The back floater contains a set of lead weights placed to balance the model. The centres of gravity of each floater were measured with a three-point load measurement system. The model was moored with four aerial lines connected to four mooring points on the front floater. The same mooring setup was reproduced in all facilities with anchoring points at the corners of a square with 11.8 m sides, centred on the middle of the front floater. Each mooring line was composed of a stiff rope made of polyethylene fibres and one calibrated spring with a 27 N/m stiffness.

Model Axes
A common axis convention was defined for all rectangular basins in order to provide the locations of the anchor points, the wave gauges and the model. This Galilean reference system is given as (O, x 0 , y 0 , z 0 ), where the ( x 0 ) axis is in the wave direction, the ( y 0 ) axis is perpendicular to the wave direction and the ( z 0 ) axis is pointing vertically upward. As shown in Figure 1, the body fixed reference systems are given as (O f , x f , y f , z f ) for the front floater and (O b , x b , y b , z b ) for the back floater. O f and O b are at the centres of gravity of the floaters. x f and x b are parallel with the deck plan, pointing frontward and facing the waves. y f and y b are parallel to the deck plan. z f and z b are perpendicular to the deck plan and pointing upward. These axes were also used to define the rigid bodies for the six-degree-of-freedom motion capture in all facilities.

Power Takeoff
The PTO is one of the more challenging aspects of model design, and its complexity is one of the key differentiators between the conceptual stage testing and the more representative higher TRL testing, as outlined in the stage development approach defined by the IEC [10] and adopted by MaRINET2 [11]. The PTO supports higher TRL testing by being controllable as per a full-scale prototype and is monitored and controlled with a real-time embedded controller. In order to facilitate the processing of the data from the tests, it was decided that the resisting torque representing the PTO system should follow a simple control law. A simple linear damping was implemented, with the PTO torque proportional to the angular velocity of the relative pitch motion between the two bodies.
An electrical motor was connected to the hinge to generate a resisting torque to represent the power takeoff. The control law was a simple proportional law, given by Equation (1). All symbols and units are described in Table 2.
In order to have a system that works for low-speed and low-torque conditions, friction must be minimized, and therefore, the use of a gearbox was avoided. Furthermore, the motor must have a torque range up to 70 Nm with limited dimensions and weight. This led to the selection of a direct drive motor from Kollmorgen (Model C061B-13-3105), with a 75 Nm peak torque capacity.
The model was assembled in ECN, and the commissioning tests were carried out in the wave tank. First, the natural period of the relative pitch motion of the device without any PTO was located at 1.55 s with decay tests and a set of tests with regular waves of 50 mm height and period going from 1 s to 2.4 s. Then, the optimal damping coefficient was identified with tests using the same regular wave of 50 mm in height, a 1.55 s period and different values of K going from 10 to 70 N·m·rad −1 ·s −1 . For each test, the average mechanical power was calculated as per Equation (4), and the maximum was found for a damping coefficient of 20 N·m·rad −1 ·s −1 . Finally, the PTO behaviour was checked with the signals recorded during the regular wave tests. The measured motor torque correctly followed its target, i.e., the product of measured angular velocity multiplied by the applied damping coefficient.

Measurement and Instrumentation
The measurement equipment can be divided into two different types: (a) directly connected to the model and used in all the facilities, as well as (b) additional measurement systems provided by the facility itself. The latter were captured by the individual data capturing systems, but in all cases, the synchronisation was based on an electrical pulse provided by the tank. Different measurement frequencies were used across the different data capturing systems and facilities, but all data were resampled to 100Hz for the combined dataset and data analysis.

Model Instruments and Data Acquisition
In order to provide commonality and avoid calibration discrepancies across the infrastructures, the same National Instruments CompactRIO data acquisition system was used for all tests. A custom LabView programme, developed by ECN, provided the acquisition and control of the model. Power was inferred from hinge torque and hinge velocity, avoiding any losses through bearings and couplings to the motor. This approach is also consistent with current best practice and guidance (e.g., [10]). The torque transducer was a DRBK model from ETH Messtechnik with a 100 Nm range. The angular velocity was given by the encoder associated with the motor.
Four Applied Measurements DBBSMM 250 N axial load cells were mounted at the mooring attachment points on the raft and were also captured by the common data acquisition system. Some failures were encountered, and some, or all, of the load cells were substituted with the laboratories' own instruments for tests at UCC and UoE. In these cases, the loads were recorded on the laboratory's own data acquisition system and afterwards combined with the dataset.

Motion Capture
All the infrastructures were equipped with similar video motion capture (MoCAP) systems provided by Qualisys. These systems use multiple cameras to capture the position of reflective markers in three-dimensional space. Affixing multiple markers to each part of the model at known locations allows a rigid body to be defined, and based on this, the six-degree-of-freedom (DoF) body motion was calculated. This was performed for the front and rear floaters separately based on the local coordinate systems defined in Figure 1. Rotations around the x and z axes were identical for both floaters and can be used as a quality control of the definition, and the angle θ was calculated based on the difference of the rotation around the y axis. The global coordinate system was defined at the beginning of the experimental investigation, and a regular refinement calibration ensured that the high accuracy of the system (<1 mm) could be maintained for the complete testing campaign.

Wave Measurement
Each infrastructure used its own wave measurement system. The wave gauge (WG) locations are outlined below, but in all cases, a wave gauge was placed at the nominal model location to conduct an "open tank" characterisation of the sea states in the absence of the WEC.

Facilities
The tests of the model were conducted at four different facilities, namely Centrale Nantes (ECN), the University of Plymouth (UoP), University College Cork (UCC) and the University of Edinburgh (UoE). All facilities hosted the same model (Section 2.1) in late 2020, as shown in Figure 2. The testing facilities are all large-scale basins with comparable capabilities, but also unique features. Table 3 provides an overview of the three rectangular and one circular wave tank, and Figure 3 illustrates the configuration of the test in schematic form. The following section provides a brief description of each facility and any relevant tank-specific features for the model considerations. Table 3. Features of the test facilities including the key length introduced in Figure 3. The distance x W M is measured from the wave makers (WMs) to the centre of Raft 1.    Table 3.

Infrastructure
Note that WG2 was moved to the nominal model location for the "open tank" characterisation tests.

Centrale Nantes
The Hydrodynamic and Ocean Engineering Tank of Centrale Nantes is 50 m long, 30 m wide and 5 m deep. It is equipped with a wave maker composed of 48 independent paddles, generating waves up to 1 m in height from crests to troughs. Owing to its size and its generation capacities, it is currently the largest tank in France dedicated to hydrodynamic studies. For these raft tests, specific anchoring points for the mooring lines were installed on the bridges across the basin to respect the common mooring setup.

University of Plymouth
The test facility at UoP is the Ocean Basin located in the Coastal, Ocean and Sediment Transport (COAST) laboratory. The Ocean Basin is 35 m long by 15.5 m wide with a moveable floor that allows different operating depths of up to 3 m. The waves are generated by 24 individually controlled hinged-flap absorbing paddles. In order to minimise the reflected waves, a convex absorbing beach is used on the other end of the basin. The paddles produce regular waves with an approximate maximum height of 0.9 m at 0.4 Hz and a wave height above 0.2 m in a range of 0.166 Hz-1 Hz. Wave synthesising software allows long-and short-crested spectral sea states to be generated, as well as special wave effects.
There are two moveable gantries on the basin that allow the model and wave gauges to be deployed in suitable locations. A gantry crane helps to lift the model during the installation process. The Qualisys motion capture system with 8 × Opus 310+ cameras is used to capture the 6-DoF of the movement of the model. The cameras are designed to capture accurate MoCap data with very low latency and work with both passive and active markers. The model was maintained in position by four mooring lines with anchoring points arranged in a square of an 11.8 m side length.

University College Cork
The test facility at UCC is known as the Lir National Ocean Test Facility (Lir NOTF) and is located in Ringaskiddy, Co., Cork. The round robin testing took place in the Deep Ocean Basin, a 35 m × 12 m rectangular basin with a 12 m × 12 m movable floor plate that allows the water depth to be varied between 0 and 3 m. The basin is equipped with 16 force feedback hinged paddles on one of the short sides with a metal beach at the opposite end. The mooring lines securing the hinged raft were connected to a metal frame fixed to the side walls of the basin to achieve the specified 11.8 m between the anchoring points. Additional wave probes were installed for the tests at UCC to facilitate a reflection analysis.

University of Edinburgh
The FloWave Ocean Energy Research Facility [12,13], in the School of Engineering at the University of Edinburgh, is the only circular wave tank in the compared facilities. One-hundred sixty-eight wave makers are arranged in a circle with a diameter of 25 m and generate, as well as absorb the waves. This allows waves to be generated from any direction, as well as allowing complex multidirectional sea states to be generated. The upper part of the wave tank, which forms the test area, has a constant water depth of 2 m. A lower water volume acts as a recirculation chamber for current generation. Twenty-eight flow drives are arranged in a circle and can introduce current in the main testing area around the centre of the tank. Figure 3 illustrates that the mooring footprint can be fully placed inside of the circular tank. Two poles were attached to the movable gantry to create the two mooring attachment points downwave of the model. Two temporary towers were installed forward of the model to ensure a stable fixture of the mooring lines. Wave Gauges (WGs) 1 and 2 had to be placed on long outriggers to reach the required locations, and WGs 3 and 4 were mounted on separate towers, each with a top outrigger. Additional wave probes were installed for the tests at UoE to facilitate a reflection analysis.

Sea States
The model was tested over a range of irregular and regular waves. For irregular waves, three different target significant wave heights (H s ) were generated (0.05 m, 0.1 m and 0.15 m) for a set of 4 peak wave periods (T p ) (1.3 s, 1.55 s, 1.8 s and 2.05 s). These were generated with JONSWAP spectra with a gamma value of 3.3.

Key Parameters
In order to provide a consistent comparison methodology, the data were packaged into a common format and each analysis was conducted by a single project partner. In order to represent the outputs typical of an appraisal of a WEC technology, the following areas were explored: • Environmental conditions (i.e., sea state calibration); • Device motions (of fore/aft rafts and the hinge angle); • Mooring loads; • Power output.
In the case of motion measurements, the device behaviour was characterised in terms of a spectral response amplitude operator (RAO), as defined in Equation (2). For translation motions (heave and surge in this case), this provides a dimensionless characterisation (e.g., heave (m)/wave amplitude (m)). In the case of the rotational degrees of freedom (pitch and hinge angles), the value is expressed as the angle per unit of wave amplitude (deg/m). The rotational DoFs (y-axis) were converted to their full-scale equivalent to provide more intuitive values. The wave period (x axis) remained at tank scale to aid comparison with the translational RAO plots.
In addition to the mean, median and maximum values, some parameters were also characterised by the significant value, an average of the largest third of values (analogous to H 1/3 as a measure of significant wave height).
The following sections primarily concentrate on comparisons across the midwave height of H s = 0.1 m for reasons of clarity and space. Fuller results are provided in tabular form, or highlighted if behaviours at other wave heights are significantly different.

Sea States
The sea states were calibrated based on the wave gauge (WG2) located at the nominal location of the model (Figure 3). The model was not present for these tests (i.e., open tank configuration). The measured spectra for the nominal H s seas are presented in Figure 4. The wave heights (significant and maximum) and measured peak period are summarised in Figure 5 for all sea states. It is noted that the wave periods were generally consistent across facilities, with the exception of the 2.05 s sea at ECN. Some undergeneration was noted, in particular for the larger wave heights. The sea states were generated as pseudorandom processes, with the wave component phases determined by a uniformly distributed random number algorithm. The time series were not intended to be consistent across facilities, and each facility generated its own randomised realisation of the sea state. Hence, the maximum wave heights showed variation, as would be expected of this probabilistic process. The targeted H s values are indicated on each plot along with the expected Rayleigh distribution extreme value based on a standard (informal industry standard) 1000-wave test (1.85 · H S ) as per Equation (3), where N is the number of waves. The actual test lengths were 512 s, as used by UoE, UCC and ECN, while UoP selected a test length of 720 s. Thus, the measured values were somewhat smaller, predicted to be approximately 1.7 · H S for a nominal 2 s period.

Hinge Position
The hinge position, or relative pitch, was simply defined as the relative angle between the fore and aft bodies, with 0 • being the still-water "flat" state. The significant and maximum absolute values are presented in Figure 6 for the H s = 0.10 m sea states. It is noted that there was minimal variation in the significant values across facilities with the total spread being approximately 3% of the mean value for any given period. The maximum hinge values showed considerably greater spread-35% of the mean value in the case of the 1.8 s period seas. The maximum values are also noted to be approximately double the significant values. This behaviour was similar across all significant wave heights, as summarised in Table 4. This behaviour is intuitively correct if the raft was closely following the wave surface elevation, with the significant and maximum values in the hinge being similar to the wave heights.
The RAO of the hinge is plotted in Figure 7, calculated for all sea states at H s = 0.10 m. In general, the behaviour was similar across the facilities, with some deviation noted around the peak response period of approximately 1.5 s. It was observed that the results for each facility were consistent for different realisations of the spectrum, suggesting that this was not a result of experimental uncertainty, but a feature of the tank or experimental configuration. However, the most obvious variable of tank depth did not appear to be a factor, with ECN and UoP, depths 5 m and 3 m, respectively, showing the greatest disagreement, while UoE and UCC (2 m and 3 m) were very similar. Potential model configuration sensitivities are discussed below.

Mooring Loads
The four mooring line loads are expressed in terms of static ( Figure 8) and dynamic forces (Figure 9). The static load is simply the load at the beginning of an experimental run (i.e., before wave propagation). The dynamic load is the total mooring load, minus the static load.
Observing the static loads illustrated some of the experimental challenges. Ideally, the loads would be consistent not only across facilities, but across each model quadrant (LC1-LC4) and across all tests. However, both systemic deviations are noted (e.g., UoE LC2 was consistently low) and individual anomalies (ECN LC1 and LC4 at 2.05 s, thought to be due to sensor drift). It is suggested that this may be due, at least in part, to practical difficulties in replicating the exact mooring footprint amongst all facilities, combined with some unreliability of the load cells that required their replacement with different models (hence with different masses and sizes) for the UCC and UoE tests.
Regardless of the variation in the static load, the trends in the dynamic loads were similar across all facilities. The mean loads were generally within good agreement (typically varying in the order of 10%), although it is noted that the absolute values were small at model-scale, with mean values below 2 N and even maximum values below 10 N (approximately 5% of the load cell range). The maximum values showed more variation than the mean dynamic loads, as expected. However, it was observed that there was a tendency for some laboratories to produce larger results, with ECN typically giving the greatest loads and UoP the lowest. Aside from tank parameters such as water depth, a variable not quantified was the stiffness of the mooring "anchor". The station-keeping mooring system was entirely above water on a horizontal plane, as described above. Each laboratory had to implement a different solution to support the anchor point (e.g., freestanding towers vs. hard points), and it was postulated that these systems may vary significantly in stiffness. A lesser influence may be the dimensional accuracy of the mooring footprint. This may explain some of the differences in the observed static load and dynamic load behaviour. However, the processes used to position the anchors are likely to result in location errors only in the order of 10 mm, which is not sufficient to explain the observed variation.

Motions: Pitch, Heave and Surge
RAOs were calculated for pitch ( Figure 10), heave ( Figure 11) and surge ( Figure 12). The behaviour was similar across all facilities, with the exception of forward raft heave, which is discussed in further detail below.
The pitch response exhibited clear peaks for the forward and aft rafts at 1.5-1.6 s and 1.6-1.65 s, respectively. As noted previously for the hinge angle, there were some differences in the magnitude of the response (approximately ±10% from the median value) that were not readily explained by the tank configuration or the calibrated sea states. It is again suggested that this may be related to mooring configuration or anchor stiffness.
The behaviour in heave was again similar across all facilities, with a trend towards a greater response at longer periods. The raft's heave behaviour was not expected to be greatly influenced by the mooring system, which was on the horizontal plane. A number of differing features are noted: a peak at approximately 1.2 s on the forward raft as measured at UCC; and some deviations at higher periods (particularly on the aft raft) for the ECN tests. As ECN is deeper than the other facilities by at least 2 m, this may explain the behaviour at high periods where the other tanks were starting to operate on the borders of an intermediate water depth (d/L < 0.5).
The majority of the results from the facilities showed good agreement in surge, although there was more deviation than observed in the heave measurements, which is to be expected given the deviation in mooring loads and pretension. It is noted that the UCC data for the fore raft differed significantly from the other facilities, especially for the lower periods. It was unusual that this behaviour was not also observed on the aft raft given that the behaviours were coupled, and the hinge motions were consistent with the other facilities. Further investigation of the data, and observations of the tests, suggested this was likely a result of water overtopping the forward raft and splashing on the motion capture markers. This behaviour was observed in all facilities, but did not appear to have caused a similar contamination of the data. Other facilities either added additional markers (for redundancy), fitted foam elements to deflect the spray, or simply had camera configurations that were less severely affected. The splashing resulted in gaps in the data, which became more apparent in the analysis at high frequencies (low periods). The surge RAO also appeared most sensitive to this effect. It is suggested that as the expected response was very small up to 1.25 s, the noise and artefacts of the software's gap filling algorithm were more apparent. To add further clarity, the RAOs for the smaller and larger wave heights (H s of 0.05 m and 0.1 m) are reproduced in Figure 13. It was observed that the issue was not present in the smaller seas, but worsened in the larger tests, supporting the observations regarding motion capture marker splashing.

Power
The device power was calculated from power conversion chain measurements, as recommended by the IEC 62600-103 technical specification [10]. The instantaneous power (P hinge ) was therefore calculated as per Equation (4), where torque (T transducer ) and angular velocity (θ encoder ) were measured at the hinge. This approach ensured that the measurements were taken upstream of transmission system losses to reduce uncertainty.
The inferred power outputs for each test are plotted in Figure 14. The interquartile range (IQR) (25th to 75th quantiles) and median (50th quantile) are illustrated in the boxplot, while the whiskers represent the 5th and 95th quantiles. The median and IQR values showed broadly the same behaviour across laboratories, in particular at the lower significant wave heights of 0.05 m and 0.1 m. There was no clear variation with wave period, with all facilities performing consistently across the 1.3-2.05 s T p range. The power outputs observed from UCC were lower by ∼30% for the largest nominal H s of 0.15 m, as would be expected, in line with the lower measured H s values observed and discussed above. The upper limits were examined in terms of the 95th quantile to reduce the influence of uncertainty due to random variation in the outputs. The upper output was generally correlated with the median power, with no clear facility-specific behaviours apparent. For the 25th, 50th (median), 75th and 95th quantiles, the values varied by ±8-10% from the quantile mean (with the exception of the anomalous UCC data at H s = 0.15 m). The biased nature of the power distribution means that the fifth quantile was close to 0 W in all seas, and therefore was not examined in detail.
Further examination of the power performance was possible through the mean power output, important when predicting the energy yield from a device. The mean power values, along with the standard deviations over the test duration, are provided in Table 5. Taking the H s = 0.1 m test series, a variation of ±∼8% was typically observed in the values. However, it is noted that this figure carries significant uncertainty due to the small sample sizes. If the deviations in the UCC results are excluded, the variation drops to ∼4%. Standard deviations were also similar across each test. The mean coefficient of variation for each facility (the ratio of the standard deviation to the population mean) varied across the range 1.35-1.40 at H s = 0.1 m, supporting the similar IQR values observed in Figure 14.  In order to understand if the power data from each tank were drawn from the same distribution for a given sea state, a Kruskal-Wallis test was conducted. This particular test was chosen given the non-normal distribution of the power data (rather than the one-way ANOVA, as might be used for parametric data), and the resulting p-values are reported in Appendix A. Based on the low p-values, the null hypothesis that the samples were drawn from the same population was rejected for all sea states. The Kruskal-Wallis test does not provide information on which samples are dominant. Therefore, also supplied in Appendix A are the pairwise Mann-Whitney U test results, a nonparametric method that tests the null hypothesis that two populations have the same median. Cases are highlighted where the null hypothesis was not rejected at the 5% significance level.

Variability of Motions, Loads and Power
The key parameters examined across the four facilities related to the motions of the fore and aft rafts (pitch, heave and surge), hinge angle, mooring loads and power. These outputs were deemed to provide the key dynamic and kinematic information typically used in the characterisation of a WEC (e.g., [10]). The analysis could be further expanded as required using the collated dataset, as noted and discussed below.
In broad terms, the agreement across the facilities was good and the fundamental behaviours of the WEC were consistent across test programmes. This was observed both in the frequency domain characterisation (RAOs) and summary parameters (e.g., mean, median, significant value). Where significant deviations did occur (e.g., forward raft RAO at higher wave heights), this could be traced to measurement issues, rather than a change in response of the WEC itself. For this particular device, the motions and power output showed the least variability. Ideally, data from more than four facilities would be available to give a more reliable quantification, but there are clear practical and financial challenges in achieving even larger multifacility deployments. However, in the case of mean power, it is suggested that a variation of ±5-10% can be expected for facilities operating with their own practices. Similar variability was present in the motion outputs. Interestingly, the influence of depth, and therefore wavelength, was not a clear influence on the response in this programme. The facility depths ranged from 2 m to 5 m, which at the T p values in question span deep to intermediate water depths. However, at the "borderline" intermediate water depths in question, the change in the power resource was expected to vary between 1% and 5% for the tested range of periods, based on the method outlined in [14]. While this is similar to the observed variability, there was no clear trend to suggest it was the dominant factor, either across facilities or across periods (the depth effect would be strongest at higher periods). It is also noted that the dry station-keeping system deployed for these tests has a geometry that is independent of water depth, as opposed to, e.g., a catenary system. This would further reduce the influence of water depth on the WEC's behaviour. In the regular wave test programme [2], the wave periods were adjusted to provide the same wavelength across facilities with no discernible difference in the results.
The clearest deviations were seen in the measurement of the mooring loads, in particular the static loads. The trends in the dynamic loads (i.e., the loads induced by the response to wave environment) were similar, suggesting that interfacility variation was less significant than the test setup and configuration.
It is noted that the uncertainty appeared to be lower in the irregular seas than the regular tests from the same experimental programme, provided in [2]. It is suggested that in regular wave testing, any variation in individual wave heights (e.g., due to reflections) is much more apparent, and the results are more sensitive to decisions on the sampling window for each facility.

Influence of Experimental Setup and Calibration
The experiment was conducted to minimise any uncertainty resulting from the operation of the WEC by using the same software, physical hardware and technical personnel (on some occasions, through remote access) to maintain operational consistency. The remaining operational areas of uncertainty were primarily the experimental inputs (i.e., the sea states) and integration of the WEC device into the facility.
All facilities employed the method of calibrating the sea states "open tank", with a gauge deployed at the nominal model location. Appropriate gain corrections were then applied to achieve the correct significant wave height. The wave period did not typically require correction due to the deterministic frequency control employed by the wave tanks in this experimental programme. Nevertheless, deviations in H s were noted across the facilities when the data were analysed using a common method. In this case, this appeared as undergeneration of the target wave height in the order of 10-15% in the worst cases. Several factors contributed to this variation: • Operational calibrations conducted using spectra averaged across multiple gauges to minimise the influence of hotspots and reflections; • Inconsistency in facility procedures between using incident spectra (obtained through multigauge reflection analysis) vs. total spectra; • Differences in facility procedures in terms of accepted uncertainty.
The final point above did not contribute, as all the facilities here would expect measured H S values to be within 5% of the target values, and more typically 1-2%. The first two points are more relevant, and this work does not draw conclusions on which is the correct method. Indeed, the choice between basing the input on incident or total spectra can be device-specific and depend on the directional sensitivity of the WEC (e.g., point absorber vs. attenuator). Accounting for the behaviour of hotspots and reflections is also complicated by the fact that that metrics characterising a sea or facility are not routinely supplied (although such measures are available, e.g., [15]). The reflection coefficient (ratio of reflected to incident wave height) is often provided, but this is a frequency-dependent parameter. Facilities such as UoE, which rely entirely on active absorption, may produce similar average values to facilities that use passive absorption (combined with active absorbing wavemakers), but the distribution of values across the operating frequency will differ. This is due to active absorption typically being more effective at low frequencies, but less effective at high frequencies. Whether this is an advantage or disadvantage will depend very much on the model's operating envelope. In this case, the device was designed to operate within a comfortable range of frequencies for all facilities.
The facilities in this study vary significantly in size and configuration, and therefore, a variety of different solutions to anchoring the above-water station-keeping system were required. This would either involve: a freestanding tower; support on an overhead gantry; or attachment to another hard structure. The differences in the stiffness of these systems is thought to be the cause of the variation in mooring pretension (i.e., static load). A contributing factor may be differences in the physical footprint. This uncertainty is difficult to quantify, but it was not deemed significant.

Recommendations for WEC Tank Testing Consistency
It is noted that the clearest source of inconsistency between laboratories was in the calibration of sea states. The effect of this was exaggerated in this study as several outputs (e.g., mean power) were being compared with no reference to the measured wave parameters. Outputs such as RAOs, when calculated using measured, rather than target, spectra are less affected in this regard and are therefore deemed more useful when comparing results from different laboratories. In making recommendations, we distinguish between characterisation and calibration. The former is obtaining an accurate measurement of the test environment, while the latter is the extension of this where the input variables are iterated to meet a specific target.
Based on the experiences of this test group, the following recommendations are made to ensure consistency when testing WECs at scale:

1.
Sea state characterisation should be clearly reported along with the facility's methodology. It must be clear whether the values relate to total or incident spectrum;

2.
Where possible, the characterisation should be based on an average of 3-5 wave gauges measuring in the absence of the model ("open tank"). These gauges would typically cover the area occupied by the model, suggesting a footprint of 1-2 m. The gauges may be spaced to support reflection analysis, allowing the incident spectrum to be reported; 3.
As a minimum, the H s value from the total spectrum should be reported as averaged across the gauges. It is noted that wave periods in contemporary wave tanks are reproduced very accurately; nevertheless, it may be desirable to report mean periods as a measure of quality assurance. Full reporting requirements were outlined in [10] as recommended by the IEC; 4.
Where reflection analysis is conducted, the incident parameters may be reported alongside the total spectrum. The reflection analysis methodology should also be referenced.

5.
Accurate characterisation of the sea state is considered as a prerequisite to any sea state calibration. An established facility with experienced operators is likely to be capable of producing sea states to within 5% of the target values with no (or minimal) iteration.
In a time-limited programme, it may therefore be acceptable to run uncalibrated sea states, provided they are characterised as detailed above; 6.
Where sea state calibration is conducted, the methodology must be reported, in particular the adjustments made to the input spectrum. For example, is the target H s achieved through the application of a broad gain function, or is a frequency-dependent gain function applied (adjusting each frequency bin individually)? The latter method is recommended where practical. It is suggested that a standardised procedure should be adopted (e.g., by the IEC) for sea state characterisation and calibration given its clear importance for maintaining consistency between laboratories. This influence was even more pronounced on the parallel regular wave tests that accompanied the work reported here; 7.
The use of a design wave (i.e., recreating a specific time series at the model location) should be considering as a benchmarking tool to aid comparison between test programmes at different facilities. It is also recommended that this approach be considered for any future round robin test programmes; 8.
Assuming that the model itself maintains consistency between programmes, the key source of variability is the interface with the facility (e.g., the moorings). In addition to ensuring dimensional accuracy, it is suggested that pull tests, to establish the stiffness of the system, be conducted.

Dataset Availability and Applications
The dataset obtained from this experimental programme will be made publicly available through the MaRINET2 project's data preservation tasks. In addition to the wave and model parameters examined here, it also includes additional wave gauge data (as illustrated in Figure 3), more detailed PTO data (e.g., damping coefficients, hinge torque, motor/generator output) and full 6-DoF motions for both rafts. Regular wave tests, as analysed in [2], will also be made available.
Additional irregular sea data are also available, primarily repeats of the seas outlined here. These data form the basis of a parallel MaRINET2 study quantifying the uncertainty associated with the physical model testing of WECs.

Conclusions
The round robin testing programme, conducted at four established European facilities participating in the MaRINET2 programme, explored experimental variability through the deployment of a common wave energy converter model working with the same experimental test programme. The results suggested that physical model testing for the sector is a reliable tool, with variability in key parameters in the order of 5%. Given that most WEC testing is still exploring major design iterations at each stage, this level of variability is unlikely to be problematic. However, as the industry moves to finer design iterations, this may no longer hold true. The key areas for reducing variability were outlined, the most significant being in the characterisation and calibration of the input sea states. Some easily applied recommendations on multiple gauge calibration were provided. Secondly, the interface with the tank (e.g., mooring) should be carefully characterised both in terms of dimensions and mechanical properties (e.g., stiffness).
The facilities themselves all performed remarkably consistently, with the variability largely traced to different working methodologies (e.g., sea state characterisation), instrumentation and model interfacing. The tanks varied significantly in footprint and depth (over the range of 2-5 m); yet, this appeared to have minimal influence on the model behaviour. This comes with the caveat that the mooring system on this particular model is not dimensionally influenced by water depth.  Data Availability Statement: The data recorded here will be archived and made publicly available in NETCDF format through the MARiNET2 project's data preservation activities.
Where the p-value is greater than 5 %, this is highlighted in bold text, indicating that the null hypothesis is not rejected, as discussed in Section 3.6.