Empirical Application of Generalized Rayleigh Distribution for Mineral Resource Estimation of Seabed Polymetallic Nodules

Yu, Gordon; Parianos, John

doi:10.3390/min11050449

Open AccessArticle

Empirical Application of Generalized Rayleigh Distribution for Mineral Resource Estimation of Seabed Polymetallic Nodules

by

Gordon Yu

^* and

John Parianos

Nautilus Minerals Pacific Pty Ltd., East Brisbane, Brisbane 4169, Australia

^*

Author to whom correspondence should be addressed.

Minerals 2021, 11(5), 449; https://doi.org/10.3390/min11050449

Submission received: 16 March 2021 / Revised: 13 April 2021 / Accepted: 21 April 2021 / Published: 23 April 2021

(This article belongs to the Special Issue Exploration of Polymetallic Nodules)

Download

Browse Figures

Versions Notes

Abstract

An efficient empirical statistical method is developed to improve the process of mineral resource estimation of seabed polymetallic nodules and is applied to analyze the abundance of seabed polymetallic nodules in the Clarion Clipperton Zone (CCZ). The newly proposed method is based on three hypotheses as the foundation for a model of “idealized nodules”, which was validated by analyzing nodule samples collected from the seabed within the Tonga Offshore Mining Limited (TOML) exploration contract. Once validated, the “idealized nodule” model was used to deduce a set of empirical formulae for predicting the nodule resources, in terms of percentage coverage and abundance. The formulae were then applied to analyzing a total of 188 sets of nodule samples collected across the TOML areas, comprising box-core samples and towed camera images as well as other detailed box-core sample measurements from the literature. Numerical results for nodule abundance and coverage predictions were compared with field measurements, and unbiased agreement has been reached. The new method has the potential to achieve more accurate mineral resource estimation with reduced sample numbers and sizes. They may also have application in improving the efficiency of design and configuration of mining equipment.

Keywords:

polymetallic nodules; mineral resource estimation; statistical analysis; Generalized Rayleigh Distribution; Clarion Clipperton Zone

1. Introduction

Polymetallic nodules are mineral particles found in many of the world’s oceans [1]. A major deposit lies within the Clarion Clipperton Zone (CCZ) of the tropical North Pacific [2]. Nodules grow via precipitation in an organized manner in and on clay-ooze at the seabed [2] and they are often found with others of similar size and form [3,4,5]. Nodule “abundance” is the kilograms (usually wet) of nodules per square metre of seabed and is used to estimate tonnage of nodules in a mineral resource estimation (as the surrounding clay-ooze should be able to be disregarded at the first step of mining [4,6,7,8,9,10]). Interest in the deposit, from the perspectives of development, marine environment and regulation, has increased over the last 10 years [5,11].

The use of nodule long (or major) axis in predicting individual nodule weights has been long understood [12,13,14,15], even if application via seabed photographs is restricted to areas where the nodules are largely exposed in the host clay-ooze [4,16]. Ultimately, box-core samples are seen to be the most reliable source of abundance data [4], but their relatively high cost makes the use of seabed photographs appealing to workers trying to improve the confidence in abundance estimation [17]. Efforts to use percentage coverage to predict nodule abundance have so far not been effective [18,19].

The distribution of nodule long-axis lengths has been recognised to be often positively skewed (e.g., [4,15]), but such distributions are not known to have been used in the mineral resource estimation process.

In Section 2, three hypotheses are proposed as the basis for an idealized model of seabed polymetallic nodules. The hypotheses made for the “idealized nodule” model are based on analyses of nodule samples collected from the seabed at CCZ. One of the key hypotheses is based on numerical evidence that long axes of the seabed nodules follow the Generalized Rayleigh Distribution (GRD).

Section 3 presents the mathematical characteristics of the GRD pertaining to the analysis of nodules samples. The traditional statistical methods for estimating the parameters of the sample distribution, and for performing the Goodness-of-fit test for GRD are discussed. While they are found useful for analysing nodule samples, the traditional numerical procedure is too complex for practical applications.

In Section 4, based on the “idealized nodule” model discussed in Section 2, a simplified practical approach is developed to replace the complex numerical methods in Section 3. As a result, empirical formulae are derived to directly predict the parentage coverage and abundance of seabed nodules.

Section 5 shows the numerical results of testing the three hypotheses as the basis for “idealized nodule” model. Strong numerical evidence is found, providing validation to hypotheses and consequently the “idealized nodule” model.

In Section 6, 188 samples of long axes of seabed polymetallic nodules from the CCZ are analysed using the empirical method developed in Section 4. The resource estimations in terms of percentage coverage and abundance are compared with the field measurements.

2. Three Hypotheses for an Idealized Model of Seabed Polymetallic Nodules

Polymetallic nodules from the CCZ are found in a wide range of forms [4], but within parts of the central and eastern CCZ covered by an exploration contract held by Tonga Offshore Mining Limited (TOML) they often form irregular slightly prolate spheroid-like forms ([4]; Figure 1). Growth around the horizontal axes (X and Y) is believed to be a function of horizontal space and mineral supply, and growth along the vertical axis is also a function of permissive layer of chemical conditions term the geochemically active layer [4]. Nodules have a very consistent density [4] and a relationship between the major horizontal axis and nodule weight (i.e., volume) has been long recognized [4,12,13].

Based on the above observation, to allow mathematical modelling of seabed polymetallic nodules, the following three somewhat severe fundamental hypotheses are constructed:

Each nodules piece is of ellipsoidal shape (e.g., in Figure 2a,b), which is defined by its three axes $X_{i}$ , $Y_{i}$ and $Z_{i}$ , where $i = 1, 2, 3 \dots N$ , with $N$ being the number of nodules. Here $X_{i}$ is the long or major axis, which is usually in the horizontal plane while $Y_{i}$ and $Z_{i}$ are the two typically shorter minor axes in the horizontal and vertical planes.
Within a certain boundary (domain) on the seabed, the ellipsoidal nodules are similar in shape, i.e., the ratio between two minor axes and the major axis, $ε_{1} = \frac{Y_{i}}{X_{i}}$ and $ε_{2} = \frac{Z_{i}}{X_{i}}$ are constant.
Within a certain boundary (domain) on the seabed, the long axis of nodule $X_{i}$ follows a Generalized Rayleigh Distribution (GRD), which is defined by a pair of parameters α and β (See Section 3).

The above idealization is supported by analysis of nodule data and they were found accurate to certain degree. Specifically, the hypothesis 1 and 2 above will be justified using regression analysis of nodule dimensions and weights of seabed nodules samples in Section 5.1, while the hypothesis 3 will be validated by Anderson-Darling “Goodness-of-Fit” tests in Section 5.2 using nodule samples collected from TOML areas.

3. Generalized Rayleigh Distribution (GRD) and the Traditional Method

The Rayleigh distribution has been widely used to model phenomena in various technical fields. For instance, in the field of oceanography, Longuet-Higgins [20] showed the heights of narrow-banded random ocean waves follows the Rayleigh distribution. Generalized Rayleigh Distributions (GRD), a family of two-parameter variations, have also been proposed although their practical application is limited. For a random variable X following the GRD, its probability density function (PDF)

f (x)

is in the form:

f (x) = 2 α β^{2} x e^{- {(β x)}^{2}} {[1 - e^{- {(β x)}^{2}}]}^{α - 1}, x > 0,

(1)

where α > 0 and β > 0 are shape and scale parameters, respectively.

The cumulative distribution function (CDF)

F (x)

is given by:

F (x) = {[1 - e^{- {(β x)}^{2}}]}^{α}, x > 0 .

(2)

Figure 3 shows the PDF of Generalized Rayleigh Distribution for various values of parameters α and β. For a typical statistical analysis of seabed polymetallic nodules, the parameter range is

α \geq 1 .

3.1. Mean and Standard Deviation of the Generalized Rayleigh Distribution (GRD)

As derived in Appendix A (Equations (A9) and (A15)), the mean

μ

and the standard deviation

σ

of the Generalized Rayleigh Distribution can be written as:

{\begin{matrix} μ = \frac{α}{β} F_{1} (α) \\ σ = μ \sqrt{G (α)} \end{matrix}

(3)

where:

{\begin{matrix} F_{1} (α) = \int_{0}^{\infty} \sqrt{z} e^{- z} {[1 - e^{- z}]}^{α - 1} d z \\ F_{2} (α) = \int_{0}^{\infty} z e^{- z} {[1 - e^{- z}]}^{α - 1} d z \\ G (α) = \frac{F_{2} (α)}{α {[F_{1} (α)]}^{2}} - 1 \end{matrix}

(4)

Formally, Equation (3) can be used to estimate

α

and

β

when

μ

and

σ

are known. However, due to the complexity of functions

F_{1} (α)

and

F_{2} (α)

in Equation (4), the solution process is rather tedious. Due to the complexity in evaluating

F_{1} (α)

,

F_{2} (α)

, an empirical method is developed below in Section 4 to simplify the solution procedure for practical applications.

3.2. Test of Goodness-of-Fit of Generalized Rayleigh Distribution

3.2.1. Parameter Estimation by Maximum Likelihood Estimation (MLE)

For a random sample

X_{1}, X_{2}, \dots, X_{n}

of size n, following the Generalized Rayleigh Distribution (GRD), to determine the two parameters

α

and

β

, defining the GRD, the Maximum Likelihood Estimation (MLE), which maximizes the log likelihood function, gives the following a pair of equations (Abd-Elfattah [21]):

\frac{\sum_{i = 1}^{n} x_{i}^{2} e^{- β^{2} x_{i}^{2}} / (1 - e^{- β^{2} x_{i}^{2}})}{\sum_{i = 1}^{n} l n (1 - e^{- β^{2} x_{i}^{2}})} + \frac{\sum_{i = 1}^{n} x_{i}^{2} / (1 - e^{- β^{2} x_{i}^{2}})}{n} = \frac{1}{β^{2}}

(5)

α = \frac{- n}{\sum_{i = 1}^{n} l n [1 - e^{- β^{2} x_{i}^{2}}]}

(6)

In a typical solution process for

α

and

β

, Equation (5) is first solved iteratively by the Newton-Raphson Method to yield

β

, and Equation (6) is then used to calculate

α

. It is obvious that the solution process for Equation (6) is quite tedious. An empirical alternative is, therefore, devised in Section 4 below to simplify the process for practical application.

3.2.2. The Anderson–Darling Test Statistics

Once the parameters

α

and

β

are estimated as above, it is important to test whether they will yield a Generalized Rayleigh Distribution which gives a “good-fit” for the sample. For computational purpose, the Anderson–Darling (AD) test statistics

A_{n}^{2}

and

V_{n}^{2}

can be written as:

{\begin{matrix} A_{n}^{2} = - n - \frac{1}{n} \sum_{i = 1}^{n} (2 i - 1) {l n [z_{i} + l n (1 - z_{n + 1 - i}]} \\ V_{n}^{2} = \frac{n}{2} - 2 \sum_{i = 1}^{n} z_{i} - \sum_{i = 1}^{n} [2 - \frac{2 i - 1}{n}] l n (1 - z_{i}) \end{matrix}

(7)

Here

z_{i} = F (x_{i})

, where

F (x_{i})

is the empirical Probability Density Function (PDF), calculated using Equation (2) above, and arranged into ascending order.

3.2.3. The Test Criteria for Hypothesis

The value of

A_{n}^{2} and V_{n}^{2}

calculated above are then compared with their corresponding critical values

y_{γ}

and

u_{γ}

, respectively. If:

{\begin{matrix} A_{n}^{2} < y_{γ} \\ V_{n}^{2} < u_{γ} \end{matrix},

(8)

the null hypothesis that sample data follow generalized Rayleigh distribution, is accepted at the particular significance level γ (or at

1 - γ

confidence level). The critical values

y_{γ}

and

u_{γ}

are simulated by the Monte Carlo Method, as discussed in Appendix B.

4. A New Empirical Method and Its Application to Nodule Resources

As shown above in Section 3, the traditional formulation can be used for (1) analysing whether a numerical sample follows the Generalized Rayleigh Distribution in a statistically significant way, and (2) for, if it does, estimating the parameters which defines a GRD yielding the best fit to the sample. However, the complexity of the numerical process described about in Section 3, makes it difficult to be applied in practice. In this Section, we introduce a simplified numerical procedure for predicting the mineral resource of seabed nodules.

4.1. Empirical Estimation of Parameters of the Generalized Rayleigh Distribution

As shown in Section 3.2.1, Equations (3) and (4) can formally be used to estimate

α

and

β

when

μ

and

σ

are known. To simplify the calculation of

F_{1} (α)

and

F_{2} (α)

, using the technique of nonlinear regression, it can be shown, for the interested range of

1 \leq α \leq 9

, the following empirical formulae:

{\begin{matrix} F_{1} (α) = \frac{1.26364}{{(α + 0.51265)}^{0.85786}} \\ F_{2} (α) = \frac{1.73929}{{(α + 1.11917)}^{0.73864}} \\ G (α) = \frac{0.19249}{{(α - 0.41497)}^{0.67379}} \end{matrix}, for 1 \leq α \leq 9

(9)

are accurate to an accuracy of

10^{- 4}

(See Figure 4).

Additionally, included in Figure 4 is

F_{3} (α)

, which will be discussed further in Section 4.2 below. The empirical form of

F_{3} (α)

is:

F_{3} (α) = \frac{2.48274}{{(α + 1.71977)}^{0.62304}}

(10)

Combining Equations (3) and (9), it is straightforward to derive:

{\begin{array}{l} α = 0.41497 + 0.08669 {(\frac{μ}{σ})}^{2.96828} \\ β = \frac{1.26364 α}{{(α + 0.51265)}^{0.85786}} (\frac{1}{μ}) \end{array}

(11)

For a random sample

X_{1}, X_{2}, \dots, X_{n}

of size n, the mean

μ

and standard deviation

σ

are first estimated from the sample, and then the two formulae in Equation (11) can be used to estimate parameters

α

and

β .

It is evident from the first equation of Equation (11) that shape parameter α is determined by the ratio of

μ

and

σ

(in effect the reciprocal of the coefficient of variation or signal-to-noise-ratio). The scale parameter β, on the other hand, is related to the mean

μ

. This is consistent with the observation in Figure 3. It can be posited that geologically these two parameters may in turn relate to the stability and thickness of the geochemically active layer in which the nodules grow, as described in [4].

4.2. Resource Estimation for Seabed Polymetallic Nodules Using Coverage and Abundance

To estimate the percentage coverage and abundance of seabed polymetallic nodules using the measurements of their long axes, the first two hypothesis, as in Section 2 above, are applied:

The nodule is assumed to be in an idealized ellipsoid shape.
Nodules within a certain boundary are assumed to be “similar” in shape, the ratios between the lengths of the two minor axes and the major axis (denoted by $ε_{1}$ and $ε_{2}$ ) are constant.

4.2.1. Prediction of Nodule Coverage: Idealized Nodules

Assuming

X_{1}, X_{2}, \dots, X_{N}

are samples of long axes of ellipsoid nodules, with

N

being the number of nodules in the photo, the total area

S_{c}

being covered by nodules can be calculated as the summation of the elliptical projections of nodules

S_{i}

in the photo image:

S_{c} = \sum_{i = 1}^{N} S_{i} = \sum_{i = 1}^{N} \frac{π}{4} X_{i} Y_{i}

(12)

where

Y_{i} = ε_{1} X_{i}

being the shorter axes (

ε_{1} \leq 1

). Equation (12) can then be re-arranged as:

S_{c} = \sum_{i = 1}^{N} \frac{π}{4} X_{i} Y_{i} = \frac{π}{4} \sum_{i = 1}^{N} X_{i} (ε_{1} X_{i}) = \frac{π ε_{1}}{4} \sum_{i = 1}^{N} X_{i}^{2}

(13)

where

\bar{X^{2}}

indicating the mean of

X_{i}^{2}

. Using Equation (A8) in Appendix A and taking

m = 2

, Equation (13) becomes:

S_{c} = \frac{π ε_{1}}{4} N \bar{X^{2}} = \frac{π ε_{1}}{4} N \frac{α}{β^{2}} F_{2} (α)

(14)

Assuming

S_{p}

is the total area of the photo and using Equation (9), the nodule percentage coverage

C_{N}

becomes:

C_{N} = \frac{S_{c}}{S_{p}} = \frac{π ε_{1} N}{4 S_{p}} \frac{α}{β^{2}} F_{2} (α) = \frac{ε_{1} N}{S_{p}} \frac{1.36603 α}{β^{2} {(α + 1.11917)}^{0.73864}}

(15)

4.2.2. Prediction Nodule Abundance I: Based on “Idealized Nodule” Model

Similarly, the total Weight

W_{a}

of nodules in the photo can be calculated as the summation of the weights of nodules

W_{i}

in the photo:

W_{a} = \sum_{i = 1}^{N} W_{i} = \sum_{i = 1}^{N} \frac{π}{6} ρ X_{i} Y_{i} Z_{i}

(16)

Assuming the two minor axes

Y_{i}

and

Z_{i}

are related to the major axis

X_{i}

by

Y_{i} = ε_{1} X_{i}

and

Z_{i} = ε_{2} X_{i}

, Equation (16) can be re-arranged as:

W_{a} = \sum_{i = 1}^{N} \frac{π}{6} ρ X_{i} Y_{i} Z_{i} = \frac{π}{6} ρ \sum_{i = 1}^{N} X_{i} (ε_{1} X_{i}) (ε_{2} X_{i}) = \frac{π ρ ε_{1} ε_{2}}{6} \sum_{i = 1}^{N} X_{i}^{3} = \frac{π ρ ε_{1} ε_{2}}{6} N \bar{X^{3}}

(17)

where

\bar{X^{3}}

indicating the mean value of

X_{i}^{3}

. Using Equation (A8) in Appendix A and taking m = 3, Equation (17) becomes:

W_{a} = \frac{π ρ ε_{1} ε_{2}}{6} N \bar{X^{3}} = \frac{π ρ ε_{1} ε_{2}}{6} N \frac{α}{β^{3}} F_{3} (α)

(18)

Assuming

S_{p}

is the area of the photo and using Equation (10), the nodule abundance

A_{N}

becomes:

A_{N} = \frac{W_{a}}{S_{p}} = \frac{π ρ ε_{1} ε_{2} N}{6 S_{p}} \frac{α}{β^{3}} F_{3} (α) = \frac{ρ ε_{1} ε_{2} N}{S_{p}} \frac{1.29996 α}{β^{3} {(α + 1.71977)}^{0.62304}}

(19)

4.2.3. Prediction Nodule Abundance II: With Empirical Long-Axis-Weight Relationship

According to [4,13] and Case 4 in Section 5.1 below, the weight of the nodule is correlated to its long axis by:

{l o g}_{10} W_{i} = k {l o g}_{10} X_{i} + b

(20)

where

k

and

b

are constants and

k

is usually smaller and close to 3.0 (See discussion in Section 5.1). Equation (20) can be re-arranged as:

W_{i} = 10^{b} X_{i}^{k}

(21)

Then, the total weight

W_{a}

of nodules in the photo can be calculated by adding the weights of nodules

W_{i}

in the photo:

W_{a} = \sum_{i = 1}^{N} W_{i} = \sum_{i = 1}^{N} 10^{b} X_{i}^{k} = 10^{b} \sum_{i = 1}^{N} X_{i}^{k}

(22)

Using Equation (A8) in Appendix A and taking m = k, Equation (22) becomes:

W_{a} = 10^{b} \sum_{i = 1}^{n} X_{i}^{k} = 10^{b} N \bar{X^{k}} = 10^{b} N \frac{α}{β^{k}} F_{k} (α)

(23)

F_{k} (α)

, with

2 < k \leq 3

, can be approximated by interpolating

F_{2} (α)

and

F_{3} (α)

:

F_{k} (α) = [F_{3} (α) - F_{2} (α)] k + [3 F_{2} (α) - 2 F_{3} (α)]

(24)

Assuming

S_{p}

is the area of the photo and using Equation (24), the nodule abundance

A_{N}

becomes:

A_{N} = \frac{W_{a}}{S_{p}} = \frac{10^{b} N}{S_{p}} \frac{α}{β^{k}} F_{k} (α) = \frac{10^{b} N}{S_{p}} \frac{α}{β^{k}} {[F_{3} (α) - F_{2} (α)] k + [3 F_{2} (α) - 2 F_{3} (α)]}

(25)

4.2.4. Relation between Nodule Percentage Coverage and Abundance

The abundance

A_{N}

are related to the percentage coverage

C_{N}

by dividing Equation (19) by Equation (15):

\frac{A_{N}}{C_{N}} = \frac{2}{3} \frac{ρ ε_{2}}{β} \frac{F_{3} (α)}{F_{2} (α)}

(26)

Eliminating β using the first equation of Equation (3), Equation (26) above becomes:

\frac{A_{N}}{C_{N}} = \frac{2}{3} ρ ε_{2} µ \frac{F_{3} (α)}{α F_{1} (α) F_{2} (α)}

(27)

Using the technique of nonlinear regression, Equation (27) can be rewritten into the form:

\frac{A_{N}}{C_{N}} = ρ ε_{2} µ \frac{0.70834 {(α + 0.41373)}^{0.99261}}{α}

(28)

For the range of

1 < α < 4

, of interest to nodules, Equation (28) can be further approximated, within 0.2% error, by:

\frac{A_{N}}{C_{N}} \approx ρ ε_{2} µ [1 - 0.3 (1 - \frac{1}{α})]

(29)

The above formulation, based on the “Idealized Nodule” model, provides two alternative ways to estimate the nodule resources:

Equations (15) and (19) can be used independently to calculate the nodule percentage coverage $C_{N}$ and the abundance $A_{N}$ , respectively.
If an estimation of the $C_{N}$ is already estimated (e.g., using digitization technique from seabed imagery), then Equation (28) or Equation (29) can be used to compute $A_{N}$ .

5. Test of Hypotheses of the Idealized Nodule Model

The three fundamental hypotheses that define the “idealized nodule” in Section 2 find considerable support from numerical measurements of samples of seabed nodule.

5.1. Test of Hypothese 1 and 2: Linear Regression Analyses on Nodule Dimensions and Weights

In order to examine the validity of the first two hypotheses for the “idealized nodules”, linear regression analyses were carried out on published data of nodule major and minor axes, volume and weight from two sites (BGR East and GSR Central Regions) in CCZ [14]. Specifically, the following cases of linear regressions were carried between:

Case 1: Nodule long axis and its horizontal minor axis;
Case 2: Nodule long axis and its vertical minor axis;
Case 3: Nodule weight and its volume; and
Case 4: Nodule long axis and its weight.

The numerical results are shown in Figure 5 (for BGR East Region) and Figure 6 (for GSR Central Region), with charts (a) to (d) representing Cases 1 to 4, respectively, and are also summarized in Table 1. The linear regression was performed for Cases 1 to 3 with intercept forced be zero to match the fact that the two variables vanish at the origin.

The results from Cases 1 and 2 show the two minor axes are correlated to the long axis in statistically significant ways. With R² > 95%, a great majority (>95%) of the data points supports the hypothesis that, with a certain boundary, the ratios between the length of the major axes X and the lengths of the horizontal and the vertical minor axes Y and Z can be considered as constants. It is also noted the ratios vary between regions. The result from the 3rd Case indicates a nodule density of 1.93 g/cm³ although it is only supported by a small sample. The Case 4 indicates the nodule weight is strongly correlated with the 2.5056th and 2.7210th power of the long axes, supported by about 88% and 93% of the data points from the two regions, respectively. It is worthwhile to notice, for “idealized nodules”, the nodule weight is proportional exactly to the cubic (the 3rd power) of its long axis.

While the above results give broad, yet strong support to the first two hypotheses for “idealized nodules” in Section 2 (these being on ellipsoid form and similarity of form within a domain), we use additional analysis in Section 2 to validate the 3rd hypothesis regarding a GRD distribution.

5.2. Test of Hypothses 3: Goodness-of-Fit Test of Generalized Rayleigh Distribution for Nodule Long Axes

In this Section, the traditional Anderson-Darling “Goodness-of-Fit” tests, as outlined in Section 3, are carried out to check the validity of the 3rd hypothesis for the “idealized nodules”. It is to check whether the long axes of seabed nodules follow the Generalized Rayleigh Distribution (GRD) in a statistically significant way. A total of 9 samples (5 towed photos and 4 washed samples) of nodule long axes were analysed. While Table 2 shows the key statistical properties of the 9 data sets, Table 3 presents the results of “Goodness-of-Fit” tests.

For each set of data in Table 3,

α

and

β

are solved iteratively by Equations (5) and (6), using the MLE method described in Section 3.2.1. The Anderson–Darling (AD) test statistics

A_{n}^{2}

and

V_{n}^{2}

are then calculated by Equation (7) and they are in turn compared with their critical values

y_{γ}

and

u_{γ}

, which are computed by the Monte Carlo Simulation in Appendix B. According to the test criteria in Equation (8), among the 9 sets of samples, 7 of them have passed the AD tests at 95% confidence level. It indicates a high probability that the samples of long axes of polymetallic nodules do follow Generalized Rayleigh Distribution although more AD tests need to be carried out for more samples to check the generality. This conclusion may be conditional (e.g., by geological domain), and more research is needed to identify the conditions.

Figure 7 and Figure 8 show the visual comparison of distribution of nodule long axes from raw data, computed by the traditional method, and by the new empirical method. Probability density functions (PDF) and cumulative distribution functions (CDF) are plotted in Figure 7 and Figure 8, respectively, using Sample ID: 2015_08_29_131349 as an example. For the raw data, a bin size of 0.25 was selected to create a histogram of the sample of long axis, and the PDF and the CDF are then computed. The dark blue line shows the raw counts of the original data set, with the light blue line showing the smoothed data using the Savitzky-Golay filter [22] for the PDF in Figure 7. The green line shows PDF and CDF based on parameters

α

and

β

calculated iteratively by MLE method described in Section 3.2.1 above. The red line shows PDF and CDF based on parameters

α

and

β

calculated by the empirical formulas in Equation (11). The empirical formulas, while much more straightforward to use, do give reasonably accurate results for the statistical distributions in practical applications.

5.3. Comments on the Level of Support

Significantly, the Linear Regression Analysis in Section 5.1 and the Goodness-of-Fit Test in Section 5.2 do support the three seemly drastic fundamental hypotheses made in Section 2 for “idealized nodules”. However, more analyses are needed to check the generality. Nonetheless, as the hypotheses have been validated, the empirical method developed in Section 4 can be used for nodule resource prediction in the next Section.

Possible reasons for achieving the statistically significant validations, include that the samples are located within a particular growth domain (the CCZ), and that the conditions of growth within this domain are remarkably consistent as nodules grow in effect from the ocean’s epibenthos, and slowly enough to “average out” short term (millennia scale) variances in growth conditions.

6. Numerical Results of Nodule Resource Prediction Using the New Empirical Method

Once newly proposed “idealized nodules” model has been validated in the previous Section, the empirical formulation as developed in Section 3, particularly the Equation (19) (or Equation (25)) for nodule abundance

A_{N}

are applied to a total of 188 samples of seabed polymetallic nodule collected in 2015 as part of the TOML CCZ15 marine expedition.

6.1. Sample Datasets

The 188 samples of seabed polymetallic nodules used for the empirical analyses can be grouped in three datasets:

1.: Dataset 1: regional scale box-core sample dataset (physical weights). This involves four TOML exploration contract areas (TOML B, C, D, F; Figure 9) spanning some 2000 km of longitude and 700 km of latitude. The dataset thus allows for examination of a general relationship.
2.: Dataset 2: local scale box-core sample dataset (physical weight) of two distinct facies types but only within the TOML F area (~200 × 200 km). Type 1 nodules are smaller and often densely packed, type 2 nodules are significantly larger and more variable (cf. [5]). The dataset thus allows for differences in nodule types from an area where the distinction between type is simple and straightforward.
3.: Dataset 3: two local scale towed photo sample datasets (long-axis abundance estimate) between the TOML B and C areas (~300 km apart). The dataset is limited in that actual nodule weights cannot be compared, but it allows for larger datasets from two distinctly different areas to be compared.

Coverage was also measured for datasets 2 and 3 from seabed photographs (boxcore mounted and towed, respectively) using Image J software. Dataset 1 was not able to be measured due to a lack of images (the box-core camera had frequently malfunctioned).

A summary of the datasets used for analysis is shown in Table 4 and Figure 9.

6.2. Prediction of Abundance of Seabed Nodules

To make a thorough assessment of the accuracy of abundance prediction made by the new empirical method, particular Equation 19 (or Equation (25)), Figure 10, Figure 11 and Figure 12 show the ratios between the abundance prediction and those from the actual box-cores measurements (for datasets 1 and 2) or the available estimate from long-axis measurements (for dataset 3). In the Figures, ratios are plotted against the mean long axes of each sample. For each dataset three charts are presented:

Chart (a) showing ratios based on abundance calculated directly by the empirical formula Equation (19), which is strictly based on the three hypotheses for idealized nodule model in Section 2. The axis ratios $ε_{1}$ and $ε_{2}$ used in the formula are extracted from the analyses of BGR East and GSR Central data (Table 1 in Section 5.1);
Chart (b) showing the ratios in Chart (a) corrected by a “linear adjustment”. Each individual ratio in Chart (a) is factored/divided by the result of linear regression of the ratios, and the corrected results are shown in Chart (b); and
Chart (c) showing ratios based on abundance calculated by the empirical formula Equation (19), incorporating the long-axis-weight relationship observed by several researchers (e.g., Felix [13]), which indicates the nodule weight is coorelated to the 2.7–2.8^th power of its long-axis (noting for “idealized nodule”, it is the 3rd power).

Figure 10 shows the results for Dataset 1, which is collected across the TOML areas. It is observed that empirical formula Equation (19), which is strictly based on the three hypotheses for idealized nodule model in Section 2, gives a slight bias of over-predicting the abundance for larger nodules. Unbiased prediction can be achieved once either the “linear adjustment” or the long-axis-weight relationships are applied.

Figure 11 depicting Dataset 2 shows that nodule types can have an influence in the prediction. While empirical formula Equation (19) results in a similar bias to that seen for Dataset 1, facies specific “linear adjustment” results in unbiased estimates with slightly higher levels of scatter. Use of empirical formula Equation (25) based on the coefficients of [13] works better for Type 1 nodules than for Type 2, suggesting that different coefficients may work better.

Figure 12 with Dataset 3 shows that the accuracy of prediction made by empirical formula Equation (19) may vary for nodules between different areas. In effect the nodules from TOML B1 appear to deviate more from the “idealized nodules” per Section 2 than the nodules from TOML C1. This may be due to the fact that the TOML B1 nodules are likely older and more often formed from multiple generations of growth (i.e., fragments of nodules with younger concentric growth phases). This could predispose them to be more equant in shape. Again, the linear adjustment addresses the size-bias seen in the direct application of Equation (19). Application of Equation (25) gives broadly similarly agreeable results in (c) as those with “linear correction” in (b).

The slight biases of over-estimation of abundance for larger nodules reveals a limitation of the empirical formula Equation (19), which is directly based on the three fundamental hypotheses for the “idealized nodules”. While the first two hypothesis state that the nodules are in ellipsoidal shape and they are “similar” in shape, in realty, it is obvious that the nodule shapes are complex and for nodules of various sizes, the ratios between the minor axes and the long one may vary with nodule size. However, this bias seems much less severe while empirical formula Equation (25) is applied, which is based on an empirical relationship between the nodule long axis and its weight (e.g., Felix [13]).

Estimates of coverage using the new empirical method show mixed but encouraging results when compared with field measurements from three areas (Figure 13). Dataset 2 from TOML F shows a systematic bias independent of nodule facies types. In contrast, Dataset 3 from TOML B and C do not display any appreciable bias. This is likely related to the degree of clay-ooze sediment cover between the areas (Figure 14).

7. Conclusions

It is concluded that:

There is statistically significant evidence that the forms of CCZ polymetallic nodules resemble an “idealized nodule” model based on three hypotheses: (1) broadly ellipsoidal shape, (2) similar forms between nodules in a given area and (3) the nodule long axes follow a two-parameter Generalized Rayleigh Distribution (GRD). These three hypotheses were tested using field measurements from available nodule samples collected from CCZ. Numerical evidence supports the three hypotheses, possibly due to the relatively stable seabed environment and the long growth period of the nodules removing short-term transient effects.
The distribution of nodules sizes and associated parameters can be estimated using empirical formulae. Specifically, explicit empirical formulae have been derived for direct calculation of GRD parameter α and β (Equation (11)), for percentage coverage C_N (Equation (15)), and for abundance A_N (Equation (19) or Equation (25)). These formulas are found to be sufficiently accurate for mineral resource estimation and are much easier to use than the traditional analytical methods for GRD.
The direct application of the formula for A_N does display a slight bias of over-estimating the abundance for larger nodules. However, unbiased accurate prediction of nodule abundance can be achieved by applying either a “linear adjustment” or a long-axis-weight relationship.
For two of the TOML areas the new empirical method provides close agreement but from the third area there is a consistent offset. This may be related to the degree of clay-ooze sediment cover in that third area. Analyses of samples from other regions will be needed to better understand the generality of the empirical model and its derived formulae. Such analysis is needed in any event to calibrate the model in other areas.
The new empirical method with derived explicit formulae has shown the potential of achieving more accurate mineral resource estimation with reduced sample numbers and sizes. The new understanding of the nodule size distribution can likely also improve the efficiency of design and configuration of mining equipment with limitations regarding particle size.

Author Contributions

Conceptualization, G.Y. and J.P.; derivation of formulae G.Y.; sample collection and selection J.P.; software and modelling, G.Y.; validation, G.Y. and J.P.; formal analysis, G.Y.; data curation, J.P.; Both authors drafted sections and figures and discussed and reviewed each other’s contributions. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data sharing is not applicable to this article.

Acknowledgments

This research was completed by the authors while under the employment of Nautilus Minerals Pacific, and their allocation of time to spend on the subject is gratefully acknowledged. The sample images used are property of Tonga Offshore Mining Limited and their permission to use the images for this research is also gratefully acknowledged. The authors would also like to thank their wives, Yuet Terry Ting and Nicola Parianos for their encouragement and support during the course of this work.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Functions $F_{1} (α)$ , $F_{2} (α)$ and $F_{3} (α)$

For a random sample

X_{1}, X_{2}, \dots, X_{N}

of size N, following the Generalized Rayleigh Distribution (GRD) with its probability density function (PDF)

f (x)

in the form of:

f (x) = 2 α β^{2} x e^{- {(β x)}^{2}} {[1 - e^{- {(β x)}^{2}}]}^{α - 1}, x > 0,

(A1)

and its mean value

\bar{X}

or

μ

can be expressed in the integral form below:

\bar{X} = μ = \frac{1}{N} \sum_{i = 1}^{N} X_{i} = \int_{0}^{\infty} x f (x) d x = 2 α β^{2} \int_{0}^{\infty} x^{2} e^{- {(β x)}^{2}} {[1 - e^{- {(β x)}^{2}}]}^{α - 1} d x

(A2)

Similarly, the means of square and cubic

\bar{X^{2}}

and

\bar{X^{3}}

be written, respectively, as:

\bar{X^{2}} = \frac{1}{N} \sum_{i = 1}^{N} X_{i}^{2} = \int_{0}^{\infty} x^{2} f (x) d x = 2 α β^{2} \int_{0}^{\infty} x^{3} e^{- {(β x)}^{2}} {[1 - e^{- {(β x)}^{2}}]}^{α - 1} d x

(A3)

And:

\bar{X^{3}} = \frac{1}{N} \sum_{i = 1}^{N} X_{i}^{3} = \int_{0}^{\infty} x^{3} f (x) d x = 2 α β^{2} \int_{0}^{\infty} x^{4} e^{- {(β x)}^{2}} {[1 - e^{- {(β x)}^{2}}]}^{α - 1} d x

(A4)

Equations (A3) and (A4) can then be combined formally as below:

\bar{X^{m}} = \frac{1}{N} \sum_{i = 1}^{N} X_{i}^{m} = \int_{0}^{\infty} x^{m} f (x) d x

(A5)

Assuming

z = {(β x)}^{2}

and

d z = 2 β^{2} x d x

, Equation (A5) becomes:

\bar{X^{m}} = 2 α β^{2} \int_{0}^{\infty} x^{m + 1} e^{- {(β x)}^{2}} {[1 - e^{- {(β x)}^{2}}]}^{α - 1} d x = α \int_{0}^{\infty} \frac{z^{\frac{m}{2}}}{β^{m}} e^{- z} {[1 - e^{- z}]}^{α - 1} d z = \frac{α}{β^{m}} \int_{0}^{\infty} z^{\frac{m}{2}} e^{- z} {[1 - e^{- z}]}^{α - 1} d z

(A6)

Further defining:

F_{m} (α) = \int_{0}^{\infty} z^{\frac{m}{2}} e^{- z} {[1 - e^{- z}]}^{α - 1} d z, where m = 1, 2, 3,

(A7)

Equation (A6) can be written as:

\bar{X^{m}} = \frac{α}{β^{m}} F_{m} (α)

(A8)

From Equation (A8), when m = 1, the mean

μ

of the sample is:

μ = \frac{α}{β} F_{1} (α)

(A9)

The variance

σ^{2}

of the Generalized Rayleigh Distribution can be calculated as:

σ^{2} = \int_{0}^{\infty} {(x - μ)}^{2} f (x) d x = \int_{0}^{\infty} (x^{2} - 2 μ x + μ^{2}) f (x) d x = \int_{0}^{\infty} x^{2} f (x) d x - 2 μ \int_{0}^{\infty} x f (x) d x + μ^{2} \int_{0}^{\infty} f (x) d x = \int_{0}^{\infty} x^{2} f (x) d x - 2 μ μ + μ^{2} = \int_{0}^{\infty} x^{2} f (x) d x - μ^{2}

(A10)

or

σ^{2} + μ^{2} = \int_{0}^{\infty} x^{2} f (x) d x = \bar{X^{2}} = \frac{α}{β^{2}} F_{2} (α)

(A11)

Combining Equations (A9) and (A11) gives:

\frac{σ^{2} + μ^{2}}{μ^{2}} = \frac{\frac{α}{β^{2}} F_{2} (α)}{\frac{α^{2}}{β^{2}} {[F_{1} (α)]}^{2}} = \frac{F_{2} (α)}{α {[F_{1} (α)]}^{2}}

(A12)

Defining:

G (α) = \frac{F_{2} (α)}{α {[F_{1} (α)]}^{2}} - 1

(A13)

Equation (A12) can be rewritten as:

\frac{σ^{2}}{μ^{2}} = G (α)

(A14)

Or:

σ = μ \sqrt{G (α)}

(A15)

From Equation (A7), functions

F_{1} (α)

,

F_{2} (α)

and

F_{3} (α)

can be expressed as:

F_{m} (α) = \int_{0}^{\infty} z^{\frac{m}{2}} e^{- z} {[1 - e^{- z}]}^{α - 1} d z, where m = 1, 2, 3

(A16)

By using generalized Binomial Theorem, Equation (A16) gives:

F_{m} (α) = \int_{0}^{\infty} z^{\frac{m}{2}} e^{- z} [\sum_{k = 0}^{\infty} (\begin{matrix} α - 1 \\ k \end{matrix}) 1^{[(α - 1) - k]} {(- e^{- z})}^{k}] d z = \sum_{k = 0}^{\infty} (\begin{matrix} α - 1 \\ k \end{matrix}) \int_{0}^{\infty} z^{\frac{m}{2}} e^{- z} [{(- e^{- z})}^{k}] d z = \sum_{k = 0}^{\infty} (\begin{matrix} α - 1 \\ k \end{matrix}) {(- 1)}^{k} \int_{0}^{\infty} z^{\frac{m}{2}} e^{- (k + 1) z} d z

(A17)

Assuming

y = (k + 1) z

, the integral in Equation (A16) becomes:

\int_{0}^{\infty} z^{\frac{m}{2}} e^{- (k + 1) z} d z = \int_{0}^{\infty} {(\frac{y}{k + 1})}^{\frac{m}{2}} e^{- y} d (\frac{y}{k + 1}) = \frac{1}{{(k + 1)}^{\frac{m}{2} + 1}} \int_{0}^{\infty} y^{\frac{m}{2}} e^{- y} d y = \frac{1}{{(k + 1)}^{\frac{m}{2} + 1}} Γ (\frac{m}{2} + 1)

(A18)

where

Γ (\frac{m}{2} + 1)

is the Gamma function. Inserting Equation (A18) into Equation (A17) gives:

F_{m} (α) = \sum_{k = 0}^{\infty} (\begin{matrix} α - 1 \\ k \end{matrix}) {(- 1)}^{k} \frac{Γ (\frac{m}{2} + 1)}{{(k + 1)}^{\frac{m}{2} + 1}} = Γ (\frac{m}{2} + 1) \sum_{k = 0}^{\infty} \frac{(α - 1)!}{k! (α - 1 - k)!} \frac{{(- 1)}^{k}}{{(k + 1)}^{\frac{m}{2} + 1}} = Γ (\frac{m}{2} + 1) \sum_{k = 0}^{\infty} \frac{(α - 1) (α - 2) \dots [(α - 1 - k) + 1]}{(k + 1)!} \frac{{(- 1)}^{k}}{{(k + 1)}^{\frac{m}{2}}} = Γ (\frac{m}{2} + 1) \sum_{k = 0}^{\infty} {(- 1)}^{k} \frac{(α - 1) (α - 2) \dots (α - k)}{(k + 1)! {(k + 1)}^{\frac{m}{2}}}

(A19)

Noting for

Γ (\frac{m}{2} + 1)

with

m = 1, 2 and 3

,

Γ (\frac{1}{2} + 1) = Γ (\frac{3}{2}) = \frac{\sqrt{π}}{2}, Γ (\frac{2}{2} + 1) = Γ (2) = 1 and Γ (\frac{3}{2} + 1) = Γ (\frac{5}{2}) = \frac{3}{4} \sqrt{π}

(A20)

respectively,

F_{1} (α)

,

F_{2} (α)

and

F_{3} (α)

can then be expressed by the infinite series as:

F_{1} (α) = \frac{\sqrt{π}}{2} \sum_{k = 0}^{\infty} {(- 1)}^{k} \frac{(α - 1) (α - 2) \dots (α - k)}{(k + 1)! \sqrt{k + 1}} = \frac{\sqrt{π}}{2} [1 - \frac{(α - 1)}{\sqrt{2} * 2!} + \frac{(α - 1) (α - 2)}{\sqrt{3} * 3!} - \frac{(α - 1) (α - 2) (α - 3)}{\sqrt{4} * 4!} + \dots]

(A21)

And:

F_{2} (α) = \sum_{k = 0}^{\infty} {(- 1)}^{k} \frac{(α - 1) (α - 2) \dots (α - k)}{(k + 1)! (k + 1)} = [1 - \frac{(α - 1)}{2 * 2!} + \frac{(α - 1) (α - 2)}{3 * 3!} - \frac{(α - 1) (α - 2) (α - 3)}{4 * 4!} + \dots]

(A22)

And:

F_{3} (α) = \frac{3}{4} \sqrt{π} \sum_{k = 0}^{\infty} {(- 1)}^{k} \frac{(α - 1) (α - 2) \dots (α - k)}{(k + 1)! {(k + 1)}^{\frac{3}{2}}} = \frac{3}{4} \sqrt{π} [1 - \frac{(α - 1)}{2 \sqrt{2} * 2!} + \frac{(α - 1) (α - 2)}{3 \sqrt{3} * 3!} - \frac{(α - 1) (α - 2) (α - 3)}{4 \sqrt{4} * 4!} + \dots]

(A23)

The above infinite series are particularly useful for calculating

F_{1} (α)

,

F_{2} (α)

and

F_{3} (α)

when

α

is an integer. In this case,

(α + 1)

-th terms onwards are all zero, and only the first

α

terms need to be included in the calculation.

Appendix B. Computation of Critical Values for Goodness-of-Fit Test of Generalized Rayleigh Distribution by Monte Carlo Simulations

The critical values

y_{γ}

and

u_{γ}

of the test statistics

A_{n}^{2}

and

V_{n}^{2}

can be computed as their percentage points using Monte Carlo simulations. Given the parameters

α

and

β

and sample size n, the numerical procedure consists of the following steps:

A set of random numbers of size n is generated in the interval $(0, 1)$ as values of cumulative distribution function (CDF). Equation (2) is used to back-calculate a sample $X_{1}, X_{2}, \dots, X_{n}$ of size n for given $α$ and $β$ .
For a sample $X_{1}, X_{2}, \dots, X_{n}$ , Equations (5) and (6), based on MLE method, are solved iteratively to estimate parameters $\hat{α}$ and $\hat{β}$ .
Parameters $\hat{α}$ and $\hat{β}$ are used in Equation (2) to calculate $z_{i} = F (x_{i})$ , with values in ascending order.
Equation (7) is used to calculate test statistics $A_{n}^{2}$ and $V_{n}^{2}$ , using values of $z_{i}$ calculated in step 3.
Steps 1. to 4. above are repeated to generate a sample for $A_{n}^{2}$ and $V_{n}^{2}$ .
The percentiles of $A_{n}^{2}$ and $V_{n}^{2}$ are calculated as critical values. The $(1 - γ)$ ^th percentile is taken as the critical value for the level of significance of γ.

For a set of critical value for given parameters

α

and

β

and sample size n, 250,000 Monte Carlo simulations are carried out to ensure the convergence to 2 digits. The simulated results are presented in Table A1, Figure A1, and Figure A2 below. In Figure A1 and Figure A2 results for both 200,000 and 250,000 simulations are shown, and the relative error is within 0.1%.

Table A1. Critical values

y_{γ}

and

u_{γ}

for various shape parameters

α

and sample sizes

n

.

Table A1. Critical values

y_{γ}

and

u_{γ}

for various shape parameters

α

and sample sizes

n

.

Shape Parameter α			1.0			1.5			2.0			2.5			3.0			3.5			4.0
Significance Level γ			10%	5%	1%	10%	5%	1%	10%	5%	1%	10%	5%	1%	10%	5%	1%	10%	5%	1%	10%	5%	1%
Sample Size n	100	$y_{γ}$	0.66	0.79	1.10	0.65	0.78	1.08	0.64	0.77	1.07	0.64	0.76	1.06	0.64	0.76	1.05	0.64	0.76	1.05	0.64	0.76	1.05
	100	$u_{γ}$	0.34	0.41	0.58	0.34	0.41	0.57	0.33	0.41	0.57	0.33	0.40	0.57	0.33	0.40	0.57	0.33	0.41	0.57	0.34	0.41	0.57
	200	$y_{γ}$	0.66	0.79	1.11	0.65	0.78	1.08	0.65	0.77	1.07	0.64	0.77	1.05	0.64	0.76	1.05	0.64	0.76	1.06	0.64	0.76	1.05
	200	$u_{γ}$	0.34	0.41	0.58	0.34	0.41	0.58	0.34	0.41	0.57	0.34	0.41	0.57	0.34	0.40	0.57	0.34	0.41	0.57	0.34	0.41	0.57
	300	$y_{γ}$	0.66	0.79	1.10	0.65	0.78	1.08	0.65	0.77	1.08	0.64	0.77	1.06	0.64	0.76	1.05	0.64	0.76	1.05	0.64	0.76	1.06
	300	$u_{γ}$	0.34	0.41	0.58	0.34	0.41	0.58	0.34	0.41	0.57	0.34	0.41	0.57	0.34	0.41	0.57	0.34	0.41	0.57	0.34	0.41	0.57
	400	$y_{γ}$	0.66	0.80	1.11	0.65	0.78	1.08	0.65	0.77	1.07	0.64	0.77	1.06	0.64	0.76	1.06	0.64	0.76	1.06	0.64	0.76	1.05
	400	$u_{γ}$	0.34	0.41	0.59	0.34	0.41	0.58	0.34	0.41	0.58	0.34	0.41	0.57	0.34	0.41	0.57	0.34	0.41	0.57	0.34	0.41	0.57
	500	$y_{γ}$	0.66	0.80	1.11	0.65	0.78	1.09	0.64	0.77	1.07	0.64	0.77	1.06	0.64	0.77	1.06	0.64	0.76	1.06	0.64	0.76	1.05
	500	$u_{γ}$	0.34	0.41	0.59	0.34	0.41	0.58	0.34	0.41	0.57	0.34	0.41	0.57	0.34	0.41	0.57	0.34	0.41	0.57	0.34	0.41	0.57

Figure A1. Critical values

y_{γ}

versus shape parameters

α

and sample sizes

n

with 200,000 and 250,000 simulations.

Figure A1. Critical values

y_{γ}

versus shape parameters

α

and sample sizes

n

with 200,000 and 250,000 simulations.

Figure A2. Critical values

u_{γ}

versus shape parameters

α

and sample sizes

n

with 200,000 and 250,000 simulations.

Figure A2. Critical values

u_{γ}

versus shape parameters

α

and sample sizes

n

with 200,000 and 250,000 simulations.

Appendix C. Determination of Minimal Sample Size for Estimation of Statistical Distribution Using Monte Carlo Simulations

It is of practical importance to estimate the minimal sample size necessary to achieve desired accuracy of estimation. To test whether a sample size of 100 is sufficient to yield an accurate estimation of statistical distribution, 500 sets of samples with size = 100 were randomly selected from the whole sample with size = 440, and for each set of selected samples, parameters

α

and

β

are computed empirically using Equation (11), and a corresponding GRD is generated.

For a typical simulation, the mean values of

α

and

β

, calculated by averaging results from 500 simulations, are within 4%, compared with values of

α

and

β

calculated based on the whole sample (size = 440). The maximum errors for

α

and

β

between a small-sample (size = 100) and the whole sample (size = 400) is about 10% and 5%, respectively. Figure A3 and Figure A4 show CDFs and PDFs from 10 simulations (randomly selected from a total of 500 simulations), compared with those calculated from the whole sample (size = 440). The reasonably good agreement between the results from samples with size = 100 and those from the whole sample (size = 440) indicates that a smaller sample size of 100 can generally produce a good estimation of the statistical distribution.

Figure A3. CDF simulated using subsets of sample with Size = 100.

Figure A4. PDF simulated using subsets of sample with Size = 100.

References

Haynes, B.W.; Law, S.L.; Barron, D.C.; Kramer, G.W.; Maeda, R.; Magyar, M. Pacific manganese nodules: Characterisation and processing. U.S. Geol. Surv. Bull 1985, 679, 44. [Google Scholar]
International Seabed Authority. A Geological Model of Polymetallic Nodule Deposits in the Clarion-Clipperton Fracture Zone; International Seabed Authority: Kingston, Jamaica, 2010. [Google Scholar]
Fouquet, Y.; Depauw, G. GEMONOD Polymetallic Nodules Resource Classification. In Proceedings of the Workshop on Polymetallic Nodule Resources Classification, Goa, India, 13–17 October 2014; International Seabed Authority: Kingston, Jamaica, 2014. [Google Scholar]
Lipton, I.; Nimmo, M.; Parianos, J. TOML Clarion Clipperton Zone Project, Pacific Ocean; AMC Consultants Pty Ltd.: Brisbane, Australia, 2016. [Google Scholar]
Lipton, I.; Nimmo, M.; Stevenson, I. NORI Area D Clarion Clipperton Zone Mineral Resource Estimate-Update; AMC Consultants Pty Ltd.: Brisbane, Australia, 2021. [Google Scholar]
Ruhlemann, C.; Kuhn, T.; Wiedicke, M.; Kasten, S.; Mewes, K.; Picard, A. Current Status of Manganese Nodule Exploration in the German Licence Area. In Proceedings of the Ninth (2011) ISOPE Ocean Mining Symposium, Maui, HI, USA, 19–24 June 2011; International Society of Offshore and Polar Engineers, Ed.; International Society of Offshore and Polar Engineers: Mountain View, CA, USA, 2011; pp. 19–24. [Google Scholar]
Yuzhmorgeologia. The concept of the Russian exploration area polymetallic nodules resource and reserve categorization. In Proceedings of the Workshop on Polymetallic Nodule Resources Classification, Goa, India, 13–17 October 2014; International Seabed Authority: Kingston, Jamaica, 2014. [Google Scholar]
Korea Institute of Ocean Science; Technology Status of Korea. Activities in Resource Assessment and Mining Technologies. In Proceedings of the Workshop on Polymetallic Nodule Resources Classification, Goa, India, 13–17 October 2014; International Seabed Authority: Kingston, Jamaica, 2014. [Google Scholar]
Deep Ocean Resources Development Co Ltd. Polymetallic Nodule Resources Evaluation—How we are doing. In Proceedings of the Workshop on Polymetallic Nodule Resources Classification, Goa, India, 13–17 October 2014; International Seabed Authority: Kingston, Jamaica, 2014. [Google Scholar]
Interoceanmetal Joint Organization. Activities of the IOM within the scope of geological exploration for polymetallic nodule resources. In Proceedings of the Workshop on Polymetallic Nodule Resources Classification, Goa, Indi, 13–17 October 2014; International Seabed Authority: Kingston, Jamaica, 2014. [Google Scholar]
International Seabed Authority. Secretary General Annual Report; International Seabed Authority: Kingston, Jamaica, 2020. [Google Scholar]
Kaufman, R. The Selection and Sizing of Tracts Comprisinq a Manganese Nodule Ore Body. In Proceedings of the All Days, Houston, TX, USA, 5–7 May 1974. [Google Scholar]
Felix, D. Some problems in making nodule abundance estimates from sea floor photographs. Mar. Min. 1980, 2, 293–302. [Google Scholar]
Schoening, T.; Gazis, I.-Z. Sizes, Weights and Volumes of poly-Metallic Nodules from Box Cores Taken during SONNE Cruises SO268/1 and SO268/2. Available online: https://doi.pangaea.de/10.1594/PANGAEA.904962 (accessed on 9 February 2021).
Ellefmo, S.L.; Kuhn, T. Application of Soft Data in Nodule Resource Estimation. Nat. Resour. Res. 2020, 30, 1069–1091. [Google Scholar] [CrossRef]
Mucha, J.; Wasilewska-Błaszczyk, M. Estimation Accuracy and Classification of Polymetallic Nodule Resources Based on Classical Sampling Supported by Seafloor Photography (Pacific Ocean, Clarion-Clipperton Fracture Zone, IOM Area). Minerals 2020, 10, 263. [Google Scholar] [CrossRef]
Parianos, J.; Lipton, I.; Nimmo, M. Aspects of Estimation and Reporting of Mineral Resources of Seabed Polymetallic Nodules: A Contemporaneous Case Study. Minerals 2021, 11, 200. [Google Scholar] [CrossRef]
Sharma, R. Computation of Nodule Abundance from Seabed Photos. In Proceedings of the Offshore Technology Conference, Houston, TX, USA, 1–4 May 1989; Offshore Technology Conference: Houston, TX, USA, 1989. [Google Scholar]
Park, S.-H.; Park, C.-W.; Kim, C.-W.; Kang, J.K.; Kim, K.-H. An Image Analysis Technique for Exploration of Manganese Nodules. Mar. Georesour. Geotechnol. 1999, 17, 371–386. [Google Scholar] [CrossRef]
Longuet-Higgins, M.S. On the statistical distribution of the heights of sea waves. J. Mar. Res. 1952, 11, 245–266. [Google Scholar]
Abd-Elfattah, A.M. Goodness of fit test for the generalized Rayleigh distribution with unknown parameters. J. Stat. Comput. Simul. 2011, 81, 357–366. [Google Scholar] [CrossRef]
Savitzky, A.; Golay, M.J.E. Smoothing and Differentiation of Data by Simplified Least Squares Procedures. Anal. Chem. 1964, 36, 1627–1639. [Google Scholar] [CrossRef]

Figure 1. Example towed seabed photo (a) and box-core sample (b). Mounds of clay-ooze without nodules in the seabed photo are caused by bioturbation.

Figure 2. An idealized nodule of ellipsoidal shape (a) and an example from the TOML C1 area (b).

Figure 3. PDF of Generalized Rayleigh Distribution for Varying Parameters α (a) and β (b).

Figure 4.

F_{1} (α)

,

F_{2} (α)

,

F_{3} (α)

and

G (α)

as functions of

α :

Numerical Integration vs. Empirical Formulas.

Figure 4.

F_{1} (α)

,

F_{2} (α)

,

F_{3} (α)

and

G (α)

as functions of

α :

Numerical Integration vs. Empirical Formulas.

Figure 5. Linear Regression between Nodule Dimensions and Weights for BGR East Region. Data source [14]. Note in (d) the legend shows the slopes and intercepts of the linear regressions of the data of Felix [13] and TOML [4], where the intercepts are converted from cm to mm (in logarithmic scale).

Figure 6. Linear Regression between Nodule Dimensions and Weights for GSR Central Region. Data source [14]. Note in (d) the legend shows the slopes and intercepts of the linear regressions of the data of Felix [13] and TOML [4], where the intercepts are converted from cm to mm (in logarithmic scale).

Figure 7. Probability Density Function (PDF) of the example towed seabed photograph (2015_08_29_131349 in Table 2 and Table 3).

Figure 8. Cumulative Distribution Function (CDF) of the example towed seabed photograph (2015_08_29_131349 in Table 2 and Table 3).

Figure 9. Sample datasets locations.

Figure 10. Comparisons of the measured Abundances and the predicted ones by the new empirical method with Dataset 1: (a) by empirical formula Equation (19); (b) by empirical formula Equation (19) with a linear adjustment; (c) by empirical formula Equation (25).

Figure 11. Comparisons of the measured Abundances and the predicted ones by the new empirical method with Dataset 2: (a) by empirical formula Equation (19); (b) by empirical formula Equation (19) with a linear adjustment; (c) by empirical formula Equation (25).

Figure 12. Comparisons of the measured Abundances and the predicted ones by the new empirical method with Dataset 3: (a) by empirical formula Equation (19); (b) by empirical formula Equation (19) with a linear adjustment; (c) by empirical formula Equation (25).

Figure 13. Comparisons of the measured Percentage Coverage and the predicted ones by the new empirical method for Datasets 2 in (a) and Dataset 3 in (b) with predictions by empirical formula Equation (15).

Figure 14. Seabed photos from TOML B, C and F. (a) TOML B, CCZ15-F02: 2015_08_17_032745 (b) F05: 2015_08_29_071000 (c) TOML F type 1 nodules CCZ15-B105 (d) TOML F type 2 nodules CCZ15- B99. Images in (a) and (b) are 2.4 m × 1.6 m in area. Trigger weight in images (c) and (d) is 28 cm × 16 cm long.

Table 1. Results of Linear Regression for Nodule Dimensions and Weights.

Case	Regression Parameters		Regions	Sample Size	Estimated Slope	Estimated Intercept	Coefficient of Determination (R²)
1	Minor Axis Y (Horizontal, mm)	Long Axis X (mm)	BGR East	1376	0.7350	0 (Forced)	97.52%
1	Minor Axis Y (Horizontal, mm)	Long Axis X (mm)	GSR Central	259	0.7618	0 (Forced)	97.63%
2	Minor Axis Z (Vertical, mm)	Long Axis X (mm)	BGR East	1376	0.4762	0 (Forced)	95.09%
2	Minor Axis Z (Vertical, mm)	Long Axis X (mm)	GSR Central	259	0.5389	0 (Forced)	96.83%
3	Weight (g)	Volume (cm³)	BGR East	99	1.9269	0 (Forced)	90.93%
3	Weight (g)	Volume (cm³)	GSR Central	No Data	/	/	/
4	Weight (Logarithmic, g)	Long Axis X (Logarithmic, mm)	BGR East	1376	2.5067	−2.6245	87.68%
4	Weight (Logarithmic, g)	Long Axis X (Logarithmic, mm)	GSR Central	259	2.7210	−2.9439	93.13%

Table 2. Summary of 9 Sets of Samples of Nodule Long Axes used for Goodness-of-Fit Test.

No	Sample ID	TOML Area	Type	Sample Size	Mean	Standard Deviation
1	2015_08_10_172643	B	Towed Photo	336	3.091	2.512
2	2015_08_10_220159	B	Towed Photo	153	7.767	2.387
3	2015_08_11_121357	B	Towed Photo	403	5.978	1.732
4	2015_08_29_131349	C	Towed Photo	440	5.425	1.995
5	2015_09_02_185307	C	Towed Photo	113	3.827	1.270
6	CCZ15-B51	D	Washed Sample	67	7.486	2.404
7	CCZ15-B102	F	Washed Sample	278	4.318	1.705
8	CCZ15-B106	F	Washed Sample	559	3.681	1.298
9	CCZ15-B110	F	Washed Sample	135	6.910	2.602

Table 3. Results of Goodness-of-Fit Test for 9 Sets of Samples of Nodule Long Axes.

No	Sample ID	α	β	$A_{n}^{2}$	$y_{γ}$	$V_{n}^{2}$	$u_{γ}$	Conclusions
1	2015_08_10_172643	0.623	0.211	9.957	0.833	5.436	0.43	Not Generalized Rayleigh
2	2015_08_10_220159	2.714	0.162	0.66	0.784	0.257	0.414	Generalized Rayleigh Dist. at 5% Level of Significance
3	2015_08_11_121357	3.598	0.226	0.69	0.770	0.226	0.410
4	2015_08_29_131349	1.965	0.210	0.541	0.777	0.305	0.408
5	2015_09_02_185307	2.690	0.327	0.159	0.789	0.076	0.419
6	CCZ15-B51	1.701	0.144	0.376	0.804	0.200	0.423
7	CCZ15-B102	1.396	0.243	1.435	0.790	0.605	0.412	Not Generalized Rayleigh
8	CCZ15-B106	2.410	0.321	0.738	0.778	0.314	0.410	Generalized Rayleigh Dist. at 5% Level of Significance
9	CCZ15-B110	1.890	0.171	0.361	0.791	0.192	0.418	Generalized Rayleigh Dist. at 5% Level of Significance

Table 4. Summary of 188 Sample Sets of Nodule Long Axes to be Analysed.

Data-Set	TOML Areas	Number of Samples	Comparative Data Type	Range of Measured Abundances	Range of Mean Long Axes *	Range of Coefficient of Variation *
1	B, C, D, F	2, 3, 7, 3	Washed sample weights	3.2 to 25.7 kg/m²	2.2 to 7.6 cm	0.23 to 0.86
2	F	11 for Type 1 9 for Type 2	Washed sample weights	1.2 to 21.3 3.3 to 29.1	2.2 to 3.9 2.6 to 9.2	0.28 to 0.45 0.28 to 0.72
3	B C	68 85	Long axis estimates on individual nodule images	0.03 to 31 0.01 to 18	1.6 to 7.8 1.5 to 6.1	0.24 to 0.96 0.25 to 0.83

* For datasets 1 and 2 long axes measured from grid photos of the nodules after collection, separation from the host clay-ooze and washing. For dataset 3 long axes measured from photos of the seabed as detailed in [4].

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yu, G.; Parianos, J. Empirical Application of Generalized Rayleigh Distribution for Mineral Resource Estimation of Seabed Polymetallic Nodules. Minerals 2021, 11, 449. https://doi.org/10.3390/min11050449

AMA Style

Yu G, Parianos J. Empirical Application of Generalized Rayleigh Distribution for Mineral Resource Estimation of Seabed Polymetallic Nodules. Minerals. 2021; 11(5):449. https://doi.org/10.3390/min11050449

Chicago/Turabian Style

Yu, Gordon, and John Parianos. 2021. "Empirical Application of Generalized Rayleigh Distribution for Mineral Resource Estimation of Seabed Polymetallic Nodules" Minerals 11, no. 5: 449. https://doi.org/10.3390/min11050449

APA Style

Yu, G., & Parianos, J. (2021). Empirical Application of Generalized Rayleigh Distribution for Mineral Resource Estimation of Seabed Polymetallic Nodules. Minerals, 11(5), 449. https://doi.org/10.3390/min11050449

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Empirical Application of Generalized Rayleigh Distribution for Mineral Resource Estimation of Seabed Polymetallic Nodules

Abstract

1. Introduction

2. Three Hypotheses for an Idealized Model of Seabed Polymetallic Nodules

3. Generalized Rayleigh Distribution (GRD) and the Traditional Method

3.1. Mean and Standard Deviation of the Generalized Rayleigh Distribution (GRD)

3.2. Test of Goodness-of-Fit of Generalized Rayleigh Distribution

3.2.1. Parameter Estimation by Maximum Likelihood Estimation (MLE)

3.2.2. The Anderson–Darling Test Statistics

3.2.3. The Test Criteria for Hypothesis

4. A New Empirical Method and Its Application to Nodule Resources

4.1. Empirical Estimation of Parameters of the Generalized Rayleigh Distribution

4.2. Resource Estimation for Seabed Polymetallic Nodules Using Coverage and Abundance

4.2.1. Prediction of Nodule Coverage: Idealized Nodules

4.2.2. Prediction Nodule Abundance I: Based on “Idealized Nodule” Model

4.2.3. Prediction Nodule Abundance II: With Empirical Long-Axis-Weight Relationship

4.2.4. Relation between Nodule Percentage Coverage and Abundance

5. Test of Hypotheses of the Idealized Nodule Model

5.1. Test of Hypothese 1 and 2: Linear Regression Analyses on Nodule Dimensions and Weights

5.2. Test of Hypothses 3: Goodness-of-Fit Test of Generalized Rayleigh Distribution for Nodule Long Axes

5.3. Comments on the Level of Support

6. Numerical Results of Nodule Resource Prediction Using the New Empirical Method

6.1. Sample Datasets

6.2. Prediction of Abundance of Seabed Nodules

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Functions F 1 ( α ) , F 2 ( α ) and F 3 ( α )

Appendix B. Computation of Critical Values for Goodness-of-Fit Test of Generalized Rayleigh Distribution by Monte Carlo Simulations

Appendix C. Determination of Minimal Sample Size for Estimation of Statistical Distribution Using Monte Carlo Simulations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Appendix A. Functions $F_{1} (α)$ , $F_{2} (α)$ and $F_{3} (α)$