Towards Age Determination of Southern King Crab ( Lithodes santolla ) Off Southern Chile Using Flexible Mixture Modeling

This study addresses the problem of age determination of the southern king crab (Lithodes santolla). Given that recapture is difficult for this species and, thus, age cannot be directly determined with the help of the annual marks on the shell, the von Bertalanffy growth function (vBGF) cannot be used to directly model length-frequency data (LFD). To determine age classes, some researchers have proposed using the MIX algorithm that consists of sampling realization of a finite mixture of normal (FMN) distributions for each LFD. However, normality assumption in age-length data has been questioned in several works related to fish growth analysis. For this study, we considered the biological information of the southern king crab for the period 2007–2015 and localization between 50◦06′–53◦15′ S and 76◦36′–72◦18′ W. We assumed that LFD could be modelled by the novel class of finite mixture of skew-t (FMST). Assigned age classes were used to estimate the vBGF parameters. The estimated vBGF parameters were L∞ = 176.756 cm, K = 0.151 year−1, t0 = −1.678 year for males, and L∞ = 134.799 cm, K = 0.220 year−1, t0 = −1.302 year for females. This study concludes that (a) FMST modal decomposition can detect a group of younger individuals at age 2, given that those individuals have LFD with a left heavy-tail and asymmetry; (b) FMST produces a better representation of LFD than the FMN model; (c) males have bigger L∞ but grow slower than females; and (d) as expected, a high correlation exists among the vBGF estimates.


Introduction
Exploitation of the southern king crab started in 1928 off the west coast of Tierra del Fuego (austral Chilean waters) [1].This fishery is crucial for the local economy as it represents the main fishing activity (80% of all fisheries) in the Magallanes Region [2].However, catches have also been reported from the city of Valdivia in the Los Ríos region (50 • 06 -55 • 59 S).The Servicio Nacional de Pesca (Chilean National Fishery Service) reported 6490 tons of landings during 2012, the highest value of landings in southern king crab fishery history [3].
Female southern king crabs reach maturity at 86.51 mm at L 50 (50% of female population).Vinuesa et al. [4] described that length-frequency data (LFD) aspects, such as molt frequency, decrease with age.Female southern king crabs molt six to seven times in the first year, four to five times in the second, and three times in their third year.From then on, females molt annually and start channeling energy toward gonad development.Estimates of von Bertalanffy [5] growth parameters and age composition for the southern king crab are a key issue in population dynamic models used for stock assessment [6].Southern king crabs grow mainly through the increment in size per molt and molting frequency over a period of time.Growth models consider those discontinuities in the growth process.Annual marked tagging is possible for this species; however, recapture proves difficult.
Many researchers found that growth models such as the von Bertalanffy growth function (vBGF) can be used to model LFD, especially for species that do not directly show ageing with annual marks on their shell.An early work by Roa and Tapia [7] considers the MIX algorithm [8], which consists of sampling realization of a mixture of probability distributions for each red squat lobster Pleuroncodes monodon LFD and the visual identification of modes.Parameter estimates of each mixture are obtained via maximum likelihood and assuming normally distributed year classes.Given the assigned groups, it is possible to consider a respective age for each mode and, therefore, an age group for each carapace-length distribution.This way, a growth function is fitted to this set of observed carapace length by unobserved age, being, for example, the von Bertalanffy [5], Richards [9], or Schnute functions [10,11].MULTIFAN is a more sophisticated method than MIX because the mean of assumed normal distribution is obtained directly from vBGF estimates, creating a more realistic log likelihood from LFD [12].MULTIFAN also identifies the ages as a nonobservable variable derived from the classification of a set of groups or modes from a typical mixture of normal densities.The number of mixture-components can be detected via the information criteria for model selection, such as Akaike's (AIC) or Bayesian (BIC) information criteria, or given by expert criteria [13].
Roa-Ureta [11] considers a more robust estimation method with multivariate LFD analysis.Beyond that, MULTIFAN does not accomplish regularity conditions for a likelihood ratio test and, because parameters are uncorrelated, Roa-Ureta's method accounts for the complete multivariate structure by considering each vector of mean-length estimates from each LFD set as the unitary observation value in Schnute's growth model, a more general interpretation of red squat lobster LFD.More recently, Yáñez et al. [3] studied the southern king crab Lithodes santolla LFD using methods of modal decomposition of LFD described in Reference [14].Both assumed that each modal component corresponds to one age group, and its identification is based on Finite Mixture of Normal (FMN) densities.In addition, they assumed different initial and fixed L ∞ parameters, and then estimated K and L 0 (the first observed carapace length in LFD).The best model was chosen according to AIC criteria.
However, normality assumption in age-length data has also been discussed in several investigations related to fish growth analysis.
For example, Contreras-Reyes et al.
[15], López Quintero et al. [16], and Contreras-Reyes et al. [17] considered skewed and heavy-tailed errors in the vBGF through the presence of extreme observations in southern blue whiting (Micromesistius australis) and pink cusk eel (Genypterus blacodes).See references therein for more examples of the use of skewed and heavy-tailed distributions in the vBGF model errors.To find the Kaplan-Meier survival estimates and growth in three pelagic larval stages from three populations in the Northwest Atlantic, Ouellet et al. [18] considers the skew-normal (SN) [19] distribution to account for asymmetry in survival data, where they detect higher values toward higher temperatures.
Typical distribution of carapace length for each cohort at recruitment is assumed normal by researchers, so the mixture models for each LFD are considered normal mixture-distributed, where each age component is an annual cohort [7,11].In this paper, we consider the proposed methodology [11] assuming that LFD could be modeled by the novel class of a finite mixture of skew-t (FMST) [20].Both classes provide some advantages over the FMN model.For instance, normal components can lead to wrong classification because they allow an arbitrarily close modeling of any distribution by increasing the number of groups of carapace lengths represented by asymmetrically distributed LFD [20].The FMST components capture both skewness and extreme carapace lengths due to their flexibility.Thus, this work considers the scale mixtures of SN (SMSN) family distributions oriented to finite mixtures, which includes as a particular case the FMST model.This model is used first to determine modes of southern king crab LFD on each year sample, and for individuals classified by sex (males and females).In a second stage, age determination using Roa-Ureta's procedure is considered based on previously obtained modes.In a third stage, vBGF is used for the age-length modeling determined in the second stage.

Methods
A finite-mixture model is a combination of two or more probability density functions (pdf), allowing to approximate any arbitrary distribution with the mixture of the most used normal ones [21].This capability has been used in several applications in fishery science, as well as catch-rate standardization [21] and age-length analysis [22].For the latter, the number of year classes composing each mixture of normal distribution is unknown given that age is a latent variable.This issue is treated as a model identification problem, where selection criteria are commonly used for this effect [11].
Regarding model selection, AIC and BIC criteria are used for selection in age-length models [11,15,22].BIC presents some advantages over AIC criteria: (a) BIC is more appropriate for large samples because it penalizes the log-likelihood function using sample size and number of parameters, and (b) AIC is less penalized by the number of parameters (parsimonious models) than BIC [15].
Below, we propose a mixture of flexible models to describe the LFD and determine the age of southern king crabs (Lithodes santolla) off Chile.

Finite Mixtures of Flexible Distributions
The pdf of an m-component mixture model with parameter vector set θ is where ß = (π 1 , . . ., π m ) is a vector of mixing weights π i , with π i ≥ 0, ∑ m i=1 π i = 1, and f (y; θ i ) a particular pdf depending on a flexibility degree that could be selected.In a general context, Basso et al. [23] considered SMSN distributions as a robust estimation method of finite components.Random variable Y follows the SMSN family if it can be written as where µ is a location parameter, κ(•) a positive weight function, U a random variable with distribution function H(•; ν) and density h(•; ν), ν is a scalar or vector parameter indexing the distribution of U and Z ∼ SN(0, σ 2 , η).From (2), the pdf of The generalization of skewed distributions (3) has been used in age-length modeling by Contreras-Reyes et al. [15] and López Quintero et al. [16], mainly through the following: (a) when U ∼ Gamma(ν/2, ν/2) in (3), ν > 0, we obtain the skew-t distribution (ST) [24] denoted by Z ∼ ST(ξ, σ 2 , η, ν) and with the pdf given by where z 0 = (z − ξ)/σ, t(z 0 ; ν) and T(z 0 ; ν) denote the pdf and cumulative density function (cdf) of the standard t distribution with ν degrees of freedom, respectively.ST distribution was introduced to achieve a higher degree of excess kurtosis produced by extreme observations.ST distribution converges to SN distribution as ν → ∞ and is the t distribution when η = 0; (b) The stochastic representation (2) of SMSN allows to determine the exact density of conditional distributions necessary for the ECME algorithm.That is, using Lemma 2 of Basso et al. [23], variable Y in ( 2) is represented conveniently by a hierarchical representation and, in this form, makes it possible to obtain the conditional maximization (CM) steps.
Considering the finite mixture of distributions (1) of SMSN distributions (FM-SMSN), we can determine the FM-SMSN model by using density f (y) for f (y; θ i ), with θ i = (ξ i , σ 2 i , η i , ν), i = 1, . . ., m, concerning the parameter ν of the mixing distribution H(•; ν), taking into account that ν is assumed for all components i = 1, . . ., m for computational convenience.Following Shertzer et al. [25], to capture the group membership of southern king crab i, we considered latent indicator variable Z i such that As is described in (c), Z appears in the hierarchical representation using Lemma 2 of Basso et al. [23] for parameter estimation via the ECME algorithm.
In this paper, we considered the FMST distribution as a robust method for components estimation and its log-likelihood function that generalized the information proportioned by FMN and FMSN distributions.In (1), we considered the parameter vector set θ = (ξ, Σ, η, ν), where ξ, Σ and η are defined as in a); ν = (ν 1 , . . ., ν m ) is a set of m degree of freedom parameters, and f (y

Growth Modeling
The vBGF model [5] explains the carapace length of one individual in terms of its age by means of nonlinear function which also depends on three parameters: L ∞ (cm), the asymptotic carapace length of the species; K (year −1 ), the growth rate coefficient; and t 0 (year)-age at zero carapace length.To fit Model (5) from an empirical dataset, (y i , x i ), i = 1, ..., n, the vBGF model can be described in terms of an additive nonlinear regression, y i = L i + ε i , where y i is the ith observed carapace length at age x i , L i = L(x i ), and ε i are independent and identically distributed (iid) N(0, σ 2 ) random errors [26].
The vBGF parameters are estimated from an observed age-length pair (x i , L(x i )), i = 1, . . ., n, where L(x i ) is the ith observed carapace length at age x i .

Implementation
Methods described in Sections 2.1 and 2.2 are implemented as follows: The cluster means are ordered and grouped into cohorts, so that no year is repeated within each group."Premise II" of Reference [11] is considered as a criterion to determine the cohort point between age classes; textually, this is: strong assumption that each year no more than one cohort (means that it is not possible that two mean carapace lengths with the same year index fall into the same age class) and no less than one cohort (means that it is not possible that two consecutive mean carapace lengths with the same year index fall into two nonconsecutive age classes) enters the population.Given that age classes 0 and 1 include individuals with molt frequency decreasing with age (six to seven molts in the first and four to five in the second year), we opted to label the first group with Year '2' and then estimate the vBGF parameters.3. vBGF model.Given the estimated year in Step 2, the formed age-length data serve to evaluate the vBGF (5) growth function.

Data
The data analyzed in this study correspond to biological information of the southern king crab for the 2007-2015 period and localization 50 • 06 -53 • 15 S and 76 • 36 -72 • 18 W. Observations localized between 54 • 13 -55 • 59 S and 69 • 40 -66 • 41 W for the 2007-2010 period were not considered because they present too few individuals and years (see Table 1).Figure 1 shows the spatial distribution of both groups.Northern individuals are concentrated off the city of Puerto Natales, while southern individuals are concentrated off the town of Cabo de Hornos (on the border with Ushuaia, Argentina).The individuals were differentiated by sex (males and females), and, for each individual, carapace lengths from 14 mm (males) to 212 mm (females) were considered.

Computational Aspects
All models and summaries in this article were estimated with free software environment for statistical computing and graphics R [27].All computational estimations was made under Linux v. 4.15 and MacOS v. 10.13 operating systems.Particularly, to estimate the mixture of distributions, we used the mixsmsn package, developed by Prates et al. [28].The mixsmsn package considers the Expectation-Maximization algorithm [29] for FMST modal decomposition.For the vBGF estimates and initial values, the nls and FSA packages were used, respectively.More details appear in Reference [30].

Results
Table 1 provides descriptive statistics by sex and year.Included are minimum, lower quantile, median, mode, mean, upper quantile, maximum, and standard deviations.Our main interest was on the growth-pattern differences between males and females for northern individuals, as explained in Section 3.1.At first sight, males are larger than females considering the range of their carapace lengths.
In this section, the methodology described in Section 2 is applied to southern king crab LFD by year and sex.As each step is done for each combination, some verbose repetition is anticipated in the subsections.In addition, FMST models were ran with m = 2, . . ., 9 (the most simple case of m = 1 was omitted) and also the respective BIC for each combination.2).The BIC values of FMST were also smaller than FMN ones for all years.We could observe that the smallest and largest means were 58.301 and 152.129 mm for 2009 and 2014, respectively (Table 3).In 2009, the smallest mean was provided, which produces a left heavy-tail (Figure 2).This mean represents a mode of male juvenile southern king crab from the northern group that would be helpful in the assignment of the first age class.In general, the FMST model provides good fits of annual LFD and well-separated cohort groups (see panels of Figure 2).Results of 10 assigned age classes are provided in the bar chart of Figure 3a.Assigned age classes are posteriorly used as age-length data for vBGF modeling (Table 4, Figure 3b).We observed that all vBGF parameters were significant and correlated with L ∞ , and K is the highest among the three vBGF parameters.For all cases, high and negative correlations between L ∞ and K, and L ∞ and t 0 , respectively, and high and positive correlation between K and t 0 .We can see that t 0 is negative and close to −1.7, becoming relevant for younger individuals, as illustrated in Figure 3b.The distance of modes related to the third is more pronounced than in the fourth age class, except for the largest mean (81.542 mm; see also Figure 3a).The means of the oldest, the 10th, age class are close to the previous, the ninth, age class.In general, vBGF fitted in the middle of all modes in each age class (Figure 3b), as observed in the residuals plot of Figure 3c.Generally, residual values are concentrated at approximately 3 mm and indicate a constant trend, thereby suggesting uncorrelated observations.q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0 50 100 150 Age group (years) Carapace length (mm) b) q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0 5 10 15 Age group (years) Carapace length residuals (mm)

Females
In Table 3, we observed that the FMST model provides two means for 2007 and 2012, three means for 2008, 2010, and 2013, four means for 2011, eight for 2014, and nine means for 2015; all obtained considering BIC criteria (Table 2).For males, the BIC values of FMST were smaller than those obtained by the FMN model and for all years.We could observe that the smallest and largest means were 60.367 and 137.902 mm for 2009 and 2014, respectively.In 2009, the smallest mean was provided, which produces a left heavy-tail (Figure 4).This mean represents a mode of female juvenile southern king crab that help in the assignment of the first age class.In general, the FMST model provides good fits of annual LFD and well-separated cohort groups (see panels of Figure 4).Results of assigned age class are provided in the bar chart in Figure 5a.Therefore, LFD were assigned 11 age classes, where the ninth and 11th age classes (oldest individuals) had the smallest number of modes.This information is used as age-length data for vBGF modeling in Figure 5b.vBGF estimates appear in Table 4 and are all significant.For males, high and negative correlations between L ∞ and K, and L ∞ and t 0 , and high and positive correlation between K and t 0 .We can see that t 0 is negative and close to −1, taking relevance for younger female individuals (Figure 5b).The vBGF model fit is illustrated in Figure 5b where a clear distance of modes is observed for the third with respect to the fourth age class.The mode of the oldest, the 11th, age class is significantly distant from the previous, the 10th, age class.In contrast to the vBGF fit for males, the vBGF in females was not fitted in the middle of all modes in each age class (seventh-ninth and 11th).However, as can be seen in the residuals plot in Figure 5c, residual values are concentrated at approximately 3 mm and generally indicate a constant trend and uncorrelated observations, except for younger individuals (second age class) and older individuals (7-11th age classes).q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0 50 100 150 Age group (years) Carapace length (mm)

b)
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0 2 4 6 8 10 12 Age group (years) Carapace length residuals (mm)

Discussion
In this study, we addressed the age determination and flexible mixture modeling for the southern king crab off southern Chile.This study mainly suggests that (a) FMST modal decomposition can detect a group of younger individuals at age 2, given that those individuals have LFD with a left heavy-tail and asymmetry; (b) based on BIC values, FMST produces a better representation of LFD than the FMN model, thus inducing more realistic vBGF estimates; (c) males are larger (biggest L ∞ ) but grow slower than females; and (d) as expected, high correlation exists among the vBGF estimates.
High molting frequency found in age group 2 explains the better fit of the FMST model, while the FMN model produces a worse fit to the LFD.In the southern king crab, females concentrate their energy on the reproduction process instead of somatic growth (evolutionary strategy), which explains why they grow faster than males trying to reach maturity sooner.Egg production is limited by body size and, therefore, it is an advantage for females grow faster, reach sexually maturity/capacity, and improve survival and competition.In addition, there is often considerable variation in molt increment based on foraging success prior to molt, and the resulting energetic state at the time of molt.Indeed, results assume that differences in size reflect different cohorts.However, it is entirely possible that some of these differences reflect age-independent size variation due to differences in molt increment.High vBGF parameter correlations, especially the negative one between L ∞ and K parameters, are in line with somatic growth and the reproductive process mentioned above.
Yáñez et al. [3] (see  [14] having the following disadvantages: (i) number of classes is not adequate for each combination from the data used in this work and is arbitrary because LFD changes through the years; and (ii) it does not allow to directly determine t 0 estimate, which is found by replacing L ∞ and K in vBGF but standard deviation cannot be determined.
The differences in the estimates of our study and those of Reference [3] can be interpreted in light of our novel incorporation of a wide range of carapace lengths and a larger sample size (Table 1).The approach developed here is more accurate in terms of error description in distribution, as we found a greater presence of extreme values and variability of length-at-age data [15,16].The comparison of the growth curves by selection criteria generated different growth curves between the sexes of the southern king crab.
For the combinations between the sexes analyzed here, we obtained good fits of vBGF on LFD.However, for females we observed difficulty for vBGF to fit in all assigned ages.Such errors are common; for example, Roa-Ureta [11] had problems with estimation and therefore preferred Schnute's curve instead of a vBGF curve.In some cases, given that the southern king crab grows quickly at first (1-3 years) and then slowly, perhaps a more flexible growth model could be implemented to realize change in age at maturity.Indeed, Ohnishi et al. [31] proposed a variant of vBGF by including two additional parameters: discontinuous change in age at maturity, t m (year), and growth rate coefficient post-age at maturity (change of growth rate), and υ (year −1 ), to define the function and to be inserted in (5) as L(t) = L ∞ (1 − e −KT(t) ), with t = x i (year) as the assigned age.This model represents the time delay to attain a certain body size in t ≥ t m due to change in energy allocation.Consequently, the growth curve becomes biphasic, combining two independent vBGFs.However, inferential aspects must be addressed to biphasic vBGF such as maximum likelihood function and Fisher information matrix derivation.
To determine the start of the age group, a balance was established between computational stability for growth-function fit and LFD modes.We assumed at least two years as the start of the age group because molt frequency is highest in this period [4].Thereby, vBGF residuals are uncorrelated and with a constant trend among the assigned ages.
The proposed methods for age determination in the southern king crab crucially depend on the available LFD.In the case that all growth stages are not well-represented in the LFDs, results of FMST modal decomposition produce a misspecification of the assigned age with respect to real age.The criteria used by Reference [11] to determine the cohort point between age classes are very sensible in vBGF estimates.Depending on sample success, it is easy to consider instances where the sample is missing a cohort representative from one or more years.In addition, vBGF estimates are crucial for the study of stock-assessment models [2,3], but are affected by gear selectivity because it produces censored samples when young individuals are missing.Indeed, a lack of comprehensive age information leads to poor understanding of life history schedules, difficulty in the estimation of vBGF parameters necessary for modeling population dynamics and uncertainty.Therefore, the natural path for age-length modeling is direct age determination by growth band counts in the southern king crab [32].
Our proposed method allows one to obtain an estimate of vBGF parameters from a mixture of distributions, but further research about a direct relationship among vBGF estimates and the observed maximum age and carapace length is necessary.Unfortunately, actual relationships are related to fish resources [33].

4. 1 .
Males BIC criteria for the FMST models provided two means for 2007; three means for 2010, 2012, and 2013; five means for 2008, 2009, and 2011; six means for 2014, and eight means for 2015 (Table

Figure 3 .
Figure 3. (a) Barplot of assigned age from FMST modal decomposition, (b) vBGF model fits for estimated age-length, and (c) its respective absolute residuals for male southern king crabs.

Figure 5 .
Figure 5. (a) Barplot of assigned age from FMST modal decomposition, (b) vBGF model fits for estimated age-length, and (c) its respective absolute residuals for female southern king crabs.

1 .
Moda16]ecomposition.The FMST model was carried out for southern king crab LFD with ν degrees of freedom and m components by zone, year, and sex.For example, degrees of freedom ν = 5 indicate a high presence of heavy-tails in LFD[15,16].The order of m depends on the reported BIC for each combination, where the 'best' model for each m is selected through the smallest BIC. 2. Age-class assignment.From Step 1, take into account that m modes provide m modes used for age-class determination, which is at least m.Considering the classified carapace lengths, a bar chart of means is built where gray and white colors alternate and represent classified age classes.

Table 1 .
Descriptive statistics for the southern king crab grouped by sex and year.LQ and UQ stand for Lower Quantile and Upper Quantile, respectively.

Table 2 .
Bayesian information criterion (BIC) values of the finite mixture of normal (FMN) and finite mixture of skew (FMST) fitted models (m = 2, . . ., 9) for each specification (sex and year).The smallest values for each model and year are marked in bold.

Table 3 .
Estimate modes for the southern king crab using the FMST model for each sex and year.

Table 4 .
Von Bertalanffy growth function (vBGF) estimates, standard error (SE), Student (t) value, and p-value, Pr(> |t|), for the southern king crab for each sex.Estimated correlations between vBGF parameters are in the last three columns.