1. Introduction
There has been increasing interest in culturing microalgae at a large scale for producing food, pharmaceutical products, and biodiesel [
1]. More specifically, it was shown that some microalgae can provide polysaccharides, vitamins,
-carotene, long-chain polyunsaturated fatty acids, and antioxidants [
2,
3]. In fact, some microalgal species are used as aquafeed for molluscs, crustaceans, and fish [
4]. Recently, the cultivation of microalgae as an alternative crop was proposed [
5], as they can provide nutritious food for human. Furthermore, some algal microorganisms that can be used for producing antibiotics, toxins, pigments, plant growth regulators, and bioactive compounds were also discussed [
6]. In terms of environmental protection, there are microalgae, including
Chlamydomonas reinhardtii and
Chlorella vulgaris, used for cleaning polluted water [
7]. Bioethanol components in some microalgae, such as
Desmodesmus sp., are employed to produce biodiesel [
8] as up to 70% of those microalgae’s dry mass are hydrocarbons.
In practice, microalgae can be cultured in either raceways, open ponds, or closed cultivation systems, such as photobioreactors. While cultivating microalgal species in an open pond has a low cost and can provide flexible scalability, a closed system of culturing microalgae is more efficient in terms of controlling the growth rate and biomass productivity [
1], which play important roles in cases where the microalgae are considered as commercial products. For example, a 40 liter closed system for culturing flagellates to concentrate the microalgal biomass was proposed in [
9]. Two types of microalgae for larval fish,
Tetraselmis suecica and Brachionus plicatilis, can be continuously cultivated in a customized cultivation system designed and built by Sananurak et al. in [
10]. Naumann et al. developed a twin-layer solid-state bioreactor [
11] for growing four types of microalgae, such as
Isochrysis sp.
T.ISO,
Tetraselmis suecica,
Phaeodactylum tricornutum, and
Nannochloropsis sp., which can be utilized as live feeds in hatcheries. It is noted that a closed cultivation system allows growers to easily monitor crucial information of microalgae over time, which they can then employ to implement a closed loop control mechanism to automatically and efficiently operate all of the cultivation procedures [
12]. For instance, one optimized up-scaled bioreactor for cultivating microalgal strain
S. platensis was proposed in [
13], where environmental parameters such as the pH condition, liquid level, and temperature during the culturing operations are remotely monitored through a smartphone app. Some discussions about employing advanced technologies including the Internet of Things or machine learning in intelligently farming microalgae were also presented in a recent work [
14]. Nevertheless, one of the critical information normally required to be monitored in culturing microalgae, particularly in real time, is their density [
6], since it allows growers to optimally control both nutrients and cultivating conditions.
The density of microalgae can be defined by the number of microalgal cells per mL [
2,
15]. In fact, an accurate number of algal cells can be obtained by carefully examining an aliquot through a microscope [
16]. However, this process is tedious and time consuming, and particularly impractical in closed cultivation systems if an automatic control strategy is implemented. Thus, to efficiently and practically estimate the microalgal density, some estimation methods have been proposed [
17]. For instance, the authors of the work [
18] proposed to measure the oxygen levels of microalgae generated during their photosynthesis in a closed photobioreactor and analyze the data to indirectly estimate their density. Zhou et al. in [
19] also relied on photosynthesis but exploited an in situ optical device to directly examine microalgae’s in vivo synthesis quantity to compute a density value. In a similar manner, the work [
20] exploited another optical meter based on the spectrophotometric and fluorimetric principles to measure the turbidity of microalgae in a photobioreactor. The optical density indicator results were then used to calculate the concentration of those algal microorganisms.
Another class of methods proposed to monitor microalgal density is based on image processing. Thanks to low-cost camera sensors, image-based approaches are widely used [
2,
6,
16,
21,
22,
23,
24]. Specifically, images of microalgae can be captured by a camera and then analyzed by an image processing technique, where extracted information can be utilized to calculate the density of microalgae presenting in the images. An advantage of these methods is that they are less invasive, nondestructive, and more biosecure. In the work [
6], Jung et al. proposed to employ a camera sensor to digitally capture a top view of a microalgal photobioreactor and then analyze light distribution profiles in the captured images to indirectly quantify the cell density. Uyar in their work [
22] developed a mechanism that relies only on a blue color channel of microalgal images to predict the algal cell concentration. In contrast, in our previous work [
16], when taking
Chlorella vulgaris microalgae into consideration, we discovered that the information carried by a blue channel in digitized images is insignificant. We then proposed to use only averages of red and green color channels of captured images for estimating the density of algal species. By considering all three red, green, and blue (RGB) color channels of captured images, Winata et al., in the work [
24], first normalized them separately and then converted normalized RGB images to grayscale ones, where the microalgal concentration can be quantified.
So far, to the best of our knowledge, in most image-based microalgal density estimation methods, pixel values of measured images are averaged, which are then correlated with known density values in a regression model to estimate the concentration of microalgae in a new image. For instance, in [
2,
6,
23,
24], the authors, by exploiting different ratios of the red, green, and blue color components, converted RGB color images to grayscale ones. Each grayscale image was then averaged to a scalar value to input into a linear regression model. In our previous work [
16], we averaged the red and green color channels of each measured RGB color image as inputs of a nonlinear Gaussian process model for prediction.
Since averaging pixel values in digitized images may not provide rich information of microalgae presenting in measured images, in this work, we investigated more advanced features from images as inputs of a regression model. As can be seen in the previous works [
16,
24] and in
Figure 1 in the following discussion, since the microalgal image data are quite uniform, analyzing the textures of images is critical for understanding microalgal density information. In fact, the texture of an image is defined by the spatial relationship of pixel values or the variation in color intensity within the image [
25]. Once the image texture is analyzed, information of the spatial distribution of pixel values in a given image can be quantified. Thus, in this work, we propose exploiting image texture characteristics as features to be input into a regression model for estimating the microalgal density. In addition to averages of pixel values, the image texture features considered in this work include confidence intervals of means of pixel values, powers of spatial frequencies presenting in images, and entropies accounting for pixel distribution. These diverse features can provide more information of the microalgae density, which can lead to more accurate estimation results.
On the other hand, since the relationship between the image features and microalgal densities can be linear or nonlinear, we propose to exploit an L1-regularization-based approach called least absolute shrinkage and selection operator (LASSO) [
26,
27] as a regression model. Fundamentally, regularization can help a regression model avoid the over-fitting of data [
28]. Furthermore, coefficients of the features in the model can be optimized in a way that prefers the more informative features, which results in a well-fit model to the data. Eventually, the microalgal density can be more accurately estimated. In order to evaluate the effectiveness of the proposed approach, we implemented it in a dataset collected by a sensing system [
29] monitoring the
Chlorella vulgaris microalgae strain. The obtained results demonstrate the outperformance of the proposed method compared with several existing methods.
The remainder of the paper is arranged as follows.
Section 2 presents how a dataset of the
Chlorella vulgaris microalgae can be gathered by using our camera-based algae monitoring system [
16]. In
Section 3, we discuss how to design features of microalgal images by analyzing their textures. The LASSO-based regression model is introduced in
Section 4, where a model for estimating the density of microalgae given their image features is derived. We then extensively discuss possible combinations of image features as inputs and corresponding results as outputs of the estimation model in
Section 5, where the effectiveness of the proposed approach is verified in real-world experiments. Some conclusions of the discussion are drawn in
Section 6.
5. Experimental Result Analysis
Given the LASSO estimation approach, we now discuss how to apply the features introduced in
Section 3 to efficiently estimate the density of microalgae given their images. Since there are multiple features being extracted from an image, multiple scenarios of combining different features can be investigated. It is noticed that not all features can provide good information of microalgae: different combinations of different features may lead to different estimation results. Therefore, in this work, we consider the typical combinations from four sets of the features presented in
Section 3 and analyze the results to derive the best combination of the features, which can lead to the most accurate estimation of microalgal density, at least with our dataset. The procedure can be easily extended to any other microalgae datasets.
In order to validate the estimation accuracy in each scenario, we randomly partitioned our 129 data samples, including 129 color images and 129 ground truth density values, into two subsets. The first subset of 100 samples was used for training, and the second subset of 29 samples was used for testing. The validation framework is summarized in
Figure 9. Since selecting training and testing data in each validation is random, to statistically verify the effectiveness of the proposed approach, in each scenario, we repeated the implementation 1000 times. At each run, we computed a root mean square error (RMSE) between the estimated density results and the ground truth in the testing subset. The 1000 root mean square errors (RMSEs) in each scenario were then condensed into two statistical parameters of mean and standard deviation (std) for comparisons among the combination scenarios.
5.1. Single Feature Combinations
In the first evaluation, we consider four scenarios where each scenario contains a single type of feature as discussed in
Section 3. The combinations are as follows:
S1: Features of average values of all three color channels: R, G and B.
S2: Six interval features: , , , , , and .
S3: Three spatial frequency features: , , and .
S4: Three entropy features: , , and .
The evaluation results presented by the mean and std values obtained in these four scenarios are summarized in
Table 2.
It can be seen from
Table 2 that, in the first three scenarios, the estimation results are quite comparable. Nevertheless, in the fourth scenario, the estimated density results are highly inaccurate. It seems that the entropy features do not carry much information about microalgae in the images.
5.2. Two-Feature Combinations
If a single type of feature does not provide a good estimation of the microalgal density, combining two types of features in a learning model may enrich the information of microalgae, which can lead to better prediction results. There are six possible combinations of two types of features from four sets of the features that we discuss in this work. Let us examine those scenarios as follows:
S5: Features of three color channel averages and three entropies: R, G, B, , , and .
S6: Features of six interval bounds and three entropies: , , , , , , , , and .
S7: Features of three color channel averages and three spatial frequency powers: R, G, B, , , and .
S8: Features of three color channel averages and six interval bounds: R, G, B, , , , , , and .
S9: Features of three spatial frequency powers and three entropies: , , , , , and .
S10: Features of six interval bounds and three spatial frequency powers: , , , , , , , , and .
The validation results in these six scenarios of two types of the features combinations are tabulated in
Table 3. Overall, the estimated results obtained by combining two types of the features are better than those obtained by combining a single type of the features. On the other hand, whenever the entropy appears in the feature sets, the prediction results are slightly better than the others; that is, under mixture with other features, the entropy can provide rich information of the microalgae.
5.3. Three and Four-Feature Combinations
We now consider scenarios of combining three or four types of features to see whether a greater variety in the types of image features involved in one learning process can result in a more accurate prediction of the microalgal density. There are five scenarios when combining three or four types of image texture features, which are as follows:
S11: Features of three color channel averages, six interval bounds, and three entropies: R, G, B, , , , , , , , , and .
S12: Features of three color channel averages, six interval bounds, and three spatial frequency powers: R, G, B, , , , , , , , , and .
S13: Features of three color channel averages, three entropies, and three spatial frequency powers: R, G, B, , , , , , and .
S14: Features of six interval bounds, three entropies, and three spatial frequency powers: , , , , , , , , , , , and .
S15: All four types of the features as discussed in
Section 3:
R,
G,
B,
,
,
,
,
,
,
,
,
,
,
, and
.
Similar to the previous considerations, the mean and std results obtained in these five scenarios are also summarized in
Table 4 for comparison. It can be seen that combining more than two types of image features for estimating the microalgal density does not enhance the estimation results compared with those in the cases where two types of features are used, as demonstrated in
Table 3. Therefore, we propose to employ only two types of features in the training and prediction processes to reduce the computational complexity.
5.4. Higher-Order and Nonlinear Entropy Features
We now propose to create new features from the existing ones extracted from images, e.g.,
or
, and incorporate them into the input of the training data. Manipulating the existing features to higher-order or nonlinear features may weigh the LASSO model in a different manner, which can lead to a better estimation. Out of the four types of features, we picked the entropy for the manipulation. Though the entropy features, when standing alone, do not provide good estimation results, as demonstrated in
Section 5.1, when combined with other types of features, they could lead to an improvement in the estimation accuracy as discussed in
Section 5.2. By using the higher-order and nonlinear entropy features, we generated six combination scenarios for discussion as follows:
S16: Features including R, G, B, , , , , , , , , and .
S17: Features including R, G, B, , , , , , and .
S18: Features including , , , , , and .
S19: Features including R, G, B, , , , , , , , , and .
S20: Features including R, G, B, , , , , , , , , and .
S21: Features including R, G, B, , , , , , , , , , and .
After running six combination scenarios using the high-order and nonlinear features, the obtained results are tabulated in
Table 5 for further analysis. Compared with the results in the previous discussion, the estimation results in five out of these six scenarios are more accurate.
In order to demonstrate why the entropy is used to create high-order and nonlinear features, in scenario S16, we exploited the second order of both the color channel averages and entropies (
,
,
,
,
, and
) in the LASSO model. However, in scenario S17, we dropped the features
,
, and
. The results in both of the scenarios that can be found in
Table 5 are consistent; that is, the features
,
, and
do not add more information of microalgae to the model. To reduce the computational complexity, we will not incorporate
,
, and
in further consideration.
In scenario S18, we even dropped all of the features relating to the color channel averages. As can be seen in
Table 5, the obtained results get worse. In other words, the features
R,
G, and
B should remain in the training dataset. We then extended the entropy features to the third order in scenario S19. Nonetheless, the results are not better than those in S17, which empirically proves that the third-order features are insignificant.
Moreover, we considered a multiplication interaction between two features. For instance, in scenario S20, we employed three new features created by the entropies, including
,
, and
. The prediction results obtained in
Table 5 show a significant improvement in the accuracy compared with all of the others. We then added the multiplication interaction among three features, such as
in scenario S21. Nonetheless, the obtained results were not enhanced compared with those in S20, though the computation was more complicated. Hence, we accepted scenario S20 as it can provide an acceptable estimation accuracy in our application [
16].
It is noted that the procedure that we discuss in this section can be easily extended to other microalgal density estimation applications, where new features can be created and combined with others as the input of the LASSO model to ameliorate the estimation accuracy to an expected level. However, the trade-off between the accuracy and computational complexity should be practically considered.
5.5. Estimation Results
We now take the features combined in scenario S20 as the input of the LASSO model and examine the performance of the proposed approach in our dataset, along with that of the others, including the Gaussian process (GP)-based algorithm [
16] and GS2-based technique [
2].
Let us consider one example of the training and testing data sets, where 100 image samples and 100 ground truth (GT) values of microalgal density in the training data were utilized to train three models including LASSO, GP and GS2. Given 29 images in the testing data, the trained models were then employ to estimate densities of the corresponding microalgae. The estimation results are compared with the GT in the testing data, as demonstrated in
Figure 10. In an ideal case of absolutely accurate estimation, the estimation results should lie on the GT line (e.g., the blue line in
Figure 10). In other words, if the estimated density values stay further from the GT line, the estimation method is less accurate. As can be seen in
Figure 10, the proposed LASSO approach outperforms both the GP and GS2-based algorithms in the example.
In order to statistically conclude the outperformance of the LASSO estimation method, we repeated the implementation of three techniques 1000 times, where, in each implementation, the training and testing data sets were randomly selected. RMSEs between the estimated density results and the GT in the corresponding testing data set were also computed in 1000 implementations. The RMSEs results obtained by three methods are summarized by boxplots as illustrated in
Figure 11. It can be clearly seen that the results obtained by the LASSO approach are more accurate than those obtained by the GP and GS2 techniques. More specifically, the mean and interquartile range of the estimation RMSEs in the LASSO implementations are 1.54 and 0.28, whereas those in the GP and GS2 experiments are 2.16 and 0.43, and 3.68 and 0.55, respectively.