Classification of Exoplanetary Light Curves Using Artificial Intelligence
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe authors present a study on Exploring lightning in exoplanetary envoronments and its implications for the search of biosignatures using AI. The results are sound and well presented.
The introduction part has a thorough literature review and the logic of the whole paper are clear.
The only issue I see is the readibility of the figure 4 - 11 are pretty low. I suggest to consider a better color usage and format to present the results.
Author Response
1.- The authors present a study on Exploring lightning in exoplanetary envoronments and its implications for the search of biosignatures using AI. The results are sound and well presented. The introduction part has a thorough literature review and the logic of the whole paper are clear. The only issue I see is the readibility of the figure 4 - 11 are pretty low. I suggest to consider a better color usage and format to present the results.
Response1. The colors of Figures 4 to 11 have been changed to blue, with representative points highlighted in red. The font size has also been increased.
Reviewer 2 Report
Comments and Suggestions for AuthorsDear authors
You can see my report in the PDF file
Comments for author File: Comments.pdf
Author Response
- The paper attempts to use artificial intelligence to explore lightning in exoplanetary environments. However, the word 'lightning' is never used again in the paper. As I understand, the paper aims to derive a classification of the Kepler light curves. Therefore, it seems that the title of the paper is not connected to its content.
A1: We have changed the title to: "Classification of Exoplanetary Light Curves Using Artificial Intelligence.
- In Table 1, there are references that are not mentioned in the reference list, such as (Olmedo, Bass).
Additionally, some papers in the reference list are not mentioned within the text, and there are others with missing titles.
A2. Olmedo and Bass have been included in lines 319, and any references that were not used or mentioned in the text have been removed. The missing citations have been added on lines 297 to 331.
- In several lines, there are characters such as '??' referring to some tables, for example, in lines 227 and 234, which makes the text difficult to understand. Additionally, Tables 2 and 3 contain exactly the same data.
A3: The incomplete references to tables have been corrected, and Table 3 has been replaced.
- Figures 3 to 11 are unclear. The X-axis, which represents time, and the Y-axis, which represents the NUV flux, do not include any units. The scale of the time axis consistently ranges from 0 to 10, yet line 139 mentions a timescale of 27.4 days. Additionally, in the same figures, there are continuous lines that occasionally return from the end of the time scale back to the beginning, as seen in Figures 4 to 7.
A4: Figures 3 to 11 have been re-plotted to include units of measurement. The X-axis now shows the NUV flux, ranging from 0 to 9 in magnitude, while the Y-axis displays the final observation time.
5.-In the ANN analysis, we can see the training sets and results in Table 4; however, the test datasets, which assess the ability of an ANN to predict results without overfitting, are not included. The type of activation functions used for the hidden nodes is also missing. Additionally, there is no description of the inputs for equations (1) and (2), and Figure 12 lacks descriptions for the X- and Y-axes.
A5: Yes, it is included, but only a representative example is shown in Figure 14(c), where the histogram illustrates how an error of 0.2167 is achieved. The training set is displayed in blue, the validation set in green, and the test set in red within the cross-validation process. Activation functions for the hidden nodes are included, and information from equations (1) and (2) is added. The description of Figure 12 includes the names of the X and Y axes. On the X-axis, the 760 stars from the collection are displayed, and on the Y-axis, the graph title shows the statistics.
Reviewer 3 Report
Comments and Suggestions for AuthorsI appreciate the opportunity to review the data descriptor article entitled ‘Exploring lightning in exoplanetary environments and its
implications for the search of biosignatures using artificial
intelligence’ by Flores-Pulido et al.
The authors claim to have proposed a robust star classification technique using light curves and by employing a bagging neural network approach.
While the topic of the paper is very interesting and relevant for the application of AI in astronomy, the paper is very ambiguously presented, both in terms of concepts and methodology. I will try to summarize my findings that elaborates the need major revisions before this paper can be considered for publication:
Major concerns and comments:
- The title is vague; the paper talks only about star classification methods, and there is no discussion about the implications of biosignatures in exoplanets. The title should be modified to reflect the actual content of the paper.
- Throughout the paper, various acronyms are used without any explanations, making it very difficult for the reader to understand what the authors are referring to using those acronyms.
- At places, the language in the paragraphs is very hard to comprehend and has multiple instances of missing references, including one specific scenario in section 3.1 where lines 94-103 and lines 104-110 relay the same exact information but paraphrased. This shows a lack of care in composing the paper and should be corrected properly. Also, tables 2 and 3 are also identical in their content.
- The paper mentions three different classification methods in section 2, lines 78-87, but when the results are discussed, only the third method is extensively discussed without proper comparisons with the first two approaches.
- In section 3.1, line 111, the authors claim the BAPENN model uses three different neural network algorithms, but in line 84, they mention it uses an ensemble of multilayer perceptron neural networks, but with varying hidden nodes, which indicates only one type of algorithm is used only with varying specs. These contradictory explanations are hard for the reader to understand.
- There is also a contradiction in explaining the reasoning behind the choice of the number of hidden nodes for each of the neural networks. Line 192 says there are 17 input nodes and 10 hidden nodes in architecture 2, but line 202-203 mentions 27 inout nodes with 13 hidden nodes.
- Finally, the conclusion section 5 is very ambiguously written. It’s not very clear what the takeaway points are from the experiments performed.
Other comments:
Line 4 - the abstract says ‘760 types of stars’ - but later in the text, the authors mention only 9 types.
Table 1 - a lot of use of acronyms without explanations,
Line 50 - missing reference for GALEX
Line 62-64 - please elaborate why neural networks are only effective between 0 and 1 values, it’s not clear in the text.
Line 89 - how is the certainty percentage calculated?
Figure 1 - what is GCK in the second box in the top row?
Table 2 and Table 3 - the tables are identical. Also, what does the error value mean? What is the unit of measurement?
Line 116 - please define the performance values.
Figure 2 - shows a confusion matrix, but section 3 does not have any mention of the confusion matrix or how it is used.
Figure 3-Figure 10 - please mention the units for the x-axis and y-axis.
Line 148 - Percy 2007 is cited using an inconsistent referencing style with the rest of the paper.
Line 214 - please elaborate on how the non-discriminant statistics are determined.
Line 217 - for equations 1 and 2, please elaborate on each of the acronyms used for the statistics
Line 223 - what is PPV?
Line 227 and line 234- missing reference for the table
Table 4 - please elaborate what dots are
Figure 14 - if possible, update the plot quality; it’s not very readable.
Author Response
1.-The title is vague; the paper talks only about star classification methods, and there is no discussion about the implications of biosignatures in exoplanets. The title should be modified to reflect the actual content of the paper.
A1. The title was modified to enhance coherence and content, now reading "Classification of Exoplanetary Light Curves Using Artificial Intelligence."
2.Throughout the paper, various acronyms are used without any explanations, making it very difficult for the reader to understand what the authors are referring to using those acronyms.
A2. We have added an appendix containing the acronyms.
3.At places, the language in the paragraphs is very hard to comprehend and has multiple instances of missing references, including one specific scenario in section 3.1 where lines 94-103 and lines 104-110 relay the same exact information but paraphrased. This shows a lack of care in composing the paper, which should be corrected properly. Tables 2 and 3 are identical in their content.
A3: The text has been rewritten to correct the omission of references, including a specific scenario in section 3.1. Lines 94-103 and 104-110 have been corrected. Table 4 has been updated.
4.The paper mentions three different classification methods in section 2, lines 78-87, but when the results are discussed, only the third method is extensively discussed without proper comparisons with the first two approaches.
A4: Corrections have been made on lines 81-90, clarifying that points 1 and 2 are not shown due to poor results. However, the authors have removed these points and instead describe the three variations of BAPENN, which consist of an ensemble with three architectures: 1) an architecture with five hidden nodes, 2) an architecture with ten hidden nodes, and 3) an architecture with fifteen hidden nodes.
5.In section 3.1, line 111, the authors claim the BAPENN model uses three different neural network algorithms, but in line 84, they mention it uses an ensemble of multilayer perceptron neural networks, but with varying hidden nodes, which indicates only one type of algorithm is used only with varying specs. These contradictory explanations are hard for the reader to understand.
A5: They have been covered in point 4.
6.There is also a contradiction in explaining the reasoning behind the choice of the number of hidden nodes for each of the neural networks. Line 192 says there are 17 input nodes and 10 hidden nodes in architecture 2, but line 202-203 mentions 27 inout nodes with 13 hidden nodes.
A6: Initially, there were 15 nodes, which were increased to prevent overfitting. The hidden nodes varied based on their characteristics, which caused the network architecture to change according to its training.
7. Finally, the conclusion section 5 is very ambiguously written. It's not very clear what the takeaway points are from the experiments performed.
A7: The conclusion was updated from line 260.
Other comments:
1. Line 4 - the abstract says ‘760 types of stars' - but later in the text, the authors mention only 9 types.
A1: Only 9 types are considered. Within the collection, there are 760 star samples, as clarified in line 4.
2. Table 1 - a lot of use of acronyms without explanations,
A2. We have added an appendix containing the acronyms.
3. Line 50 - missing reference for GALEX
A3: The reference to GALEX was added inline 317.
4.Line 62-64 - please elaborate why neural networks are only effective between 0 and 1 values, it's not clear in the text.
A4 The architecture of the multilayer perceptron trained with the Backpropagation algorithm has the advantage of working with real input values between 0 and 1, this information is added in lines 82-90.
5.Line 89 - how is the certainty percentage calculated?
A5: The classification percentage is calculated for both the classified samples and the misclassification error. This is based on the total samples and the desired outputs given to the algorithm during the training stage. The confusion matrices even measure the percentages for false positives or false negatives; this is done for each type of star and for each type of dataset, and it varies depending on the light points of each dataset as added.
6. Figure 1 - what is GCK in the second box in the top row?
A6: Galex Cause Kepler (GCK)
7.Table 2 and Table 3 – the tables are identical. Also, what does the error value mean? What is the unit of measurement?
A7: The table 3 was replaced. The SNR measurement error of the observation instrument. SNR is an abbreviation for signal-to-noise ratio, a crucial concept in fields such as telecommunications, audio engineering, and data transmission. The SNR helps determine the quality and clarity of a signal in relation to background noise in observation contexts and is expressed in decibels (dB).
8. Line 116 - please define the performance values.
A8: There are 6 performance interpretations managed within the research:
1. Performance is used as a synonym for functioning in diagrams, and it signifies the flow of information in BAPENN architecture.
2. Performance is the selection of the best percentage among the three variations of hidden nodes.
3. Performance is in terms of the confusion matrices for each dataset.
4. Performance is the convergence of learning.
5. Performance is the training of the ensemble.
6. Performance is related to the neural network architectures.
9. Figure 2 - shows a confusion matrix, but section 3 does not have any mention of the confusion matrix or how it is used.
A9: It was used in the description of Figure 2.
10. Figure 3-Figure 10 - please mention the units for the x-axis and y-axis.
A10: The X-axis represents the time, and the Y-axis represents the NUV FLOW.
11. Line 148 - Percy 2007 is cited using an inconsistent referencing style with the rest of the paper.
A11: The inconsistencies in references have been corrected; they are listed with the surname and year in Table 1 for the reader's convenience. However, the rest of the article uses the normal numbering format in brackets.
12. Line 214 - please elaborate on how the non-discriminant statistics are determined.
A12: The non-discriminating statistics are those highlighted in red in Figure 12, as no variations are observed in their graphs. This means their feature vector is entirely continuous and unchanged.
13 Line 217 - for equations 1 and 2, please elaborate on each of the acronyms used for the statistics
A13: Acronyms 1 and 2 are described.
14.Line 223 - what is PPV?
A14: PPV is defined by the ratio of the differences and the sum between the maximum and minimum of flux and errors.
15.Line 227 and line 234- missing reference for the table
A15: They have been added.
16.Table 4 - please elaborate what dots are
A16: The number of points that describe the light curve depends on the dataset and its corresponding error, where the best percentage was obtained among the 15 sets of light curve data being managed. This is due to the error of the instrument providing the data, which can be affected by noise or static in the signal or from the telescope. Although this does not affect the operation of the classifier through the assembly of neural networks, it does affect their aesthetics visually in the graphs. However, this does not prevent effective classification. It is demonstrated that despite these details in the light curve samples, the uncertainty, ambiguity, or noise in the data—whether due to static or instrument measurements—is omitted or suppressed by the ensemble, as its effect is not of significant importance in the classification process performed by the neural networks.
17. Figure 14 - if possible, update the plot quality; it's not very readable
A17: If possible, the graphs have been recreated for the reader's convenience. It is also noted that there are some continuity errors in the graphs. This is due to the error of the instrument providing the data based on noise or static of the signal or telescope. Although this does not affect the operation of the classifier through the assembly of neural networks, visually in the graphs, it affects their aesthetics. However, this does not prevent effective classification, as it is demonstrated that despite these details in the light curve samples, the uncertainty, ambiguity, or noise in the data, whether due to static or instrument measurements, is omitted or suppressed by the ensemble, as its effect is not of significant importance in the classification process performed by the neural networks.
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsDear authors
You can find my report in the PDF file
Comments for author File: Comments.pdf
There are some points that need corrections in English, particularly in the captions of Figures 3 to 11.
Author Response
Q1 Aside from the English corrections needed, we understand that the time unit is one month, with the X-axis displaying time and the Y- axis displaying flux. The authors should correct or clarify this point.
A. We have added a red caption to each figure in the updated manuscript.
Q2 a strong revision of all the Figures 3 to the 11.
A. We made a strong revision of the Figures.
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsI appreciate the authors making the changes. I have one remaining concern:
For figures 3 to 10 the plots look updated and does not seem to be identical as the previous manuscript. The individual time series patterns are also not very clear from the plot. The axes labels are also barely readable. I would suggest to improve the plots by proper separation of the individual time series in each plot.
Author Response
Q1 For figures 3 to 10 the plots look updated and does not seem to be identical as the previous manuscript. The individual time series patterns are also not very clear from the plot. The axes labels are also barely readable. I would suggest to improve the plots by proper separation of the individual time series in each plot.
A. Yes, the samples are not the same because they are taken randomly, and these new images resulted from the model's execution.
The algorithm cannot separate the samples into the individual series because the dataset is already formed. If the noise were to be removed from the set of samples, the exposure time would be lost, making it difficult to recognize the start of one star and the beginning of another.
Round 3
Reviewer 2 Report
Comments and Suggestions for AuthorsDear authors
I see that the captions in the Figures have been corrected. So, I do not have some further comment
Reviewer 3 Report
Comments and Suggestions for AuthorsThanks for the explanation. I would recommend it for publication.