Effort and Cost Estimation Using Decision Tree Techniques and Story Points in Agile Software Development
Round 1
Reviewer 1 Report
The purpose of this paper is to explore the use of hybrid models that combine algorithmic models and learning-oriented techniques as a method for project-level effort estimation in agile software development. Effort estimation in agile frameworks such as Scrum uses the story point approach, which measures the effort required to complete a release of the system using an arithmetic scale.
The authors in this paper uses labeled historical data to estimate the completion time and total cost of a project, measured in days and rupees respectively. The Decision Tree, Random Forest, and AdaBoost techniques are used to improve the accuracy of predictions.
Models are trained using 10-Fold cross-validation and the relative error is used to compare the results with existing literature. The Bagging ensemble made up of the three techniques provides the highest accuracy and project classification also improves the estimates. This study contributes to the limited research on effort estimation in agile software development using artificial intelligence and provides insight into the potential benefits of using hybrid models for project-level effort estimation.
This study contributes to the Models in Software Engineering and fit the scope of this special issue. However, it would be even better if the following issues were considered:
It would be helpful to provide more context and background information on agile software development and the importance of effort and cost estimation in the field.
The choice of Decision Trees, Random Forest, and AdaBoost techniques should be explained and justified in more detail, including the specific reasons for why these techniques were chosen over others.
The labeling of the projects should be explained in greater detail, including the method used for labeling and why this method was deemed the best choice.
The results section could be made clearer and more concise, perhaps by presenting the results in the form of tables or graphs, and including a discussion of the limitations of the study and any areas for future research.
It would be useful to compare the results of this study with previous research in the field, and to discuss how the results contribute to the overall understanding of effort and cost estimation in agile software development.
The conclusions section should be strengthened by highlighting the implications of the study for practice, including specific recommendations for practitioners who want to improve their estimation accuracy.
Consider including more information about the dataset used in the study, including the size, source, and any relevant details that would help readers understand the validity and generalizability of the results.
The language and terminology used in the paper should be made more accessible to a wider audience, especially those who may not be familiar with the technical details of machine learning techniques.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
Effort and cost estimation in software development projects is a crucial aspect for ensuring successful delivery of the project within the given timeline and budget. The paper under consideration, titled "Effort and Cost Estimation Using Decision Tree Techniques and Story Points in Agile Software Development", proposes the use of hybrid models composed of algorithmic models and learning-oriented techniques for project-level effort estimation in agile frameworks.
The authors note that limited research has been done on the topic of effort estimation in agile software development using artificial intelligence. Therefore, the research project presents a valuable contribution to strengthening the use of hybrid models for effort estimation in agile frameworks. The proposed approach utilizes the story point approach, which measures the effort required to complete a release of the system, and labeled historical data to estimate completion time and total cost of the project. The authors have used Decision Tree, Random Forest, and AdaBoost techniques to improve the accuracy of predictions, and Bagging ensemble to provide the highest accuracy and project classification.
Overall, this paper is a well-written and valuable contribution to the field of effort and cost estimation in agile software development. The authors have made a convincing argument for the use of hybrid models in agile frameworks, and their approach is grounded in well-established techniques. However, there are some areas where the paper could be improved. For instance, the authors could provide more detail on how they obtained their labeled historical data and how they validated their results. Additionally, the paper would benefit from a more in-depth discussion of the limitations of their approach and potential avenues for future research.
In information technology projects, why is it required to make an early effort estimation, and what are the benefits of doing so?
Why has there been so little study done utilizing artificial intelligence to estimate the amount of work involved in agile software development?
How exactly can estimation of the amount of work involved in agile frameworks be improved by using hybrid models that combine learning-oriented methodologies and algorithmic models?
When it comes to estimating the amount of work involved in agile approaches such as Scrum, what are the drawbacks of employing the story point approach?
How does a user story capture the actions that stakeholders need to conduct through the system as well as the demands of the client?
In determining the amount of time needed to finish a project and the overall cost of doing so in rupees, how important is it to have labeled historical data?
How exactly do the techniques of Decision Tree, Random Forest, and AdaBoost improve the accuracy of predictions made regarding the amount of effort and expense involved?
What exactly is 10-Fold cross-validation, and how exactly does it contribute to the training of the models that are being utilized in this project?
How does the relative inaccuracy in this project compare with the results found in the literature?
What exactly is the Bagging ensemble, and why is it seen to be the most effective strategy for this undertaking?
What are the repercussions of utilizing the Bagging ensemble strategy in terms of enhancing the precision of the assessment of the amount of effort and cost?
In other agile software development projects, what are some of the potential issues that could arise while applying the Bagging ensemble approach?
How does the Bagging ensemble approach stack up against other grouping techniques when it comes to accuracy and productivity?
How can the findings of this project be implemented in situations that take place in the real world, and what are the possible benefits of doing so?
When it comes to estimating the amount of time and money required for software development projects, what ethical considerations should be taken into account when utilizing algorithms and machine learning techniques?
With these changes, I recommend that the paper be accepted for publication.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
I reviewed your paper and want to express my appreciation for your contribution to the field. I found your work insightful and thought-provoking. I also want to share some thoughts on how the paper's structure and content could be expanded.
1) To start with, would you be open to expanding the paper's contribution?
2) The proposed working method is very well explained and contains many exciting ideas; However, the comparative study is poorly presented. I think the authors should consider more current state-of-the-art methods to strengthen their hypotheses.
3) It is worth mentioning that the results obtained by some ML algorithms can not be easily compared to the result of their ensemble. Therefore, authors should vary the parameters' values to assess the performance thoroughly and extensively.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 4 Report
This paper contributes to strengthening the use of hybrid models composed of algorithmic models and learning oriented techniques as a project-level effort estimation method in agile frameworks. Results indicate that the proposed method is superior to baseline models. Generally speaking, it is well written and easy to follow. However, I have some comments as follows.
1. As we know, deep learning has become the most promising method in data mining, and has achieved favorable results in a variety of disciplines, including computer vision, natural language processing, and demand prediction. Please refer to the following papers.
"Sensing Data Supported Traffic Flow Prediction via Denoising Schemes and ANN: A Comparison." 2020, IEEE Sensors Journal 20(23): 14317-14328.
Exploring influence mechanism of bikesharing on the use of public transportation-a case of Shanghai [J]. Transportation Letters-the International Journal of Transportation Research. 2022. DOI: 10.1080/19427867.2022.2093287.
2. Are there any random numbers in the solving algorithm for the data mining method? If yes, how do the authors deal with the randomness?
3. The Bagging ensemble made of the three techniques achieves a better result than a single technique. So do the authors consider to use more single techniques to obtain an ensemble method?
4. I suggest the authors to use the computation time as an additional evaluating indicator to assess the relative advantages and disadvantages of these algorithms.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 5 Report
To improve the accuracy of model predictions, this paper proposes a hybrid model based on decision tree techniques and story points as a method for estimating effort and cost in agile software development. The paper trains the models using 10-fold cross-validation and compares the model results using relative errors, which is practical and realistic. However, there is insufficient experimental data in this paper for the multi-model combination part, and this study only performs a simple multi-model overlay, which is not significantly different from previous research methods. Overall, it is recommended that this paper needs to be revised and resubmitted.
Specific suggestions for revision are as follows.
1. The entire paper focuses on how combining different models helps to improve model accuracy. The paper, however, makes no mention of the equipment base required for the models. The experimental section also does not go into great detail about the equipment.
2. The related work on effort estimation and agile software development is not presented in a way that corresponds to the three techniques under consideration in this paper's study, and instead follows a simple pile of literature in the timeline.
3. The background introduction only briefly introduces the three types of decision trees, random forests, and adaboost methods and principles, without summarizing the comparison.
4. The project strategy flowchart in Figure 4 can be explained further in Materials and Methods, and the summary highlights the paper's innovation points and contributions.
5.The relevant factors are only briefly listed in Figure 5 of Materials and Methods. The logic of the diagram and the inherent connection between the diagram and the methods in the paper can be given careful consideration.
6.Line 257's experimental running process should be mirrored in the experimental section, and the experimental processes should be described more methodically and clearly.
7. To conduct a more comprehensive experimental comparison and highlight the advanced nature of the models in this paper, the comparison experiments in the experiments should be divided into two categories: single-model experiments and multi-model combination experiments.
8. Although the problem of over-fitting data in Decision Tree is mentioned several times throughout the experiments, the preliminary solution is not provided in the subsequent paper.
9.There is a lack of real data project experiments in the paper, and there is a lack of innovation point summary, only the simple integration of three models.
Based on the above suggestions, it is considered that this paper needs to be revised and resubmitted.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Thank you for your works.
Reviewer 4 Report
The authors have dealt with all my concerns.
Reviewer 5 Report
The authors carefully revised the paper according to the comments listed in this reviewer's reports. The concerns have been answered via the authors' efforts, and the quality of this paper was improved.