Preliminary Estimation for Software Development Projects Empowered with a Method of Recommending Optimal Duration and Team Composition

: In the early software development stages, the aim of estimation is to obtain a rough understanding of the timeline and resources required to implement a potential project. The current study is devoted to a method of preliminary estimation applicable at the beginning of the software development life cycle when the level of uncertainty is high. The authors’ concepts of the estimation life cycle, the estimable items breakdown structure, and a system of working-time balance equations in conjunction with an agile-fashioned sizing approach are used. To minimize the experts’ working time spent on preliminary estimation, the authors applied a decision support procedure based on integer programming and the analytic hierarchy process. The method’s outcomes are not definitive enough to make commitments; instead, they are supposed to be used for communication with project stakeholders or as inputs for the subsequent estimation stages. For practical usage of the preliminary estimation method, a semistructured business process is proposed.


Introduction
From the business standpoint, predictability is an essential aspect of software development.In other words, before starting a project, it is important to understand how many resources and how much time the implementation will require.This is why estimation is an integral part of the software development life cycle.Following the cone-of-uncertainty concept [1], estimation is envisioned as a multistage process starting from an early phase of project ideation and finishing along with the implementation completion.At each project life cycle stage, estimation has certain goals, requires specific methods, and produces different outcomes.
Most well-known and practically proven estimation methods were developed in the second half of the 20th century.As a result, the application of those methods to modern software development projects causes difficulties or even is hardly possible.Such methods as COCOMO II [2] are quite complex and require special training.Another widely used method, PERT [3,4], is relatively easy but does not take into account the agile nature of modern software development projects.Agile-oriented methods like planning poker, T-shirt sizing, affinity grouping, and their variations are rather applicable by agile teams at the implementation stage [5][6][7][8].A common downside of the existing methods (except COCOMO II) is that they do not cover the whole software development life cycle starting from project ideation and ending with production, including maintenance and support.
To fill in this gap, the authors propose the concept of the estimation life cycle (ELC), tailoring estimation activities to the software development life cycle (SDLC) stages.According to the ELC, early project stages are accompanied by introductory, preliminary, and intermediate [9] estimations (listed in the order of increasing levels of details, reliability, and accuracy).The commonality of these estimations is that their outputs are not definitive enough to take on obligations.Instead, the following step-precise estimation-aims at providing outcomes that can be used for making commitments.Precise estimation takes place before the implementation start, when the analysis and design are completed.At the implementation stage, the estimates are compared with actual efforts, and a project team is responsible for keeping up-to-date the remaining estimates.After completing implementation, the ELC prescribes finalizing feedback on estimates.
The current study is devoted to a method of preliminary estimation.The method's objective is to roughly understand the resources and time required to implement a project, minimizing the efforts spent on the estimation itself.Usually, this type of estimation is applicable in the initial steps of SDLC.The main outputs of the preliminary estimation are project scope (represented as an estimable-item breakdown structure, defined below), team composition, and project duration.Obviously, due to the quite high level of uncertainty, these outcomes are not supposed to be used for making commitments; instead, they will be elaborated on in the following stages: the intermediate and precise estimations [9,10].
The method of preliminary estimation proposed in this study is based on a simplified version of a system of working-time balance equations [9,10].This applied simplification (in comparison with the intermediate estimation) aims, first of all, at the minimization of efforts spent on the estimation, intentionally sacrificing its accuracy.Additionally, the method is empowered with a decision support procedure based on a multiobjective optimization with the purpose of providing the involved experts with several estimate alternatives.For practical usage, the method of preliminary estimation involves a semistructured knowledgeintensive business process [11,12], predefining steps to be performed and, at the same time, leaving enough space for flexibility and creativity.
The rest of this paper is structured as follows: the research background is given in Section 2; a literature review is in Section 3; the main results, including the semistructured business process, the concept of an estimable-item breakdown structure, the sizing approach, the system of working-time balance equations, the approach to compose a project team, the project duration estimation, and the multiobjective decision support procedure, are presented in Section 3; the challenges of the project scope decomposition, development specializations, development working time coefficient, and validity of the proposed method are discussed in Section 4; conclusions are in Section 5.

Background
The current study is a logical continuation of the authors' past works-the preliminary estimation method is based on their estimation framework [9,10].This section is devoted to the basic terms and concepts introduced previously, in particular, the concept of the estimation life cycle, the structure of software developer working time, and the "normalization" approach to measure development efforts.

Concept of Estimation Life Cycle
A common mistake is to consider estimation as a one-time activity that takes place before the project start.As a rule, such a mistake leads to a low-quality estimate that, in turn, causes the unpredictability of project resources and deadlines.The key to avoiding this issue is to organize the estimation as a multistep process taking place at each SDLC stage.Of course, depending on the project life cycle stage, the estimation targets different goals, requires various methods, and satisfies certain accuracies.In the current paper, the estimate accuracy means closeness of the estimated values to the corresponding actual value requirements; obviously, the accuracy defined in this way can be measured only after receiving the actual values, i.e., during the project implementation or even after its completion.In Figure 1, the concept of the estimation life cycle (ELC) proposed by the authors is visualized.Being based on the cone-of-uncertainty idea [1], the concept represents estimation as an integral part of SDLC.The estimation starts from the so-called ideation phase (which is a very early stage of a project), when the level of uncertainty in the understanding of different aspects of the project is the highest.As a rule, the estimates at this stage are represented as wide ranges that rather express a rough order of the required resources and time.In the project ideation stage, introductory and preliminary estimates are undertaken.The main difference between these successive steps is that introductory estimates are mostly based on similarity with past projects and do not need a project scope analysis.In its turn, the preliminary estimation requires the project scope analysis.Definitely, such estimate types are rather informative and not recommended for commitments (e.g., when signing a contract).The estimation goal at the project ideation stage is to roughly understand the project scope and resources required for the project's implementation while minimizing the efforts to prepare the estimate itself.
As the level of uncertainty decreases, it becomes possible to provide more accurate estimates using the outcomes from the previous stages as inputs.Higher accuracy of intermediate estimates is mostly achieved through the elaboration of the project scope as well as distinguishing the project implementation phases.In order to reduce the efforts spent on the estimation itself, the corresponding methods do not consider dependencies among the project tasks.Although their definitiveness is higher, intermediate estimates are still not reliable enough to be used for commitments.Further information about intermediate estimation can be found in [9,10].
After a certain amount of analysis and design, providing a precise estimate and a project release plan becomes possible.Unlike in the previous stages, the precise estimate's accuracy is acceptable to use to make commitments.In this regard, precise estimation is similar to what is called the definitive estimate [13].Since the efforts required to prepare such an estimate are greater in comparison to the previous stages, it is worth undertaking it in the case of a high probability of the project implementation's start.
In the project implementation phase, it is important to constantly monitor progress, comparing the actual efforts with the estimate and reacting proactively to undesirable deviations from the release plan.To avoid missing deadlines, regular re-estimation of the remaining project tasks is necessary.As this part of the project life cycle is likely to follow an agile methodology, the agile team is the main source of feedback on the estimate, and the team is responsible for adjusting the estimate according to the actual project state.
After project completion, it is quite useful to finalize the feedback on the estimates comparing them with the actual working time as well as analyzing lessons learned.The collected feedback is valuable for future project estimation and planning.
The main idea of splitting estimation into several stages is that the estimate accuracy becomes higher, while the level of uncertainty decreases.The stages are consequent-outcomes of the previous stage are inputs for the next one.Therefore, performing estimation successively by following the ELC ensures higher quality and less total effort in comparison with the case where, for example, the previous stages are skipped and only the precise estimation is performed before the project implementation start.

Structure of Software Developer Working Time
One of the core concepts of the authors' estimation framework [9,10] is the structure of software developer working time.The project working time (PWT), W, means total working time spent by a software engineer working on a project.PWT consists of the following three parts: M, the working time spent on project scope implementation; G, the working time spent on general project activities such as daily meetings, sprint plannings, etc.; N, nonworking time including idle time as well as days off, sick leaves, vacations.
In turn, the project scope implementation time, M, is split into two parts: development working time (DWT), D, and supplementary development activities, A. D is the time that a software engineer spends on coding activities.And A is the time spent on supplementary activities such as writing unit tests, defect fixing, team collaboration, etc.
The nonworking time, N, also consists of two parts.The first is idle time, I, when a software engineer does not do actual work due to, for example, lack of project tasks.And leave time, O, includes planned vacations, unplanned days off, and sick leave.
Hence, the structure of software developer working time is expressed as follows: The preliminary estimation proposed in the current paper is based on a simplified version of the structure of software developer working time including only the PWD and DWT, omitting the other variables (Equation ( 8) in (Section 4).

Measurement of Development Efforts
The existing approaches to measurement (or sizing) of development efforts can be grouped into the three main categories [14,15].The most commonly used category is based on the idea of function points [16].Such units as object points [2], use-case points [17], story points [5][6][7][8], etc., are derived from the concept of function points.Another wellknown development effort measurement unit comprises logical lines of code (the so-called SLOC metric) used by COCOMO [18] and COCOMO II [2].And the third category is working time-based (e.g., used in PERT [3,4]), expressing development efforts in manhours, man-days, man-months, etc.The main downside of the approaches from the first and second categories is the difficulty in converting them to time-based units, which usually requires some historical data.To eliminate such an issue, the authors' estimation framework operates with time-based units to measure development efforts.The theoretical basis of this is given in the current section below.
An "average" software developer (ASD) is a software developer who possesses a high enough competency level to act as a team player and implement project tasks of acceptable quality without permanent supervision by more experienced colleagues; and, at the same time, such a developer has room for improvement in terms of efficiency and complexity of the work performed.A middle software engineer is the position which is closest to the ASD.The development effort "normalization" approach follows from the concept of the ASD.
The normalized development working time (NDWT), U, is the DWT spent by an ASD implementing certain project tasks who is 100% involved (i.e., 8 h per working day) in a project.If D is the DWT spent by a developer who is different from the ASD, then the following expression takes place: where ρ is the productivity coefficient (PC).Assuming that the development productivity is influenced by the competency level and the project involvement, where α is the competency-level productivity coefficient (CLPC) and η is the involvement productivity coefficient (IPC).For an ASD, α = 1; 0 < α < 1 for a developer of a competency level lower than the ASD, and α > 1 for a competency level higher than the ASD.In the case of 100% of project involvement, η = 1; for partial involvement or overtime, 0 < η < 1.The IPC and CLPC parameters are used in Section 4 to define the project team composition.
The normalized development capacity (NDC), C, is maximum possible value of NDWT that can be spent on the project tasks implementation: where U and D are the sets of all possible values of NDWT and DWT respectively.In particular, the maximization in ( 4) is aimed at minimizing idle time I and leaves time O.
For the preliminary estimation, NDC is defined as (12) and (13).
The normalized development estimate (NDE), E, of a certain portion of the project scope is a forecasted DWT assuming that the work is performed by an ASD.Usage of the NDE for preliminary estimation is explained in Section 4, in particular, for sizing of estimable items.
In the authors' estimation framework, the NDC and NDE occupy one of the key places, being "common denominators" for measuring development efforts.Importantly, these parameters determine project duration (Inequalities ( 14) and (15) in Section 4).It is worth noting that the described "normalization" approach has been expanded to include the project involvement definition, introducing a new term-normalized development full-time equivalent (Equation (7) in Section 4).

Literature Review
The preliminary estimation represented in the current paper is close to such well-known terms as "rough order of magnitude" (ROM) [13,19] and more informal ones-"guesstimate" or "ballpark figures"-that are usually used to express an approximation of future project efforts, schedule, and budget.The method proposed by the authors is supposed to be applied under the following circumstances: (a) an early project life cycle stage when the level of uncertainty is high; (b) limited time to prepare an estimate; (c) acceptability of low estimate accuracy; (d) an estimate that is not recommended for making commitments; (e) unknown dependencies among project tasks.In the current section, a thorough overview is provided of the existing methods that can be applied under the conditions mentioned above.
An estimation by analogy [20][21][22][23] allows one to understand the approximate efforts and duration based on similar past projects.Such methods can either utilize formal approaches using similarity metrics [24] and a database with historical projects or rely more on experts' opinion.A significant drawback of those methods is the lack of project scope analysis; the main advantage is the speed of estimation.In the case of an existing solid database of past projects, analogy-based methods can be empowered with machine learning and AI techniques [23,25,26].
One of the most mature of the existing estimation methodologies-COCOMO II [2]-supports the concept of tailoring estimation models to the project life cycle stages.In the early prototyping phase, the Application Composition model is applicable.This model is based on the idea of measuring the size of software in so-called object points (which, in turn, derives from the concept of a function point [16]).The application-points approach involves identifying objects of a software product (e.g., screens, reports, database tables, etc.) and defining their complexity levels, which are used to calculate application points that are translated to the efforts and schedule.In the subsequent early design phase, the COCOMO II Early Design model is suggested.The usage of this model requires trained estimation experts, availability of historical data, and calibration of the model parameters.Its challenging part is also the sizing, which requires the calculation of so-called logical lines of source code (i.e., the SLOC metric) or function points.Whilst both Application Composition and Early Design models are applicable in the initial SDLC stages, the second one seems to be more difficult for usage in practice.
In this regard, it is also worth mentioning the approach based on use-case points [17,[27][28][29][30][31]].An advantage of such a method is the usage of UML use-case models to analyze project functionality.The strong side of this method is its relatively simple visual notation representing both the users (actors) and functionality (use cases) of a software system (which is quite handy in the early project life cycle stages).However, the calculation formulas proposed in the original publication [17] most probably will require some adjustments before application to a specific project.
Widespread use of agile methodologies has led to the creation of agile-fashion estimation techniques.Usually, such techniques are used by project teams during the project implementation phase.However, some of their elements can also be applied in the early project life cycle phases before the development start, even before forming an agile team.One such method is affinity grouping [5][6][7][8]: tasks are grouped into clusters by their similarity (e.g., estimated efforts or complexity), and then the clustered tasks are assigned with estimates.Another well-known method is T-shirt sizing [5][6][7][8], where tasks are attributed to one of the predefined categories that usually correspond to the sizes of T-shirts: XS, S, M, L, XL, etc.Both methods can be used either by teams or by individual experts for sizing of estimable items in the preliminary estimation stage.
Another quite popular method is the so-called three-point estimation, where experts provide three values: optimistic, pessimistic, and most likely.Then, the final estimate is calculated as a weighted average of these three points.Such an approach is inherited from the PERT methodology [3,4].To apply this method to preliminary estimation, it is required to prepare a work breakdown structure that, in turn, is estimated by experts.
Despite the considerable number of existing methods, none of them fully covers the requirements of the preliminary estimation provided at the beginning of the section.Furthermore, the majority of the methods suffer from the absence of a holistic vision of estimation as a multistep process inseparable from SDLC.

Preliminary Estimation as a Semistructured Business Process
Before starting a discussion of the preliminary estimation method in detail, it is worth analyzing its usage in practice.Since preliminary estimation heavily relies on the knowledge, experience, and creativity of the involved experts, it makes sense to keep it as a semistructured business process [11,12], leaving some level of freedom for the participants.In other words, such a process involves certain steps; however, there are no strict recommendations on the steps' order (i.e., some of the steps can be interchanged, and some of them can be repeated several times).The preliminary estimation semistructured process is represented in Figure 2. The process starts with understanding the essence of a project and getting familiar with the available requirements (step A).An important aspect of the process is identifying the project scenarios (step B).Under a project scenario, a hypothetical way of project implementation is understood.Depending on the circumstances, the criteria of scenario identification might be different, for example, development efforts, scope of work, implementation technologies, architecture design, etc.The "backbone" of the process consists of three steps: D, E, and F. In these steps, the estimation outcomes are produced.The rest of the Section 4 covers step E and, partially, step D. Importantly, steps D, E, and F are performed for each of the identified scenarios (it is worth noting that optimistic and pessimistic estimates represent an estimate range for a single scenario, not two different scenarios).The final step G is aimed at communication of the estimates to the concerned parties (e.g., to a potential client).
The process involves the following participant roles: a project manager, a technical expert, and a business analyst.The primary responsibility areas of each role are shown in Figure 2.However, it is worth noting that regardless of the primary responsible role, other roles are also supposed to contribute to a certain process step (e.g., a business analyst is responsible for the project scope decomposition; however, a technical expert can also contribute to this).In practice, a person can combine the duties of more than one role (e.g., a technical expert can also perform the tasks of a business analyst).Or, in the opposite case, the business analyst role can be covered by two people: a business analyst and a business domain expert (or a subject matter expert).

Estimable-Item Breakdown Structure
To provide reliable estimates, it is necessary to have some representation of the project scope-the object of estimation.Usually, the project scope is decomposed into a treelike construction called a work breakdown structure (WBS) [32] or into one of its subtypes (e.g., a component-based work breakdown structure, CBWBS [33]).In practice, such a breakdown structure is received as the result of a combination of several decomposition approaches (e.g., work packages, work items, epics, features, components, use cases, etc.).In order to incorporate a project scope breakdown into the authors' estimation framework, the terms "estimable item" (EI) and "estimable-item breakdown structure" (EIBS) are introduced.
An estimable item (IE), x, is a representation of a project scope portion which can be sized (i.e., assigned with a normalized development estimate, NDE [9]), decomposed to child estimable items, and analyzed in terms of assumptions, dependencies, risks, etc.An estimable item x possesses a set of attributes; the value of an attribute can be denoted in squared brackets-x[attribute name].For example, x[parent] is a parent of estimable item x; x[risks] is a set of risks associated with x.Especially important is x[NDE].An item is called a leaf estimable item (LEI) when it does not have any child items, An estimable-item breakdown structure (EIBS) X is a tree (in terms of graph theory) with estimable items as the vertices, parent-child relationships between the estimable items as the edges, and the root item representing the scope of the whole project.
To show how the preliminary estimation is applied, a software product named "Realtime Business Process Monitoring for Estimation" (RTBPM-E) [34,35] is used here and below.In Table 1 and in Figure 3, an EIBS is provided for RTBPM-E.Visualization of a process model Graph-based visualization of a process model on a web page.

IE-4
Alerting on project estimation process Alerts highlighting issues or risks during estimation of a particular project.Alerts are visualized on the graphic user interface and sent via email.

IE-5
Data processing pipeline Integration with the data sources and implementation of a data processing pipeline [34].

IE-6
Security and user management Typical security-related functionality: (a) authentication, (b) authorization, (c) user management, etc. IE-7 Administration and configuration Admin dashboard and configuration of the system.

IE-8
Project infrastructure Setting up project infrastructure including structuring of the source code and continuous integration.
An important part of EIBS creation is assigning attributes to items.Such an attribute is nothing but a piece of information associated with an EI.In Table 2, attribute types are listed.They, in the authors' opinion, correspond to the most frequently analyzed aspects of the project scope.
Considering the limit on the preparation time and the high level of uncertainty, as well as not-so-strict accuracy requirements, an EIBS created during the preliminary estimation stage is not supposed to be quite detailed.Even identification of the first-level items might be enough to provide a preliminary estimate.

Sizing of Estimable Items
As already mentioned in Section 2, the authors' estimation framework uses NDE as a measure of development efforts.NDE is a time-based unit expressing the amount of work in man-hours, man-days, etc.Such time-based measuring ensures seamless translation of the estimated efforts into the project schedule.However, it is worth highlighting that the NDE itself is defined in a way that makes it independent from neither the project schedule nor the team composition.
In essence, sizing is aimed at designating each EI as an NDE.For the preliminary estimation, a two-step sizing approach is proposed: (a) designate each LEI using such attributes as an estimable item point (EIP) and an estimable item uncertainty (EIU); (b) then, obtain optimistic and pessimistic NDEs from the corresponding EIP and EIU.
Let x ∈ X be an LEI belonging to EIBS X.The estimable item point (EIP) of x, x[EIP], is a positive number representing the relative measure of development efforts required to implement x.In turn, the estimable item uncertainty (EIU), x[EIU], is a non-negative dimensionless number expressing how much is unknown with regard to x; in other words, the bigger the x[EIU], the less definitive the x[NDE] and vice versa.It is worth emphasizing that EIPs and EIUs are supposed to be designated for LEIs (not CEIs).
EIPs and EIUs are based on experts' judgment.In Figure 4, an example is shown of EIP and EIU estimation based on the idea of affinity grouping [5]: the LEIs are placed on a coordinate plane where the horizontal axis corresponds to the size (EIP), and the vertical axis defines the level of uncertainty (EIU).One of the strengths of the described approach is its visual representation of the project scope on a two-dimensional plain, allowing a relative comparison of EIs' sizes and uncertainties.After estimating EIPs and EIUs, it is necessary to transform them into NDEs.Define the relationship between NDE and EIP as follows: where p > 0 is the NDE corresponding to one EIP.In turn, optimistic and pessimistic NDEs are related to the EIU as follows: where x NDE opt and x NDE psm are the optimistic and pessimistic NDEs, respectively; 0 ≤ u 1 < 1 and u 2 ≥ 0 are the coefficients defining deviation of the optimistic and pessimistic estimates from the basic NDE.Values of parameters p, u 1 , u 2 can be either based on experts' judgment or defined statistically from past projects.In Table 3, an example of applying the above approach to RTBPM-E is represented; for calculations, the following values of the parameters were chosen by the authors: p = 168 man-hour EIP , u 1 = 0.1, u 2 = 0.2.where the optimistic and pessimistic estimates form the range −20.3% . . .+ 40.7% (relative to the NDE basic ), which, from authors' perspective, is acceptable for the preliminary estimation.

Project Team Composition
Along with the NDE discussed in the previous section, project team composition is one of the key ingredients of an estimate.As can be seen in the sections below, the varying of the project team composition allows to one to obtain estimates with different project durations and costs.
Project team composition means a set of project team member roles, T, and the attributes associated with each role (e.g., a full-time equivalent, FTE).A project team includes roles in two main categories: development, T D ⊆ T (software engineers), and nondevelopment, T ND ⊆ T (e.g., project managers, test engineers, etc.).The main difference between these categories is that the efforts spent by team members in development roles are estimated in the NDE, while the efforts of the nondevelopment roles are not included in the NDE.
In order to match FTEs of development roles with the NDE, let us extend the estimation framework with a new term-normalized development full-time equivalent (ND-FTE), ψ: where ρ is the productivity coefficient (PC) defined in [9]; ϕ is the corresponding FTE.Using the introduced term, a team composition with nondifferentiated specializations applicable to RTBPM-E is provided in Table 4.In most cases, the simplest type of development team composition-with nondifferentiated specializations-fulfills the preliminary estimation needs.However, in situations where highlighting development specializations is quite important, development teams with differentiated or even mixed specializations can also be applicable at the preliminary estimation stage.Further information about the team composition types is in Section 5.

System of Working-Time Balance Equations
The idea of a system of working-time balance equations was introduced in the authors' past works [9,10].Its purpose is to define the relationships between the key estimate ingredients such as the structure of software developer working time, project team composition, project duration, and NDE.
For the preliminary estimation, a simplified version of the system of working-time balance equations is used.To achieve the simplification, we assume the following: 1.
The project timeline is not split into sprints or phases.

2.
The project team does not change throughout the whole project.

3.
There is no differentiation of the development specializations.

4.
There is a linear relationship between the project working time, W, and the development working time, D (in contradiction to (1), where that relationship is based on the structure of software developer working time): where ξ > 1 is the development working time coefficient (DWTC) (again, for simplicity reasons, it is assumed that ξ does not depend on project role m ∈ T D ).Further information about the DWTC is provided in Section 5. Therefore, the system of working-time balance equations for the preliminary estimation is as follows: where T is the set of project roles;

Estimation of Normalized Development Capacity
One of the key characteristics of a project team is the normalized development capacity (NDC), which is the maximum possible NDWT that can be spent on the project scope implementation (Equation ( 4) in Section 2).For the preliminary estimation, the following expression for NDC takes place: where U m is a set of all possible values of NDWT.Taking into account (2), NDC is expressed as follows: where D m is a set of all possible values of the DWT for development role m ∈ T D .Then, from (9), it follows: and In practice, ( 12) and ( 13) can be used to solve an inverse problem-finding the amount of project scope that a particular project team is capable of implementing if project duration L is known.

Estimation of Project Duration
In the case of the preliminary estimation, the main criterion of estimating project duration is that the project team must have enough NDC to implement the project scope, measured as NDE, E: Therefore, project duration, L, is estimated with the following inequation: An example of estimating the optimistic and pessimistic durations for RTBPM-E is represented in Table 5-to implement the project scope from Table 3 utilizing the project team defined in Table 4, it will take from L opt = 6.5 to L psm = 11.5 months.Due to the high level of uncertainty, the estimated duration range is wide, which is expected for the preliminary estimation.The duration estimation based on (15) does not guarantee high accuracy; instead, it allows a roughly evaluation of the duration of the project using relatively simple calculations.Narrowing down the estimate range will be undertaken in the next estimation stages.

Optimization of Project Duration and Team Composition
As can be seen above, the preliminary estimation operates with these three main ingredients: project scope, team composition, and project duration.Given that the project scope is fixed (i.e., the NDE does not vary), the other two components are interdependent: changes in the team composition imply different project durations and vice versa: depending on the project duration, different team compositions are required.Manual selection of the best combination of these two ingredients requires time spent on calculations.To make this more efficient, a multiobjective optimization is proposed: where T is the set of project team roles; m ∈ T is a particular role belonging to the team; r m ∈ [0, 1] is the normalized hourly rate of role m ∈ T; W m is the PWT of role m ∈ T; ϕ m is the FTE of role m ∈ T; L is the project duration; C is the NDC of the development team T D ⊂ T; E is the NDE; I is the development team idle time.It is worth noting that applying normalization to the cost-related variables brings the following benefits: (a) avoiding disclosure of commercially sensitive information; (b) avoiding too-big values of the objective function; (c) currency-independent calculations with further conversion of the normalized costs to a required currency.One of the main constraints to be satisfied is ( 14)-the chosen team composition and project duration have to allow the implementation of the project scope estimated as E. Also, it is worth applying restrictions on the project duration: The other group of constraints is applicable to the project role FTEs: where T * ⊆ T is a subteam of project team T. For example, a team has to include at least one middle software engineer: 1 ≤ ∑ m∈T mid ϕ m (where T mid ⊆ T is a subset of middle software engineers); or, the size of the whole team, T, does not have to exceed 25 FTEs: The interrelation of FTEs for different project roles is expressed with this type of constraint: where T 1 ⊆ T and T 2 ⊆ T are a subteams of T; γ 1 > 0 and γ 2 > 0 are constants.For example, 1 project manager cannot lead more than 15 team members: Let us substitute real decision variables ϕ m with the corresponding integer variables f m : where f m ∈ N 0 = {0, 1, 2, . ..} is a whole number of minimum FTEs for role m ∈ T; µ m > 0 is a minimal possible step of FTE change for role m ∈ T. Also, let us vary the project duration within a range (20): Therefore, for each L k , k = 1, p, a sequence of integer programming problems with an objective (16), constraints ( 14), ( 21), (22), and decision variables ( 23) is received.Solving these optimization tasks produces a sequence of p alternatives Then, the alternatives are ranked using the analytic hierarchy process (AHP) [36].
In application to the RTBPM-E example, let us solve a sequence of integer programming problems, varying the project duration from [L] min = 4 months to [L] max = 16 months with the step of 0.5 month for both optimistic and pessimistic estimates.Then, the received alternatives are ranked with AHP according to the criteria from Table 6.The top alternatives are provided in Tables 7 and 8 for the optimistic and pessimistic estimates, respectively.As a result, for the optimistic estimate, alternative 1 is chosen (as recommended according to the AHP ranking).However, alternative 2 is selected for the pessimistic estimate (in this regard, it is worth emphasizing that the AHP-based alternative ranking is just a decision support tool, while the final conclusion is made by the experts).Table 9 provides the project team composition corresponding to the chosen alternatives.The calculations in the current section were performed with a Python script using the following libraries: (a) Pyomo v.6.5.0 (https://www.pyomo.org/,(accessed on 11 May 2023)) as an optimization model builder; (b) FICO Xpress v.9.1.0under the community license (https://www.fico.com/,(accessed on11 May 2023)) as an optimization task solver; (c) ahpy v.2.0.0 (https://github.com/PhilipGriffith/AHPy,(accessed on 11 May 2023)) for the AHPbased ranking of the alternatives.
A comparison of the manual (Tables 4 and 5) and optimized (Tables 7-9) estimates is given in Tables 10 and 11-the optimized estimate, on the one hand, slightly increases the ND-FTE and, on the other hand, outperforms the manual estimate, reducing the project duration and cost.The proposed decision support tool set reduces the experts' time spent on deciding on the team composition and the project duration.However, it requires a specific software implementation and calibration of the parameters.

Project Scope Decomposition
Building an EIBS might be one of the most challenging and time-consuming parts of a preliminary estimation.This is mostly caused by quite high uncertainty and limited time.
Despite this, it is necessary to break down the project scope at a high level without diving into details but, at the same time, covering all of the functionality of the future project.
Working with an EIBS includes these three main steps: (a) decomposition; (b) analysis; (c) sizing.Each of them is related to the necessity of processing a quite large amount of information under time restrictions.In this regard, the following mistakes are possible: (a) missing significant parts of the project functionality; (b) blowing up the scope by including unnecessary features; (c) incorrect structuring of the project scope, which can be difficult to elaborate on during the next estimation stages.
These are some recommendations that can help with the challenges mentioned above: 1.
Use a project scope visualization that makes the perception and analysis of large amounts of information easier.

2.
Cover the unknown with assumptions and risks.

3.
Add placeholder EIs to cover the unknown parts of the functionality.4.
Use EIBS templates based on historical estimates and projects.
Methods of building an EIBS still require further research and testing in practice.

Development Specializations
Depending on distinguishing specializations, a development team belongs to one of the following three types.The simplest is a team with nondifferentiated specializations.It is worth noting that this does not mean that the developers on such a team do not have specializations during the project implementation phase; instead, it means that at certain estimation stages, those specializations are just not defined.For the preliminary estimation, this kind of team is used for simplicity reasons.In the case of a development team with differentiated specializations, each development role belongs to one of the specializations.Developers belonging to this type of team can perform tasks of a single specialization only (not two or more).And the third type is a development team with mixed specializations, where a particular software engineer can perform tasks belonging to several specializations.Such a situation takes place, for example, when a so-called full-stack developer performs both front-end and back-end tasks.In the case of nondifferentiated specializations, NDE and NDC are single numbers, while for the other two types they are defined as vectors, each component of which corresponds to a particular specialization [9,10].

Development Working Time Coefficient
The current section aims at disclosing the nature of the development working time coefficient (DWTC).
For intermediate estimation [9], the relationship between PWT and DWT reflects the structure of developer working time: where W D is the development project working time; T D is the set of development project roles; K is the set of project phases; Q is the set of development specializations; respectively, the project working time (PWT), the development working time (DWT), the supplementary development activities, the general project activities, the leave time, and the idle time for development role m ∈ T D in project phase i ∈ K (further information about these variables is provided in [9]).
The DWTC from ( 8) is introduced in order to decrease complexity by reducing the number of variables in the system of working-time balance equations.Therefore, for the preliminary estimation, development project working time, W D , is expressed as follows: where ξ is the development working time coefficient (DWTC); D m is the development working time (DWT) of role m ∈ T D .Following the concept of the ELC, a preliminary estimate is supposed to be "wider" than the corresponding intermediate estimate (given that the NDE is the same for both estimates); on the other hand, W D does not have to be too big to prevent making a wrong impression regarding the project duration and budget: where q > 1 is the coefficient defining the high boundary of PWT, W D , received as the outcome of preliminary estimation; W D is the corresponding PWT resulting from the intermediate estimation.
Summarizing the above, DWTC depends on what is defined by ( 25): (a) the structure of software developer working time; (b) the project release plan (splitting into project phases); (c) the development team composition (roles, development specializations, involvement in different project phases).And the coefficient's boundaries are limited according to (27).
To define DWTC, it is suggested that one rely on the data from historical projects implemented in a particular company as well as on experts' judgment.For practice usage, it is recommended that a decision tree is created that allows one to obtain a value of DWTC based on the specifics of a particular project.

Alternative Approach to Define Development Working Time Coefficient
In the current section, an alternative way to simplify the system of working-time balance equations is presented.Let us keep Assumptions 1-3 from Section 4.5, replacing the fourth assumption with a linear relationship between the PWT of the development team, W D = ∑ m∈T D W m , and the NDE, E: where κ > 1 is an alternative way to define the DWTC (which is originally introduced in ( 8)).This implies the following system of working-time balance equations: an expression for NDC: and the estimated project duration: As can be noticed, ( 30) and ( 31) are quite similar to ( 13) and (15), respectively.The only difference is that (30) and (31) rely on the development FTE, ∑ m∈T D ϕ m , while (13) and (15) are based on the ND-FTE (which, in turn, takes into account productivity coefficients ρ m of each development role m ∈ T D ).From this standpoint, (28) results in even simpler estimation formulas in comparison to the ones following from (8).
To choose between the two approaches for a particular case, it is recommended that one take into account such factors as the necessity of considering the PCs and the availability of historical data to define the DWTC (in particular, (8) requires more granular historical project data than (28)).

Threats to Validity
The conditions under which usage of the proposed preliminary estimation is not recommended are discussed in the current section.
The preliminary estimation is supposed to be applicable in the early project stages, when the level of "unknown" is quite high.However, its usage is not recommended when the level of uncertainty is so high that it is not possible to define the project scope (i.e., build an EIBS) even at a high level without going into details.In such a case, an introductory estimation based on project similarity might be more suitable.
In situations where it is difficult to define the DWTC, usage of preliminary estimation is not desirable.This might happen, for example, in the case of completely new implementation technologies or project teams whose performance is difficult to forecast.As a solution, intermediate estimation [9,10] can be applied.Intermediate estimation does not use the DWTC; instead, it requires more detailed analysis and operates with more parameters.
Another case when preliminary estimation is not applicable is the situation when the composition of a project team significantly changes along the project timeline.Intermediate estimation [9,10] is more suitable under this condition, since team composition changes are built into this method.
As can be seen from ( 18) and ( 19), the preliminary estimation defines the project duration depending on the NDE of the project scope and the NDC of the project team.Obviously, in situations where there are strict dependencies between project tasks or dependencies on other projects, the preliminary as well as intermediate estimation project duration formulas are not going to work.Using approaches based on the critical path method (CPM) and the project evaluation and review technique (PERT) [3,4,37,38] might be a solution in such a case.
Although the ELC (Figure 1) defines preliminary estimation as one of the successive steps that are complementary to the SDLC stages, under the circumstances described above, it is suggested that one skip the preliminary estimation, replacing it with other methods.

Conclusions
The preliminary estimation proposed in the current study is an integral part of the authors' estimation framework [9,10].According to the concept of the estimation life cycle (ELC), the preliminary estimation can be either the first stage or a logical continuation of the predecessor stage-the introductory estimation.In the second case, apart from an approximation of a timeline and budget, the introductory stage passes along the problem statement understanding, information about a business domain, and identified similar projects or products.Then, at the subsequent stage-the intermediate estimation-the inputs from the previous step are made more definitive by (a) reducing the number of project scenarios (ideally, to one main scenario); (b) detailing an EIBS; (c) adding specializations to a development team; and (d) splitting a project timeline into phases.
The key advantage of the proposed preliminary estimation is that it belongs to the estimation framework covering the whole SDLC.Except for COCOMO II [2], none of the existing methods is aimed at supporting the entire software project life cycle.Despite its high level of maturity, the main obstacle to using COCOMO II is its complexity-preliminary estimation clearly has an advantage in this aspect.Use-case point analysis [17,[27][28][29][30], which is also applicable in the early project stages, does not lose to the preliminary estimation regarding ease of use; however, in the authors' opinion, the formulas aimed at calculating use-case points and translating them to time-based units require a certain rethinking.Probably, an adjusted version of use-case point analysis might be a useful extension of the preliminary estimation sizing approach.The agile estimation methods [5][6][7][8] are supposed to be applied rather at the implementation phase by an agile team when the project scope is decomposed to the user story level.Furthermore, those methods suffer from the lack of a tool set to translate story points to time-based units.Therefore, agile estimation methods (as they are originally defined) are hardly applicable at the early project stages.Similarity-based estimation [20][21][22][23] can be either used in the introductory estimation stage or applied in conjunction with the preliminary estimation in order to verify the results.
One of the most time-consuming and challenging parts of preliminary estimation is the project scope decomposition, i.e., building an EIBS.Usually, for this reason, an expert (a business analyst or a domain expert) has to process a large amount of information within a limited time frame.Some ideas on how to overcome such difficulties are given in Section 5.However, this matter is a separate topic that still requires further research.
An important role that preliminary estimation plays is in the development working time coefficient (DWTC).Despite the considerations in Section 5, the recommendations on how to choose a DWTC value for a particular project are still not definitive enough.Obviously, the past project estimates are supposed to be used to define the coefficient's values.In the case of insufficiency of such historical data, DWTC values can rather be based on experts' judgment.To fully cover DWTC-related questions, further research is needed.
From the practical implementation standpoint, preliminary estimation is a semistructured business process that, on the one hand, states what has to be done, and, on the other hand, intentionally leaves certain space for the involved experts' creativity.
The proposed method of preliminary estimation is simple enough to be easily implemented with a spreadsheet like MS Excel or Google Spreadsheet.However, to ensure the use of the decision support tool set and to take full advantage of the estimation life cycle (ELC) concept, the method's integration into a fully scaled estimation information technology is envisioned [34], including other estimation methods with the ability to accumulate historical data (which, in particular, will be beneficial for the use of AI-based techniques).Apart from estimation methods, the envisioned information technology will include a process-mining module aimed at monitoring and improving the estimation processes [35].

Figure 1 .
Figure 1.Concept of estimation life cycle.

Figure 2 .
Figure 2. Preliminary estimation as a semistructured business process.

Figure 4 .
Figure 4. Defining EIPs and EIUs for the RTBPM-E estimable items.
D ⊆ T is the subset of development roles; W m is the project working time (PWT) of role m ∈ T; D m is the development working time (DWT) of role m ∈ T D ; ξ > 1 is the development working time coefficient (DWTC); ϕ m is the full-time equivalent (FTE) of role m ∈ T; L is the duration of the project; E is the normalized development estimate (NDE) of the entire project; ρ m is the productivity coefficient (PC) of development role m ∈ T D .

Table 1 .
Table-based representation of EIBS for RTBPM-E.
Identifier Title DescriptionRTBPM-E Real-time business process monitoring for estimation The whole scope of the RTBPM-E project.EI-1 Table visualization of estimation process logsTable-based visual representation of logs collected from performed estimation processes including standard table features such as filtering and grouping.EI-2 Etalon model of the estimation process Read-only visualization of the etalon model is based on the 4-stage estimation process: (a) introductory estimate, (b) preliminary estimate, (c) intermediate estimate, and (d) precise estimate.

Table 2 .
Most common types of estimable item attributes.
Nonfunctional requirements NFR Nonfunctional requirements associated with an EI.Constraints CON Constraints associated with an EI.Architecture considerations ARCH Architecture ideas, patterns, tactics.Out of scope OOS Explicitly stated what is out of the project scope.Normalized development estimate NDE NDE associated with an EI.Questions Q Questions associated with an EI.

Table 3 .
NDEs of the RTBPM-E estimable items.

Table 5 .
Duration estimation for the RTBPM-E project.
* The DWTC value is based on the authors' expert judgment.

Table 7 .
Top alternatives for the RTBPM-E optimistic estimate.

Table 8 .
Top alternatives for the RTBPM-E pessimistic estimate.

Table 10 .
Comparison of the manual and optimized optimistic estimates for RTBPM-E.

Table 11 .
Comparison of the manual and optimized pessimistic estimates for RTBPM-E.