Efficacy of the Localized Aviation MOS Program in Ceiling Flight Category Forecasts

(1) Background: Flying in instrument meteorological conditions (IMC) carries an elevated risk of fatal outcome for general aviation (GA) pilots. For the typical GA flight, aerodrome-specific forecasts (Terminal Aerodrome Forecast (TAF), Localized Aviation Model Output Statistics Program (LAMP)) assist the airman in pre-determining whether a flight can be safely undertaken. While LAMP forecasts are more prevalent at GA-frequented aerodromes, the Federal Aviation Administration (FAA) recommends that this tool be used as supplementary to the TAF only. Herein, the predictive accuracy of LAMP for ceiling flight categories of visual flight rules (VFR) and instrument flight rules (IFR) was determined. (2) Methods: LAMP accuracy was evaluated for the period of July–December 2018 using aviation-specific probability of detection (PODA), false alarm ratio (FARA) and critical success scores (CSSA). Statistical differences were determined using Chi-Square tests. (3) Results: LAMP forecasts (n = 823) across 39 states were accrued. LAMP PODA for VFR (0.67) and IFR (0.78) exceeded (p < 0.031) the corresponding TAF scores (0.57 and 0.56). For VFR, the LAMP showed a non-significant (p = 0.243) higher FARA (0.25) than the TAF (0.19). For IFR forecasts, the LAMP FARA was lower (p < 0.001) (0.48 and 0.81, respectively). LAMP CSSA scores exceeded the TAF for VFR (p = 0.012) and IFR forecasts (p < 0.001). (4) Conclusion: These findings support the greater integration of LAMP into pre-flight weather briefings.


Introduction
General aviation, comprised mainly of light (<12,500 lbs.), single engine, piston-powered airplanes [1], is defined as all civil aviation excepting revenue-generating, passenger, and freight transport such as air carriers. Unfortunately, general aviation shows an inferior safety record when compared to commercial operations, and accounted for 97% of all civil aviation accidents in the period of 2010 and 2015 [2].
One of the greatest challenges to general aviation safety is the operation of aircraft with respect to instrument meteorological conditions (IMC), i.e., in the absence of external visual cues, such as clouds, and in particular where ceilings are low. Under such conditions an airman must be able to fly the aircraft solely by reference to instruments. While general aviation pilots, who are instrument flight rule (IFR)-certificated, are trained to this effect, the majority (72%) of general aviation airmen do not carry this rating [3]. Pilots of the latter encountering IMC are prone to spatial disorientation, often leading to a loss of control [4]. Such mishaps carry a 2-9-fold higher risk of a fatal outcome compared with accidents unrelated to weather [5][6][7]. Put another way, while only 9% of general aviation accidents occur in IMC, they account for 25% of fatalities [8]. Accordingly, these airmen are prohibited from operating under such conditions. Furthermore, flight operations restricted to visual flight rules (VFR are defined [9], in part, by a ceiling of greater than 3000 ft. above the ground (AGL)) are recommended. This altitude is partly determined by man-made structures, some of which extend higher than 2000 ft. AGL. It should be emphasized that general aviation airmen who are IFR-certificated are not immune to the hazards of IMC. Deficient proficiency in flying by reference to instruments represents the major cause of fatal accidents in this challenging environment [10]. To operate safely, IFR-rated pilots should eschew low IFR (LIFR) weather (ceiling <500 ft. AGL) [9].
To determine whether a flight can be safely completed with respect to the aforementioned weather and IFR-certification or lack thereof, airmen are mandated to undertake a pre-flight weather briefing for any flight away from the vicinity of the airport [11]. To this end, a variety of aviation-specific weather tools informing current and forecast conditions are available, i.e., surface analyses and synopses, Meteorological Terminal Aviation Routine weather reports (METARs) and two aerodrome-specific forecast tools: Terminal Aerodrome Forecasts (TAFs) [12] and the Localized Aviation Model Output Statistics Program (LAMP) [13,14]. TAFs (covering an area within 5 statute miles of the airport) are generated by National Weather Service (NWS) Weather Forecast Office meteorologists every six hours [12] whereas LAMPs, issued hourly, are entirely automated [15]. Considering that the typical general aviation leisure flight is <100 nautical miles in distance, aerodrome-specific forecasts are of particular utility especially when such forecasts along the route of the flight [16], in addition to the departure and destination aerodromes, are included in the pre-flight weather briefing.
Unfortunately, of the approximately 5100 civilian aerodromes (also referred herein as stations) in the USA [17], TAFs are issued for only approximately 750 [12], with a bias towards larger airports used by commercial services and less so by general aviation [17]. In contrast, the LAMP, a relatively new forecasting gridded tool only integrated into the NWS in 2010 [18], represents a potential alternative to the TAF by nature of its wider prevalence at those airports more frequented by general aviation. As of 2018, 1853 airports are served by the LAMP [19]. The LAMP represents an automated (human input-independent) forecast blended from the current observation, output from three advective models and the Global Forecast System Modeled Output System (GFS MOS) [15]. However, the Federal Aviation Administration (FAA) presently cautions airmen [20] to employ LAMP forecasts only as supplementary to the TAF (presumably issued from neighboring airports) rather than as a stand-alone alternative. The reasons for this might be related to the sparsity of studies addressing the accuracy of this forecast tool and the absence of human oversight in the forecast process.
Considering the greater number of general aviation-frequented aerodromes served by LAMP forecasts (relative to TAFs), the objective herein was to determine the predictive accuracy of this forecast tool for the VFR (>3000 ft.) and IFR (500-1000 ft.) ceiling flight categories.

Results
The overarching goal herein was to determine if LAMP forecasts, which are more common for aerodromes frequented by general aviation, pose an effective alternative to the TAF for ceiling flight category forecasts. A total of 823 LAMP forecasts across 39 states was used over the five-month (July-December 2018) study period with an average of five events per day. Of these, 317 and 506 were generated for the warm (July-September) and cool (October-December) periods, respectively.

VFR Forecast Accuracy by the LAMP
The pre-flight weather briefing is crucial for the VFR-only pilot, indicating whether a flight can be undertaken safely, i.e., with ceilings >3000 ft. AGL [9], as >90% of accidents involving IMC encounters have a fatal outcome [21]. Consequently, we first determined the accuracy of the LAMP in forecasting VFR conditions over the entire study duration. For comparative purposes, the accuracy of the LAMP in predicting VFR was compared to that based on persistence and the meteorologist-generated TAF.
LAMP forecasts for VFR conditions showed an aviation-specific probability of detection (POD A ) of 0.67, a value substantially higher than the 0.43 based on persistence, as seen in Figure 1. In comparing the LAMP with the corresponding TAF, the POD A for VFR forecasts was higher for the former when compared to the latter (0.67 and 0.57, respectively), a difference which was statistically significant (p = 0.031). A corollary is that the "miss fraction", where ceilings were verified as 3000 ft. AGL or lower, for the LAMP forecast was lower (p = 0.031) than the corresponding TAF. The latter observation is important from a safety perspective for the non-IFR rated airman who may rapidly experience spatial disorientation upon an IMC encounter, often with a fatal consequence [21].
comparing the LAMP with the corresponding TAF, the PODA for VFR forecasts was higher for the former when compared to the latter (0.67 and 0.57, respectively), a difference which was statistically significant (p = 0.031). A corollary is that the "miss fraction", where ceilings were verified as 3000 ft. AGL or lower, for the LAMP forecast was lower (p = 0.031) than the corresponding TAF. The latter observation is important from a safety perspective for the non-IFR rated airman who may rapidly experience spatial disorientation upon an IMC encounter, often with a fatal consequence [21]. PODA/miss fraction for the VFR flight category (ceiling >3000 feet (ft.) AGL) are shown. The Localized Aviation MOS Program (LAMP) forecast was that generated at the hour of the Terminal Aerodrom Forecast (TAF) issue. Persistence data are based on Automated Surface Observation System (ASOS)-derived ceiling data at the time of the TAF issue. A "Miss" refers to a forecast in which verified ceilings were 3,000 ft. or lower. n, count of events. Statistical differences in proportions was undertaken using a Pearson Chi-Square (2-sided) testing of a 2 × 2 contingency table.

Accuracy of LAMP Forecasts for IFR
Although IFR-rated general aviation pilots are trained to fly solely by reference to the instruments, maintaining this skillset has always posed a challenge [22]. This lack of proficiency has been identified as a major cause of accidents involving IFR-rated general aviation airmen operating in IMC [10]. Consequently, forecasts discriminating IFR (ceiling 500-1000 ft. AGL) and the more challenging LIFR (ceiling <500 ft. AGL) flight categories are of importance for the IFR-rated pilot's decision-making as to whether or not an IMC flight should be undertaken.
Accordingly, the ability of the LAMP forecast to accurately segregate these two flight categories was determined. Since, from a safety-perspective, an event in which a verified ceiling was higher-than-forecast (yielding a marginal VFR (MVFR) or VFR flight category) presents no safety hazard, such forecast flight categories were aggregated with those which accurately predicted IFR and were recorded as "hits." Over the five-month study period, of 427 forecasts for IFR, the LAMP identified an IFR (or MVFR/VFR) flight category with a PODA of 0.78, as seen in Figure 2. While this accuracy was not statistically higher (p = 0.106) than that for persistence (0.71), it was superior (p < 0.001) to that of the TAF (PODA = 0.56). Again, as a corollary, the LAMP showed disproportionately fewer (22% and 44%, respectively) "misses" (i.e., verified LIFR where IFR or better were forecast) compared with the TAF (p < 0.001). POD A /miss fraction for the VFR flight category (ceiling >3000 feet (ft.) AGL) are shown. The Localized Aviation MOS Program (LAMP) forecast was that generated at the hour of the Terminal Aerodrom Forecast (TAF) issue. Persistence data are based on Automated Surface Observation System (ASOS)-derived ceiling data at the time of the TAF issue. A "Miss" refers to a forecast in which verified ceilings were 3,000 ft. or lower. n, count of events. Statistical differences in proportions was undertaken using a Pearson Chi-Square (2-sided) testing of a 2 × 2 contingency table.

Accuracy of LAMP Forecasts for IFR
Although IFR-rated general aviation pilots are trained to fly solely by reference to the instruments, maintaining this skillset has always posed a challenge [22]. This lack of proficiency has been identified as a major cause of accidents involving IFR-rated general aviation airmen operating in IMC [10]. Consequently, forecasts discriminating IFR (ceiling 500-1000 ft. AGL) and the more challenging LIFR (ceiling <500 ft. AGL) flight categories are of importance for the IFR-rated pilot's decision-making as to whether or not an IMC flight should be undertaken.
Accordingly, the ability of the LAMP forecast to accurately segregate these two flight categories was determined. Since, from a safety-perspective, an event in which a verified ceiling was higher-than-forecast (yielding a marginal VFR (MVFR) or VFR flight category) presents no safety hazard, such forecast flight categories were aggregated with those which accurately predicted IFR and were recorded as "hits." Over the five-month study period, of 427 forecasts for IFR, the LAMP identified an IFR (or MVFR/VFR) flight category with a POD A of 0.78, as seen in Figure 2. While this accuracy was not statistically higher (p = 0.106) than that for persistence (0.71), it was superior (p < 0.001) to that of the TAF (POD A = 0.56). Again, as a corollary, the LAMP showed disproportionately fewer (22% and 44%, respectively) "misses" (i.e., verified LIFR where IFR or better were forecast) compared with the TAF (p < 0.001). PODA/miss fraction for the IFR flight category (ceiling 500-1000 ft. AGL). The "Miss Fraction" group refers to an IFR flight category forecast which was validated as Low IFR (LIFR) (ceilings <500 ft. AGL). The LAMP forecast was generated at the hour of the TAF issue. Persistence data are based on ASOS-derived ceiling data at the time of TAF issue. n, event count. Statistical differences in proportions were tested as per Figure 1.

False Alarm Rates and Critical Success Scores for LAMP Forecasts.
Whilst the aforementioned analytical method used to evaluate LAMP accuracy is pertinent to real-world operational decision-making by general aviation airmen, it suffers from one shortcoming. Specifically, it excludes events (VFR and IFR in the current study) which were not forecast but did occur (false alarms). A tool which excessively forecasts worse-than-actual conditions may undermine the credibility of the device and ultimately lead to pilots disregarding such forecasts (in common vernacular: "crying wolf" too often). To address this shortcoming, we used both aviation-specific false alarm rates (FARA) and aviation-specific critical success scores (also known as threat scores-CSSA) to evaluate the LAMP for forecasts of VFR and IFR conditions over the five-month data collection period.
The LAMP showed a modestly higher FARA (0.25) than persistence (0.19) and the TAF (0.19) for VFR, as seen in Figure 3, although this difference was not statistically significant (p = 0.243). However, for IFR forecasts, as seen in Figure 3, the LAMP FARA (0.48) was statistically lower (p < 0.001) than that of the TAF (0.81) and persistence (0.82). As for CSSA, the LAMP exceeded the TAF for both VFR (p = 0.012) and IFR forecasts (p < 0.001), as seen in Figures 4A and 4B, respectively. Collectively, these data suggest that, in comparison with these two other forecast models, the LAMP does not excessively predict worse-than-actual conditions. POD A /miss fraction for the IFR flight category (ceiling 500-1000 ft. AGL). The "Miss Fraction" group refers to an IFR flight category forecast which was validated as Low IFR (LIFR) (ceilings <500 ft. AGL). The LAMP forecast was generated at the hour of the TAF issue. Persistence data are based on ASOS-derived ceiling data at the time of TAF issue. n, event count. Statistical differences in proportions were tested as per Figure 1.

False Alarm Rates and Critical Success Scores for LAMP Forecasts
Whilst the aforementioned analytical method used to evaluate LAMP accuracy is pertinent to real-world operational decision-making by general aviation airmen, it suffers from one shortcoming. Specifically, it excludes events (VFR and IFR in the current study) which were not forecast but did occur (false alarms). A tool which excessively forecasts worse-than-actual conditions may undermine the credibility of the device and ultimately lead to pilots disregarding such forecasts (in common vernacular: "crying wolf" too often). To address this shortcoming, we used both aviation-specific false alarm rates (FAR A ) and aviation-specific critical success scores (also known as threat scores-CSS A ) to evaluate the LAMP for forecasts of VFR and IFR conditions over the five-month data collection period.
The LAMP showed a modestly higher FAR A (0.25) than persistence (0.19) and the TAF (0.19) for VFR, as seen in Figure 3, although this difference was not statistically significant (p = 0.243). However, for IFR forecasts, as seen in Figure 3, the LAMP FAR A (0.48) was statistically lower (p < 0.001) than that of the TAF (0.81) and persistence (0.82). As for CSS A , the LAMP exceeded the TAF for both VFR (p = 0.012) and IFR forecasts (p < 0.001), as seen in Figure 4A,B, respectively. Collectively, these data suggest that, in comparison with these two other forecast models, the LAMP does not excessively predict worse-than-actual conditions.

LAMP Forecasts for Warm and Cool Periods
Due to seasonal differences between warm and cool periods (e.g., ceilings related to convective vs. non-convective clouds), we considered the possibility that more robust LAMP forecasting accuracy in one period could mask an inefficacy for the other period. To address this possibility, LAMP forecasts for VFR and IFR were segregated into warm (July-September) and cool (October-December) periods [23].
However, for both time periods, the POD A for VFR forecasts by LAMP, as seen in Figures 5 and 6, was statistically higher than that for persistence (p = 0.006 and <0.001 for warm and cool periods, respectively). However, while the LAMP also showed superiority (p = 0.043) over the TAF in the POD A for VFR forecasts (0.68 and 0.55, respectively) for the warm period, as seen in Figure 5, these two tools were comparable (p = 0.323) for the cool period shown in Figure 6 (0.65 and 0.58, respectively). POD A for the VFR flight category (ceiling >3000 ft. AGL) are shown for data collected July-September 2018 (Warm period) period. The LAMP forecast was that generated at the hour of the TAF issue. Persistence data are per the ASOS-derived ceiling data at the time of the TAF issue. n, event count. Statistical differences used a Pearson Chi-Square (2-sided) test per Figure 1.

LAMP Forecasts for Warm and Cool Periods
Due to seasonal differences between warm and cool periods (e.g., ceilings related to convective vs. non-convective clouds), we considered the possibility that more robust LAMP forecasting accuracy in one period could mask an inefficacy for the other period. To address this possibility, LAMP forecasts for VFR and IFR were segregated into warm (July-September) and cool (October-December) periods [23].
However, for both time periods, the PODA for VFR forecasts by LAMP, as seen in Figures 5 and  6, was statistically higher than that for persistence (p = 0.006 and <0.001 for warm and cool periods, respectively). However, while the LAMP also showed superiority (p = 0.043) over the TAF in the PODA for VFR forecasts (0.68 and 0.55, respectively) for the warm period, as seen in Figure 5, these two tools were comparable (p = 0.323) for the cool period shown in Figure 6 (0.65 and 0.58, respectively). PODA for the VFR flight category (ceiling >3000 ft. AGL) are shown for data collected July-September 2018 (Warm period) period. The LAMP forecast was that generated at the hour of the TAF issue. Persistence data are per the ASOS-derived ceiling data at the time of the TAF issue. n, event count. Statistical differences used a Pearson Chi-Square (2-sided) test per Figure 1.  Probability of detection/miss fraction for the VFR flight category (ceiling >3000 ft. AGL) is shown for data collected over the cool period (October-December, 2018). The procedure is as described in Figure 5.
A similar analysis was performed for LAMP forecasts of IFR conditions in the warm and cool months. For the warm period, as seen in Figure 7, LAMP forecasts for the IFR flight category showed a high POD A score (0.84), although this was not superior (p = 0.548) to that based on persistence alone (0.80). However, the POD A score (0.84) for LAMP forecasts was statistically higher (p < 0.001) than the corresponding TAF value (0.59). These findings were paralleled for the cool months, as shown in Figure 8. The LAMP was superior to the TAF in forecasting the IFR flight category (POD A scores of 0.75 and 0.55, respectively), a difference which was strongly statistically significant (p < 0.001). While higher than that for persistence (0.66), the difference between this tool and the LAMP forecast was not statistically significant (p = 0.122).
Atmosphere 2019, 10, x FOR PEER REVIEW 7 of 12 Probability of detection/miss fraction for the VFR flight category (ceiling >3000 ft. AGL) is shown for data collected over the cool period (October-December, 2018). The procedure is as described in Figure 5.
A similar analysis was performed for LAMP forecasts of IFR conditions in the warm and cool months. For the warm period, as seen in Figure 7, LAMP forecasts for the IFR flight category showed a high PODA score (0.84), although this was not superior (p = 0.548) to that based on persistence alone (0.80). However, the PODA score (0.84) for LAMP forecasts was statistically higher (p < 0.001) than the corresponding TAF value (0.59). These findings were paralleled for the cool months, as shown in Figure 8. The LAMP was superior to the TAF in forecasting the IFR flight category (PODA scores of 0.75 and 0.55, respectively), a difference which was strongly statistically significant (p < 0.001). While higher than that for persistence (0.66), the difference between this tool and the LAMP forecast was not statistically significant (p = 0.122). Forecast accuracy for the IFR flight category (ceiling 500-1000 ft. AGL) is shown as "PODA" using data from the warm (Jul-Sep, 2018) months. The "Miss Fraction" group describes an IFR flight category forecast which was validated as LIFR (ceilings <500 ft. AGL). The LAMP forecast was generated at the hour of the TAF issue. Persistence data are based on ASOS-derived ceiling data at the time of TAF issue. n, event count. Statistical differences in proportions were tested as described in Figure 1. Forecast accuracy for the IFR flight category (ceiling 500-1000 ft. AGL) is shown as "PODA" using data collected over the cool (Octomber-December, 2018) months. Details are as per Figure 7.
Taken together, these data suggest that the overall efficacy of the LAMP in forecasting VFR and IFR conditions is, at least, comparable for both warm and cold periods. Forecast accuracy for the IFR flight category (ceiling 500-1000 ft. AGL) is shown as "POD A " using data from the warm (July-September 2018) months. The "Miss Fraction" group describes an IFR flight category forecast which was validated as LIFR (ceilings <500 ft. AGL). The LAMP forecast was generated at the hour of the TAF issue. Persistence data are based on ASOS-derived ceiling data at the time of TAF issue. n, event count. Statistical differences in proportions were tested as described in Figure 1.
Forecast accuracy for the IFR flight category (ceiling 500-1000 ft. AGL) is shown as "POD A " using data collected over the cool (Octomber-December, 2018) months. Details are as per Figure 7.
Taken together, these data suggest that the overall efficacy of the LAMP in forecasting VFR and IFR conditions is, at least, comparable for both warm and cold periods.

Discussion
The current study demonstrates the LAMP to be at least comparable (and in some instances superior) to the TAF in forecasting accuracy for VFR and IFR conditions. These findings are of immense operational importance regarding pre-flight decision-making by the VFR-only and IFR-rated general aviation airman as to whether an operation should, or should not, be undertaken. Moreover, these findings advocate the use of this forecast tool in the pre-flight weather briefing as a stand-alone tool. This is especially the case for aerodromes in which a TAF is not issued and possibly integrated into the graphical area forecast (which represents a pictorial rather than textual description of the weather) as well. Currently, the FAA only recommends [20] the use of the LAMP as supplementary to TAF data.
Whilst an earlier study [23] researched LAMP forecast accuracy, it differed from ours in several respects. First and foremost, a single flight category, IFR (the ceiling employed was below 1000 ft.), was investigated. This had two consequences; specifically, the operational needs of the (i) VFR-only pilot who should restrict flights to ceilings >3000 ft. and (ii) the IFR-rated pilot, often deficient in instrument skills [9,10] and who should avoid LIFR (<500 ft.) operations [22], were not addressed. Additionally, the prior research [23] was undertaken before re-development of the ceiling and sky cover algorithm in 2012 [13]. Finally, by including data from all US stations [23], including those in areas with low temporal (seasonal to diurnal) variability, verification data may have led to positive bias for the earlier study.
From an operational perspective for a general aviation pilot, the LAMP has an additional advantage over the TAF in the hourly issuance of the former compared with every six hours for the latter. While the current study synchronized the analyses of these two forecast tools in "real-world" operations, the general aviation pilot is more likely to undertake a weather briefing several hours after the TAF issue. In contrast, the LAMP forecast, which is updated hourly, is more likely to take into account the most recently verified data as this model is partly based on current conditions [15]. That said, one disadvantage of the LAMP forecast was a non-statistical trend towards a higher false alarm rate for VFR operations (MVFR or lower forecasted but where the VFR flight category prevailed). Such false calls have the potential to undermine the credibility of this tool, leading some VFR-only pilots to disregard forecasts of marginal weather conditions. The finding that the LAMP was at least comparable to the TAF forecasts regarding the POD A for VFR and IFR ceiling-based flight categories was somewhat surprising since the former is entirely automated whilst the latter is generated by a trained meteorologist who may draw on several sources of weather data (including the LAMP) as well as experience. It should be noted, however, that it is at the discretion of each NWS Forecast Office meteorologist as to whether LAMP data are employed to generate the TAF (personal communication with Lance Wood, NWS Forecast Office). Since the LAMP forecast tool is relatively new [18] and validation studies sparse, it may be that NWS Weather Forecast Office meteorologists have been reluctant to make use of this tool. If so, such reservations combined with the fact that the geographical area covered by each NWS Weather Forecast Office is extensive (122 NWS offices cover the entire USA [12]) may offset any advantages (e.g., experience) of human-input. The superiority of automation over the human interface is not at all new in aviation. For example, the evolution and integration of fly-by-wire aircraft in commercial aviation and in which computer algorithms mitigate against the aircraft departing its flight envelope is well accepted [24].
The current study was not without limitations. First, while the standard definition of flight category is based on both ceiling and visibility [9], only the former was used in the current study. Nevertheless, weather-related general aviation accidents with a fatal outcome are often due to spatial disorientation following an inadvertent encounter with low ceilings [10,25,26]. A second limitation was that, in focusing the study on geographical areas likely to experience marginal weather conditions on a given day based largely on synoptically-driven features (LCL heights and/or frontal regions), aerodromes affected by their own micro-climates may have escaped evaluation. Nevertheless, our strategy was warranted to avoid a positive bias associated with stations located in areas of low temporal (seasonal or diurnal) weather variability.
The current study argues for a greater integration of LAMP forecasts into the pre-flight weather briefing than currently advocated by the FAA [20], which suggests that such data be used as supplementary to the TAFs only. Nevertheless, considering the imprecise nature of weather forecasting and that the LAMP and TAFs are for geographically discrete areas (i.e., an aerodrome), airmen should always avail themselves of all data applicable to a planned flight-in particular, the recent graphical area forecast even for short distance (<100 nautical miles) operations common to general aviation.

Selection of Aerodromes
Over a five-month period (12 July-17 December 2018), geographical areas of the contiguous USA favoring low cloud ceilings were identified daily using two approaches: (i) the Skew-T-derived lifting condensation level (LCL) [27] from 0000Z radiosonde launches and (ii) consultation of the surface prognosis chart [28] valid at 1800Z. The latter, issued by the NOAA Aviation Weather Center at 0935Z, provides a forecast of surface pressure systems and fronts for a two-day period [12]. Contoured areas corresponding to the 500-m LCL height (Skew-T) and low-pressure frontal systems (surface prognosis charts) guided the selection of areas for LAMP evaluation. Aerodromes located in the aforementioned regions were then manually selected for that day (one per state) with the caveat that only aerodromes for which both TAFs and LAMPs were issued were used.

Comparison of Forecast Tools
For a comparison of the LAMP forecast with others, a block of time was chosen daily for each station based on the corresponding TAF [12,28]. This time block constituted a time frame in the TAF validity period starting at any time element (irrespective of the type, e.g., From (FM), Temporary (TEMPO) group) and ending at the subsequent time element [12]. No priority was accorded to any specific time element. The initial time element selected was <4 h into the valid TAF period; this strategy reflected the common general aviation flight scenario in which a pre-flight weather briefing is undertaken 0-4 h prior to departure with a flight duration thereafter of 1-2 h. For comparative purposes and since LAMPs and TAFs are issued hourly and every 6 h, respectively, the LAMP forecast [29] assessed was that generated at the hour concurrent with the TAF issue time. Weather persistence was used as a baseline and the flight category was based on the ceiling as of the time of the TAF issue.
Weather forecasts and observations were collected once daily over five months to include warm (July-September 2018) and cool (October-December 2018) periods [23]. The selection of stations for cool months was restricted to those employed in the warm month analysis.

Flight Categories and Forecast Tool Accuracy
Ceiling height (where ceiling is defined as either broken or overcast) was used to define flight categories: VFR >3000 ft.; marginal VFR (MVFR) >1000-3000 ft.; IFR 500-1000 ft.; low IFR (LIFR) <500 ft. as described elsewhere [9]. All altitudes are AGL. For each forecast tool, the flight category assigned each day was based on the lowest ceiling over the aforementioned block time described at the beginning of the "Comparison of Forecast Tools" section. Flight categories for LAMP [29], TAF and persistence forecasts [28] were validated using Automated Surface Observation System (ASOS) data [30].
To determine the efficacy of the forecast tools, two 2 × 2 contingency tables, as seen in Table 1, were created for use with modified (as described below) probability of detection, false alarm ratio and critical success scores; one for a VFR forecast and the other for an IFR forecast.
For the VFR flight category, a forecast was considered a "hit" when VFR conditions were forecast and VFR conditions were observed, shown in Table 1 cell (a), while a forecast was considered a "miss", as seen in Table 1 cell (b), when VFR conditions were forecast and conditions other than VFR (MVFR/IFR/LIFR) were observed. If other than VFR conditions were forecast, and VFR was observed, as seen in Table 1 cell (c), this was considered a "false alarm." For the case of IFR conditions, a forecast was considered a "hit" if IFR conditions were forecast and IFR or better (VFR/MVFR) conditions were observed, as seen in Table 1 cell (a). A "miss" was recorded if LIFR conditions were observed, as shown in Table 1 cell (b). If LIFR conditions were forecast and IFR or better (VFR/MVFR) was observed, this was considered a "false alarm", as seen in Table 1 cell (c). The rationale for this approach is that an LIFR situation puts the general aviation instrument-rated pilot at risk while VFR/MVFR does not. Table 1. A 2 × 2 contingency table for the determination of probability of detection, false alarm ratio and critical success scores. VFR conditions-"hit" if VFR forecast and observed (a), "miss" if VFR forecast and other than VFR (MVFR/IFR/LIFR) observed (b), "false alarm" if other than VFR forecast but VFR observed (c); IFR conditions-"hit" if IFR forecast and IFR or better (VFR/MVFR) observed (a) or "miss" if LIFR observed (b), "false alarm" if LIFR forecast and IFR or better (VFR/MVFR) observed (c).

Forecast Yes a b
No c d Using these definitions, the forecast parameters for the aviation-specific POD, FAR, and CSS, hereby denoted as POD A , FAR A , and CSS A (respectively), are defined below.
While this approach does not address the meteorological fidelity of the forecast, operationally it serves to distinguish between a safe and a potentially hazardous flight for the general aviation pilot in real-world operations.

Statistical Analysis
Proportion testing in conjunction with a Pearson Chi-Square (two-sided) test was used to determine where there were statistical differences [31]. The contribution of individual cells in proportion tests was determined using standardized residuals (Z-scores) in post-hoc testing. All statistical analyses were performed using SPSS (v24) software.