What is the definition of a good forecast? Is a forecast that predicts temperatures within three degrees of reality during benign weather better than a forecast with an error of five degrees when rapidly changing weather is occurring? Murphy (1993) provided insight into this question by focusing on the consistency, quality, and value of a forecast. He conveyed that quality refers to the correspondence between forecasts and observations. One such relationship, know as relative correspondence, has a broad definition but generally implies some type of benchmark is used to "grade" a forecast according to performance measures.
Grading can be based on the difficulty of the forecast situation. Can a specific forecast parameter such as temperature be rated or assigned a level of difficulty beyond the casual words "tough" or "easy"?
This paper addresses forecast difficulty, and quantifies it by developing an index to rate the quality and difficulty of a temperature forecast. Unlike most other methods of temperature verification, this index is constructed using spacial and temporal temperature trends without relying on external measures such as model data or climatology. The index then can be used as an objective benchmark for the measurement of relative difficulty of a forecast.
The High Plains of the United States often provides significant forecasting challenges. These difficulties include large diurnal fluctuations in temperature, occasional significant temperature gradients over short distances, and large temperature variations from day to day. The National Weather Service Office (WSO) in Goodland is located in the heart of this often challenging area.
Parallel challenges extend to the realm of verification. The development of a forecast difficulty index began in June 1997 as an effort to provide additional insight for the WSO Goodland local verification program. Experiments using a wide array of formulas and parameters were conducted in search of an index of temperature difficulty. The results of these experiments led to a conclusion that a measure of temperature forecasting difficulty should include measures of both temporal and spatial variations. For the purpose of this formula, temporal variations are defined as a three day temperature range for either highs or lows at a given point for a given period. An additional refinement was made in January 1999 when the temperature range (R_{1}) was changed to include the total daytoday temperature change (R). (Table 1 gives an example of this computation.) This modification allowed the index to differentiate between periods of slowly changing temperatures and period of abruptly changing temperatures (but still had the same range as when temperatures were changing slowly). Spatial variations are defined as temperature differences over a given area at a given time (D).
When temperatures do not deviate much on a daytoday basis or from pointtopoint they are "easy" to forecast. A period where either one or both of these variations is large substantially increases the difficulty of the forecast. This conclusion was reached using a running threeday period centered on the (forecast or current) day. For example, in Table 1, the three day range (R)equals 17°F. Observed low temperatures were recorded in like manner.
DAY  OBSERVED HIGH  FORECAST HIGH  3  DAY RANGE (R_{1})(R) 
1  45     
2  57  54  12 17 
3  52     
The last term used in the index is the difference (D) between the highest observed maximum (minimum) and the lowest observed maximum (minimum) across the forecast area for a given day. Table 2 shows an example of how parameters were calculated. Sample values were given for a three day period. The parameters were evaluated for the middle day (Day 2) across three cities: Goodland (GLD), Hill City (HLC), and McCook (MCK).
DAY 1:  DAY 2:  DAY 3:  
PARAMETER  GLD  HLC  MCK  GLD  HLC  MCK  GLD  HLC  MCK 

Temperature°F  Error in "()"    
R        10 +  9 +  15       
D    3      2      7   
Observed  45  48  47  54  52  54  55  57  62 
Forecast        51(3)  50(2)  50(4)       
MOS Forecast        48(6)  49(3)  51(3)       
Fcst Median          50.00         
Fcst Average      50.33     
Almost two years of forecasts (over 30,000 individual temperature site forecasts) from the Goodland Weather Office were analyzed and verified against observations. The index was then applied to all forecasts using the above technique. Table 3 displays an example of the calculations for a forecast area with three sites and how the index figures the total difference (in this case high temperatures on day 2) :
SITE:  GLD  HLC  MCK  D 
HIGH: DAY 1  45  44  50   
HIGH: DAY 2  52  47  55  8 
HIGH: DAY 3  48  49  45   
RANGE (R)  11  5  15   
Table 4 shows the monthly correlation using parameters "D" and "R" computed over 1998. When an entire year of data was analyzed, a simple average of R and D (NDX) most closely mirrored the mean absolute forecaster error (MAE) observed, when compared to other tested forms of this formula: when the index was high (low), the MAE was high (low). The correlation ranged from +0.71 to +0.92 as shown in Table 4.

1^{ST} PERIOD  2^{ND} PERIOD  3^{RD} PERIOD  4^{TH} PERIOD 
HIGHS 




LOWS 




Although this formula measures a relative forecast difficulty, performance of the forecaster must also be considered in order to use the index for verification. That is, the index needs to be calibrated to a MAE of a given reference period so comparisons can be made between the index and the individual forecast.
This "indexing" period needs to be at least one year to obtain valid constants. By doing this comparison, the index is calibrated to a MAE over the given period of performance. Comparisons can then be made between the index and the MAE. In this case, improvement over the index is based on the calibration period (a station can determine if improvement is made for any given subsequent period).
R can be evaluated when D is set to zero (for the entire year or more). The first step in this process is to compute R for the entire year, resulting in benchmark coefficient R_{C}. This calibrates the period of evaluation. For example, for the Goodland forecast office, the average R was 77.30°F for the entire year. The average MAE for the six sites (four forecast periods) was 3.79°F (90.960 total °F). The average R per forecast period per site is 12.9.
R_{C} = 
OBS ERROR(MAE)/OBS RANGE (R) 
= 
3.79/12.9 
= 
0.293 
A station may use WFO Goodland's R_{C} value of 1.17 before of deriving their own value. Using this example, the calibrated index formula (CNDX) for period one would be:
CNDX = 
(1/2n)*(RR_{C} +nD); 
where n= 
3 sites 
= 
(1/6)*[((11+15+5)*0.293)+ (3)8] 
= 
(1/6)*(9+24) 
= 
5.5 
The final step in completing the index is the computation of forecast period constants. This computation of coefficients allows the index to account for increased difficulty due to increased separation between the forecast issue time and the forecast valid period. An index for each period of the forecast can then be used.
These separate coefficients for periods one through four were derived by taking the average improvement from the fourth to the first period (both highs and lows) for all forecasts during 1998:
PERIOD  1  2  3  4 
MAE  3.13  3.41  3.84  4.09 
FORECAST PERIOD CONSTANT  C_{1}=.826  C_{2}=.900  C3=1.01  C4=1.08 
The final form of the equation, the Final Calibrated Forecast Temperature Difficulty Index (FCNDX) with forecast period constants is defined as
where:  
C_{N}  Forecast Period Constants C_{N} (Table 5) 
N  Number of forecast periods (for N = 4) 
R_{C}  Range Constant (GLD 1998 = 1.176) 
R  Sixsite Cumulative Range (T_{MAX} T_{MIN}) 
n  Number of forecast (CCF) sites 
D  Temperature Difference across the forecast area 
For example, to compute the final calibrated index (FCNDX) for Period 4 for a forecast office with eight sites forecasting for five periods by: first, compute R and D and then calibrate by finding R_{C}after a year of data has been analyzed. FCNDX for Period 4 would be (given R_{C}=0.293; R_{1}=11; R_{2}=15; R_{3}=5; R_{4}=10; R_{5}=8; R_{6}=7; R_{7}=13; R_{8}=9; D=6):
FCNDX = 
C_{4}[(1/2n)*(RR_{C} +(n*D))] ; 
= 
(1.08)*((1/16)*[(11+15+5+10+8+7+13+9)*R_{C} + (8)D]) 
= 
(1.08)*((1/16)*[(78)*(0.293)+(8)(6)] 
= 
4.78 
The index may be set up for any area that has at least three forecast points. Constants can be derived (as described in the previous sections) to determine the CNDX. Multiple benchmarks may be set up as additional years of data become available. The CNDX can be used to evaluate either individual or station performance. CNDX also provides an additional method of temperature verification using a difficulty index computed from trends in observed temperature data. The index can be a useful tool in analyzing trends of forecaster improvement as well as overall station proficiency without a dependence on model performance. Tables 7 and 8 show an example of the entire index computing process:
PERIOD  4^{TH} FCSTR A  3^{RD} FCSTR B  2^{ND} FCSTR C  1^{ST} FCSTR D  ACTUAL HIGH 
SITE 1  97  100  102  104  101 
SITE 2  97  102  101  105  100 
SITE 3  96  100  101  104  100 
SITE 4  95  102  100  103  101 
SITE 1 ERROR  4  1  1  3   
SITE 2 ERROR  3  2  1  4   
SITE 3 ERROR  4  0  1  4   
SITE 4 ERROR  6  1  1  2   
AVG MAE  4.25  1.00  1.00  3.25   
OBSERVED  DAY 1  DAY 2  DAY 3  R  TOTAL R 
SITE 1  100  101  97  5   
SITE 2  102  100  98  4   
SITE 3  102  100  97  5   
SITE 4  91  101  99  12   
Total          26 
In this example, the total forecaster MAE was 2.37. There were four sites and one period. To compute the index, the index must be calibrated. For this example, R_{C }was computed using the data above. This was done to show how R_{C }is computed using limited data for this example only. There needs to be at least a year of range data to make a valid calibration.
R_{C} = 
2.37/6.5 
= 
0.365 
CNDX = 
(1/8)*((26*0.365)+(1*4)) 
= 
(1/8)*(13.5) 
= 
1.68 
Period  4  3  2  1 
Forecaster  A  B  C  D 
AVG MAE  4.25  1.00  1.00  3.25 
C_{N}  1.08  1.01  0.900  0.826 
CNDX  1.68  1.68  1.68  1.68 
Index for forecast  1.81  1.69  1.51  1.38 
Improvement over Index  134.80%  40.82%  34.00%  138.5% 
Table 9 shows forecasters B and C had the same MAE, but forecaster B had a greater improvement over the index. In the same way, forecaster A was better relative to the index than forecaster D, even though the MAE of A was over 4 degrees.
Murphy, A.H., 1993: What Is a Good Forecast? An Essay on the Nature of Goodness in Weather Forecasting. Wea Forecasting, 8, 281293.