TEMPERATURE VERIFICATION: DEVELOPING AND APPLYING
A FORECAST DIFFICULTY INDEX

 

Michael A. Skipper
National Weather Service Office
Goodland, Kansas

 

I. INTRODUCTION

What is the definition of a good forecast? Is a forecast that predicts temperatures within three degrees of reality during benign weather better than a forecast with an error of five degrees when rapidly changing weather is occurring? Murphy (1993) provided insight into this question by focusing on the consistency, quality, and value of a forecast. He conveyed that quality refers to the correspondence between forecasts and observations. One such relationship, know as relative correspondence, has a broad definition but generally implies some type of benchmark is used to "grade" a forecast according to performance measures.

Grading can be based on the difficulty of the forecast situation. Can a specific forecast parameter such as temperature be rated or assigned a level of difficulty beyond the casual words "tough" or "easy"?

This paper addresses forecast difficulty, and quantifies it by developing an index to rate the quality and difficulty of a temperature forecast. Unlike most other methods of temperature verification, this index is constructed using spacial and temporal temperature trends without relying on external measures such as model data or climatology. The index then can be used as an objective benchmark for the measurement of relative difficulty of a forecast.

II. TEMPERATURE DIFFICULTY INDEX (TDX) BACKGROUND

The High Plains of the United States often provides significant forecasting challenges. These difficulties include large diurnal fluctuations in temperature, occasional significant temperature gradients over short distances, and large temperature variations from day to day. The National Weather Service Office (WSO) in Goodland is located in the heart of this often challenging area.

Parallel challenges extend to the realm of verification. The development of a forecast difficulty index began in June 1997 as an effort to provide additional insight for the WSO Goodland local verification program. Experiments using a wide array of formulas and parameters were conducted in search of an index of temperature difficulty. The results of these experiments led to a conclusion that a measure of temperature forecasting difficulty should include measures of both temporal and spatial variations. For the purpose of this formula, temporal variations are defined as a three day temperature range for either highs or lows at a given point for a given period. An additional refinement was made in January 1999 when the temperature range (R1) was changed to include the total day-to-day temperature change (R). (Table 1 gives an example of this computation.) This modification allowed the index to differentiate between periods of slowly changing temperatures and period of abruptly changing temperatures (but still had the same range as when temperatures were changing slowly). Spatial variations are defined as temperature differences over a given area at a given time (D).

When temperatures do not deviate much on a day-to-day basis or from point-to-point they are "easy" to forecast. A period where either one or both of these variations is large substantially increases the difficulty of the forecast. This conclusion was reached using a running three-day period centered on the (forecast or current) day. For example, in Table 1, the three day range (R)equals 17°F. Observed low temperatures were recorded in like manner.

 

TABLE 1
Example of Range Computation

 

DAY OBSERVED HIGH FORECAST HIGH 3 - DAY RANGE
(R1)(R)
1 45 - -
2 57 54 12 17
3 52 - -

The last term used in the index is the difference (D) between the highest observed maximum (minimum) and the lowest observed maximum (minimum) across the forecast area for a given day. Table 2 shows an example of how parameters were calculated. Sample values were given for a three day period. The parameters were evaluated for the middle day (Day 2) across three cities: Goodland (GLD), Hill City (HLC), and McCook (MCK).

 

TABLE 2
Example of Parameter Calculation

 

  DAY 1: DAY 2: DAY 3:
PARAMETER GLD HLC MCK GLD HLC MCK GLD HLC MCK

 

-
Temperature°F Error in "()" -
R -- -- -- 10 + 9 + 15 -- -- --
D -- 3 -- -- 2 -- -- 7 --
Observed 45 48 47 54 52 54 55 57 62
Forecast -- -- -- 51(3) 50(2) 50(4) -- -- --
MOS Forecast -- -- -- 48(6) 49(3) 51(3) -- -- --
Fcst Median -- -- -- -- 50.00 -- -- -- --
Fcst Average -- -- 50.33 -- --

 

III. USING THE TDX

Almost two years of forecasts (over 30,000 individual temperature site forecasts) from the Goodland Weather Office were analyzed and verified against observations. The index was then applied to all forecasts using the above technique. Table 3 displays an example of the calculations for a forecast area with three sites and how the index figures the total difference (in this case high temperatures on day 2) :

 

TABLE 3
Computing an example of "R" and "D" for day 2

 

SITE: GLD HLC MCK D
HIGH: DAY 1 45 44 50 --
HIGH: DAY 2 52 47 55 8
HIGH: DAY 3 48 49 45 --
RANGE (R) 11 5 15 --

Table 4 shows the monthly correlation using parameters "D" and "R" computed over 1998. When an entire year of data was analyzed, a simple average of R and D (NDX) most closely mirrored the mean absolute forecaster error (MAE) observed, when compared to other tested forms of this formula: when the index was high (low), the MAE was high (low). The correlation ranged from +0.71 to +0.92 as shown in Table 4.

 

TABLE 4
Correlation Between Monthly Average Error and Index 1998

 

1ST PERIOD 2ND PERIOD 3RD PERIOD 4TH PERIOD
HIGHS

 

+0.84

 

+0.81

 

+0.92

 

+0.91
LOWS

 

+0.71

 

+0.84

 

+0.72

 

+0.81

 

IV. CALIBRATING THE INDEX

Although this formula measures a relative forecast difficulty, performance of the forecaster must also be considered in order to use the index for verification. That is, the index needs to be calibrated to a MAE of a given reference period so comparisons can be made between the index and the individual forecast.

This "indexing" period needs to be at least one year to obtain valid constants. By doing this comparison, the index is calibrated to a MAE over the given period of performance. Comparisons can then be made between the index and the MAE. In this case, improvement over the index is based on the calibration period (a station can determine if improvement is made for any given subsequent period).

R can be evaluated when D is set to zero (for the entire year or more). The first step in this process is to compute R for the entire year, resulting in benchmark coefficient RC. This calibrates the period of evaluation. For example, for the Goodland forecast office, the average R was 77.30°F for the entire year. The average MAE for the six sites (four forecast periods) was 3.79°F (90.960 total °F). The average R per forecast period per site is 12.9.

 RC =

 OBS ERROR(MAE)/OBS RANGE (R)

  =

 3.79/12.9

  =

 0.293

A station may use WFO Goodland's RC value of 1.17 before of deriving their own value. Using this example, the calibrated index formula (CNDX) for period one would be:

 

 CNDX =

  (1/2n)*(RRC +nD);

 where n=

 3 sites

  =

(1/6)*[((11+15+5)*0.293)+ (3)8]

  =

(1/6)*(9+24)

  =

5.5

V. FINAL FORMULA

The final step in completing the index is the computation of forecast period constants. This computation of coefficients allows the index to account for increased difficulty due to increased separation between the forecast issue time and the forecast valid period. An index for each period of the forecast can then be used.

These separate coefficients for periods one through four were derived by taking the average improvement from the fourth to the first period (both highs and lows) for all forecasts during 1998:

 

TABLE 5
GLD Coefficients Derived from Mean Average Error (MAE) Based on 1998

 

PERIOD 1 2 3 4
MAE 3.13 3.41 3.84 4.09
FORECAST PERIOD  CONSTANT C1=.826 C2=.900 C3=1.01 C4=1.08

The final form of the equation, the Final Calibrated Forecast Temperature Difficulty Index (FCNDX) with forecast period constants is defined as

 

FCNDX = CN[(1/2n)*(RRC +nD)]

 

where:  
CN  Forecast Period Constants CN (Table 5)
N   Number of forecast periods (for N = 4)
RC  Range Constant (GLD 1998 = 1.176)
R Six-site Cumulative Range (TMAX -TMIN)
n Number of forecast (CCF) sites
D Temperature Difference across the forecast area

For example, to compute the final calibrated index (FCNDX) for Period 4 for a forecast office with eight sites forecasting for five periods by: first, compute R and D and then calibrate by finding RCafter a year of data has been analyzed. FCNDX for Period 4 would be (given RC=0.293; R1=11; R2=15; R3=5; R4=10; R5=8; R6=7; R7=13; R8=9; D=6):

FCNDX =

C4[(1/2n)*(RRC +(n*D))] ;

  =

(1.08)*((1/16)*[(11+15+5+10+8+7+13+9)*RC + (8)D])

  =

 (1.08)*((1/16)*[(78)*(0.293)+(8)(6)]

  =

 4.78

 

VI. APPLICATIONS AND CONCLUSIONS

The index may be set up for any area that has at least three forecast points. Constants can be derived (as described in the previous sections) to determine the CNDX. Multiple benchmarks may be set up as additional years of data become available. The CNDX can be used to evaluate either individual or station performance. CNDX also provides an additional method of temperature verification using a difficulty index computed from trends in observed temperature data. The index can be a useful tool in analyzing trends of forecaster improvement as well as overall station proficiency without a dependence on model performance. Tables 7 and 8 show an example of the entire index computing process:

 

TABLE 7

 

PERIOD 4TH FCSTR A 3RD FCSTR B 2ND FCSTR C 1ST FCSTR D ACTUAL HIGH
SITE 1 97 100 102 104 101
SITE 2 97 102 101 105 100
SITE 3 96 100 101 104 100
SITE 4 95 102 100 103 101
SITE 1 ERROR 4 1 1 3 -
SITE 2 ERROR 3 2 1 4 -
SITE 3 ERROR 4 0 1 4 -
SITE 4 ERROR 6 1 1 2 -
AVG MAE 4.25 1.00 1.00 3.25 -

 

TABLE 8

 

OBSERVED DAY 1 DAY 2 DAY 3 R TOTAL R
SITE 1 100 101 97 5 --
SITE 2 102 100 98 4 --
SITE 3 102 100 97 5 --
SITE 4 91 101 99 12 --
Total -- -- -- -- -26

In this example, the total forecaster MAE was 2.37. There were four sites and one period. To compute the index, the index must be calibrated. For this example, RC was computed using the data above. This was done to show how RC is computed using limited data for this example only. There needs to be at least a year of range data to make a valid calibration.

RC =

2.37/6.5

  =

0.365

CNDX =

(1/8)*((26*0.365)+(1*4))

 =

(1/8)*(13.5)

 =

1.68

 

TABLE 9

 

Period 4 3 2 1
Forecaster A B C D
AVG MAE 4.25 1.00 1.00 3.25
CN 1.08 1.01 0.900 0.826
CNDX 1.68 1.68 1.68 1.68
Index for forecast 1.81 1.69 1.51 1.38
Improvement over Index -134.80% 40.82% 34.00% -138.5%

Table 9 shows forecasters B and C had the same MAE, but forecaster B had a greater improvement over the index. In the same way, forecaster A was better relative to the index than forecaster D, even though the MAE of A was over 4 degrees.

 

VII. REFERENCES

Murphy, A.H., 1993: What Is a Good Forecast? An Essay on the Nature of Goodness in Weather Forecasting. Wea Forecasting, 8, 281-293.


USA.gov is the U.S. government's official web portal to all federal, state and local government web resources and services.