September 1996


An Analysis of Sea-Level Cyclone Errors in the MRF: Forecast Approaches and Limitations

Bruce B. Smith
National Weather Service Forecast Office
Gaylord, Michigan


The skill of numerical weather prediction (NWP) has increased steadily at the National Centers for Environmental Prediction (NCEP). The useful limit of predictive skill, as judged by 500-mb anomaly correlation coefficients, has shown significant improvement during recent decades (Figure 2 of Bonner 1989).

In an effort to evaluate the performance of operational NWP models, numerous verification studies have been conducted (Silberberg and Bosart 1982, Sanders 1987, Grumm and Siebers 1990, Smith and Mullen 1993). These studies serve at least two important purposes: 1) they provide forecasters with an understanding of model tendencies, enabling them to make intelligent decisions when selecting the model or models of the day, and 2) they inform those involved with the development and maintenance of NWP models as to the nature of model errors.

Unlike short-range models (0 to 48 hr), relatively few NWP verification studies have focused on output provided by medium range models. In this study, the performance of the Medium Range Forecast (MRF) model will be examined. In particular, the skill of the MRF in predicting the central MSLP, location, and 1000 to 500-mb thicknesses of surface low pressure systems will be evaluated for 72-, 96-, 120-hour forecast projections (days 3, 4, and 5). The MRF is run operationally once a day at NCEP, and is part of the Global Data Assimilation System (GDAS).


This study includes MRF data from portions of two winter seasons: 1 November 1993 to 1 March 1994, and from 1 November 1994 to 1 February 1995. All available 00-hr initialization, 72-hr, 96-hr, and 120-hr surface forecast maps were examined. The study domain covered the area enclosed by 20°N northward to 70°N, and 160°W eastward to 40°W. This region covers the North American continent and adjacent oceans.


A cyclone was included in the study if it possessed a closed isobar on any of the following maps; the 00-hr initialization, or the 72-hr, 96-hr, or 120-hr MRF surface forecast maps. Errors in forecasted central pressure, position, and 1000 to 500-mb thickness were then computed. The forecast error was defined to be the forecast quantity minus the observed one. For example, a positive (negative) central pressure error of two millibars corresponded to underdeepening (overdeepening), with the predicted pressure being two millibars higher (lower) than the observed pressure. Similarly, a positive (negative) 1000 to 500 mb thickness error corresponded to a warm (cool) bias. For all error calculations, the 00-hr initialization panel from the AVN was used as ground truth for verification.



Mean pressure errors (MPE), average absolute pressure errors (APE), average absolute displacement errors (ADE), and average 1000 to 500 mb thickness errors (ATE) for the entire study domain is shown in Table 1A. This data set consisted on nearly 1000 individual cyclones. Model errors for just the Great Lakes region are shown in Table 1B. This region was defined as the area enclosed by 40°N northward to 50°N, and 90°W eastward to 75°W. Only 50 areas of low pressure were initialized in this area by the AVN at 00 UTC during the study period.


Table 1A shows a tendency for the MRF to have overdeepened surface low pressure systems at all forecast projections (i.e., systems were forecast too strong). MPEs ranged about 1/3 mb (-0.34) at 72 hr, to nearly 1 mb (-0.90) at 120 hr. APEs revealed that the MRF was in error by an average magnitude of about 5.1 mb at 72 hr, 6.4 mb at 96 hr, and 7.8 mb at 120 hr. The fact that MPE magnitudes (systematic errors) are very small when compared to the APEs (random errors), suggested that the MRF was not able to correctly anticipate (or time) the deepening or filling trends of many cyclones.




Mean Pressure Error (MPE), Average Absolute Pressure Error (APE), Average Absolute Displacement Error (ADE), and Average 1000 to 500 mb thickness error (ATE) for the entire study domain (about 1000 cyclones).


72 hr96 hr120 hr
MPE (mb) -0.34 -0.46 -0.90
APE (mb) 5.09 6.44 7.82
ADE (km) 506 634 785
ATE (m) -18.5 -24.1 -16.7



Same as Table 1A above, except for only the Great Lakes region (about 50 cyclones).


72 hr96 hr120 hr
MPE (mb) +0.27 +0.06 -1.45
APE (mb) 5.61 6.27 7.24
ADE (km) 478 580 858
ATE (m) -10.2 -1.3 +5.5

The ADEs shown in Table 1A are very large, and reveal some of the inherent limitations associated with medium range forecasts. These errors steadily increase from just over 500 km at 72 hr to nearly 800 km at 120 hr.


To put an ADE of nearly 800 km into operational perspective, consider an area of low pressure moving to the east at 50 km/hr. An ADE of this magnitude implies that for a low that actually verifies over Ann Arbor, Michigan, an average 120-hr forecast from 5 days earlier would have placed the low perhaps to the west over Des Moines, to the north over Kapuskasing Ontario, to the east over New York City, or possibly to the south over Chattanooga. Obviously, a random error of this magnitude could have a tremendous impact on extended range forecast accuracy. This finding presents a dilemma for operational forecasters. While NWP continues to steadily improve, the model error statistics presented here clearly indicate that we have some distance to go. Consequently, there is a strong need for forecasters to acknowledge the limitations associated with NWP output, and to resist the temptation of putting extreme detail into longer range forecasts. The model errors shown in Table 1A simply do not justify the practice.

Also, shown in Table 1A are 1000 to 500 mb average thickness errors. These ATEs show a tendency for thicknesses at the center of cyclones to have been forecast too low. This cool bias was approximately 19m at 72 hr, 24m at 96 hr, and 17m at 120 hr. Given that this is a cool bias, it does not appear to be associated with the tendency for the MRF to also overdeepen systems (i.e., forecasting systems too strong would have suppressed the 1000 mb surface, therefore causing a greater thickness and a net warm bias). Consequently, this small cool bias could be attributed to positional errors and/or a bias in MRF forecasts of 500 mb height.


Across the Great Lakes region, Table 1B reveals a tendency for 72 hr and 96 hr forecasts of central pressure to be slightly too high. In contrast, 120-hr forecasts tended to be too deep by 1.5 mb (-1.45 mb). APEs were generally comparable in size to those found across the entire domain, and ranged from about 5.6 mb at 72 hr, to 7.2 mb at 120 hr. Positional errors (ADEs) over the Great Lakes region ranged from 478 km at 72 hr, to over 850 km at 120-hr. This 120-hr ADE was larger than that found over the entire study domain. This may be the result of weaker systems (and consequently more difficult to forecast) over the Great Lakes region, compared to other parts of the study domain, such as the Gulf of Alaska and the western North Atlantic.


ATEs over the Great Lakes region were much smaller than those over the entire study domain. The cool bias was noted at both 72 hr and 96 hr (about -10 m and -1 m, respectively). At 120 hr, a slight warm bias was found (+6 m). The character of MPEs and ATEs over the Great Lakes region suggests that forecasted central pressure errors may partially be contributing to the 1000 to 500 mb thickness errors, especially at 120 hr.


Figures 1a, 1b, and 1c show spatial distributions of MPE for 72-hr, 96-hr, and 120-hr, respectively. MPE magnitudes clearly increase as forecast projection increases (note the local -10 mb error over northwest Canada at 120 hr). There are several geographic biases that appear consistently on each of these figures; note that the MRF tended to overdeepen systems across much of west and central Canada, the northern U.S. Rockies, as well as across parts of the Ohio Valley and Lower Great Lakes. The tendency to not deepen low pressure systems enough was less pronounced, and was found across the Central and Southern Plains, Florida, and parts of the western North Atlantic.


Spatial distributions of ATE for 72-hr, 96-hr, and 120-hr forecasts are shown in Figures 2a, 2b, and 2c, respectively. Over much of the study domain, note the strong tendency for 1000 to 500 mb thicknesses to be forecast too cool. This bias was strongest along and to the lee of the Rockies, and over the western North Atlantic. Local error magnitudes of 100 m were noted in these areas at 96 hr and 120 hr. The only area where a warm bias consistently appeared was across eastern Canada, especially at 120 hr.




Forecasters employ numerous operational techniques when using model guidance. Some of these techniques consist of simply averaging different sources of NWP output. Using medium range forecasts as an example, this might involve averaging the MRF, along with the UKMET (United Kingdom Meteorological office) and the ECMWF (European Center for Medium-Range Weather Forecasts) models. Two other techniques sometimes used include: 1) weighted ensemble or lagged average forecasts, and 2) predictions of forecast skill based on model run to run consistency. In the next two sections, the utility of these forecast methods will be examined.


A. Lagged Averaging


The method of lagged average forecasts (LAF) is based on the notion that a forecast made using the "best" initial atmospheric state (presumably the most recent) is not necessarily the Figure 1. Distribution of the mean pressure error in (a) 72-hr, (b) 96-hr, and (c) 120-hr MRF forecasts. Contours every 2 mb; negative values are dashed; the zero contour is thicker.


Figure 2. Distribution of the mean 1000 to 500-mb thickness error in (a) 72 hr, (b) 96 hr, and (c) 120 hr MRF forecasts. Contours every 2 dm; negative values are dashed; the zero contour is thicker.
"best" forecast (Dalcher et al., 1988). A LAF ensemble approach, therefore, might take into account not only the latest operational run of a particular model, but also previous runs of the same model started one or more days earlier for the same verification time.


For this study, blended forecasts consisting of the most recent and previous MRF runs were computed for the same verification time, and for each area of low pressure forecasted. The resulting 72-hr average forecasts were then compared to the most recent 72-hr forecast, to see whether an improvement in skill could be achieved. Three blended LAF ensembles were computed; 1) an equally weighted forecast consisting of the 72-hr forecast from the most recent MRF run, the 96-hr forecast from the run one day earlier, and the 120-hr forecast from the run two days earlier, 2) an equally weighted forecast consisting of the 72-hr forecast from the most recent MRF run, and the 96-hr forecast from the run one day earlier, and 3) a forecast consisting of two parts of the 72 hr forecast from the most recent MRF run, and one part of the 96 hr forecast from the run one day earlier.


Table 2 shows MPEs, APEs, and ADEs for each of the blended LAF ensembles described above (#1-72/96/120, #2-72/96, and #3-72/72/96, respectively). For comparison, the 72-hr MRF forecast errors are also shown. These 72-hr errors differ from those shown previously in Table 1A due to analysis differences (only cyclones possessing a closed isobar on all maps were included). Table 2 reveals that none of the LAF ensembles exhibited more skill, on average, than the 72-hr MRF forecast by itself (the MPE, APE, and ADE for 72 hr forecasts are smaller than each of the individual forecast approaches). In spite of this finding, it is interesting to note that as the ensembles became more heavily weighted toward the 72-hr forecast, average errors steadily decreased. These results suggest that operational forecasters are generally better off accepting the 72-hr forecast, rather than trying to blend it with previous runs of the MRF.



Table 2

Mean pressure errors (MPEs), Average absolute pressure errors (APEs), and Average Absolute Displacement Errors (ADEs) for the most recent 72 hr forecast, and for three lagged average forecasts (LAFs). A detailed description of these LAFs can be found in text.


72 hr#1#2#3
MPE (mb) -1.14 -1.54 -1.43 -1.39
APE (mb) 4.72 5.14 4.99 4.92
ADE (km) 476 517 488 479

B. Run-to-Run Consistency


Another technique operational forecasters commonly employ when evaluating NWP output is run to run consistency for a particular model. This technique proposes that a forecaster can have added confidence with today's MRF output if it is sufficiently "similar" to yesterday's output.


In this study, the most recent 72-hr MRF forecast was compared to the previous 96-hr forecast, valid for the same time. For each individual area of low pressure, forecast to forecast differences in central pressure and position were computed. Error statistics were then stratified by an appropriate (though somewhat arbitrary) run to run difference. Table 3 shows MPEs, APEs, and ADEs as a function of run to run differences in central pressure (Section A) and position (Section b). For central pressure, a forecast was considered "similar" to the previous MRF forecast if it differed by at most 4 mb. For position, a forecast was considered "similar" to the previous MRF forecast if it differed by at most 400 km.


Table 3 suggests that there is some operational utility in the evaluation of run to run model differences. For example, Table 3A indicates that when 72-hr central pressure forecasts were within 4 mb of the 96-hr forecast from the previous run, central pressure forecasts tended to be more skillful than when differences exceeded 4 mb. The improvement in MPE was about ½ mb, while the APE decreased by 3/4 mb. The ADE did not appear to be a function of run to run pressure forecast differences.


When run to run cyclone position forecasts were within 400 km of each another, Table 3B shows that there was a tendency for the MRF to be more skillful. When the position forecasts were "similar", the average absolute pressure error was less than 4.4 mb (versus nearly 5.6 mb when run to run forecasts were not "similar"). The same tendency was noted in position forecasts. When run to run forecast differences were less than 400 km, the ADE was around 450 km (versus more than 525 km when the run to run difference exceeded 400 km).


Mean Pressure Errors (MPEs), Average Absolute Pressure errors (APEs), and Average Absolute Displacement Errors (ADEs), as function of run to run model difference in pressure (Section A), and position (Section B). The most recent 72-hr forecast is shown for comparison.


A. Pressure Difference72 hr4 mb>4 mb
MPE (mb) -1.14 -0.89 -1.42
APE (mb) 4.72 4.37 5.10
ADE (km) 476 476 476
B. Position Difference72 hr400 km>400 km
MPE (mb) -1.14 -1.19 -1.04
APE (mb) 4.72 4.34 5.59
ADE (km) 476 454 527

These error statistics, stratified by run to run model difference, strongly suggest that forecasters may be able to have more confidence in a particular MRF forecast if it is similar to the previous run of the MRF. Though the error differences were relatively small, they suggest that further study in this area may reveal geographic regions and/or flow regimes in which this operational approach is most useful.




A preliminary analysis of 72-hr, 96-hr, and 120-hr errors occurring in MRF forecasts of surface low pressure during the winters of 1993-94 and 1994-95 has been completed. The primary findings are as follows:



It is believed that one of the most important results of this study is the notion that 3 to 5 day public forecasts should not be too detailed. This is consistent with the findings of Livingston and Schaefer (1990) who cautioned against "trying to force a lot of detail" into extended forecasts when the skill of the model does not support such action. The magnitude of the average absolute pressure and position errors found in this study clearly are in support of this conclusion.


It is hoped that additional MRF verification studies, perhaps consisting of several winter seasons, will be able to isolate geographic regions and/or flow regimes that tend to be associated with consistent model output characteristics. Only after these characteristics have been documented and understood by forecasters will we be able to make more intelligent and realistic use of the MRF.



Bonner, W. D., 1989: NMC overview: recent progress and future plans. Wea. Forecasting, 4, 275-285.

Dalcher, A., E. Kalnay, and R.N. Hoffman, 1988: Medium range lagged average forecasts. Mon. Wea. Rev., 116, 402-416.

Grumm, R.H., and A.L. Siebers, 1990: Systematic model forecast errors of surface cyclones in the NGM and AVN, January 1990. Wea. Forecasting, 5, 672-682.

Kanamitsu, M., 1989: Description of the NMC Global Data Assimilation and Forecast System. Wea. Forecasting, 4, 335-342.

Livingston, R.L., and J.T. Schaefer, 1990: On medium-range guidance and the 3-5 day extended forecast. Wea. Forecasting, 5, 361-376.

Sanders, F., 1987: Skill of NMC operational models in prediction to explosive cyclogenesis. Wea. Forecasting, 2, 322-336.

Silberberg, S.R., and L.F. Bosart, 1982: An analysis of systematic cyclone errors in the NMC LFM-II model during the 1978-1979 cool season. Mon. Wea. Rev., 110, 254-271.

Smith, B.B., and S.L. Mullen, 1993: An evaluation of sea level cyclone forecasts produced by NMC's Nested-Grid model and Global Spectral model. Wea. Forecasting, 8, 37-56. is the U.S. government's official web portal to all federal, state and local government web resources and services.