Problems with creating and using models

By Kevin Roche

The Institute for Health Metrics and Evaluation at the University of Washington has put together a variety of coronavirus data and projections, which have come to be widely relied upon by policymakers.   (IMHE Page)   I am not going to go into detail on how their forecast model has wildly overstated what actually has occurred to date, others have done that very well.  But as I tried to make sense of what was being communicated by the model and other information on the site over the weekend, what I saw was not just how dependent any model is on the assumptions put in to it, and how badly this model and others miss on those assumptions, but also how poor the communication is about how the model was build, what its limitations are and to what use it might properly be put.

The original purpose of the site was to help guide decision-makers in regard to the health resources available to combat coronavirus disease, particularly hospital beds, ICU beds and ventilators, compared to the likely number of cases.  The first part of the work is relatively easy; you can be pretty accurate about the existing number of health resources.  One shortcoming is the lack of any estimate for additional capacity that could be quickly created, as has occurred in New York City for example.  But that side of the comparison of resources to need is pretty straightforward.

They further assume that certain “social distancing” techniques are implemented, as of the update today, they say through August 4th.  These social distancing techniques are lumped into four buckets–school closures, stay-at-home orders, non-essential business closures and travel limitations.  They claim to be weighting the effect of these measures and have promised a technical paper describing this.  What would be really useful is to see a separate analysis of the supposed benefit of each of these measures, to better inform policymakers.  How many lives does each save separately?  But let’s assume their categorization of social distancing measures and when they were implemented is pretty accurate.

Now note that in the technical paper explaining the model, the authors said “Even with social distancing measures enacted and sustained, the peak demand for hospital services due to the COVID-19 pandemic is likely going to exceed capacity substantially.”   This has not happened anywhere, including New York.  Why should anyone be giving any credence to this model?

So where did they go so wrong?  Where big trouble occurs for all these models is the number of cases and the number of cases that are going to be serious enough to require hospitalization or ICU use, the length of hospitalization, and of course, people are always interested in deaths.  All of these are tied to assumptions, and at this point they are largely assumptions, about such basic things as the percent of people who are exposed who become infected, the percent of people who are infected who become symptomatic, and the percent of people who are symptomatic who need hospitalization and so on.  Any one producing any model of coronavirus disease should state those assumptions up front and why they chose the ones they chose.  While the parameters can be adjusted and therefore produce a range of estimates, the model authors tend to promote certain scenarios or they allow decision-makers to pick one.  The IMHE site says they are using “observed death rates” to make their forecasts of deaths.  I cannot anywhere on the site find any more precise information, even on what death rate–% of serious cases, % of infections, % of population?  In fact the only place you can figure out anything about how this model works is in a technical paper.

On March 30 the IHME model forecast 2,411 deaths in North Carolina before August 4.
Monday’s model moves peak up to April 15. Projected death toll is revised downward to 496.

What is apparent, depending on the date you look at the site, is that the death rates they have projected are continually coming down dramatically.  Now it could be that this is due to the impacts of social distancing, but isn’t it equally likely that the initial over-estimations, which did also assume social distancing, resulted from initial assumptions which were just plain completely wrong.  The technical paper says that other models assume 25% to 75% of the population will be infected and that based on reported case-fatality rates, there will be millions of deaths in the US, presumably without mitigation.  I can’t find anywhere this group’s own model of what happens with no mitigation, or with mitigation other than the extreme social distancing tactics they describe.  That is critical information, you need to know the lives you are saving to figure out if it is worth the economic cost.

Although they don’t tell us how many lives would be lost without mitigation, the model does purport to tell us how many will die with them in place, but even that is by a dubious method.  What they decided to do was model the observed death rate curves, because they thought those directly reflect both transmission of the virus and case fatality rates.  Death rate curves early in an epidemic tell you little about either infection rates for the general population or case fatality rates.  How many times do we have to say you have to know the denominator if you are talking about anything as a percent of cases.  We don’t know cases.  They made similar poor assumptions about rates and length of hospital and ICU use.  And then of course, they fit a curve to all the junk assumptions they made.  And now they are having to adjust the bad projections based on bad assumptions as real world data comes in.

Now the really important thing to note is that this model assumes that the virus magically disappears due to social distancing and we have absolutely no new cases.  Even this unlikely state of affairs is noted in the FAQs to only persist if social distancing, i.e. an economic shutdown, is continued, indefinitely.  Since they assume that only 3% of the population is infected in the first wave, there is no “herd immunity” and the social distancing measures can’t be lifted or we will presumably see cases and deaths on the order of their original projection, but extrapolated to the whole population.  So would this group seriously recommend that we just keep the economy shutdown indefinitely or that we lift the social distancing lockdown?

Since this model was widely cited by policymakers and since it projected cases beyond what the health system could handle, the authors bear a heavy responsibility for the economic damage being inflicted on the country.  They failed to prominently warn about the limitations of modeling and the particular limitations in this case due to completely inadequate data for generating assumptions and parameters.  They have at the same time lulled people into thinking that somehow the death numbers, which is all the public seems to fixate on, that they projected were all we would see.  They have not been clear enough that they were only projecting the first wave.  They have not been clear enough in stating that to avoid, at least according to their models, hundreds of thousands of deaths we would have to keep the economic lockdown in place indefinitely.  And they have not explained why cases would completely disappear even with social distancing, as their model assumes.  All in all, a terrible job of modeling and communication.