Because of the atrocious abuses Americans have endured under the “authority” of what some computer model says, there have been a spate of reactions against models and modeling as such. Inasmuch as simulation and modeling were my specialties before I retired, I feel an obligation to present some actual reasoning about modeling: how it can be used constructively, and what happens when it’s abused as it has been over this Wuhan virus nonsense.
First: What is a model? Have you thought about that at all? Or have you merely regarded it as a source of “knowledge” about some particular phenomenon?
A model is emphatically not a source of knowledge. It may incorporate actual knowledge, but there are no guarantees. All a model can do is emit predictions: specifically, how a delimited system will behave if the assumptions built into the model are correct.
The construction of a model is a mathematical exercise:
- Make some assumptions about how the system of interest works: i.e., its causal mechanisms.
- Express those assumptions as behavior over time, in mathematical form.
- Implement that behavior in a computer program that simulates the passage of time.
That’s only slightly simplified. The complexity inheres in the formulation of the assumptions about the system’s causal mechanisms.
Here’s a really simple case of modeling: Imagine an organism – say, a lily pad – that resides in a pond of fixed size. Let’s assume, whether or not we have a good reason for it, that the lily pad grows. Assume further that we know the rate of growth and how it relates to the lily pad’s current size. For the sake of this example, let’s set the rate of growth at 100% per day – i.e., the lily pad doubles in size with each passing day.
To complete the model, we must set some initial conditions:
- The initial size of the lily pad (i.e., on “modeling day 0”);
- The size of the pond.
Let’s set those at 1 square foot for the lily pad and 1024 square feet for the pond. (See what I did there?) Now we run the model.
Why, lookee here: on day 10 the lily pad has covered the entire pond! Amazing! And on day 11 it’s covered twice the area of the pond! And on day 12 it’s covered four times that area! And…and…and…
If you’re still awaiting the punch line, you’ve read past it. A lily pad requires the pond beneath it to live, much less to grow. It cannot expand beyond the pond’s borders. So after day 10 the model is producing fantasies about “The Lily Pond That Ate Minnesota.”
Perhaps the assumptions are too simple, too dismissive of unmentioned considerations. Even if we have the lily pad’s initial growth rate correct, there are other factors we shouldn’t neglect. The pond might not be of a regular shape. The pad might not start from a regular shape that it maintains as it grows. And there’s that little matter of the fixed-size pond to cope with. Clearly, this model will not tell us anything about a real-world situation.
Everything depends on whether the model incorporates as its operating assumptions the real behavior of a real lily pad on a real pond. Other commentators have called this the “spherical cow of uniform density” fallacy, in reference to some other model I can’t even try to imagine without erupting in uncontrollable laughter.
So of what use is a model? Of what use could a model possibly be?
- Models can help us explore the consequences of our assumptions about physical systems.
- They can prepare data sets to be used in confirming or disproving scientific hypotheses.
- And when the model incorporates accurate assumptions about the behavior of a system:
- It can generate data about what we can expect from that system…
- …but only as long as the model’s assumptions continue to apply!
For there is this about all knowledge, regardless of its specific character: it will possess a domain of applicability: a range of conditions outside which it no longer applies. Even the best models, built solely upon well-confirmed knowledge about the systems they simulate, can only predict accurately in the near term. The passage of time tends to pull the model outside its domain of applicability, just as the physical limits to the pond destroyed the applicability of the lily-pad model above.
A responsible expert will not present a model – not even one that has generated accurate predictions in the past – as a source of reliable information about the future. He certainly won’t present it as “proof” of anything. Every sentence he utters will contain the most important of all words: if. Ironically, he might await the data reality “generates” to find the limits on his model’s domain of applicability.
Knowledge is about causal mechanisms and the power to predict outcomes. Therefore, in the nature of things, a model cannot generate knowledge…only test data.
Have a final humorous example of the constraints applicable in modeling. A mathematician who fancied himself a demography expert was powerfully impressed by the reproductive rate of the peoples of China. He tried to use it in a classroom setting as a real-life example of an infinite series, thus:
“Imagine,” said the professor, “that you could form the whole population of China – about one billion persons – into a column of marchers, four abreast, each row six feet from the one before it, and march it off the edge of a cliff at walking speed – about 3 miles per hour. How long would it take for the column to come to an end?”
After a number of students proposed answers, the professor waved them all aside. “It would never come to an end! The Chinese reproduce so fast that the back of the column would be refreshed at least as fast as the front eliminated itself!”
The class was impressed by this for an interval. Then a student at the back raised his hand.
“Yes, Mr. Smith?” the professors said.
“But Professor,” Smith said plaintively, “how could that be? They’d be marching.”
Have a nice day.
I'd never heard that variation of the marching Chinese story. Funny, and a little like the story about the King with No Clothes. It takes a seemingly naive comment to puncture a confident pundit.
ReplyDeleteThe thing about the models is that they only work with simplified parameters. Their predictive value is fairly good, unless the model get complicated. Introduce a single new variable, and the model may collapse entirely. Or, worse, feed the model bad data, and its usefulness as a predictor is utterly gone.
For example, a possibly effective treatment for C-19 - Quick, ridicule the Bad Orange Man for his stupid suggestion that an inexpensive drug might have some effectiveness for off-protocol use. Make dire assumptions that ALL of the side effects will kick in, killing MORE people than C-19. Even though most of those side effects happen over long term use (my knowledge is recent - I was prescribed the med in Feb. for RA - hasn't killed me yet, and seems to be effective for that use).
Instead, promote a vaccine that does not, as yet, exist. Or, even better, one that is actually made in China, and quite expensive, to boot.
Anything to push back against the horrible possibility of people returning to their normal life.