Michael’s Substack (Free)

https://michaelweissman.substack.com/p/an-inconvenient-probability-v57

Mar 25

Thanks! There's an updated version here:

Expand full comment

Mar 2, 2024

Hi-- Could you link to my report at https://ermsta.com/covid_debate instead of the link you use, thanks.

I have written a response to some of the things you have written which may clarify a few points: https://ermsta.com/posts/20240301

Cheers

Expand full comment

Reply (2)

Mar 2, 2024Edited

Most of my substantive responses would be fairly long so I'll defer them except one. You say I must have some conspiracy theory because I discount the claim that SC2 showed up abruptly at HSM without a trail of other cases, in effect distrusting the WHO report. My conclusion is “Many of the early cases were associated with the Huanan market, but a similar number of cases were associated with other markets and some were not associated with any markets….No firm conclusion therefore about the role of the Huanan Market can be drawn.” Those words are from the WHO-CDC report! George Gao, head of the CDC, was even more explicit “At first, we assumed the seafood market might have the virus, but now the market is more like a victim. The novel coronavirus had existed long before”. Gao also explained to the BBC that it was a mistake to focus the search on the north side of the Yangtze- i.e. that there was big ascertainment bias.

On your new description of Bayes: Oy! (or JFC, depending on religious choices.) It's just getting worse and worse, clearly not just a careless phrasing. We can argue about which version or neither of the CDC reports to trust, but mathematics has some definite formal structure.

Bayes is perfectly happy to work with likelihood ratios for densities of continuous observed quantities. There is no need for an "alpha" and one is rarely used. The distinction between ratios of conditional probabilities of specific observations under a variety of hypotheses and the probabilities of a range of extreme values "outside" an observed value of some statistic relative to some null is fundamental. It usually has major numerical importance. E.g. Bayes intros often point out that for a normal distribution a two-sided p-value of 0.05 corresponds to a likelihood ratio of at most e^(1.96) = ~7, much weaker evidence than most people trained in school-style frequentist stats would guess.

Expand full comment

Mar 3, 2024

I believe we seem to be talking past each other re. Bayes. You say that the likelihood *ratio* is different from a p-value -- certainly I agree. I am saying the conditional probability itself, not the ratio, corresponds to a p-value.

If you define O to be the observation that p <= alpha, then necessarily P(O | A) = alpha; and if you permit yourself to choose alpha post facto, then you can choose alpha = p, so P(O | A) = p. Thus it is only the obstacle that you can't choose alpha after observing p that stops you from saying that P(O | A) equals the p-value.

You are welcome to make a different choice for observation (perhaps one not involving an alpha), in which case you will get a different result.

Whatever you are doing to get a likelihood ratio is something very different because in standard null hypothesis testing everything is conditioned on the null hypothesis A, and P(O | not-A) is completely inaccessible, so it is impossible to compute a likelihood ratio. To get a likelihood ratio you must have left hypothesis testing behind and are doing something else, eg introducing a prior distribution conditioned on not-A. Feel free to explain the process by which you are getting that result but it is certainly unrelated to what I was talking about.

If you dispute that P(O | A) = alpha for the definition of O given above, then we have an actual mathematical disagreement.

Expand full comment

Mar 3, 2024Edited

I scarcely know what you're even talking about. You seem to think that Bayesian reasoning is a little tweak on null hypothesis significance testing, and keep dragging constructs from that over-used niche frequentist technique into the more general discussion.

P values require a statistic that's uniform [0,1] under the null and chosen so that under some perhaps loosely defined alternative the statistic tends toward an extreme value, usually 0. For a given null and observation, the p-value depends on how consistent with the null a set of hypothetical observations "more different" from the null are. One of the things that always bugs Bayesians is that the p-value involves not just the hypothesis and the observation but also a bunch of non-observed "more extreme" results lumped in with the observation, with "extreme" often having a fairly intuitive definition but, especially for two-sided tests, often needing an arbitrary choice. (log scale?, linear?,...)

None of that comes up at all for Bayes. One has some hypotheses well-enough defined to give conditional probabilities P(obs|hyp_i). If the observation is of a continuous variable, replace P with density rho. No hypothesis is called "null". With each observation one updates ratios of probabilities of the different hypotheses proportionally to the ratio of the probabilities (or densities) of the outcomes conditional on the different hypotheses. Treating an actual observation as part of a bigger pot of observations as extreme or more and using an arbitrary alpha cutoff is entirely illogical in this context. It gives incorrect results. A Bayesian doesn't have to worry whether it's the linear value or the log value or whatever that's more extreme, because nothing is involved except the likelihoods of the actual observations. Normalization of the probabilities after an update occurs over the set of hypotheses, not the set of hypothetical observations.

"To get a likelihood ratio you must have left hypothesis testing behind and are doing something else, eg introducing a prior distribution ..." You're damn right I've left null hypothesis testing behind. Good riddance! However, likelihood ratios have nothing to do with priors. Only at the end, if you want to convert your likelihood ratios to posterior odds or probabilities do the priors of the competing hypotheses come in.

I should mention a caveat. Models for calculating conditional probabilities are often not cut-and-dried. They can involve some uncertain nuisance parameters. In those case one needs to do a hierarchical Bayes calculation over the prior distribution of the nuisance parameters to obtain the likelihood. Those nuisance parameter priors generally have nothing to do with the priors for the competing hypotheses. Each likelihood is calculated entirely within the conditional framework of assuming its hypothesis is correct, ignoring whether that ultimately will turn out to be probable or not.

My wife was famous as an iconic statistics teacher of 10's of thousands of undergrads. Her workbooks, lectures, homework, etc. were models of clarity in teaching mostly frequentist stats. She loved the challenge of getting students to understand the convoluted logic of p-values.There was a brief, well-taught informal section on Bayes.

But I always had a nagging sense that maybe doing such a good job on teaching about p-values would undermine the students' ability to think in more general ways. Like over-training an immune system on a narrow epitope. Or teaching quantum mechanics as if the energy eigenstates were "the states" rather than "a convenient basis set". This exchange is re-awakening that unease.

Expand full comment

Mar 4, 2024

So long as we agree we are talking about different things, that is sufficient. I do not know why you went forwards so confidently with saying my math was wrong when now you just say that you don't get my point, which is fine by me since it was not an important point.

> You seem to think that Bayesian reasoning is a little tweak on null hypothesis significance testing

The cartoon version is that Bayesian reasoning uses three things: P(O | A), P(O | not-A), and a prior P(A). Hypothesis testing uses only the first, P(O | A). Yes you can make things more sophisticated with continuous variables and uncertainties and such, but the target audience of my primer was people who had never used bayesian reasoning / hypothesis testing before, so I stuck to the very simplest form of each framework.

Re. lumping in "more extreme" results. Certainly in this respect a Bayesian framework is clearer and easier to understand, but so long as there are only two competing hypotheses of interest there is no ambiguity or arbitrariness with hypothesis testing. The choice that maximizes the statistical power is to use the likelihood ratio as the test statistic (or any monotonic function thereof). Of course if you can actually explicitly calculate likelihood ratios you might as well just do a Bayesian analysis and not bother with hypothesis testing -- my point is just that, in principle, there is nothing arbitrary about how to make that choice.

Expand full comment

Mar 4, 2024Edited

In saying that I did not know what you were talking about, I did not mean that as a positive thing. This sentence is just flatly false mathematically :"Bayesian reasoning uses three things: P(O | A), P(O | not-A), and a prior P(A). Hypothesis testing uses only the first, P(O | A)." The P(O | A) used in NHST is a completely different thing than the one used in Bayes. [typo fixed thanks to Eric| You had written in your explanation that Bayes require a yes-no observation, which no one who had ever used it would have written.

I'm not sure that any of your mathematical misunderstanding of the technique ends up having any relevance to the numbers that you ended up using, so it may not matter for most readers interested in the bottom line.

By a strange coincidence (they do happen occasionally) I just found out last night that exactly the misunderstanding that appeared in your description of Bayesian reasoning played a key role in the Pekar et al. paper, one of the main works cited to support the zoonotic claim. (This is more fundamental than the 3 major coding errors, etc.)

In an explicitly Bayesian calculation they want to compare P(topology|1) with P(topology|2) where the number refers to how many successful propagating spillovers occurred and the topology describes a feature of the descendant tree: 2 major polytomies separated at the base by 2nt. It turns out what they did instead was to replace the observed topology with a larger collection including more extreme ones that weren't observed, i.e. ones with more than 2nt difference. Now |2) tends to produce a bigger nt gap than |1), so this method added in fictional outcomes that agreed with n=2 and not with n=1. artificially boosting the likelihood of their preferred hypothesis. Adding in results more extreme the other way (0 or 1 nt) would have had the opposite effect. Either type of addition is mathematically just wrong. Adding in fictional outcomes is intrinsic to NHST but it gives seriously incorrect results for Bayes. You haven't given "the simplest form" of Bayes but rather a more complicated form that's wrong.

Final results of Nod's simulations aren't in yet, but it seems that together with correcting other errors, which reduced the Bayes likelihood factor from 60 to ~3, this will reduce it to less than one.

My gut feeling is that the substantive question at stake is less important than the techniques. Zoonosis happens. Lab leaks happen. That remains true regardless of which this one was, although I think the case for lab is overwhelming. Logic, math, general methods of reasoning, honesty, realistic evaluation of data,... those things are very much at stake in this whole debate. For them there is no symmetry between the sides.

Expand full comment

Reply (3)

Mar 6, 2024

Re. Pekar, the mathematics of bayesian analysis are equally valid whether you define the observation O to be "at least 2nt difference", "exactly 2nt different", or "a specific 2 nt are different" -- the choice is not mathematical in nature. That said, I am included to agree with you that the middle choice is most appropriate for this application.

While I suspect the conclusions of Pekar are probably correct, I have very low confidence in this. You will find in my covid origins report that I have specific criticisms of their model and did not give it any weight in my analysis.

Expand full comment

Mar 6, 2024Edited

> The P(O | A) used in hypothesis testing is a completely different thing than the one used in NHST.

(I assume the first reference to hypothesis testing was supposed to be bayesian reasoning.) They are different if you made different choices for O (and A), they are the same if you made the same choice for O. *shrug*

> You had written in your explanation that Bayes require a yes-no observation, which no one who had ever used it would have written.

I see, yes, that was incorrectly stated in the absolutist way I wrote it; I should had written "I need" instead of "we need" but was using the inclusive first person.

You can work with continuous variables but I used discrete yes-no observations as I found that easier to explain, and I was fresh off of my covid origins report where the participants had only used discrete variables. If O is continuous then P(O | A) becomes the differential of the p-value, i.e., P(O | A) is a probability density whose cdf is the p-value. Perhaps that would have been a better exposition, I don't know.

> It turns out what they did instead was to replace the observed topology with a larger collection including more extreme ones that weren't observed

I see how someone could easily make such a mistake, and knowing that you believe I did this by accident elucidates much of our conversation. Your previous comments make much more sense to me now in light of this.

In the context of converting NHST to bayesian computation, I am *deliberately choosing O to include more extreme observations* because the idea of such a conversion is to make use of the results of NHST. (This is explicit by statements like "define O to be the observation that p <= alpha".) When I am illustrating what NHST and bayes have in common, I use a choice for O that makes them more similar. If you use a different choice for O they will be less similar.

My goal was to draw attention to the fundamental differences between the frameworks: Bayes is symmetric between the two (or more) hypotheses, and has a prior. How to convert between p-values and likelihoods was a distraction from this purpose, so I did not go into detail, and used definitions that made them equal to each other. Unfortunately this omission led to our confusion as you had different definitions in mind.

I hope with this has cleared up the misunderstanding. Certainly I feel I now understand your comments.

Expand full comment

To clarify- since an outcome is often so detailed that no simulation could reasonably be expected to give the exact outcome under any of the hypotheses, it's often necessary to generalize it somewhat in order to get conditional probabilities in reasonable computation time. E.g. to include any 2 nt difference with one C-->T and one other, rather than specifying the particular sites of the mutations. So in that sense one usually has to include outcomes beyond the one observed. What one needs to avoid is extending the category more than necessary and especially extending it in a way that systematically favors some hypotheses more than others.

Expand full comment

Mar 2, 2024Edited

Eric- I've changed the link and just printed out your analysis. I'll respond pretty soon- it's a welcome escape from turbotax nightmares.

I did see that you were skeptical of the whole Bayesian approach. I agree that it's not a great shield against various biases, just a way of making everything as open as possible. I don't think there's any other rational game in town, so that's why I start by looking at Bayes factors.

One thing I noted right away- You say that Worobey accounts for the distance contrast between linked and unlinked. Worobey points out that linked cases fall into two categories, shoppers and workers. I don't think there's any indication that shoppers would fail to be linked. One needs a second step, some reason why one linked category would disproportionately give rise to unlinked subsequent cases, in order to account for how that heterogeneity within the linked turns into a contrast with the unlinked. Not impossible, although with uncertain sign. It's not a guaranteed first-order predictable effect, in contrast to simple collider bias.

On MahJong etc.- My point was not that we know any of that stuff with great reliability. It's that the official accounts and unofficial accounts form a big mess. My logit from that mess is 0.00. I could understand someone discarding Pekar (which is looking worse and worse by the hour as Nizzaneela runs his sims) but still saying that a market cluster tends to favor ZW. It cannot favor ZW by a factor greater than the inverse of the probability that officials leaned hard on the available data. No way that's 10,000. Maybe 2?

Expand full comment

Mar 3, 2024Edited

Thanks for changing the link and good luck with turbotax.

> Worobey points out that linked cases fall into two categories, shoppers and workers. I don't think there's any indication that shoppers would fail to be linked.

In the WHO report, you see that of the 55 December cases with a known link to HSM, only 3 are described as "buyer", ie "Community residents who purchase food for their families in the market". There is also 1 "visitor" (ie "Looking for someone in the market, without purchasing") and other categories, but the majority with a known link to the market are vendors. Unless you believe ~5% = 3 / 55 or fewer of cases acquired at HSM were among local community shoppers, this is sufficient to support Worobey's explanation for the disparity. (Or maybe 3 / 54 if we think "visitor" is ambiguous.)

Expand full comment

Mar 3, 2024

Agreed, this looks like an important issue. Do you have a link to the relevant part of the WHO report?

Given the type of investigation conducted, it would seem odd not to have asked all or almost all of the other patients whether they had been to the market. WHO specifically concluded that many cases were not linked to HSM, which would be quite a strange conclusion if they didn't bother to ask.

Expand full comment

https://jimhaslam.substack.com/p/the-bayesian-boys-thought-they-were

Mar 3, 2024Edited

Here's from Appendix E4 of the WHO report:

Vendor 30

Purchaser 12

Passer-by 5

Buyer 3

Deliveryman 2

Visitors 1

Indirect exposure Contact of the Huanan Market exposed population 2

For the unlinked cases they specifically say "denied case contact history, as well as history of exposure to Huanan Market." In other words, every case was asked if they'd been to any of the markets.

An ordinary community shopper has no reason to deny that they shopped there. So there is no reason to think that any of the identified unlinked cases were HSM customers who chose to deny having shopped there. There could be many cases that shopped there and were not linked because they were never identified as cases and therefore not queried by the team. By definition, they are not part of the data set.

Expand full comment

Jim Haslam

Feb 4, 2024

Still no reference to the WIV publishing RaTG13 in your Bayesian analysis? Was hoping for your feedback since you helped write it!

Expand full comment

Reply (2)

Feb 4, 2024Edited

p.s. Starting to read your new blog, I should thank you because it had references to one or two earlier Bayesian analyses that I had missed. I'll include them in a minor update once I've had a chance to read them carefully.

I'm not sure what "it" stands for in "since you helped write it!" If it's this blog, I wrote the whole thing. If it's any of the others I didn't help write them. Did Silver or somebody borrow an idea from me at some point? That would be good.

Expand full comment

Feb 4, 2024

I'm sure that's relevant to sorting out the detailed story. It's hard for me to see exactly what it means. Getting into uncertain details would undercut the force of my main analysis about LL vs ZW. I understand that you're especially interested in how much of the work was done in the US vs. how much in China. The only part of that question that's relevant to my argument is that Wuhan labs were involved enough at some point for Wuhan to be a likely leak location.

Expand full comment

Jan 21, 2024

Did you delete my comment?

Expand full comment

No, you put it on old v4. I answered it there. v5 was done as a new post because the changes in the organization of the the priors and the what/where/when factors were more than just tweaks. You could re-comment here and I can just cut and paste my reply, if you'd like.

Expand full comment

https://x.com/joshlbrooks/status/1743622521889464347?s=20

Thanks for the response. Sorry for my confusion:

> "There's dozens of leading experts who have spoken out in public."

This gets circular, as it depends on how you decide who is a "leading" expert. But I think you're kind of missing part of my point. One aspect could be about the application of "authority": Which experts make what claims. My sense is that among those who do the most closely-related, and most closely connected work, the scientific consensus leans towards a market-mediated spillover. Of course I could be wrong about that. And there could be corruption/self-interest or groupthink in play. Those considerations can't just be dismissed. But if I'm right, that's an important component for me, for assessing probabilities. I'm not suggesting it would be dispositive, but relevant.

> "just as the Proximal Origin authors did in private."

This is highly disputed, just exactly how what they said in private is used to inform the discussion. Needless to say, I suspect you and I will interpret it differently. But more to my point, either way it requires mind-probing and I don't think bad faith assumptions are very probabilistically reliable. If you want to know what someone meant in ambiguous statements (or at least statements that can be logically explained in more than one way) in a private conversation, imo, the best way to go about that is to ask them what they meant.

> "It's not just the obvious ones- Ebright, Kinney, Chan, Nickels,.....but also ones who will only speak privately. If I had to name the most outstanding scientist of those who have done a lot of highly domain-relevant work, it would be Bloom."

I agree that Bloom can function as a kind of touchstone here

> "It's pretty clear from his writings that he thinks the zoo arguments are crap and leans toward a lab explanation. "

My reading is that he's pretty agnostic on the origins, if not necessarily on how various arguments have been made or justified. Further, I have more recently found this to be problematic.

> "those most deeply involved in the general WIV-linked research group, Baric has maintained a discreet silence. I'm not saying authority counts for nothing but given that scientific communities have similar social dynamics to other communities, authority is a weak putty to fill in gaps not directly accessible to evidence and logic. "

When you have big gaps and not great ways to fill them, weak putty can be less sub-optimal than mind-probing as a heuristic.

> "Here's a true story. Two personable and well-spoken researchers from prestigious universities were writing up a Bayesian analysis similar in form to mine, although not using uncertainties. Their odds favored zoo, largely because of overwhelming priors (10^5) and naive belief that Worobey was strong evidence. They haven't published, perhaps in part because we had a long discussion, but more explicitly for a completely different reason. They said they could not get a single relevant scientist to speak with them. They had legal papers drawn up by their institutions guaranteeing confidentiality. Not good enough to loosen tongues. This is not normal science."

This is a problematic logic chain, IMO. Reverse engineering of this sort is fraught. There can be many reasons why these things happen. There are many degrees of freedom here any many garden paths.

Regardless, to some extent a lab-mediated spillover theory anchors on to a conspiracy of one size or another, and not just on the part of the CCP. It anchors on scientists lying and being misleading, including Lipkin. Without dismissing the possibility of a conspiracy, one existing requires an assumption of probabilities, and any assessment of probabilities without including the probability of a conspiracy is obviously incomplete, IMO.

That's all outside of your assessment of scientific probabilities, which I am unable to evaluate.

Expand full comment

All the social/psychological speculation is, as you know, extremely subjective. If i had to pick the solidest part it would just be that China is hiding all the data. which makes no sense if the labs weren't involved. But I don't use that because then you get into the whole psychological judgment swamp. On the scientific probabilities, please don't just give up on the evaluation. It's all open source and pretty simple if crude reasoning. You might enjoy it. There's something deeply liberating, if often depressing, about just sitting down and scrounging through literature, doing some calculations, etc. and accepting the results.

I opened that twitter link you gave. Do you realize how extraordinarily naive this remark you made sounds?

"Joshua Brooks @joshlbrooks Jan 6

I like to hope that bloom isn't just a politicized contrarian lab leak type but an actual scientist"

You really don't have *any* familiarity with the relevant science or scientists! So starting from scratch is more work but maybe even more liberating.

And thanks for commenting. More discussions like this are needed.

Expand full comment

Michael -

" If i had to pick the solidest part it would just be that China is hiding all the data. which makes no sense if the labs weren't involved."

That doesn't seem very solid to me. China has very solid reasons to try to protect their animal trade market, and to hide a reality that unsafe conditions they claimed they were preventing, in fact existed.

That they hid information seems non-controversial. Everyone seems to agree on that. Reverse engineering and mind-probing their reason to do that seems more to me like conformation bias than anything else

Expand full comment

Reply (2)

So how much do I use that mind-probing speculation?

From it I obtain logit =+0.00. Should I leave that out?

You want to include mind-probing of the opposite sign. OK,

I'll include -0.00 as well.

Expand full comment