92 Comments
User's avatar
Quantum Quokka's avatar

This post is the most compelling assessment I've seen of the situation to date. Thank you for writing this up and sharing your analysis.

I've seen the full arguments from both sides of the Rootclaim debate, and regardless of people's opinions on how the debate played out with information available at the time, it seems undeniable that studies and key information have emerged since the debate which seem to overwhelmingly conclude that the wet market has extremely low odds of being the origin of the virus, as pointed out in your analysis, which would dramatically downward shift the probability assigned to the most important point in support of Zoonosis. Many commenters don't seem to be aware of how recent some of the key points of evidence are (some as recent as March 2024) which either were not available or may not have been fully understood at the time of the debate.

For those interested, here's a brief list of some recent information that updated me towards lab leak. This is for the sake of explaining my thoughts to others, but is in no way all-encompassing. Michael does a far superior job of explaining these in great depth.

- Study published March 5th, 2024 finding intermediate sequences between Lineage A and B (https://doi.org/10.1093/ve/veae020). This research shows that Lineage B very likely came from Lineage A. All cases in the market were Lineage B, but none were Lineage A. In short, the research shows that a single spillover is much more likely than a double-spillover Zoonotic event. The double-spillover theory is a foundational argument of the ZW theory that Peter Miller and others use. This is a massive blow to the probability that the wet market was the origin of the virus, to the point where it now seems extremely *unlikely* that the wet market was the origin.

- Wildlife trade in Wuhan is significantly less than Wuhan's percentage of the population, which significantly changes the probabilities downwards of a ZW origin in the bayesian calculations that Peter Miller and others use.

- Although the DEFUSE proposal leaked in 2021, more recent drafts were discovered in 2024 which contain what appears to be damning evidence. New information included their approach using restriction enzymes (BsaI/BsmBI) that ultimately matched precisely with what Bruttel et al. (2022) found as the assembly process that would create exactly this virus, years before this DEFUSE draft leak was even public. Michael describes the degree of how unlikely this would be if the origin was Zoonotic. The DEFUSE budget leak confirms that they were purchasing these enzymes. Additionally, the new documents contained draft comments that were not available in the original leaked proposal. Among many other things, the comments show that the research work was actually planned to be done at the WIV at BSL-2 levels for cost reduction, but they edited the final document to "BSL-3" because they thought "US researchers will likely freak out" if they knew this research was being done in lower safety BSL-2 labs. The researchers seemed to think the distinction didn't matter for their research and that it was bureaucratic tape slowing them down, so they fudged the proposal to hide this. Considering BSL-2 labs are not sufficiently designed to contain airborne disease (whereas BSL-3 labs are), this does not seem to be an insignificant point in this whole debate.

Peter Miller claimed that because the DEFUSE proposal was rejected meant that the proposed work never happened (or at least had an exceedingly low likelihood of occurring), which led to the proposal essentially being dismissed as evidence, at least when it came to evaluating the arguments and probabilities. However, it is unequivocally the case that many research scientists conduct research long before they apply for the grant for that research. It is also often the case that they apply for the grant while in the middle of conducting research. This is confirmed by countless research scientists online and it's just how research often is done due to the difficulty and delays of receiving funding. Peter's conclusion that the DEFUSE research did not happen because it was rejected is so at odds with how scientists conduct research that at best I would consider Peter ignorant of this point, and at worst it would seem Peter intentionally manipulated this point and tried to get the DEFUSE proposal dismissed as evidence because it would dramatically undermine his argument and potentially change the entire outcome of the debate.

There's more, but I'll wrap this up for now. My key takeaway is that Michael's approach and evidence seem more updated, complete, and compelling than anything else available online on the topic to date. It seems especially relevant and a better analysis than either sides of the Rootclaim debate, and I believe more people should be aware of this post and read it in full before making conclusions.

Expand full comment
Michael Weissman's avatar

Yes, nice summary of the newer evidence.

I think that the HSM market story has taken serious hits, negating the importance of the badly flawed Worobey/Pekar papers. If you'd asked 5 years ago, people warning of zoonosis wouldn't have put an especially high fraction of their priors on market spillovers, since there are also farms, transport, etc. So I think the big factors assigned by some to HSM are nonsense but removing them doesn't do much to reduce the overall ZW priors.

The biggest change has been from finding the DEFUSE restriction enzyme plans. Before, most people (including me) had not considered that pattern as usable evidence because it was unclear how much ex post facto choice of what to look for was involved, i.e. the old multiple comparisons problem. After Emily Kopp published the DEFUSE plans that were stunningly close to the pattern Bruttel et al. had noticed, that factor became worth taking very seriously.

Expand full comment
Écorché's avatar

Thank you so much for this work.

I have one major suggestion, which is more discussion of how to interpret the numbers coming out. In particular, I was bothered by this analysis for a long time because the conclusion seemed too certain given the number of missing pieces. Intuitively, it seems like there should be some factor in there which has no information about the conclusion but vastly increases the uncertainty. Some or possibly all of this comes from me wanting to interpret the final numbers as representing degrees of certainty instead of subjective probabilities, but I doubt I'm the only one with that confusion.

I think this could be discussed in a few places - in the introduction, at the end when combining priors with evidence, and once at the point where you discuss the vast gap between RaTG13 and SC2. Any origin theory requires some sequences in this gap, and finding them could decide the case either way depending on whether they show up in wildlife or in a lab notebook. Maybe there's a way to evaluate counterfactual evidence formally but just discussing the significance of the missing evidence would help. For everybody's sanity there should be a clear distinction between "100:1 odds it was a lab leak" and "available evidence favors lab leak by 100:1".

There is a second small issue relating to the Bruttel et al paper and the seeming confirmation from the DEFUSE draft. I think it was Alina Chan who pointed out that the BsaI/BsmBI construction showed up in an earlier Baric paper, and that Bruttel et al probably read that paper. I didn't see any followup on that. Your wording seems to suggest that Bruttel et al predicted the choice of enzymes from looking at the genome, but that may be too strong. You didn't use it as evidence but it might be worth tweaking the text.

And a really small point: I would take the part about "quantifying friggin likely' out of the title. Most people won't get the joke, and you are so consistent about taking the high road in the rest of the document.

I can't thank you enough for this work: I really hope that you keep on this. Having it in a decent journal would be a huge step forward.

Expand full comment
Michael Weissman's avatar

Good points. Taken in reverse:

I'm sluggishly starting (with some possible coauthors) to convert this to peer-review-ready format. There will be no "friggin'" in the title. Meanwhile, I already gave the P.O. authors all the good lines because they spoke in such clear ordinary English.

I guess I should mention the preceding Baric paper. It is relevant to P(Bruttell guessing the right enzymes|ZW), but that doesn't enter directly in my calculations. To the extent that it affects P(pattern|LL), which does enter, it increases it and raises the odds. I'd thought about using it but was lazy and at some point the odds get extreme.

That brings me to the big philosophical point. I do discuss it in a narrow sense at the specific point about the sequence gap. "Although fully knowing what sequences were in Wuhan labs would be almost equivalent to answering the origins question, our current estimate of what’s there would mostly just be based on the other evidence leaning toward LL, ZL, or ZW, augmented a bit by a highly subjective sense of how forthright people are likely to be. We don’t want to either double-count our other evidence or introduce especially subjective terms."

I think this "what about missing evidence?" problem comes up in every Bayesian estimate. I've worried about it a lot, as have others. It sort of nags at the back of the mind. One can usually imagine various types of evidence that would, if available, outweigh everything that we have our hands on. If you tried to change odds based on pure ignorance of the non-evidence, all odds would approach 1/1. That would not only be wrong, it would be logically inconsistent. E.g. it would give odds (A or B) vs. C = 1 if A and B are lumped, but 2/1 if viewed as separate outcomes.

So how to deal with the sense that somehow the odds based on what we know can't be as extreme as what they seem to be? For people calling election results or horse races, they have enough cases to allow calibration. Do the ones they call 2/1 give about 2/1 in results? Etc. For pandemics, we fortunately don't have enough to calibrate.

If I were reading this blog and not writing it, I'd just take it in attenuated form. Once people start seeing things one way, whether by priors or by the first evidence they see, it starts to get hard to see other sides. So I think a reasonable reader who doesn't want to spend a huge amount of time tracking down individual factors would tend to discount the odds beyond the hierarchical and robust discounts I've used. But I can't think of a logically coherent way of saying that the odds aren't what they are because someday we may know them better.

Expand full comment
Écorché's avatar

Thanks for the explanation - discounting the odds was my first approach, and when I re-read the document looking specifically for this question I did see that you gestured towards it in the section about the sequence gap. Maybe your pedagogy is working as intended.

I guess what I'm thinking of is to estimate odds for a counterfactual like "nearby ancestral genome including FCS shows up in a wildlife sample around Wuhan" which most people would consider dispositive, and see if the method agrees. This seems like it would be a useful sanity check if it doesn't taint the analysis.

Happily, for a peer-reviewed version you can assume more expert readers.

Expand full comment
Michael Weissman's avatar

I did sort of quickly run through a negative control when I saw someone claim that a paper using very different (non-Bayesian) methods might have indicated a possible lab origin of MERS. I think the method here comes down heavily for MERS being natural.

If a close ancestor with an FCS etc had showed up anywhere, even Yunnan, then I wouldn't even bother looking at LL. Given the possible routes to Wuhan, ZL would still be very much in the running.

Expand full comment
Mikko Talvensaari's avatar

Re: conditioning on pandemic

I found it hard to accept the general implication of your reasoning, that a pandemic-causing feature must always be discounted as evidence. You seem to be thinking about the likelihoods as p(evidence|H_i, pandemic). But if the posteriors are P(H_i| evidence, pandemic), as they should, the likelihoods are not conditional on pandemic. With evidence E and conditioning event C,

P(H0|E,C)=P(E,C|H0)P(H0)/P(E,C).

The likelihood is P(E,C|H0), not P(E|C,H0)! The likelihood ratio can be factorized as

P(E,C|H₁)/P(E,C|H₀) = [P(E|C,H₁)/P(E|C,H₀)] × [P(C|H₁)/P(C|H₀)],

or

P(E,C|H₁)/P(E,C|H₀) = [P(C|E,H₁)/P(C|E,H₀)] × [P(E|H₁)/P(E|H₀)].

The latter form shows that the unconditional likelihoods need to be multiplied with p(pandemic| evidence, H_i), the probability of pandemic given evidence and hypothesis.

Expand full comment
Michael Weissman's avatar

Let's look at a couple of limiting cases. Say that people are frequently picking up SCs but they almost always fizzle because they lack an FCS and no FCS-free one can propagate in people. Then a very rare FCS gets inserted by accident and that one doesn't fizzle. The presence of the FCS in the pandemic SC is possible and required in the natural scenario so it provides no additional evidence on the source. This is basically the story that some zoonosis types tell.

Another limit would be that there are no FCS in natural SCs because there is no natural mechanism by which they can be inserted. Then the presence of the FCS would be strong evidence of a non-natural origin.

The real case seems likely to be closer to the first limit though not actually in that limit since some SCs without FCSs have propagated as respiratory viruses outside of bats, and perhaps in some bats as well. So I think it provides weak evidence against a natural origin. I don't use it out of conservatism, but perhaps should change that to including a factor of 2 or so.

Expand full comment
Martin's avatar

Sorry, I commented on an earlier version a few moments ago. Let’s try again:

I commented earlier that I think a lower bound on the prior likelihood ratio is missing. I think P0(LL)/P0(ZW)<1/100,000 is unjustifiable. But these absurd cases have a considerable weight in the t-distribution. My calculation is that the odds change from 300/1 to 1200/1 when the lower limit of P0(LL)/P0(ZW) is 1/100,000. (The effect on Gaussian is small; no effect on uniform.) Just out of curiosity: Is there a reason why you decided not to mention this? Here is the R code I used:

# Monte Carlo integration

# prior log likelihood LL over ZW; -4.2 ± 2.3

m <- -4.2 # mean prior distribution

s <- 2.3 # sd prior distribution

v <- s^2 # variance prior distribution

N <- 1e6L # nb. of draws from prior distribution

df <- 3L # degrees of freedom student-t distribution

min_L0 <- 1 / 100000 # min prior likelihood factor (LL over ZW) in case of winsorizing

L_obs <- 1300000 # likelihood factor of observations (LL over ZW)

# function to calculate mean odds based on simulated log prior likelihood factors

fct_odds <- function(lnL0, wins_TF, text) {

# lnL0: vector of randomly drawn log prior likelihood factors

# wins_TF: TRUE if winsorizing (min_L0 applied), FALSE otherwise

# text: print which distribution was used (Gaussian, Student-t, or Uniform)

L0 <- exp(lnL0) # vector of prior likelihood factors

if (wins_TF) L0 <- ifelse(L0 < min_L0, min_L0, L0) # winsorizing

odds <- L_obs * L0 # vector of combined likelihood factors (LL over ZW)

probs <- odds / (odds + 1) # vector of probabilities LL

mean_prob <- mean(probs) # mean probability LL

print(text); print(mean_prob / (1 - mean_prob))

}

# draw from Gaussian

lnL0 <- rnorm(N, mean = m, sd = s)

fct_odds(lnL0, FALSE, "Gaussian: ")

fct_odds(lnL0, TRUE, "Gaussian winsorized: ")

# draw from Student-t

lnL0 <- rt(N, df = df) * sqrt(v * (df-2)/df) + m

fct_odds(lnL0, FALSE, "Student-t: ")

fct_odds(lnL0, TRUE, "Student-t winsorized: ")

# draw from Uniform

lnL0 <- runif(N, min = m - s * sqrt(3), max = m + s * sqrt(3))

fct_odds(lnL0, FALSE, "Uniform: ")

fct_odds(lnL0, TRUE, "Uniform winsorized: ")

Expand full comment
Michael Weissman's avatar

Thanks for the comment with code.

Your argument makes sense- that most of the posterior ZW odds come from the far tails of the fat prior distribution, and those tails are unrealistic. I'm trying to keep the calculation on the conservative side, remembering that in all sorts of estimates (e.g. of physical constants) error bars turn out to be larger than initially believed. In effect, the broad distribution on priors also serves as a way of allowing for some major screw-up of the likelihoods.

Expand full comment
@capitolsheila's avatar

Wow. Quite the piece!

FWIW, only DARPA has denied funding DEFUSE. Never seen a public DOD-wide denial about funding or not funding DEFUSE or something similar. Few more agencies and subs that should/could be asked.

You write: “The site mentioned in DEFUSE for adding an FCS to a coronavirus, UNC, is smaller and uses highly enhanced BSL-3 protocols. After DEFUSE was not funded, switching this part of the work to WIV, where there was already expertise in the methods, would have been easy. …Notes from DEFUSE investigators have recently been released describing plans to actually conduct much of the research described as planned for BSL-3 at UNC instead in Wuhan, where BSL-2 was often used. While the chance of a spillover occurring at UNC isn’t zero, it’s much lower than for WIV. Thus

P(Wuhan|LL, coronavirus with FCS, etc.) = ~1.”

Let’s circle back when UNC and the USG let a few documents and secret chimera sequences out. Not sure how you do the math with the US secrets, but they don’t add up. Not all covid secrets are in Wuhan. Will leave it at that. Thanks for your work here!

Expand full comment
Michael Weissman's avatar

Sure, the level of analysis here isn't designed to distinguish between chimeric work done in Wuhan and initial chimeric work done at UNC with the product then shipped to Wuhan for further work. The small difference in P(Wuhan pandemic|LL) between these two versions just isn't important for this type of calculation.

On the other hand, for people more interested in assigning specific institutional responsibility the difference between those accounts does matter. For my purposes the extent of US involvement is relevant more for understanding the muffled response of US intelligence agencies.

Expand full comment
GJ Bonte's avatar

https://x.com/john_bumblebee/status/1810335076623712265.

I'm reading the part about the furine-cleavage site now. Of course there is the story about the sodiumchannel ENaC, but there are two other hypothesis. There is the possibility that the furine-cleavage site in the Feline Enteric Coronovirus inspired researchers, although I'm not very convinced of this possibility. But it is certain from Baric's testimony for the Subcommittee that it were these viruses that made him think about the function of the FCS. So it's not that strange to think that the Chinese also had knowledge about these viruses too.

I think this one is much more convincing. https://zenodo.org/records/10203207. This study describes a Nuclear Localisation Sequence PRRARSV that is in SARS-CoV-2. No other coronavirus in the family of bètacoronaviruses has this sequence: only a MERS-clone after thirty cycles of serial passaging has the same sequence. The study comes from a group of virologists of the University of Iowa, but the first author of the study did his PhD in virology in the Wuhan Institute of Virology from 2005 to 2010.

This would be an explanation for choosing for the 'suboptimal' FCS sequence. If they had chosen for the optimal sequence RARR, the sequence would not function as an NLS. Maybe this detail could be of help in determining the odds that this virus was genetically engineered.

https://www.frontiersin.org/journals/microbiology/articles/10.3389/fmicb.2023.1073789/full. This proves the function of this sequence as being real.

Expand full comment
Michael Weissman's avatar

I removed what seemed to have been a duplicate of this Comment.

I agree that the FIPV FCS and the MERS FCS are highly relevant to what WIV and its collaborators might have chosen, especially since zoonati have claimed that the proline would not have been used by a lab. I've been hesitant to add even more details to an over-long blog, but maybe could put something in the relevant Appendix.

Expand full comment
GJ Bonte's avatar

That's perfectly fine (about the duplicate). I might have hit the post button twice. It indeed is a very long blog, but very, very useful. I read most of the studies that you referenced, but still there are some things that I didn't notice, after two years of reading and writing. Just wanted to let you know about the MERS-clone, because that's the only possibility that is somewhat watertight, especially about the proline in the lead, of which Baric stated that he never would have chosen this variant.

Expand full comment
Jim Haslam's avatar

Vero cells (mentioned 3x) delete the SARS2 furin cleavage site, and Covid has 0 to do with humanized mice (mentioned 5x).

The WIV doesn't use type IIS restriction sites, found in SARS2, but you mentioned over 20x? All of this was covered in Baric's testimony:

https://vitamindstopscovid.info/07-origins/2024-01-22-Baric-testimony-refs.pdf

Expand full comment
Michael Weissman's avatar

The FCS slows viral replication in VERO compared compared to FCS-free virus, but it still replicates well. (See fig 1C of https://pmc.ncbi.nlm.nih.gov/articles/PMC8175039/.) So a few rounds of replication in VERO is not a problem. That's enough to multiply counts by about a million.

I'm not sure what your basis is for claiming that humanized mice are irrelevant, though they may turn out to be.

WIV researchers were involved in several experiments using type IIS restriction enzymes. There is no reason at all to think that they couldn't have used them.

Is it possible that LL occurred in Wuhan with a viral strain originally made by collaborators in the US? Sure, although that's a more complicated picture. At any rate, that distinction is not one relevant to the LL vs. ZW odds.

Expand full comment
Michael Weissman's avatar

Thanks for the links to many relevant pieces. The substacks' style is a bit disorganized stream-of-conscious but your core point does emerge— that some of the best tools for making the SC2 chimera were more available in the US than in China. I'll take a while to sort through and absorb the work more thoroughly. On a few-minute skim I did notice some questionable points.

Your claim that mice don't transmit respiratory diseases looks completely at odds with the extensive literature: https://www.ncbi.nlm.nih.gov/books/NBK235137/.

The several cases you mention in which mouse lab accidents didn't get people sick don't seem, so far as I can tell, to have involved humanized mice. At any rate that's a small N.

You quote Peter Miller on rapid selection for N501Y without noting that he incorrectly implied that was found in humanized nice rather than ordinary mice.

A quick thought- A virus that grows well in some lab animal but doesn't transmit well between them without some external help might be just the thing to lure experimenters into too much confidence that it would transmit poorly in humans.

Anyway, I hope to read more thoroughly over the next few days.

Expand full comment
Jim Haslam's avatar

What's disjointed is you linking to a book on mice and claiming it has something to do with Covid. So let's play the Twitter game here, out of sight of others, provide 1 paper showing SARS2 infecting hACE2 mice?

Expand full comment
Michael Weissman's avatar

The mouse book simply points out that your claim about transmission based on the "Mice don't sneeze" quote was false.

You're correct that in mentioning HAE cells and humanized mice, I've just followed the convention of the literature. For the purposes of this analysis, any cell culture or in vivo host in which SC2 can be cultured without rapid evolutionary change is equivalent. Your focus is on the US vs. China origin of the virus. That enters into my calculation only at one point, P(Wuhan|LL). If the sequence was studied first in US BSL-4 labs before being sent to Wuhan, then P(Wuhan|LL) is slightly reduced, not enough to show up in calculations of this sort. I have mentioned that possibility since early versions of the blog. So while that question is interesting for some purposes, it's not important for my calculations. Meanwhile, I've bought a Kindle version of your book to try to sort through which of your intriguing claims have some solid evidence.

Expand full comment
Jim Haslam's avatar

if you replace hACE2 with Syrian hamsters:

https://pmc.ncbi.nlm.nih.gov/articles/PMC7523093/

a lot more will make sense, since they were models in self-spreading vaccines

https://usrtk.org/covid-19-origins/colorado-state-university-documents-on-bat-pathogen-research/

Expand full comment
zach hensel's avatar

Here's a report of a likely COVID-19 patient who likely infected his doctor but was never confirmed as a COVID-19 case: the patient (translated) "went to the South China Seafood Market for the last time on December 3, but he continued to come into contact with live animals from the market. He began to have a fever in early December. After treatment, he was discharged from the hospital on December 30, but he still had intermittent coughing symptoms. He was hospitalized in our department on January 17 because of palpitations and dizziness."

https://finance.sina.cn/2020-01-28/detail-iihnzahk6649439.d.html

Elsewhere the patient's doctor notes that this case wasn't confirmed (atypical symptoms and no nucleic acid test), but it is a likely case because the doctor was infected and this was his only known exposure.

Regarding "traces of A were found only on one glove, with additional mutations indicating that it was not from an early case" (1) of course it is not likely to be from an extremely early case; it's a sample from Jan 1 and the one additional mutation (and one mixed read) is well within expectations from estimated late November spillover of lineage A, (2) there's another sample with a lineage A read (two duplicate reads, actually), (3) other lineage A sequences are geographically associated with Huanan market e.g. the first patient in Sichuan province -- where there isn't market ascertainment bias -- lived about 1 km away from Huanan market.

Another early patient was reportedly handling animal carcasses in the market. Another early patient was in the poultry business at a nearby market. One of the first cases in Huanggang worked in both Huanan market and in a Huanan market. Two patients at Zhongnan hospital were identified in the first case search independent of later market-link diagnostic criteria; they were linked to Huanan market.

Furthermore, healthcare worker infections spread from Huanan market, this is described in the scientific literature -- https://www.sciencedirect.com/science/article/pii/S0195670121000463 -- and in news reports e.g. https://weekly.caixin.com/m/2020-02-01/101510145_all.html

"'Our Union Hospital is located in the Hankou area, not far from the South China Seafood Market next to the Hankou Railway Station, but the Houhu Campus of Wuhan Central Hospital, Wuhan Red Cross Hospital and Hubei Xinhua Hospital are closer to the seafood market, so they received patients earlier. The first batch of patients came to them for treatment with cold or pneumonia symptoms.' Zhao Lei, chief physician of the infectious department of Wuhan Union Hospital, recalled. From the map, the Houhu Campus of the Central Hospital, Wuhan Red Cross Hospital (hereinafter referred to as the Red Cross Hospital) and Xinhua Hospital form a triangle, with the South China Seafood Market located in the center."

Ascertainment bias doesn't explain this; Huanan market centrality does.

Expand full comment
Michael Weissman's avatar

Zach- Thanks for joining in the discussion.

As for the late A fragment found in HSM, the point is that it conveys no information as to where the much earlier initial spillover happened. There were late traces all over the place by then, including another market. That's what CCDC head Gao realized.

I hate to quote a paper that Ridley co-authored (with van derMerwe) but this is a pretty compact account. "Liu and colleagues showed that virus RNA was found at just one wildlife stall out of nine, compared with two out of eight vegetable stalls, five out of 36 livestock stalls, six out of 56 seafood stalls, eight out of 37 poultry stalls, 13 out of 73 aquatic product stalls and 16 out of 87 cold-chain product stalls: so the virus was [?] least associated with wildlife stalls and no wildlife vendor tested positive[18]. "

[18] Liu, W.J., Liu, P., Lei, W. et al. 2023. Surveillance of SARS-CoV-2 at the Huanan Seafood Market. Nature https://doi.org/10.1038/s41586-023-06043-2.

This does not exclude that possibility that somehow the virus got into humans via HSM while leaving no statistical evidence in the nucleic acid swabs, the case histories, the sequenced cases, or the wildlife carcasses. It's just atypical and thus improbable.

Expand full comment
zach hensel's avatar

First, Ridley opens the introduction to his article saying: " I teamed up with the molecular biologist Alina Chan to write Viral, our book about the search for evidence on both sides of that question. I remained unsure what happened at that stage. Then in the autumn of 2021 more startling evidence emerged to support the lab leak. I now think that is by far the most likely explanation."

That's a lie. The new evidence was DEFUSE and DEFUSE was described perfectly accurately in the first edition of Viral.

So, unsurprisingly, Ridley is misleading you with this very common lab leak meme.

First, you can check out the confidence intervals on those numbers here: https://www.nature.com/articles/s41586-023-06043-2/figures/6

Second, there were two other wildlife stalls that aren't recognized by Liu et al as wildlife stalls with positive samples. Stall 8-25 and the stairs adjacent to the wildlife stall below the mahjong room and next to the toilets that DRASTIC members don't like to talk about.

Third, the statistical association claims suffer from the same error as Bloom's paper -- an error Bloom presumably agrees with since he responded to criticism of it by changing the subject to something else entirely. The earliest sampling was premised on proximity to very recent suspected COVID-19 cases. Animals were gone for at least a couple days at this point. Human cases were rising up until the day the market was closed.

Your skepticism about statistical ineptitude and sampling bias seems to only be applied in one direction.

The 1 in 30 positive samples in other markets was collected 5 or so doubling times later. There's no need to point to that to show SARS-CoV-2 spreading in markets, because there are earlier COVID-19 cases associated with other Wuhan markets. One for someone who worked in the poultry trade and one for the person in Huanggang I mentioned who purchased at Huanan market to sell in Huanggang. Suffice to say, spread through the network of markets early is unsurprising.

What's surprising is that it took so long (until early January) to identify healthcare-acquired infections. This isn't want you'd expect if SARS-CoV-2 were everywhere and just happened to be also in Huanan market. You list a couple irrelevant examples of later outbreaks in the food industry -- these are irrelevant because they come after the rest of the economy dramatically reduced its density; something the food industry can't really do while keeping food affordable -- check out how uncommon that is compared to healthcare outbreaks for first introductions of SARS-CoV-2 around the world.

Unfortunately the experiment of what it looks like when SARS-CoV-2 is introduced by a single human to a city that wasn't really prepared was run many times around the world. As far as I know, Wuhan is unique in having those outbreaks centered on the largest live animal market in town, having the first cluster associated with such a market, having the first healthcare outbreaks be in hospitals closest to such a market, etc. Yet there are live animal markets all over the world.

Edit: I figured you'd be more interested in finding the missing link with that chef in the wildlife industry... changing the subject to demanding "statistical evidence" in measurements were you wouldn't expect to find it is disappointing (fraction of wildlife stalls? spillover likely didn't happen from every stall; no reason to expect every raccoon dog, for example, to be infected when every human in the market wasn't infected... very different from the market sampled in Shenzhen in 2003 in ways that make sense given differences between the markets and between the viruses).

Expand full comment
Michael Weissman's avatar

Those confidence intervals are not on the raw numbers they gave, which are just raw observations. They're on inferred percents in a hypothetical larger sample, based on the small size of the actual sample. So your description is wrong.

Were those other positive samples reported in some publication? Are the stalls live-wildlife stalls that were not reported in Worobey or Liu? Sounds strange though not impossible.

Bloom showed clearly that of the 5 actual animal coronaviruses detected, the 4 that were present in fairly large amounts all were associated with the known host species. SC2 was not associated positively with any of the proposed hosts.

As of this date, none of those proposed hosts have even been shown to be capable of sustainable propagating the original SC2. Raccoon dogs could barely propagate the more transmissible D614G variant by direct physical contact, not via aerosol to others in nearby cages.

You argue that once SC2 was known to be around, markets would represent a larger fraction of subsequent outbreaks than earlier because activities were more limited. Maybe, although in the first outbreak before H2H transmission was known there was stronger bias toward focussing detection on a wet market based on the SC1 experience than there was later . Either way, were talking about a smallish adjustment to P(HSM|other spillover location), nothing close to the Bayes factor of ~5000 that e.g. Scott Alexander and others obtain from the HSM location data. Without those huge factors, even their net odds end up strongly favoring LL.

Expand full comment
Michael Weissman's avatar

On where outbreaks occur

"Most SARS or MERS super-spreaders were very symptomatic, the super-spreading occurred in hospital settings and frequently the individual died. In contrast, COVID-19 super-spreaders often had very mild disease and most COVID-19 super-spreading happened in community settings."

https://www.sciencedirect.com/science/article/pii/S1047279723000583

Expand full comment
zach hensel's avatar

Why'd you skip over Ridley lying about the relative timing of Viral and DEFUSE, by the way? Kinda relevant he's a liar if you are citing him as an authority!

Expand full comment
Michael Weissman's avatar

Ridley and van derMerwe had a convenient summary of the Liu positivity results. If you reread my analysis, Ridley is never cited for any of the factors. I think in this case he's usually been right but given his record on climate I wouldn't count on that.

Expand full comment
zach hensel's avatar

What's wrong? Those are confidence intervals relevant, for example, to the wildlife stall that they knew about and did not sample.

Yes, the other positive samples are reported in Liu and also in our paper on market sampling.

You should take a closer look at Bloom (2024) before assuming that it says what people say that it does. What you're talking about is rather flawed -- e.g. the canine coronaviruses are probably enteric viruses that results in horrible diarrhea for dogs, raccoon dogs, and foxes -- and the correlation comes from a single sample from a machine used to remove their fur. There's no way to tell if the viruses infected a dog or raccoon dog or both in that stall (probably not a fox since it wasn't detected there). The two canine coronavirus isolates in Bloom's Fig 3 are from raccoon dogs, but the virus (or viruses) is also similar to viruses from dog samples.

The rabbit coronavirus is correlated with rabbit, marmot, porcupine, dog (more than rabbit), bamboo rat.

That (also enteric) bamboo rat coronavirus is also correlated for a few other species.

But, anyway, the point was that this whole discussion in Bloom (2024) was a non sequitur to avoid discussing the problem of Bloom (2023) that we discussed in our paper: the nonsensical correlations in Liu et al (2022 preprint) and Bloom (2023) made perfect sense because of the sampling strategy. All that's left that's significant was correlation between SARS-CoV-2 and whether or not a sample was collected in one stall on the day most wildlife stall sampling was done. The animal species disproportionately found in that stall have significant correlations; the animal species that weren't (raccoon dog, human, etc) don't have them.

So it's either a person working in the wildlife trade with an unascertained infection or an animal.

The D614G thing is silly. Raccoon dogs can get infected and transmit; there's no evidence this is possible for SARS, either, but it happened. SARS-CoV-2 without D614G lasted long into 2021 and was responsible for the earliest outbreaks in the USA, for example.

My argument on food-industry outbreaks isn't a "maybe" -- it's a "definitely" -- just read the news from the first half of 2020 from most of the world.

Go look for pre-pandemic photos and video of Huanan market. It's just not a crowded place. It looks significantly less crowded than I'd guess the median workplace/school is in Wuhan, and that's a biased selection of videos taken when tourists were around to take videos and it was probably more crowded than normal!

Expand full comment
Michael Weissman's avatar

I have a call coming in and for now will respond only to one point, more later. You wrote that those case numbers were "misleading" because of the "confidence intervals". That was BS. The case numbers were simply the reported numbers. Any statistically competent person understands that small samples have statistical fluctuations.

Expand full comment
zach hensel's avatar

Yes I'm statistically competent enough to know it's an insignificant statistic and logically competent enough to know that it's unreasonable to expect a greater association with wildlife stalls than other types of stalls given the nature of the wildlife trade and the viruses. It's reasonable to expect a strong association with a stall or two that had potential intermediate species if those stalls are sampled in time, though.

"No wildlife vendor tested positive" is pretty funny given the several positive samples at a couple wildlife vendors. Of course there were positive wildlife vendors and/or animals.

Expand full comment
GJ Bonte's avatar

Can't share the 'summary' of your blog. It's too long... :)

Expand full comment
GJ Bonte's avatar

"These convey information complementary to the centroids because they emphasize the most clustered points rather than the more distance ones. In their supplementary material there is also a KDE map for the linked cases."

Couldn't find in the supplement...

Expand full comment
Michael Weissman's avatar

Aha. I didn't think that Walker had to prepare the KDE map. It does turn out Worobey et al. made it, included it on github (https://github.com/sars-cov-2-origins/huanan-market/blob/main/maps/geojson/who_cases_dec-2019.linked.KDE.contours.geojson) but left it out of the supplementary materials.

Expand full comment
GJ Bonte's avatar

How curious... the supplement is very, very long, with many illustrations of the proces how to localise the early cases. But just this one figure, just the one figure that could cast doubt on their hypothesis, that figure is missing. I am certainly willing to look objectively at any study - although there is particularly little left of the study by Worobey et al after the extensive criticisms of several scientists - but this is quite extraordinary.

It is a shame, though, that this constantly updated Bayesian analysis has not received more attention, except from insiders. I understand that it is a difficult piece for people who do not know Bayes' Theorem. It was not an easy piece for me either, and I only now really understand where its principle is, whereas years ago I thought I understood it. And that too only because I got help with it from a professor of probability calculus. And yet this should be published in a mainstream or popular science journal somewhere.

A final question is whether the scenario of Line 2 by Steven Quay would add anything. This is because it can actually explain a lot of aspects very well, without getting into all sorts of twists and turns, especially including the early spread outside China of early variants of the virus. The only problem I have is that even this scenario need not directly point to a lab escape. A zoonotic disease could also spread in this way. The only thing then is that Line 2 also has the WIV and a military hospital, where apparently the first four samples were collected.

https://www.academia.edu/82244361/Where_Did_the_2019_Coronavirus_Pandemic_Begin_and_How_Did_it_Spread_The_Peoples_Liberation_Army_Hospital_in_Wuhan_China_and_Line_2_of_the_Wuhan_Metro_System_Are_Compelling_Answers

Expand full comment
Michael Weissman's avatar

On publication, I checked informally with Ann of Appl Stats about preparing a shorter more formal version. An editor said, accurately, it wasn't quite suitable because they like to publish innovative techniques and mine were conventional. If somebody comes up with a likely journal, I may rouse enough energy (or find an energetic coauthor) to make that tighter version. It would remove introductory apologetics, the conventional Bayes intro, remarks in evolution of the versions, ... Also convert the refs to standard form rather than just links.

I've never been much for writing long papers, and dealing with some health issues reinforces that tendency.

Expand full comment
GJ Bonte's avatar

Well, so far for science. So they weren't interested in a publication with a statistical technique that has proven it's value. Curious.

I'm going to ask around a bit. But I'm certainly no scientist, zo I'm afraid you would know better than me where this could be published.

Expand full comment
Michael Weissman's avatar

I think AoAS was sincere. Their theme is innovative techniques, which I don't have.

Expand full comment
Michael Weissman's avatar

Thanks for that link. First impressions: The 4 PLA-hospital patients with records starting on Dec. 10, 2019 are really interesting. On first reading, I'm dubious about the statistics on the Line-2 associated cases. Rejecting a toy null is a bad procedure regardless of who does it. Also, the connection of the international cases with Line 2 is kind of silly. You'd get a similar international pattern regardless of how the infected people got to the airport.

Expand full comment
GJ Bonte's avatar

That is exactly my doubt. It is a very attractive hypothesis to explain how the virus spread across Wuhan - see also the New York analysis - but it is no more than that. A zoonotic disease would spread in exactly the same way, with then the fact that the PLA's hospital and the WIV on the Line 2 route would increase the likelihood of it being a lableak. On the other hand, it does not argue against the market hypothesis, because it too could spread through Line 2. So I'm not sure how to assess this piece: probably only as a possible hypothesis.

The link to the airport is in the fact that Line 2 is the only subway line that ends there. But that indeed does not rule out the possibility that the virus could have gotten there by other means. It is just a very fast and direct route, from a situation that is very ideal to spread a virus: the subway.

Expand full comment
Michael Weissman's avatar

Yipes. I can't either. The points are there but not the KDE map. Dan Walker must have prepared that from the points. I'll revise right away.

Expand full comment
GJ Bonte's avatar

Thanks. Surely this is very special. Omitting the KDE from the cases that have a direct relationship with the Huanan market. Not including in the actual article nor in the supplement. Just don't bring it up and don't mention it. Surely this suggests that the authors had a particular goal with the publication and deliberately just ignored the details that did not fit with that. Because not only is the KDE quite different from that of all cases and those with no relation to the market, but it also does not center on the market. I cannot escape the impression that the authors did not have pure intentions here.

Expand full comment
Michael Weissman's avatar

If you look closely at the KDEs in their paper, you see the "all" noticeably more spread out than the "unlinked", but the effect isn't all that striking because there are only a few "linked" included in "all". I made a .ppt for a talk with those maps superimposed to make it clear, but then found Walker's version that separates out the "linked" rather than hiding them in the "all".

Expand full comment
GJ Bonte's avatar

"Wu’s data also show that over time SC2 variants drifted away from the initial pattern by picking up extra sites, as one would expect if the initial form was not a product of random evolution. Unlike other sequence features, there’s no non-random fitness associated with this pattern under ZW."

This seems like a very good argument by Wu, but of course it is not. The "drift" of a reverse-genetic system would occur with every possible clone if it were so massively distributed around the world. Because even if it were a reverse-genetic system, it is still an RNA virus, which mutates regularly. That's really a very different situation than using a reverse-genetics system in the laboratory. Even when creating a clone, the genome sequence is checked afterwards to make sure no significant mutations occurred during cloning. And Wu himself has been trained in the WIV, and knows full well that this is so. But this is really a complete fallacy.

Expand full comment
Michael Weissman's avatar

I'm not sure I follow your point. What I'm saying (contra Wu) is that the drift toward a more typical RE site pattern is just what you'd expect if the initial pattern had been selected. Since there's no non-random natural selection for the pattern, that supports the existence of a lab selection. I can't really reproduce whatever Wu's reasoning was supposed to be. I think you agree but am not sure if you're adding another point.

Expand full comment
GJ Bonte's avatar

Now I understand. Thanks.

Expand full comment
GJ Bonte's avatar

A few other remarks: there is even stronger evidence, whereby the genome was not only found by PCR, but even with sequencing.

Amendola A, Bianchi S, Gori M, et al. Evidence of SARS-CoV-2 RNA in an Oropharyngeal Swab Specimen, Milan, Italy, Early December 2019. Emerg Infect Dis. 2021;27(2):648-650. doi:10.3201/eid2702.204632.

This one is about a child, four years old, that developed a cough and a runny nose on 21 november 2019. Also in Italy. PCR and sequencing confirmed SARS-CoV-2.

Amendola A, Canuti M, Bianchi S, et al. Molecular evidence for SARS-CoV-2 in samples collected from patients with morbilliform eruptions since late 2019 in Lombardy, northern Italy. Environ Res. 2022;215(Pt 1):113979. doi:10.1016/j.envres.2022.113979.

The second one is a study from a reference laboratory for measles and Rubella. The possible rash of Covid-19 is very much alike this infections. First positive sample is a four months old baby. Later it tested positive for IgG and IgM. These could have been false positives, but the complete picture tells us that this is very probably a real case."

About the sequences that were gathered after 2016, this study from 2022 tells us more about this.

Wu Z, Han Y, Wang Y, et al. A comprehensive survey of bat sarbecoviruses across China in relation to the origins of SARS-CoV and SARS-CoV-2. Natl Sci Rev. 2022;10(6):nwac213. Published 2022 Oct 11. doi:10.1093/nsr/nwac213

But the problem is whether this study can be trusted. Holmes as well as the english virologist Alice Hughes - who did a lot of fieldwork in China - say it is highly unlikely that they didn't find any other SARS-CoV-2 related coronavirus, not even in the coppermine in Moijang. And we know that the Chinese from early 2020 strongly censored all scientific publications on SARS-CoV-2 and Covid-19.

"Although the possibility of artifacts cannot be excluded, if La Rosa et al’s found wastewater evidence that SC2 started to show up substantially in northern Italy by December 18th 2019 and Fongaro et al. found similar evidence for November 27 in Brazil. If either of these seemingly careful reports is correct the spillover would have had to have been more in the range estimated by most of the papers rather than the late estimate of the 2022 Pekar paper'

Expand full comment
Michael Weissman's avatar

Useful. Among other things, it implies I'd better go back over my grammar.

Expand full comment
GJ Bonte's avatar

Forget my last remark. Now I see how you got to the number of 19. These are all zoonotic diseases, although some of them are well known for ages. I'm not sure how this would impact your calculations, but I think it lowers the odds for the ZW, because the period is very long.

Expand full comment
GJ Bonte's avatar

Another remark:

I checked the reference and the table. It indeed lists 19 pathogens, but they are not all new emerging diseases. The text remarks that roughly half of them are. The other ones are pathogens that we alreadly knew, and for which it is not clear that they are new emerging pathogens. For instance, tuberculosis has been known for ages? That would change the odds, wouldn't it?

"Since then, the number of notifiable diseases has increased from 15 to 41 (in 2011), with emerging zoonoses accounting for over 50% (Table 1 )"

"Before looking numerically at the LL probabilities, let’s look at the competing ZW background. A tabulation from 2014 of important pathogens emerging in China since the 1950’s lists 19 different ones, including one sarbecovirus—SC1, the original SARS. From that one can roughly estimate that the probability of a significant new pathogen in any year, e.g. 2019, would be

P­0(2019, ZW) = ~ 1/3."

Expand full comment
Michael Weissman's avatar

Yes. I bent over backwards to give a big ZW prior, part of being conservative. I should have been more explicit about that point.

But in the end it doesn't matter because the one SARS becomes a higher fraction. It basically comes down to one SARS coronavirus in several decades of comparable risk.

I'll try to patch up that explanation later today or tomorrow. But any effect on the priors will be cancelled by P(sarbecovirus|ZW) giving the same net odds.

I'd just been lazy about writing that up and appreciate your careful reading!

Expand full comment
GJ Bonte's avatar

Thanks! When I'm done with the section from the book, could I post it here? So you can see it is an adequate sort of abstract?

Expand full comment
Michael Weissman's avatar

sure

Expand full comment
GJ Bonte's avatar

Sorry! Already found the other ones in your piece!

Expand full comment
GJ Bonte's avatar

"There gave been about a dozen more...'

Tiny error I saw when reading again. I thought that in the earlier versions there was a literature list with the other Bayes analyses that were done, or am I mistaken?

Expand full comment
Michael Weissman's avatar

I go over them in Appendix 1, with links. As more came in it got a bit too detailed to summarize more in the main text.

Expand full comment
GJ Bonte's avatar

Thanks... As your analyses is the most comprehensive and the most recent updated, I'm going to use him for the final part of my book - a form of final conclusion, as it were, how to integrate all the evidence and all the events. That will still require some study, I'm afraid, but I think it will be a nice conclusion.

Expand full comment