An Inconvenient Probability V4.5

Robust Bayesian analysis of Covid origins, now using better corrections for uncertainties

Nov 12, 2023

The latest version, with major updates, is here.

[This version is substantially changed from V3.2 by partially replacing its crude robust Bayesian method with a method that reduces potential paradoxical effects of including weakly informative factors. The only sentence in italic bold face remains: The method is explicitly ready for correction based on improved reasoning or new evidence.]

1/13/2023: Big edits are now patched in, correcting two embarrassing errors and making one of three anticipated tweaks. I’ll try make a new version that incorporates the changes more coherently and efficiently.
Errors fixed:
1. In using the timing likelihood factor for DEFUSE vs. zoonosis, I forgot to use that the prior of a specific DEFUSE-like leak is only a fraction of the prior for a general lab leak when looking over a longer period. This should wipe out most of that timing factor. Dumb!

In giving the prior for zoonosis, I lumped together all novel pathogens, forgetting to restrict to sarbecoviruses. Dumb again! That will give a factor more or less cancelling the change from (1)

Tweaks that had been announced.
The non-existence of FCS’s outside bats gives a reduced likelihood for a zoonotic FCS in humans even though the absence within bats doesn’t.

To be done:

Wuhan’s especially low fraction of the mammalian wildlife trade, much lower than the population fraction, will lower the likelihood of the market-based version of the zoonotic story.

another upcoming tweak:
Some phylogenetic evidence that SARS CoV-1 may have come straight from bats suggests tweaking the no-intermediate-host factor upward for the market version but reducing its prior downward. More importantly, it suggests more attention should be paid to the direct-from-bats routes, both research-related and non-research-related.

These issues are a reminder that the optimum number of authors for a complicated paper is greater than one, although less than 29.

Introduction

Early on in the Covid pandemic, I took a preliminary look at the relative probabilities that SARS-CoV-2 (SC2) came from some sort of lab leak vs. more traditional direct zoonotic paths. Although zoonotic origins have been more common historically, the start in Wuhan where the suspect lab work was concentrated left the two possibilities with comparable probabilities. That result seemed convenient because it left strong motivation both to increase surveillance against future zoonosis and to stringently regulate dangerous lab work. Since then there have been major changes in circumstances that change both the balance of the evidence and the balance of the consequences of looking at the evidence. I think these warrant taking another look.

The origins discourse has become increasingly polarized with different opinions often tied to a package of other views. To avoid having some readers turn away out of aversion to some views that have become associated with the suspicion of a lab leak, it may help to first clarify why I think the question is important and to name some of the claims that I’m not making before plunging into the more specific analysis of probabilities.

I’m now more concerned about dangerous work being accelerated rather than useful work being over-regulated. This is not a specifically Chinese problem. Work with dangerous new viruses is planned or underway in Madison, and the Netherlands, with some questionable work in Boston. I’m not suggesting that the US or other western governments should officially say that they think SC2 started from a Wuhan lab. That would make it harder to work with China on the crucial issues of global warming and even on international pathogen safety regulation itself. (I just signed a letter “urging renewal of US-China Protocol on Scientific and Technological Cooperation”.) I’m definitely not endorsing the cruel opposition to strenuous public health measures that seems to have become associated with skepticism about the zoonotic account.

In what follows I will try to objectively calculate the odds that SC2 came from a lab, in the hope that will be useful to anyone thinking about future research policy. The underlying motivation for this effort has been eloquently described by David Relman, who has also provided a nice non-quantitative outline of the general types of evidence supporting different SC2 origins hypotheses.

Looking forward, what we really care about is estimating risks. We already know from experience that the risks of zoonotic pandemics are significant. Are the risks of some types of lab work comparably significant? We shall use some prior estimates of those risks in passing on the way to answering a more concrete question– does it look like SC2 came from a lab? If the answer is “probably not”, then it’s at least possible that the estimates of significant lab risk may have been overstated, although the evidence for that would be skimpy. If the answer is “probably yes” that indicates that the prior estimates of significant risk should not have been ignored.

The method used may also help readers to evaluate other important issues without relying too much on group loyalties. The method is explicitly ready for correction based on improved reasoning or new evidence. Our method will be robust Bayesian analysis, a systematic way of updating beliefs without putting too much weight on either one’s prior beliefs or the new evidence. Bayesian analysis does not insist that any single piece of evidence be “dispositive” or fit into any rigid qualitative verbal category. Some hypotheses can start with a big subjective head-start, but none are granted categorical qualitative superiority as “null hypotheses” and none have to carry a qualitatively distinct “burden of proof”. Each piece of evidence gets some quantitative weight based on its consistency with competing hypotheses.

In practice there are subjective judgments not only about the prior probabilities of different hypotheses but also about the proper weights to place on different pieces of evidence. I will use hierarchical Bayes techniques to take into account the uncertainty in the impacts of different pieces of evidence and “Robust Bayesian analysis” to allow for the uncertainty in the priors.

I am aware of eight more-or-less-published Bayesian analyses of SC2 origins, other than my own very preliminary inconclusive one. Five of the six that attempt to be comprehensive come to conclusions similar to the one I shall reach here. On 1/2/2024, however, I became aware of a Bayesian spreadsheet with brief verbal comments that concluded at that time that the odds were about 3/1 favoring zoonotic origins. After I pointed out a logical error and an omission, it was modified but still somewhat favors zoonosis All are discussed in Appendix 1.

I will focus on comparing the probability that SC2 originated in wildlife vs. the probability that it originated in work similar to that described in the 2018 DEFUSE grant proposal submitted to the US Defense Advanced Research Project Agency from institutions that included the University of North Carolina (UNC), the National University of Singapore, and the EcoHealth Alliance as well as the Wuhan Institute of Virology (WIV). (For brevity I’ll just refer to this proposal as DEFUSE.) Although DEFUSE was not funded by DARPA, anyone who has run a grant-supported research lab knows that work on yet-to-be-funded projects routinely continues except when it requires major new expenses such as purchasing large equipment items.

I will not discuss any claims about bioweapons research. It is not exactly likely that a secret military project would request funding from DARPA for work shared between UNC and WIV.

My analysis will not make use of the rumors of roadblocks around WIV, cell-phone use gaps, sick WIV researchers, disappearances of researchers, etc. That sort of evidence might someday be important but at this point I can’t sort it out from the haze of politically motivated reports. Mumbled inconclusive evidence-free executive summaries from various agencies are even less useful. I will discuss in passing two very recent U.S. government funding decisions that could potentially provide weak evidence concerning the actual probabilities as seen by those with inside information. The biological and geographic data are much more suited to reliable analysis.

The main technical portions will be unpleasantly long-winded since for a highly contentious question it’s necessary to supply supporting arguments. Although parts may look abstract to non-mathematical readers, all the arguments will be accessible and transparent, in contrast to the opaque complex modeling used in some well-known papers. For the key scientific points I will provide standard references. At some points I bolster some arguments with vivid quotes from key advocates of the zoonotic hypothesis, providing convenient links to secondary sources. The quotes may also be obtained from searchable .pdf’s of slack and email correspondence.

The outline is to

1. Give a short non-technical preview.

2. Introduce the robust Bayesian method of estimating probabilities, along with some notation.

3. Discuss a reasonable rough consensus starting point for the estimation, i.e. the “priors” with an update based on the pandemic starting in Wuhan.

4. Discuss whether the main papers that have claimed to demonstrate a zoonotic origin via the wildlife trade should lead us to update our odds estimate.

5. Update the odds estimate using a variety of other evidence.

6. Present brief thoughts about implications for future actions.

Preview

I will denote three general competitive hypotheses:

ZW: zoonotic source transmitted via wildlife to people, suspected via a wet-market.

ZL: zoonotic source transmitted to people via lab activities sampling, transporting or otherwise handling viruses.

LL: a laboratory-modified source, leaked in some lab mishap.

The viral signatures of ZW and ZL would be similar, so the ratio of their probabilities would be estimated from knowledge of intermediate wildlife hosts, of the lab practices in handling viral samples, and detailed locations of initial cases. Demaneuf and De Maistre wrote up a Bayesian discussion of that issue in 2020, before the DEFUSE proposal for modifying coronaviruses was publicly known. They concluded that the probability of ZW, i.e. P(ZW), and the probability of ZL, i.e. P(ZL), were about equal.

Much of their analysis, particularly of prior probabilities, is close to the arguments I use here, but written more gracefully and with more thorough documentation. They use a different way of accounting for uncertainties than I do, but unlike some other estimates their method is transparent and rational. Nevertheless, here I’ll focus on comparing the probability P(ZW) to that of the LL lab account, P(LL), because sequence data point to a lab involvement in generating the viral sequence, so that P(ZL) will itself be somewhat smaller than P(LL). (I’ve added Appendix 5 to discuss the ZL probability.)

Ratios of probabilities such as P(LL)/P(ZW) are called odds. It’s easier to think in terms of odds for most of the argument because the rule for updating odds to take into account new evidence is a bit simpler than the rule for updating probabilities.

I’ll start with odds that heavily favor ZW, historically the common origin of most new epidemics. Then I’ll update using several important facts. The most immediately obvious are the location and timing of the initial outbreak. Wuhan is the location of a major research lab that had not long before the outbreak submitted a grant proposal that included modifying bat coronaviruses in the way later found in SC2. That location and timing could also have occurred by accidental coincidences for ZW, but we shall see that it’s not hard to approximately convert the coincidences to factors objectively increasing the odds of LL. Here’s a beginning non-technical explanation of how the odds get updated.

I’ll start with a consensus view, that the prior guess would be that P(LL) is much less than P(ZW). That corresponds to the standard idea that you would call ZW the null hypothesis, i.e. the boring first guess. Rather than treat the null as qualitatively sacred I’ll just leave it as initially quantitatively more probable by a crudely estimated factor.

Now we get to the simple part that has often been either dismissed or over-emphasized. Both P(ZW) and P(LL) come from sums of tiny probabilities for each individual person. P(LL) comes mostly from a sum over individuals in Wuhan. P(ZW) comes from a sum over a much larger set of individuals spread over China and southeast Asia. Since we know with confidence that this pandemic started in Wuhan, restricting the sum of individual probabilities to people around Wuhan doesn’t reduce the chances for LL much but eliminates most of the contributions to the chances for ZW. Wuhan has less than 1% of China’s population, so ~99% of the paths to ZW are crossed off. That means we need to increase whatever P(LL)/P(ZW) odds we started with by about a factor of 100.

Further updates following the same logic come from other data. The timing, about 1.5 years after the DEFUSE proposal was submitted, lands in the narrow window in which the lab work could have started the outbreak. A similar natural outbreak could have happened at any time over several decades. As with location, that timing could be a coincidence, but as with location it will shift the odds toward the hypothesis for which it’s less of an unpredictable coincidence. Another update will come from a special genetic sequence that codes for the furin cleavage site (FCS) where the UNC-WIV-EHA DEFUSE proposal suggested adding a tiny piece of protein sequence to a natural coronavirus sequence. The tiny extra part of SC2’s spike protein, the FCS that is absent in its wild relatives, has nucleotide coding that is rare for related natural viruses but is fairly typical for the most relevant known designed sequences, the mRNA vaccines. We can again make an approximate numerical estimate of how much more coincidental the coding seems for a natural origin than for a lab origin.

Even if we start with a generously high but plausible preference for ZW once the evidence-based updates are done we’ll have P(LL) much larger than P(ZW). P(ZW) will shrink to about 1%, and is saved from shrinking much further only by allowance for uncertainties.

This openly crude and approximate form of argument may alarm readers who are not accustomed to the Fermi-style calculations routinely used by physicists. In this sort of calculation one doesn’t worry much about minor distinctions between similar factors, e.g. 8 and 12, because the arguments are not generally that precise. Sometimes the large uncertainties in such a calculation render the conclusion useless, but this turns out not to be one of those cases.

Methods

The standard logical procedure to calculate the odds, P(LL)/P(ZW), is to combine some rough prior sense of the odds with judgments of how consistent new pieces of evidence are with the LL and ZW hypotheses. Bayes’ Theorem provides the rule for how to do this. (See e.g. this introduction.)

One starts with some roughly estimated odds based on prior knowledge:
P₀(LL)/P₀(ZW). Then one updates the odds based on new observations. The probabilities that you would see those observations if the hypothesis (LL or ZW) were true are denoted P(observations|LL) and P(observations|ZW), called the “likelihoods” of LL and ZW. Assuming these likelihoods are themselves known, Bayes’ Theorem tells us the new “posterior” odds are

P(LL)/P(ZW) = (P₀(LL)/P₀(ZW))*(P(observations|LL)/P(observations|ZW)).

In practice, it’s hard to reason about all the observations lumped together, so we break them up into more or less independent pieces and do the odds update using the product of the likelihood ratios for those pieces.

At this point it’s necessary to recognize that not only the prior odds
P₀(LL)/P₀(ZW) but also the likelihoods involve some subjective estimates. In order to obtain a convincing answer we need to include some range of plausible values for each likelihood ratio. As we shall see, inclusion of the uncertainties is important because realistic recognition of the uncertainties will tend to pull the final odds back from an extreme value towards one.

Once our odds become products of factors of which more than one have some range of possible values, our expected value for the product is no longer equal to the product of the expected values. Since the expected value of a sum is just the sum of the expected values it’s convenient to convert the product to a sum by taking the logarithms of all the factors.

ln(P(LL)/P(ZW)) = ln(P₀(LL)/P₀(ZW))+ln(P(obs₁|LL)/P(obs₁|ZW)) … +ln(P(obs_n|LL)/P(obs_n|ZW)) = logit₀+ logit₁… +logit_N

where “logit” is used for brevity.

At each stage I will include a crude estimate L_i of the log of the likelihood ratio and an estimate of its uncertainty expressed as a standard error, s_i. Factors with large uncertainty contribute less information than do ones with similar L_i but smaller uncertainty, so that in calculating the net likelihood ratio the terms with large s_i need to be down-weighted. The likelihood weighting technique used here is described in Appendix 3. The results are not sensitive to the details because I do not use likelihood ratios with large s_i. The down-weighted results are the logits used.

Once the net likelihood factor is estimated, taking uncertainties into account, we still have a distribution of plausible prior odds. This can also be treated by assuming
a probability distribution around the point estimate of the log odds. The final odds will be obtained from integrating the net probabilities, including the net likelihood factor, over that distribution. This distribution is wide enough to make its form, not just its standard deviation, potentially important for the result. The treatments of the priors and the likelihoods look superficially similar, but are not equivalent. Uncertainty in the likelihoods leads to discounting the likelihood ratios but not to discounting the priors. Uncertainty in the priors leads to discounting both. Thus observations with uncertain implications leave the priors untouched but highly uncertain priors can make fairly large likelihood ratios irrelevant. Since in this case the priors tend toward ZW but the likelihoods tend more strongly toward LL the inclusion of each type of uncertainty will substantially reduce the net odds favoring LL.

Along the way we shall see several observed features that perhaps should give important likelihood factors that I think tend to favor LL but for which there’s substantial uncertainty and thus little net effect. I will just drop these to avoid cluttering the argument with unimportant factors. I will include some small factors when the sign of their logit is unambiguous, e.g. a factor from the lack of any detection of a wildlife host. I will not omit any factors that I think would favor ZW. The peculiarity of some features under ZW will not be used to penalize ZW’s odds if those features are likely to have a notable selective advantage, since we would not be discussing a virus that lacked major selective advantages regardless of whether those arose via peculiar accidents.

My analysis differs from some others in one important respect. The others treat “DEFUSE” as an observation and try to estimate something like P(DEFUSE|LL)/P(DEFUSE|ZW). I don’t see how to do that. Instead I treat DEFUSE as a way to define a particular branch of the LL hypothesis. Since it’s narrower than generic LL, it should be easier to find observations that don’t fit it, i.e. have low likelihoods. The flip side of that is that observed features that do fit it give higher likelihoods than they would for generic LL. Picking a reasonable prior for a leak of something stemming from DEFUSE-style work given the existence of the proposal will require looking more at prior estimates of lab leak probabilities rather than at the sparse history of previous pandemics, a revision that will be clarified in the next version.

The quantitative arguments

Prior odds

Let’s start with the fuzzy generic prior odds. The book Pandora’s Gamble amply documents that pathogen lab leaks are common, including in the US, and a recent summary documents over 300 lab-acquired infections and 16 lab pathogen escapes over a two-decade period. These they almost always caught before the diseases spread.

In my lifetime, starting in 1949, there have been seven other significant (>10k dead) worldwide pandemics. At least one pandemic (1977 A/H1N1) came from some accident in dealing with viral material. So if we just wanted to base our priors on that, we’d say, very crudely,

P₀(LL or ZL)/P₀(ZW) = 1/7 = ~0.1.

In 2006, the World Health Organization warned that the most likely source of new outbreak of the original SARS would be a lab leak, confirming that the danger of lab leaks was large according to consensus expert opinion. There’s an important caveat, however. So far as we know, all of the past epidemics that came from labs (e.g. 1967 Marburg viral disease in Europe, 1979 anthrax in Sverdlovsk, 1977 influenza A/H1N1) were caused by natural pathogens. That’s not surprising, since until recently nobody was doing much pathogen modification in labs. The main modern method was only patented in 2006 by Ralph Baric, who was to have done the chimeric work on bat coronaviruses under the DEFUSE proposal. Without lab modification, only ZW and ZL would be viable hypotheses.

We know, however, that lots of modifications are underway now in many labs. As early as 2012, Klotz and Sylvester had warned of the dangers in a Bulletin of the Atomic Scientists article. In the same year Anthony Fauci conceded the possibility that such research might cause an ”unlikely but conceivable turn of events …which leads to an outbreak and ultimately triggers a pandemic”.

The dangers were perceived as substantial enough for the Obama administration to at least nominally ban funding research involving dangerous gain-of-function modifications of pathogens. When that ban was lifted under Trump in 2017, Marc Lipsitch and Carl Bergstrom raised alarms. Lipsitch wrote: “ [I] worry that human error could lead to the accidental release of a virus that has been enhanced in the lab so that it is more deadly or more contagious than it already is. There have already been accidents involving pathogens. For example, in 2014, dozens of workers at a U.S. Centers for Disease Control and Prevention lab were accidentally exposed to anthrax that was improperly handled.” Bergstrom tweeted a similar warning. Ironically, Peter Daszak, head of the EcoHealth Alliance, who became extremely dismissive of the lab leak possibility after Covid hit, gave a talk in 2017 warning of the “accidental &/or intentional release of laboratory-enhanced variants”. It is hard to see how such warnings would make sense if expert opinion held that the recent probability of a dangerous lab leak of a novel virus was negligible. For at least the last decade the prior probability P₀(LL) of escape of a modified pathogen has not been negligible.

Several papers have been published on lab-modified viruses, e.g. one including authors from WIV and UNC that demonstrated potential for modified bat coronaviruses to become dangerous to humans: “Using the SARS-CoV reverse genetics system, we generated and characterized a chimeric virus expressing the spike of bat coronavirus SHC014 in a mouse-adapted SARS-CoV backbone…. We synthetically re-derived an infectious full-length SHC014 recombinant virus and demonstrate robust viral replication both in vitro and in vivo.” This paper prompted a 2015 response in Nature in which S. Wain-Hobson warned “If the virus escaped, nobody could predict the trajectory” and R. Ebright agreed “The only impact of this work is the creation, in a lab, of a new, non-natural risk.” Even in the research paper itself the authors called attention to the perceived dangers: "Scientific review panels may deem similar studies building chimeric viruses based on circulating strains too risky to pursue.”

At least one paper specifically described adding an FCS to a SARS-CoV virus. The 2018 DEFUSE proposal from WIV included plans for just such modifications of coronaviruses. Even K. G. Andersen, the lead author of the first key paper (“Proximal Origins”) claiming to show that LL was implausible, initially thought “…that the lab escape version of this is so friggin’ likely to have happened because they were already doing this type of work and the molecular data is [sic] fully consistent with that scenario.” That view is inconsistent with claims that the prior P₀(LL) was extremely small, although it neither quantifies “friggin’ likely” nor establishes how much of “friggin’ likely” would be attributed to priors and how much to molecular data whose analysis may have since changed.

Should our prior probability of a pandemic from a new lab virus be raised or lowered compared to our old empirical probability of a lab-origin pandemic in light of the new prevalence of modern research in which pathogens are modified? On the one hand, only some of the viruses studied in labs are new, so the probability that a leak would be of something new is less than the net probability of any leak. On the other hand, more lab work is being done than in the past, raising the overall leak probability. Furthermore, we are interested only in the probability of a major pandemic-causing leak, and that is going to be higher for new viruses than for old ones since there’s some population immunity for the old ones.

I think the prior odds P₀(LL or ZL)/P₀(ZW) should be at least about the same as the old empirical ~0.1, but want to be conservative here in order not to lose reasonable readers who disagree before we get to the core evidence. So let’s just make a crude but quite conservative estimate for starters:

P₀(LL)/P₀(ZW) = ~0.01.

We shall see that it is in the range others consider reasonable. (It is at the upper end of J. Seymour’s range, but he provides no rationale for his choices, which seem incompatible with the historical record.) (Following Demaneuf and De Maistre , we should also have P₀(ZL)/P₀(ZW) = ~0.01.)

Although each subsequent likelihood ratio adjustment has its own uncertainty, the uncertainty of these prior odds will be the most important one. Let’s estimate the uncertainty in the prior odds as about a factor of 10 either way. It will be convenient to describe factors and their uncertainties in terms of the natural logs of the odds, i.e. logits, to make the error bars roughly symmetric. Our prior is then equivalent to

L₀ = -4.6 ± 2.3.

where the ±2.3, equivalent to the factor of 10, is meant to roughly show the standard error in estimating the logit. A standard error of 2.3 allows and even requires that errors outside the ±2.3 range are possible, although not very probable. “4.6” is not meant to convey false precision, just to translate a rough estimate (x100) into convenient units.

We’ll return to check how well that prior agrees with expert opinion after updating to include knowing that the pandemic started in Wuhan. The reason is that some expert opinions expressed after the pandemic started already integrated the priors with that knowledge.

Where: Starting in Wuhan

Now let’s take the first, most obvious piece of evidence—the pandemic started in Wuhan. Even without any formal Bayesian notation it’s easy to understand why the Wuhan origin shifts the odds heavily toward LL or ZL. The probability of ZW comes from a sum over people and wet markets spread over China, Laos, etc. A recent paper working entirely within the ZW framework argues that SC2 is a fairly recent chimera of known relatives living in or near southern Yunnan, and that transmission via bats is essentially local on the relevant time scale. Wuhan is sufficiently remote from those locations that WIV has used Wuhan residents as negative controls for the presence of antibodies to SARS-related viruses. Thus Wuhan residents are not particularly likely to pick up infections of this sort from wildlife.

About 0.7% of the population of China lives in Wuhan. It is sometimes claimed that only urban centers have much chance of sustaining a viral spillover, but since China is mostly urban Wuhan has only ~1.0% of the urban population. Thus knowing that the pandemic started in Wuhan, of all places, gives a population-based P(Wuhan|ZW) < 0.01.

Wuhan has fewer wet markets than expected from its population. Wuhan had only 17 wet market shops in four markets. Overall, China has about 44,000 wet markets according to one source and about 4600 according to another. I do not know the reason for the discrepancy, perhaps a count of shops vs. markets, but even using the most extreme numbers Wuhan would have less than 0.4% of the wet market shops, probably less than 0.1%. Focussing on wildlife traded in the markets, Wuhan becomes an even tinier fraction of the total spillover risk. The total mammalian trade in all the Wuhan markets was running under 10,000 animals/year. The total Chinese trade in fur mammals alone was running at about 95,000,000 animals/year (“皮兽数量… 9500 万”). Thus if the ZW hypothesis is assumed to entail a market spillover, our P(Wuhan|ZW) would be much less than 0.01, perhaps 100 times lower. Thus the tiny fraction of the wildlife trade that is found in Wuhan means that the specific market version of ZW has much steeper odds to overcome than non-market ZW accounts would have. It will help to keep this in mind as we see further evidence that the specific market spillover hypothesis runs into other major difficulties. (In future revisions of this already too-long article, I should explicitly disambiguate a market ZW_Mfrom all other ZW, ZW_O, since the likelihood of ZW_Mhas taken a much harder hit than the likelihood of ZW_O.)

What is P(Wuhan|LL)? Calculating the likelihood for that observation will be the first point at which we need to use the DEFUSE proposal to narrowly define our LL hypothesis. That raises the question of whether DEFUSE should only get some of the prior LL probability or if knowing about DEFUSE should cause that specific version of LL to get some add-on priors comparable to the general background priors for the specific lab involved. Given that DEFUSE-style products would be unusually transmissable, perhaps the priors should be above the background level. I’m just using the background level here, but that could inflate or deflate the best estimate of the DEFUSE-specific priors by a factor of two or so. Since the priors are already assigned large uncertainty this extra uncertainty would make little difference.

We know that WIV’s DEFUSE specifically described planned coronavirus modifications including FCS insertion, modification of a feature called an N-linked glycan, and major modifications of the receptor binding domain compared to known natural strains available in labs, all features later found in SC2. (Note that for the purposes of calculating P(Wuhan|LL) whether similar features could have arisen naturally is precisely irrelevant mathematically. “|LL)” means “conditional on LL”. ) We know there were U.S. State Department cables warning specifically that bat coronavirus work in Wuhan faced safety challenges. We know that the DEFUSE proposal claimed WIV had more than 180 relevant coronavirus sequences, apparently including many unpublished ones. Although there are other cities where some coronavirus work is going on, if someone with this prior knowledge heard that a lab leak had started a pandemic of a coronavirus with an FCS etc., they would have been pretty sure that the location was Wuhan:
P(Wuhan|LL, DEFUSE, FCS,…) is not a lot less than 1.

This first likelihood ratio, from Wuhan being the starting location, is then:

P(Wuhan|LL)/P(Wuhan|ZW) = ~100

We can estimate the standard error in that estimate as about a factor of 2.

Our logit estimate will then be updated by
L₁ = +4.6 ±0.7 giving logit₁ = 4.4 after down-weighting for uncertainty
Again, these numbers are not meant to convey false precision.

Priors adjusted for Wuhan start

At this point of the analysis the combined logit is ~0, i.e. the chances are about equal. Let’s check that our odds are reasonable at this point, based on the combination of priors and that the outbreak started in Wuhan. Demaneuf and De Maistre looked in detail at past evidence for various scenarios of natural and lab-related outbreaks. Without considering sequence features beyond that the virus is SARS-related they conservatively estimate that the lab-related to non-lab-related odds for an outbreak in Wuhan, P(ZL|Wuhan)/P(ZW|Wuhan), are about one-to-one, similar to the odds we use for P(LL|Wuhan)/P(ZW|Wuhan) using the knowledge that DEFUSE-style work was planned.

Now let’s double-check our starting point that the chances for ZW and LL in Wuhan were comparable. One serious pre-Covid paper estimated the chance of a human transmissible leak at 0.3%/year for each lab. Another careful pre-Covid analysis of experiences of labs using good but not extreme biosafety practices (“BSL3”) estimated that the yearly chance of a major human-transmissable leak was in the range of 0.01% to 0.1% per lab. For a large lab doing much of its work at a much lower safety level (BSL2) the chances would be higher, easily >0.1%/year. According to the lead coronavirus researcher at WIV, Shi Zhengli, “coronavirus research in our laboratory is conducted in BSL-2 or BSL-3 laboratories.“ For comparison, newly important zoonotic diseases have been identified in China at a rate of about 0.4/year. With Wuhan having about 0.7% of China’s population, not located near coronavirus hotspots, its local rate should be less than 0.3%/year. Once again, we have roughly even odds for a lab origin or a natural origin for a new pandemic starting in Wuhan.

One advantage of these two checks is that they are insensitive to knowledge about events in other cities. If other cities have risky coronavirus work, that raises the overall prior odds of LL but lowers the Wuhan-location update factor. Those effects on the updated odds would cancel.

Finally, let’s triple-check by looking at the impressions of the lead author of Proximal Origins. Andersen wrote his colleagues on 2/2/2020 “Natural selection and accidental release are both plausible scenarios explaining the data - and a priori should be equally weighed as possible explanations. The presence of furin a posteriori moves me slightly more towards accidental release, …” Based on general priors plus knowledge of the Wuhan origin and before taking into consideration more detailed data such as the FCS, Andersen thought the probabilities were about equal, which is just the result we have reached at the same point.

Two of our checks on the priors used attempts to predict yearly rates of spillovers from ongoing lab work based on prior knowledge of lab events. To the extent that those estimates are reliable our whole exercise here is only of historical interest, since those estimates tell us directly to take lab risks very seriously regardless of the source of this particular pandemic. For practical purposes, we are only interested in estimating the source of this one pandemic in order to check the credibility of those warnings.

What and When: Timing of Sarbecovirus outbreak

The timing of the outbreak (late 2019) is obviously consistent with an origin in work described in the 2018 DEFUSE proposal for modifying viruses to make something close to SC2. The outbreak occurred in the ~1-year window possible for LL under DEFUSE. How long a window would have been possible under ZW? China started encouraging the wildlife trade in around 1980, but with tightened regulation after the 2003 SARS CoV-1 epidemic was traced to the wildlife trade. By analogy with the location coincidence, one might then estimate a likelihood ratio P(2019|LL,DEFUSE, FCS…)/P(2019|ZW,FCS…) of about 40. (Here and below I omit “DEFUSE from ZW conditional probabilities, since it is not relevant to them) We don’t know quite how the intensity and safety regulation of the wildlife trade has varied over time during those decades. If the post-2003 regulation was successful, then the likelihood ratio should be increased. If it was unsuccessful and the wildlife trade drifted up without enhanced caution then the likelihood ratio should be lowered. To be reasonably conservative I’ll use P(2019| DEFUSE LL)/P(2019|ZW) = 20, uncertain to about a factor of 2.

By extending our time frame to include several decades we’ve picked up an important objective likelihood ratio, roughly 20 or more. A DEFUSE-style LL would be expected only in 2019 while ZW could happen in any year for at least a few decades. In stepping back to include that broader period, however, we need to remember that DEFUSE is only a subset of LL. Over that longer period the prior probability of a DEFUSE leak is reduced compared to the overall LL probability by the same sort of timing argument. LL picks up some probability each year since ~2010 but DEFUSE only picks up some in 2019, i.e P₀(DEFUSE, Wuhan, 2019)=~P₀(LL, Wuhan, 2019)/10 This reduction in net prior probability when we include only DEFUSE approximately cancels the timing likelihood factor, although not quite since the LL probability doesn’t extend as far back in time as the ZW probability does. Although I suspect that the DEFUSE BSL-2 work with highly transmissable virus should get a larger share of the prior probability than the mere time ratio would indicate, I’ll defer making that more subjective argument.

Just as I initially neglected to reduce the LL priors when restricting to LL’s DEFUSE-style subset, I also neglected in earlier versions to reduce the ZW priors when restricting to sarbecovirus pathogens. The emerging diseases in China include all sorts of pathogens, with only one in the period since the 1950’s being a sarbecovirus out of the 19 emerging or re-emerging pathogens listed in a 2014 paper. This is not surprising, because none of the major prior pandemics was known to be caused by a coronavirus. Thus P(coronavirus|DEFUSE leak) =1 but P(coronavirus|general zooznosis) =~1/19.

For the time being for convenience I’ll just keep the old update obtained when I ignored that timing reduced the priors for DEFUSE compared to LL and that conditional probability for sarbecovirus given generic ZW was much less than one.

L₂ = +3.0 ±0.7 giving logit₂ = 2.8 after down-weighting for uncertainty

In future revisions I’ll integrate (where, when, what) into one likelihood factor, since although they are somewhat independent under ZW (except for where-what) under LL they are closely linked by DEFUSE.

The key papers arguing for zoonosis

Proximal Origins

Now let’s look at the three main papers on which claims that the evidence points to ZW rest. The first is the Proximal Origins paper, whose valid point was that ZW was at least possible. Its initially submitted version concluded logically that therefore other accounts were “not necessary”. That conclusion is implicit in all the Bayesian analyses, which neither assume nor conclude that P(ZW)=0.

The final version of Proximal Origins changed that conclusion under pressure from the journal to the illogical claim that therefore accounts other than ZW were “implausible”. To the extent that the paper had an argument for LL being implausible it was based on the assumptions that a lab would pick a computationally estimated maximally human-specialized receptor binding domain rather than just a very well-adapted human receptor binding domain and that seamless modern methods of sequence modifications would not have been used. Neither assumption made sense, invalidating the conclusion. Defense Department analysts Chretien and Cutlip already noted in May 2020: “The arguments that Andersen et al. use to support a natural-origin scenario for SARS CoV-2 are not based on scientific analysis, but on unwarranted assumptions.” The later release of the DEFUSE proposal further clarified that the precise lab modifications that Proximal Origins argued against were not ones that WIV had been planning. The DEFUSE proposal described adding some “human-specific” proteolytic site, not a special computationally optimized one. The particular “RRAR” amino acid sequence for the FCS that Proximal Origins argued would not have been used was identical to that of a coronavirus FCS previously studied at WIV. Thus Proximal Origins contains nothing that would lead us to update our odds in either direction.

As further confirmation, we now know that even weeks after Proximal Origins was published its lead author did not have confidence in its conclusions or even believe its key arguments. On 4/16/2020 Andersen wrote his coauthors : “I'm still not fully convinced that no culture was involved. If culture was involved, then the prior completely changes …What concerns me here are some of the comments by Shi in the SciAm article (“I had to check the lab”, etc.) and the fact that the furin site is being messed with in vitro. … no obvious signs of engineering anywhere, but that furin site could still have been inserted via gibson assembly (and clearly creating the reverse genetic system isn't hard -the Germans managed to do exactly that for SARS-CoV-2 in less than a month.”

Phylogeny and location: Pekar et al. and Worobey et al.

The next papers involve phylogenetic data and intra-city location data. Readers should be forewarned that the likelihood factors for their combination do not factorize into separate contributions. The reason is that the locations data were used to support one particular version of the ZW hypothesis and the phylogenetic data make that particular version implausible although on their own they would say little to disfavor the general ZW hypothesis.

Pekar et al. argued based on computer simulations of a simplified model of how the infection would spread that the presence of two lineages (A and B) differing by two point mutations in the nucleic acid sequence without intermediate cases was unlikely if all human cases descended from a single most recent common ancestor (MRCA) that was in some human. They claimed (incorrectly) to obtain Bayesian odds of ~60 favoring a picture in which the MRCA was in another animal shortly before two separate spillovers to humans. There is no obvious reason why having an MRCA in some other animal a few transmission cycles before two spillovers to humans would say much about whether the other animal was a standard humanized mouse in a lab or an unspecified wildlife animal in a market. For example, multiple workers were exposed to Marburg fever in the lab and the Sverdlovsk anthrax cases included multiple strains. In the most relevant case, SARS spilled over in “four distinct events at the same laboratory in Beijing.” DEFUSE itself described planned work with quasi-species, collections of closely related strains, rather than purified strains. Thus further discussion of the Pekar et al. model seems irrelevant to our question, but I’ll include a brief discussion in Appendix 2 about some of the major technical problems of the paper. (If there were evidence for multiple spillovers that might tend to be reduce the likelihood of the difficult direct bat to human route regardless of whether or not that involved research, as discussed in Appendix 5.)

Let’s step back from complicated, assumption-laden modeling that seems irrelevant to our ZW vs. LL comparison to look at what the lineage data seem to say prima facie. (Jesse Bloom and Trevor Bedford wrote a convenient introductory discussion.) Lineage A shares with related natural viruses the two nucleotides that differ from B. Thus lineage A was the better candidate for being ancestral, as Pekar et al. acknowledged. Pekar et al. describe 23 distinct reversions out of 654 distinct substitutions in the early evolution of SC2. The chance that when two lineages are separated by two mutations (2 nucleotides, “2nt”) both those mutations would be reversions is then roughly (23/654)²= 0.00124 = ~1/800. (A more detailed discussion of the probability of reversions, giving a lower value, is included in Appendix 2.) At this point that conclusion tells us nothing about P(LL)/P(ZW), but it will become important when integrated with information about locations of early cases and early viral traces.

Lineage A was almost entirely absent from the main suspected site of the wildlife spillover, the Huanan Seafood Market (HSM). Although many traces of B were found in HSM, traces of A were found only on one glove, with additional mutations indicating that it was not from an early case. The early cases with known market linkage were of lineage B, not A. Thus the sequence data indicate that lineage A was quite unlikely to have originated at HSM. This conclusion applies whether or not the spillover that led to lineage A was the only one or whether there was a separate spillover to lineage B.

Both Kumar et al. and Bloom have analyzed the phylogenetic data, concluding that neither A nor B was the MRCA, which they argue differed from B by 3nt shared with wild relatives, not 2nt. The MRCA was probably present in Oct. 2019, with the first spillover case likely to have occurred weeks earlier. Bloom finds more early lineage A (and other sequences closer to their suspected MRCA) at multiple locations away from the market, including other parts of Wuhan, other parts of China, and other countries. The phylogeny data thus seem inconsistent with HSM being the only spillover site, since lineages closer to the ancestral relatives were spreading widely before the less-ancestral lineage showed up at HSM.

The point of the Pekar et al. paper seems to have been that the absence of traces of early lineage A in HSM does not rule out the possibility that HSM was a spillover site for lineage B since lineage A could have spilled over elsewhere. That possibility does not require that the probability of having had just one spillover is very small, just that the probability of having had more than one is not very small. Thus although the many errors in the Pekar et al. paper, discussed in Appendix 2, invalidate its conclusion that a unique spillover was highly improbable the lineage results are still reasonably compatible with a multi-spillover picture including one at HSM.

We’ve looked at whether the sequences found in the HSM were reasonably compatible with that being the only spillover site (they weren’t) but we haven’t made the equivalent test for WIV. Depending on what sequences were there, one could end up with a Bayes factor either favoring ZW or LL. Unfortunately we have little information. In Sept. 2019 WIV started removing public access to its sequence collection, finishing early in the pandemic. Publication of newly gathered sequences seems to have abruptly stopped with those gathered in 2016, at least according to the data I’ve been provided. (If someone knows of updates that would be helpful.) Y. Deigin discusses further omissions from public disclosure of what sequences were known as well as of when and where they were obtained.

Some people consider the lack of evidence for a close match of a WIV sequence to SC2 as indicating that SC2 was unlikely to come from WIV. Others have said it’s just from reflexive bureaucratic secrecy with no particular implications. Others have read the missing-data situation as indicating a systematic cover-up of some embarrassing sequence data. Support for the latter interpretation may be found in a note dated 4/28/2020 from Daszak, a leader on the DEFUSE proposal: “ …it’s extremely important that we don’t have these sequences as part of our PREDICT release to Genbank…. having them as part of PREDICT will being [sic] very unwelcome attention…” The official explanation of why the data base was taken offline was that it was being hacked. It seems to me that it would have been easy and inexpensive to make copies on some standard read-only media and distribute these to many dozen labs and libraries around the world. That would have made the information available without allowing hacking by anything short of a massive worldwide conspiracy. An evaluation of the likelihoods under ZW, ZL, and LL of the removals of various sorts of data from Wuhan and the inconsistencies between various statements of prominent virologists would be an interesting project for a social scientist, but not one I will use to update here.

Although fulling knowing what sequences were in Wuhan labs would be almost equivalent to answering the origins question, our estimate of what’s there would mostly just be based on the other evidence leaning toward LL, ZL, or ZW, augmented a bit by a highly subjective sense of how forthright people are likely to be. To summarize in more formal-looking language that some readers have indicated they prefer:
ln(P(no known backbone sequence, multiple types of hidden data|LL)/P(no known backbone sequence, multiple types of hidden data|ZW) = ~0±big uncertainty.
No update is justified.

In combining the lineage and case location data we can simplify a bit by using one point on which there is unanimity– if there were more than one spillover either all or none were lab-related. Is there evidence that lineage B spilled over to humans at HSM? If so, that would support a market-based ZW despite the otherwise low odds for a Wuhan market account due to Wuhan’s extremely small fraction of China’s wildlife trade.

The widely publicized paper by Worobey et al. used case location data to argue that HSM was not just a superspreading location but also the location of at least one spillover to humans. Worobey et al. argue that since there were hundreds of plausible superspreading locations it would require a remarkable coincidence, with probability ~1/400, for a possible spillover site, HSM, to be the first ascertained spreading site unless it were the actual spillover site. While the argument sounds reasonable, one can get a preliminary empirical feel for how much of a coincidence that would be by looking at the first notable ascertained outbreak in Beijing some 56 days after initial cases were controlled. It occurred at the Xinfadi wet market, which could not have been the site of the months-earlier spillover. In Singapore, the “biggest Covid-19 community cluster” was found at the Jurong seafood market. Apparently first ascertainment of spread of a pre-existing human virus is not so unlikely to be located at a wet market.

The case data Worobey et al. used omitted about 35% of the clinically reported known cases, probably ones that were not PCR-confirmed. Omission of cases can be a serious problem for an analysis based on spatial correlations. (Proximal Origins author Ian Lipkin described the Worobey et al. analysis as "… based on unverifiable data sets…") The collection of clinically reported cases and of ones then PCR-confirmed already was biased because proximity and ties to HSM were used as criteria for detecting cases in the first place. I now include in Appendix 2 a rather rigorous argument that Worobey at al. themselves present evidence demonstrating that proximity-based case ascertainment bias must have been too large to allow proximity-based inference about the origins.

Even aside from the severe ascertainment bias, re-analysis using standard spatial statistical methods by experts in such techniques showed that the statistics used could not identify HSM as the starting location. In addition to the more technical re-sampling statistical analysis, the re-analysis made the obvious point that in a modern city infections do not spread symmetrically in a short-range local pattern but follow other routes, e.g. commuter lines. A paper that Worobey et al. cite specifically shows extremely anisotropic movements around Wuhan.

Worobey et al. do not cite a single relevant instance in which the sort of case-location data analysis they used identified the source of an epidemic. In the closest historical analogy I can think of, John Snow’s famous 1854 map-based identification of a water pump as a cholera source, people from infected households had walked from their houses to the pump. Even for Snow the most convincing evidence for water-borne disease causation was not spatial distribution of a cluster around the pump, subject to multiple confounders, but rather correlation with the pseudo-random spatially mixed distribution of water from two companies, only one of which was polluted. Unfortunately an analog of one of his most convincing pieces of evidence, reduction of the disease cluster round the pump right after its handle was removed, is not available for SC2. To the extent that such a temporal correlation is available, we have seen that it points toward LL, unfortunately due to the timing of the onset rather than the timing of a reduction.

A report from the WHO and the Chinese CDC looking at the case location data concluded “Many of the early cases were associated with the Huanan market, but a similar number of cases were associated with other markets and some were not associated with any markets….No firm conclusion therefore about the role of the Huanan Market can be drawn.” That agrees with an extensive analysis by Demaneuf detailing the serious obstacles to inferring a spillover location from the sparse and non-random case location data.

Worobey et al. include a map of locations of requests to the Weibo web site for assistance with Covid-like disease, which provides a way of looking at the location distribution within Wuhan without selective omission of cases. The earliest Weibo map they present shows a tight cluster near to but not centered on HSM. Instead it clusters tightly more than 3 km southeast on a Wuhan CDC site (not part of WIV) where BSL2 viral work was done. Just before the time of the first officially recorded cases the CDC opened a new site within 300m of HSM, indistinguishable from the HSM site via the sorts of case location data used in Worobey et al.

More relevant to the question of the original spillover, the paper that provided the Weibo map also had a map of Weibo data prior to 1/18/2020. By far the largest cluster of early reports in this data set is close to the WIV on the south side of the Yangtze, as shown in this version of that map from a Senate report that includes WIV and HSM locations. Such maps cannot reliably point to the spillover site.

Worobey et al. present another argument— that the distribution of SC2 RNA within HSM pointed to a spillover from some wildlife there. If correct, that argument would be more directly relevant to whether a spillover occurred at HSM than are the locations of cases after Covid became more widespread.

The positive SC2 RNA reads did tend to cluster in the general vicinity of some of the HSM wildlife stalls, even after correcting for the biased sampling that focused on that area. That area, however, is also where bathrooms and a Mah Jong room are located, both likely spreading sites. Demaneuf documents extensive evidence that the early market cases were largely of old folks who frequented the stuffy little crowded Mah Jong room. A finer-grained map using the Worobey data showed the hot spot to be centered on the bathroom/Mah Jong spot, not the nearby wildlife stalls.

In a short-lived coda, there were many press stories that SC2 RNA found in a stall with DNA of a raccoon dog showed that species to be the intermediate host. The presence of wildlife in the market was not news– it is implicit already in our priors. The question was whether there was some particular connection between that wildlife and SC2. When Bloom went over the actual data for the individual samples, he found that particular sample had almost undetectable SC2 RNA, far less than many others. Overall, sample-by-sample SC2 RNA correlated negatively with the presence of DNA from possible non-human hosts. Actual wildlife-infecting viruses, in contrast, correlated strongly positively with the corresponding DNA. A newer analysis by Bloom again indicates no support for an HSM wildlife spillover.

Thus the internal SC2 RNA data make it unlikely that wildlife had any direct connection with SC2 spread in HSM. As the head of China’s CDC concluded, “At first, we assumed the seafood market might have the virus, but now the market is more like a victim. The novel coronavirus had existed long before”. That is consistent with the prior likelihood of Wuhan being the location of a market spillover already being far less than 1% because Wuhan markets sold less than 0.01% of the Chinese mammalian wildlife trade. Nonetheless, to be conservative I will not include a Bayes factor disfavoring the general ZW hypothesis at this point, since markets are not the only path by which viruses can spillover. To express that thought in formal language again:
ln(P(market cases but negative internal correlation|LL)/(P(market cases but negative internal correlation|ZW) )= ~0 ± a bit.

Summary of Key Zoonotic Papers

Before going on to discuss other likelihood factors it may help to look back at the three papers just discussed. Regardless of whether the estimates I’m about to give of likelihood factors hold up well (one has already changed a lot thanks to discussions), the most solid conclusion is that the key papers on which the standard zoonotic story rests are extremely shaky. See Appendix 2 for more details.

Intermediate hosts

The failure to find any positive statistical association of SC2 RNA with any plausible intermediate host in the HSM points to a larger issue. For both the important recently spilled-over human coronaviruses, SARS-CoV-1 and MERS, intermediate wildlife hosts were found. In contrast, no wildlife intermediary has been found anywhere for SC2 despite intense searches. According to the Lancet Commission “Despite the testing of more than 80000 samples from a range of wild and farm animal species in China collected between 2015 and March, 2020, no cases of SARS-CoV-2 infection have been identified.”

Intermediate hosts were found for 3 of the 4 other recently identified human betacoronaviruses, with the missing one (HCoV-HKU1) causing a relatively minor disease that provoked relatively little attention. It would not be likely that China could have found the intermediate host for HCoV-HKU1 since retrospective evidence was found for its existence in Brazil and elsewhere years before it was first described in Hong Kong. I’ve found no indication of a search for intermediate hosts at any of those locations. A broader review of human coronaviruses finds that intermediate hosts have been identified for 5 of the 7 described, not counting SC2.

Given the enormous attention paid to SC2, I think the probability of not finding any intermediate under the ZW hypothesis would be less than for the other coronaviruses, but we can conservatively estimate the logarithm of probabilities consistent with the observations for the other coronaviruses. I calculate the expected value of
ln(P(no wildlife host found|ZW)) assuming a uniform prior on the probability of non-observation. (See Appendix 3) Although the identification of intermediate hosts for the two most relevant cases produces the most negative expected
ln(P(no wildlife host found|ZW)) it has large uncertainty due to the very small sample. The larger samples give less negative values for ln(P(no wildlife host found|ZW)) but with reduced uncertainty. (See Appendix 3)

Of course, P(no wildlife host|LL) = 1. Thus based on the absence of any intermediate host samples expected for ZW our probabilities should be updated by a modest likelihood ratio of ~4, corresponding to:

L₃ = +1.3 ±0.6 giving logit₃ = 1.2.

There is an interesting caveat. Wenzel has argued based on SARS-CoV-1 phylogeny that the market animals believed to be SC1 intermediate hosts were just secondary hosts picking up infections from humans. He argues that SC1 directly infected humans from some bat. I don’t know enough to evaluate the strength of those arguments, but if correct they would slightly increase the likelihood of failing to find an intermediate host if there were one. A bigger effect on our calculations would be to shift some of the priors away from any market story toward a direct bat story. In Appendix 5 I take a cursory look at direct bat accounts.

To be symmetrical, one should also consider whether there are any traces of an intermediate host of the type that might be found under the LL hypothesis, i.e. either cell cultures or humanized mice that would be used in the type of work proposed in DEFUSE. SC2 sequences did show up in data from the Sangon sequencing lab, which DEFUSE had named as a sequencing lab it would use, in irrelevant Antarctic samples contaminated with standard lab Vero and hamster culture cells. DEFUSE had specifically described planning to use Vero cells. The Vero and hamster mitochondrial sequences show a peculiar complementarity, suggesting the sort of cell fusion that can be induced by viral infections. Human sequences are also present. The Antarctic samples were gathered in Dec. 2019, but the contaminating lab culture samples might have been gathered later since the sequencing was done in Jan. 2020.

Three mutations that differ from the initial SC2 sequence but are shared with related wild viruses were detected in these samples, out of just 14 nt difference from lineage B.. Most strikingly, these three are just the ones that Kumar et al. assigned to the MRCA. That not only supports the Kumar et al. phylogeny but also shows that these lab samples either contained the MRCA or multiple strains that included the MRCA nucleotides. (See Appendix 2 for details.) Unfortunately the sequences are fragmentary so it is not known if a complete MRCA sequence was present.

Comments from prominent virologists, including Bloom, Andersen, and Crits-Christof discuss possible interpretations of the data. One possibility is that the range of mutations represents an ancestral quasi-species in cell culture, for which only one or a few variants then made it through the spillover. Another is that all the SC2 RNA was obtained from multiple patients sampled in a brief time window after the pandemic was detected, and then cultured in the lab before the lab samples were sent in. Either interpretation is reasonably plausible and the second is compatible with ZW. Thus although some have cited the Sangon observation as strong evidence for LL it doesn’t let us update the odds with much confidence.

Pre-adaptation

Several other simple properties of SC2 would be expected under DEFUSE-style LL but have been widely noted as surprising under ZW. One feature is that the ACE2 binding site worked better for humans than for bats, even before having a chance to evolve in people. As a Nature paper noted “Conspicuously, we found that the binding of the SARS-CoV-2 S protein was higher for human ACE2 than any other species we tested, with the ACE2 binding energy order, from highest to lowest being: human > dog > monkey > hamster > ferret > cat > tiger > bat > civet > horse > cow > snake > mouse.“ The binding to human ACE2 is also substantially stronger than to raccoon dog ACE2. It would also be expected after serial respiratory passage through lab mice with humanized ACE2.

The initial protein evolution in humans was much slower than for SARS-CoV-1, with about a factor of 5 lower ratio of non-synonymous to synonymous mutations. The FCS region of the original SC2 also evolved little when grown in human cell cultures. The contrast with the behavior of SARS-CoV-1, whose natural origin is established, strongly suggests that SC2 had already had a chance to adapt to a human cell environment, such as the human airway epithelial cells whose planned use was described in DEFUSE. One of the most prominent advocates of the ZW account, Proximal Origins coauthor Eddie Holmes, in a communication with the others on 2/10/2020 noted this contrast with SARS-CoV-1: “It is indeed striking that this virus is so closely related to SARS yet is behaving so differently. Seems to have been pre-adapted for human spread since the get go.”

One might speculate that the slow early evolution in humans was due to some special generalized cross-species infectivity of SC2. That possibility was checked in detail by comparison with early evolution in minks after spillover from humans. The finding was again a sharp contrast between the apparent pre-adaptation for humans and the rapid evolution after spillovers to minks: “[SC2’s] apparent neutral evolution during the early pandemic….contrasts with the preceding SARS-CoV epidemics….Strong positive selection in the mink SARS-CoV-2 implies that the virus may not be preadapted to a wide range of hosts.”

These combined initial adaptation features, each expected for a DEFUSE-style LL but surprising for a ZW origin like that of SARS-CoV-1, should shift the odds further toward LL. Unlike some other updates, they do not easily lend themselves to semi-quantitative form but I think it is hard to see why such features would strike even expert advocates of ZW as anomalous if they were nearly as consistent with ZW as they obviously are with LL. I think that another likelihood factor
P(adaptive features|LL)/P(adaptive features|ZW) = ~3 would be conservative. I will use a small standard error only to indicate that much smaller values are implausible, not to imply that much larger values are implausible.

L₄ = ~+1.1 ±0.5 giving logit₄ = 1.0.

Pre-adaptation combined with intermediate hosts

In treating P(adaptive features|ZW) and P(no wildlife host found|ZW) as independent factors I have made an approximation that overestimates the likelihood of ZW. A virus that circulates extensively in some post-bat wildlife has a chance to evolve from bat intestinal oral-fecal propagation to the very different respiratory propagation mode found in humans, civets, etc. That possibility, however, is nearly ruled out by the failure to find any proximal wildlife host. Even more surprising, no experiment has shown that any early strain of SC2 is even able to sustainably propagate in raccoon dogs or any other candidate host.

Spillover from sparse wildlife hosts is possible, but that would imply little chance for evolution since leaving bats. The combined data are then less compatible with ZW than would be calculated from a simple product of separate adaptation and host factors. This tension between the limited chances for post-bat pre-human evolution and the apparent pre-adaptation was a topic of discussion among Proximal Origins authors on 2/3/2020. Holmes wrote “No way the selection could occur in the market. Too low a density of mammals: really just small groups of 3-4 in cases.” Garry replied “That is what I thought as well…”. Holmes summed up: “Bottom line is that the Wuhan virus is beautifully adapted to human transmission but we have no trace of that evolutionary history in nature.”

Since then several bat sarbecoviruses, dubbed BANAL, have been reported to be found in Laos. Some have good human ACE2 binding although none have been found to have an FCS. Although the closest sequence of these to SC2 still differs by ~1000 nt, too much to change in the relevant time window, their existence raises the possibility that a fairly well-adapted ancestral virus could exist. As I discuss in Appendix 5, this could lead to a zoonotic account without tension between the lack of intermediate hosts and the good pre-adaptation because intermediate hosts would not be necessary, but in such an account ZL would be more probable than ZW.

The FCS and its neighbors

Most LL advocates have argued that the mere fact that SC2 has an FCS is strong evidence for LL since no close relative of SC2 has an FCS and DEFUSE proposed adding an FCS. As we have seen, even the lead author of Proximal Origins thought the FCS was at least some evidence favoring LL. Nevertheless, the argument that simply having an FCS gives a major factor is exaggerated, since it would only apply to some generic randomly picked relative. SC2 is not randomly picked. We are only discussing SC2 because it caused a pandemic. So far as we know having an FCS may be common in the subset of hypothetical related viruses that are capable of causing a human pandemic. In other words even though P(FCS|ZW) is much less than 1 for some generic sarbecovirus P(FCS|ZW, pandemic) need not be. One needs to be cautious in using fitness-enhancing features such as the FCS in likelihood calculations. (See Appendix 4 for a consolidated discussion of how the FCS data are used here.)

Although it is not appropriate to use the non-existence of FCS’s in bat sarbecoviruses to estimate P(FCS|pandemic, ZW) the lack of an FCS in any non-bat sarbecoviruses seems to provide evidence that even though an FCS can enhance fitness in respiratory infections it’s just hard for sarbecoviruses to acquire one. The FCS of SC2 clearly has provided major evolutionary advantages for transmission in other species, yet there are no other known FCS-containing sarbecoviruses in any of the 27 non-bat species known to host sarbecoviruses. The long period of bat interactions with a range of other non-bat mammals has not produced a spillover of a persistent FCS-containing virus even though it has produced many successful spillovers. One might expect each such successful non-bat sarbecovirus to have higher probability of having an FCS than would a recently spilled-over human sarbecovirus, even one that was to go on to later have a successful career as a pandemic-causer, since these non-bat viruses have had more chances to pick up an FCS, especially by template switching with host DNA. There should be a factor disfavoring ZW based on this empirical lack of sarbecovirus FCS’s even in the face of selection pressure, but I want for now to be cautious since the presence of fairly many FCS’s in the broader category of betacoronaviruses suggests that one should not to draw too strong an inference from their absence in sarbecoviruses.

Of the 27 species hosting sarbecoviruses, 10 or more sequences have been found in only 13. We should probably ignore the less-sampled other species. One might estimate then that a sarbecovirus that succeeds in establishing itself in a non-bat species has a probability of picking up an FCS of less than ~1/13. The formal calculation of Appendix 3 would give ln(P(FCS)|ZW)=-3.25 for zero hits on 13 tries.

Are those other species relevant? I think so but am not sure. Let’s say that their relevance is somewhere between none and complete. Crudely using a uniform distribution [-3.25,0] on the ln(P) we get

L₅ = -1.6 ±0.9 giving logit₅ = -1.3.

Since the presence of the FCS struck everyone from Andersen to Baltimore as suggesting lab modification, I think this update factor is reasonably cautious.

The specific contents of the FCS, however, do provide evidence. Focusing on the internal details of the FCS site is not cherry-picking statistical oddities from a large range of possibilities, since it is specifically the tiny FCS insertion that seems so peculiar for this type of virus and so predictable for DEFUSE-style synthesis. One of the Proximal Origins authors, Robert Garry, initially reacted: " I really can't think of a plausible natural scenario where you get from the bat virus or one very similar to it to [SC2] where you insert exactly 4 amino acids 12 nucleotide that all have to be added at the exact same time to gain this function -- that and you don't change any other amino acid in S2? I just can't figure out how this gets accomplished in nature. Do the alignment of the spikes at the amino acid level -- it's stunning. Of course in the lab it would be easy to generate the perfect 12 base insert that you wanted.” One particular detail of the FCS (codon usage, discussed below) initially struck David Baltimore as a “smoking gun” for LL, although he later moderated that claim.

As we saw in our introduction of the methods, rather than categorizing each unusual feature as either a smoking gun or mere coincidence, Bayesian analysis assigns each feature a quantitative odds update factor. Events that are unusual under some hypothesis do not rule out that hypothesis but they do constitute evidence against it if the events are more likely under a competing hypothesis. Our task here is to try to turn the qualitative surprise into a rough quantitative likelihood ratio.

The feature that struck Baltimore is that the SC2 FCS has two adjacent arginines (Arg’s), each coded for by the nucleotide codon CGG. CGG is the least common of the 6 Arg codons in all related natural viruses. CGG is only used for ~2.6% of the Arg’s in the rest of SC2. None of the other 40 Arg’s on the spike protein use CGG. If we treat them as approximately independent we get P(CGGCGG|ZW)= 0.026²= ~0.0007. One can check the independence assumption for generic sarbecovirus codons using Arg pairs in closely related viruses, finding that there are zero CGGCGG’s of over 3000 ArgArg’s, indicating at best no tendency for CGG’s to pair and perhaps a tendency not to. In a broader set of relatives, the fraction of ArgArg pairs coded CGGCGG ranges from 0 outside Africa and Asia to 1/10790 in Asia to 1/5493 in Africa.

The probability of finding a CGGCGG in some generic ArgArg pair thus turns out to be very low compared to an estimate of the probability for a synthetic sequence, to be discussed below. The most favorable ZW likelihood then follows a different path, a possibility of which I was initially unaware but which a pseudonymous twitter user pointed out to me. The pattern that Garry noted could be typical for a lab insertion but could also occur by a one-step natural insertion of the whole 12 nt piece. Such large insertions are not very common, but when they do occur they have different codon frequencies than the rest of the virus since insertion can be read in a different frame than the source, can be reversed in direction, and has different nucleotide frequencies. Fortunately, an initial tabulation of the fraction of ArgArg’s that would be coded CGGCGG in such random long insertions has just been calculated to be 0.0158, much larger than the 0.0007 calculated from the rest of the sequence. Since the appearance of the extra 12nt piece already strongly suggested that it was a long insert, there is no need to reduce the 0.0158 much to allow for other possible evolutionary paths. We have ln(0.0158 )= -4.1, with small uncertainty compared to our upcoming estimate of the corresponding term for the LL account.

We need to compare that with an estimate of P(CGGCGG|LL). Here the argument will be less direct than for P(CGGCGG|ZW), because we don’t have a good extensive comparison set of lab insertions similar to that hypothesized for FCS under ZW. Since we will have to refine our estimate of P(CGGCGG|LL) using synthetic sequences other than viral inserts, it’s important to consider how the optimization criteria vary for different synthetic purposes and how that might affect codon use.

If the LL codon choice were purely random, we’d have P(CGGCGG|LL)=1/36. When sequences are synthesized for use in hosts, however, they are typically “codon optimized”, using the more common host codons, such as CGG in humans, even more frequently than they are found in the host. CGG codes for 20% of human Arg. Thus a reasonable first minimum estimate of P(CGGCGG|LL) would be 0.2²=0.04. More likely, since the two rarer codons would generally not be used, a good low estimate would be (1/4)²=0.06.

I found two convenient relevant examples of how often CGG would be used in modern RNA synthesis for human hosts, specifically of stretches coding for portions of the SC2 spike protein used in the Pfizer and Moderna vaccines. Both mRNA vaccines and viral genomes need to be stable in the host organism and to work well at highjacking the host machinery to generate the proteins for which they code, so there’s quite a bit of overlap in the criteria used in choosing codons.

Unlike vaccine mRNA, however, viral RNA also needs to replicate well and to pack well into the viral package. For our purposes, looking at just a few nt on an insert that already disrupts the previous RNA structure, packing is probably irrelevant. Is there any indication that CGG is thought to be a particularly poor replicator in humans, in which case we should lower our estimate of P(CGGCGG|LL) compared to what’s found in mRNA vaccines? In the years since SC2 started, almost all strains remain CGGCGG, although some synonymous mutations to CGUCGG are now present. Thus there is no indication that a viral sequence designer would have any special reason to avoid CGG for reproductive reasons, so the vaccine coding can give us a rough idea of how likely a CGGCGG choice would be for a synthetic viral sequence.

CGG is used far more often in the Pfizer and Moderna vaccines than in the natural viruses: “The designers of both vaccines considered CGG as the optimal codon in the CGN codon family and recoded almost all CGN codons to CGG.” 19 of 41 Arg codons in Pfizer are CGG, as are 39 of 42 in Moderna. The designers were not inspired to use CGG by its appearance in the FCS on the target protein, since none of the other 40 Arg’s on that protein use CGG. Deigin has pointed out another reason that a researcher inserting coding for ArgArg might specifically choose CGGCGG— it provides a marker for a standard, easy, restriction enzyme test allowing the researcher to know if that insertion is still present or has been lost, an important consideration since FCS’s tend to get lost in cell culture. (AGGCGG would also code for ArgArg and work for the marker.) On the other hand, although both designers were fond of CGG, neither used CGGCGG for the ArgArg pair, indicating that they had some reason to avoid it, perhaps connected to occasional translational errors that might be particularly important to avoid in vaccines although less important for viral fitness. The |LL) likelihood factor here may go up or down if I can track down why the vaccine designers chose not to use CGGCGG.

The amino acid sequence of the SC2 FCS is identical to a familiar human amino acid sequence that would be a good candidate for use in a furin cleavage site promoting infectivity. In that human FCS sequence the ArgArg pair is coded CGUCGA, which would become CGGCGG either under the choice CGN—>CGG usually used by vaccine coders or to implement the standard tracing procedure described by Deigin.

In the one example of which I’m aware in which a collaborator of the WIV group added a 12nt code for an FCS to produce a viral protein via a plasmid (reminiscent of the 12nt addition in SC2) they only used CGG for one of its three Arg’s. Other plasmid primers from WIV use high fractions of CGG, including CGGCGG dimers, but again these are for plasmid work and thus subject to substantially different optimization criteria.

We can check that we have not missed some important argument that CGG would be disfavored in a lab by reading Andersen’s extensive argument that CGG did not indicate LL. While presenting detailed non-statistical scenarios of how CGG might possibly arise naturally, it makes no mention of any reasons why it might be disfavored in a lab.

Given the strong indications that CGG is a popular codon for use in synthetic sequences for human hosts, I’ll assume that the purely random 1/36 is the absolute minimum estimate of P(CGGCGG|LL). We’ve seen a couple of plausible though not compelling accounts of why CGGCGG might specifically be chosen. The absolute maximum estimate is of course 1.0. We can then use the geometric mean between those limits as our consensus estimate, 1/6. Using a uniform prior on the log we get ln(P(CGGCGG|LL))= -1.8 ±1.1. Combining with our estimate for ZW gives

L₆ = 4.1-1.8 = +2.3 ± 1.1 giving logit₅ = 1.9

This is far less important than the result I had initially used based on whole-sequence codon frequencies. Statistically alert readers will be suspicious on seeing still another “2.3”, but this time that’s just how it came out.

The DEFUSE proposal mentions plans to modify the N-linked glycans of a natural backbone. Their fitness depends strongly on the host environment. SC2 is missing one that is found in its relatives. Further work would be needed to estimate how much that should change the likelihood ratios. It is particularly relevant for the direct bat to human route, since that would require two features (FCS and the modification of the N-linked glycan) that are unfit in bats.

New Government Funding Decisions

I have not used official statements of various government agencies so far, primarily because in any country agencies have many motivations other than simply telling the public what they know. They presumably do know some things, however, beyond the public record, and that knowledge can be reflected in their concrete actions. With due allowance for other political motivations, government actions can give some evidence beyond the direct public record.

Two major U.S. agency funding decisions have come out since the first version of this piece. In one, funding for a large USAID program to sample wild viruses internationally was eliminated over concerns about “the relative risks and impact of our programming (including biosafety…)”. Since that program did not directly involve viral modifications its cancellation reflects more on ZL risks than on LL risks. Now Health and Human Services has banned WIV from receiving funding on the grounds that “WIV conducted an experiment that violated the terms of the grant regarding viral activity, which possibly did lead or could lead to health issues or other unacceptable outcomes.” Despite the delicate language the concern about possible “unacceptable outcomes” is clear. The detailed account of HHS/WIV interactions makes it clear that WIV’s secrecy about their viral work was intense enough for them to give up a significant funding source, a stronger indication of motivations than merely shutting down some public information. If these funding decisions had been made by political factions committed to an LL account, they would have no significance. Since the current administration has no such commitment, they seem to be good indications that non-public information is consistent with lab-related accounts. I’ll refrain from using them to update our odds for now, since it could be too soon to be confident about what they indicate.

Summing up

Although the point estimate of the likelihood factor ignoring uncertainties would be ~1,100,000, down-weighting due to uncertainties reduces the likelihood factor to ~290,000. Combining that with the point estimate of the prior logit would give rather extreme odds, P(LL)/P(ZW) = ~2900. Integration over the range of plausible priors will bring those odds down substantially. The reason is not hard to see. If our point estimate of the logit, corresponding to P(LL) = ~ 99.97%, is low, raising it picks up almost no extra P(LL) because it’s already almost 100%. If on the other hand we were to lower our logit point estimate there is plenty of room for P(LL) to go down.

Let’s estimate how the uncertainty in the log of the prior odds reduces the net odds by trying some probability distributions for that log, fixing the standard deviation at 2.3. Integration (See Appendix 3) for distributions that are Gaussian, fat-tailed 3-degree-of-freedom-t, and uniform gives

Odds = 250/1, 140/1, and 430/1

respectively. The form of the distribution is thus not especially important here. These odds estimates are near the middle of the previous attempts at comprehensive quantitative Bayesian estimates, described in Appendix 1, which gave ~30/1, ~500/1, and 1000/1, although one newer estimate gave ~1/2.

I think roughly 200/1 is conservative because I was fairly conservative about each factor, left out some other factors that might tend to support LL, and allowed reasonable standard errors to further down-weight the likelihood factors. Nevertheless, people tend to underestimate uncertainties, so a reader might well suspect the standard errors should be larger. To get a feel for how robust the results are, we can ask what happens if we double the logs of all the uncertainty-based likelihood discounts, shift the priors to center at 1/1000 rather than 1/100, and use the fat-tailed distribution. The odds would become 20/1. The bottom line is just that LL looks a lot more probable than ZW, with some room for argument about exactly how much more probable.

Retrospective on methods

How then could so many serious scientists have concluded that P(ZW) is bigger than P(LL) or even that P(ZW) is much bigger than P(LL)? There was of course a great deal of intensely motivated reasoning, as the recently published internal communications among key players vividly illustrate. For those just following the literature in the usual way, the impression left by the titles and abstracts of major publications suggested that ZW had been confirmed, although we’ve seen that the arguments in the key publications disintegrate or even reverse under scrutiny. When major errors were found in the key papers, the authors resisted making even mathematically necessary corrections, in contrast to what I’ve tried to do here.

There has also been a familiar methodology problem among the larger community that accepted the conventional conclusion. Although simple Bayesian reasoning is often taught in beginning statistics classes, many scientists have never used it and fall back on dichotomous verbal reasoning. The initially more probable story, ZW in this case, is given qualitatively favored status as the “null hypothesis”. Each individual piece of evidence is then tested to see if it provides very strong evidence against the null. If the evidence fails to meet some high threshold, then the null is not rejected. It is a common error to then think that the null has been confirmed, rather than that its probability has been reduced by the new evidence. After a few rounds of this categorical reasoning, one can think that the null has been repeatedly confirmed rather than that a likelihood ratio strongly favoring the opposite conclusion has been found.

What should be done?

Despite prior probabilities favoring zoonosis we have seen that after evidence-based updating the odds strongly favor a lab leak origin. Thus it was wrong to dismiss prior warnings of lab risks. How might that inform our actions?

Blaming China is about the most counterproductive possible reaction. The lead Proximal Origin author, Andersen, alluded to the dangers of such blame when on 2/1/2020 he asked his colleagues: “Destroy the world based on sequence data. Yay or nay?” We’ve now seen what the sequence data say but we don’t want to destroy the world— just the opposite. We need to regulate pathogen research in ways that avoid the most dangerous work while expanding work needed to develop vaccines and therapies. No new ideas are needed for the guidelines, since in 2018 Lipsitch already outlined exactly the sort needed to achieve those goals. Meanwhile, paying attention to lab risks cannot be an excuse to ignore ongoing zoonotic risks, since even if this pandemic probably came from a lab we know that others have been zoonotic.

Reflection

None of the three clear existential threats to humanity– global warming, new pathogens, and nuclear war– can be addressed without science. I think that some public trust in science is a necessary though not sufficient condition for successful defenses against those threats. For example, public awareness of the scientific conclusion that SC2 mainly spreads by aerosols and of the value of indoor air filtering would have limited and still could limit the disease burden. When scientists are not candid about what we know we undermine the necessary public trust.

Appendix 1: Previous Bayesian analyses

Tuntable has written a summary of a variety of lines of evidence, for the most part parallel to the argument I’ve presented. The spirit is Bayesian but few of the pieces of evidence are assigned quantitative values. The conclusion is similar to the more explicitly Bayesian analyses.

Demaneuf and De Maistre’s Bayesian analysis, written before either DEFUSE or the WIV sampling in Laos were known and omitting sequence considerations, provides a useful introduction to the form of the arguments, as well as detailed analyses of the priors. Readers who find something confusing about the basic reasoning may find their “rebuttal of common misunderstandings” particularly useful.

A brief Bayesian analysis by J. Seymour only considering priors and geographical factors (like my early one) came out in Jan. 2021. It considers a range of possible values obtaining estimates of lab leak probability ranging from 0.05% to 91%. The biggest difference from my current analysis is that Seymour uses no biological data, but he also mostly uses lower priors, without empirical explanation.

The first fairly comprehensive Bayesian analysis that took geographical, biological, and social factors into account came out in 2020 from “rootclaim”, a project led by Saar Wilf. It concluded that some lab event is about thirty times as likely as a pure zoonotic wildlife scenario. That analysis contains a wealth of useful references and discussion but is a bit out of date and uses an obscure method of accounting for uncertainties in the factors.

An extraordinarily detailed analysis from early 2021 by S. Quay concluded that the probability of a lab leak origin was 99.8%, i.e. 500 times as likely as pure zoonosis. (I had forgotten hearing of Quay’s paper until after I finished the core analysis of this paper, so the detailed analyses are independent.) Although there is overlap with my analysis, Quay’s mathematical treatment does not follow a systematic logical system, as Andrew Gelman noted.

Louis Nemzer tweeted an analysis on 10/28.2021 that used straight Bayesian methods rather than robust Bayes, i.e. did not include uncertainties on the factors. This analysis is particularly compact and easy to follow. It includes priors that are somewhat less favorable to LL than mine, no factor for the timing coincidence, a large factor that I don’t use for the existence of the FCS, and a larger factor for the CGGCGG. He does not include factors for non-observation of hosts or for pre-adaptation. Nemzer ends up with 1000/1 odds favoring LL. Those odds are similar to those I get before averaging over the distribution of plausible priors.

An anonymous twitter user posted a brief Bayesian evaluation on 6/20/2022 with fairly much overlap with mine, also concluding that a lab leak was much more probable than competing hypotheses. They used the presence of the FCS in a way that I think is not justified, but they do not get around to using some other details of the genomic sequence that I find to be important.

In Nov. 2022 Alex Washburne posted a well-written Bayesian analysis that includes several pieces of useful auxiliary information (e.g. alternate funding sources for the work) that I do not cover here. He does not provide a numerical summary, but implies odds stronger than I obtain. As in most other analyses, he uses the existence of the FCS as evidence in a way that I argue fails to condition on the existence of a pandemic. My timing coincidence factor captures the rarity of FCS occurrences to the extent that I consider valid.

Uniquely, Washburne considers the pattern of segments that would be defined by cutting with the restriction enzymes BsaI/BsmBI that WIV used in previous work. He shows that ten synthetically assembled coronaviruses show a predictable restriction enzyme segment pattern, with only 5-8 segments and with the maximum segment length being about 8 knt. These features make sense because using more segments in assembly is of course harder and commercial segment generators show major price increases for segments longer than 8knt. SC2 lands right in the middle of the synthetic range with 6 segments, the longest being just under 8 knt. Of the related natural sequences they show, only 2 out of 42 land in the synthetic range, although the 42 sequences look like they only represent about 22 independent types. Thus at first glance it appears P(BsaI/BsmBI segment pattern|ZW) = ~0.1. Whether that likelihood remains about the same when conditioning on knowing the close relatives of SC2 is currently under discussion. P(pattern|LL) is also unclear, since although all synthetic sequences use similar numbers and maximum lengths of segments, not all leave traces of the restriction sites in the final sequence. Also, the probability that WIV would stick with the BsaI/BsmBI restriction enzyme pair may be fairly large but must be a bit less than one. Furthermore, the priors for LL are only partly for overall synthesis with the rest being for simple FCS addition. Thus although P(BsaI/BsmBI segment pattern|LL)/P(BsaI/BsmBI segment pattern|ZW) is probably noticeably bigger than 1, it seems best not to use it unless those issues are clarified a bit more.

David Johnston has posted a Bayesian spreadsheet with odds currently (1/6/2023) favoring zoonotic origins. When first posted it omitted any LL-favoring factor for missing hosts on the grounds that the disease could have come straight from bats but inconsistently also included a ZW-favoring factor due to proximity to HSM non-bat hosts. The CGGCGG factor was mentioned but not included. Repairing these two flaws would have raised the odds to moderately favoring LL, by a little more than 5/1 using his conservative estimates. The newly posted version, however, includes several other changes each favoring ZW, so that the estimated probability of ZW only drops from 0.7467896541 to 0.6781080632.

For now I will mention only a few of the problems I see with Johnston’s analysis. One of the new changes was to reduce an FCS factor from 5.0 to 2.5, with no explanation, thus minimizing changes in the bottom line. A factor of 1.5 favoring ZW is now included based on the lack of an explicit statement in FOIA documents acknowledging a lab leak. I think that is a peculiar way to interpret statements such as Andersen’s “that the lab escape version of this is so friggin’ likely to have happened because they were already doing this type of work and the molecular data is [sic] fully consistent with that scenario” and Daszak’s ” having [the sequences] as part of PREDICT will being [sic] very unwelcome attention…” as well as other statements along the same lines. Johnston includes a factor of 8 from HSM/Worobey effects, including a factor of 2.5 favoring ZW due to “proximity to wild animals” inside HSM. Given that SC2, unlike other coronaviruses, failed to show any association with potential host species, that seems like another peculiar read. The contrast of seeing those data as positive evidence of association while seeing the FOIAs as negative evidence for lab problems is striking. There are numerous other issues, including some double-counting, but these give a flavor for why Johnston’s results differ from the others.

Another anonymous twitter user has posted a handy Bayes calculator that readers can use to make their own estimates. It is suited only for straight Bayes calculations. In order to realistically allow for uncertainty in the factors users will need to try various combinations of plausible values and then take a weighted average of the resulting probabilities, not of the resulting odds, to get their best odds estimate. Averaging the odds themselves gives an over-estimate.

Appendix 2: Worobey et al. and Pekar et al.

Worobey

Arguments about how much weight to put on anecdotal evidence for strong proximity-based ascertainment bias are unlikely to be persuasive. I have noticed, however, that the Worobey et al. paper includes internal evidence that shows conclusively that there was major proximity ascertainment bias. The argument that follows is my only original contribution to the origins dispute.

Let’s consider two hypotheses, W and M. W is that all the cases ultimately come from the HSM and that fact accounts for the observed clustering near the HSM, with no major ascertainment bias. M is that the proximity ascertainment bias is too large to allow inference about the original source from the location data. These hypotheses have opposite implications for the correlation between detected linkage to HSM and distance from HSM.

For hypothesis W there is some typical distance from HSM to a linked case. An unlinked case must come from a linkable case (typically not observed) via some additional transmission steps in which the traceability is lost. The mean-square distance (MSD) to HSM of the unlinked cases would then be the sum of the MSD of the linked cases and the MSD of the remaining steps in which traceability was lost. Barring some peculiar contrivance, the unlinked cases are then on average farther away from HSM than the linked cases. The linkage-distance correlation is negative.

For hypothesis M case observation can arise either via linkage or proximity. Some cases are found by following links, others by scrutiny near HSM. Observation is a causal collider between linkage and proximity: linkage→observation←proximity. Within the observed stratum of cases collider stratification bias then gives a negative correlation between linkage and proximity, i.e. a positive linkage-distance correlation.

The relevant observational results are reported clearly by Worobey et. al. “(ii) cases linked directly to the Huanan market (median distance 5.74 km…), and (iii) cases with no evidence of a direct link to the Huanan market (median distance 4.00 km…. The cases with no known link to the market on average resided closer to the market than the cases with links to the market (P = 0.029).” The statistical significance of the deviation from the W hypothesis is even stronger than the “p=0.029” would indicate since that calculation was for the hypothesis of no difference but W implies a noticeable difference of the opposite sign. Thus the W hypothesis is strongly disconfirmed by the data of Worobey et al. The sign of the linkage-distance correlation instead agrees with the M hypothesis, that there is substantial proximity-based detection bias.

Worobey et al. do propose a distinction between types of linked cases that could in principle lead to difference in distances between linked and unlinked cases. They point out that cases linked because the patient (or a linkable contact) worked at HSM would typically be farther than ones linked via shopping at HSM. They do not directly explain how that heterogeneity within the linked cases would give a distance contrast between linked and unlinked cases. In order for that difference to show up as the observed effect one would need to add another assumption, that it would be harder to trace secondary connections to the nearby shoppers. No explanation is given for why such an effect would be expected or for what sign it would be expected to have. One might expect that among people near the market it would be easier to find contacts with neighbors with linked cases. A second-order explanation along these lines remains in the realm of possibility although more speculative than the simple first-order observational collider bias.

The existence of major proximity detection bias should not be taken to imply that there is no actual clustering of the unlinked cases. It just means we have no reliable way of knowing if there is.

Pekar

Although as we’ve seen the question of whether there were two spillovers or one has little or no general direct relevance to LL vs. ZW, the absence of the more ancestral lineage A from HSM fits poorly with the HSM version of ZW unless there were multiple spillovers. Thus despite its limited relevance a good deal of attention has been paid to the argument of Pekar et al. that there were two spillovers.

Pekar et al. use a Bayesian calculation to infer the probability that there were two spillovers rather than one based on the later phylogenetic pattern. Although the Pekar et al. phylogeny contradicts analyses based on more complete data, it’s worth looking at it in detail just to get a further feel for the reliability of major work in this field. We’ve seen that calculating Bayesian odds involves both picking priors and calculating likelihood ratios. The priors used by Pekar et al. are peculiar even on superficial inspection, but not by a huge factor, but the likelihood ratios used have multiple serious mathematical errors.

The Bayesian analysis of Pekar et al. calculates conditional probabilities of different observations for the N=1 and N=2 hypotheses, with more specific results required for N=1. Specifically, only the N=1 hypothesis is required to give “the mutation separation and relative clade size”. That imbalance is certain to bias the results toward N=2, but by how much is not known.

Three pubpeer analyses find multiple errors in the code used to calculate the likelihood ratio. One error seems to be due to a simple copy-paste mistake. The next is somewhat more conceptual, an incorrect normalization of the likelihoods. Together those two “combined corrections reduce the Bayes factors from ~60 to less than 5.” The third is a double-counting error: “Removing the duplicated likelihoods reduces the Bayes factors by a further ~12%.” A numerical correction for these three coding errors has belatedly been included in the Science paper, although without changing any verbal results or acknowledging that the errors were discovered by a pubeer contributor. In order to verbally accommodate the reduction in the Bayes factor from ~60 to ~4.3 the revised version drops the minimum cutoff for “significance” from 10 to 3.2. The full story is recounted by Demaneuf.

Oddly, after much complicated error-prone model-dependent analysis of the likelihood ratio for two spillovers vs. one spillover the prior odds were just arbitrarily assigned to be 1.0. (See page 13 of the Supplement to Pekar et al.) In effect the prior probabilities used for N, the number of successful spillovers, were P(1) =1/2, P(2)=1/2, P(3)=0, P(4)=0, etc. Let’s assume, pretty realistically, a Poisson distribution for N with expectation value x. There is no value of x nor is there any probability distribution of x that leads to the set of prior probabilities use by Pekar et al. Thus it looks like a post-hoc attempt to inflate the prior probability of N=2.

We don’t know x but it can’t be very small because then no spillovers would have been found or very big because then even more than two would have been found. A standard non-informative form for the prior probability density function of x is 1/x. Its integral diverges weakly but that divergence will not affect the odds. We can then easily integrate the Poisson probabilities over x to get the prior odds, P(N=2)/P(N=1) = 1/2. These conventional non-informative priors would reduce the resulting posterior Bayes odds from ~4.3 to ~2.2. It is peculiar that the paper did not use such a conventional exercise to obtain the prior odds without post-hoc adjustment.

Since we require N>0 to observe a pandemic the probability density of x for observed cases excludes the N=0 cases, leaving a distribution of the form (1-e^-x)/x, eliminating the small-x divergence. This does not change the odds since those excluded N=0 cases did not contribute to the odds. Extension of this method to higher N gives a very weakly divergent sum of probabilities that stays finite if truncated, e.g. at N= population of Wuhan.

The pseudonymous author who posted the major coding errors (now acknowledged) on pubpeer now reports that there is another small error in the same direction. Even without correcting for that error, rerunning the code reduced the likelihood ratio of the main estimate to 3.5, which would give odds of 1.8 when combined with conventional priors. 1.8 is less than the revised significance cutoff of 3.2. Of course, it would always be possible to choose a lower cutoff if needed. That 1.8 still includes no correction for the fundamental imbalance in the conditional likelihood conditions used for N=1 and N=2.

The remaining small effect (of unknown direction) depends critically on the quality of the epidemiological model. Fundamental problems with the model have been noted. The simplifications used in the model have been described as strongly inappropriate for SC2, with arbitrary and likely unrealistic probability distributions of spreading events. As S. Zhao et al. wrote re Pekar et al.: “In the coalescent process of their simulations, they assumed that viruses spread and evolve without population structure, which is inconsistent with viral epidemic processes with extensive clustered infections, founder effects, and sampling bias” Omitting the blotchy population structure of transmission and blotchy ascertainment probability in location and time severely loosens the connection between the model and real data. Since having a non-ascertained early phase in a distinct population (other host species) introduces just the sort of realistic elements that were left out of the over-simplified model, it is no surprise that the model fit could be improved by inclusion of these features through the hypothesized multiple spillover. These model problems affect not only the topolgy odds but also the estimated time to the MRCA.

Whether a properly done version of the Bayesian modeling exercise in Pekar et al. would leave P(N=2) or P(N=1) larger is not clear, although it is clear that N=2 could not be strongly favored. Since whether N=1 favors LL or ZW is also unknown this conclusion would not lead us to change our P(LL)/P(ZW) odds even if the phylogeny account had not been been contradicted by ones based on more complete data.

With regard to the MRCA, Pekar et al point out that most reversionary mutations (regardless of which MRCA is picked) are of the C→T form. 19 distinct such mutations were found. Only 4 other distinct reversionary mutations from lineage B were found, or 3 from lineage A. The C→T mutations often showed up more than once–I count 41 times in their Figure 1. Non-reversionary mutations also seem to sometimes occur more than once, so that total number is probably more like 800 (too hard to read in the figure) than the 654 distinct ones. Since lineage A differs by a C→T and one other, coincidentally T→C, the likelihood factor favoring A being ancestral to B over the reverse is ~20*200 = ~4000, substantially more than we found without distinguishing between different reversions. Thus following Pekar et al. to distinguish between C→T and other reversions strengthens the case that a pure B spillover is highly improbable.

One sequence listed differs from B by only 4 reversions, 2 C→T’s and 2 others. The probability that if B were the MRCA one sequence of the ~800 early ones shown would show a difference of that sort is in the ballpark of
800*(4 choose 2)*(41/800)²(4/800)² = ~3*10^-4. Even allowing for the post-hoc choice of features, i.e. some potential multiple comparisons, this is a low probability. If lineage A were the MRCA, the corresponding probability would be
~800*(2 choose 1)*(41/800)(4/800) = ~0.4, entirely unremarkable.

Once again, including Pekar et al.’s emphasis on the role of C→T reversions makes a pure-B spillover even less plausible. Although their case for a high probability of two spillovers disintegrates under inspection, there’s no particular reason to say that two spillovers have a very low probability and thus no phylogenetic reason to discount the possibility of an HSM spillover much further. The tiny fraction of the wildlife trade that went through Wuhan gives a much larger factor without any subtle complications.

Pekar et al.’s reversion statistics also help make the Sangon sequences more interesting. The probability that of the 14 Sangon mutations (from lineage B) at least 2 would be C→T reversions and at least 1 another reversion is only ~1.2%. (If any are just misreads, that percentage drops further.) The probability that those 3 reversions would exactly match the Kumar MRCA rather than other 20 known early reversions is of course lower. That match cannot be fully explained simply by saying that the earliest mutations determined Kumar’s choice of MRCA since in order to avoid distortion from time-dependent ascertainment probability the Kumar modeling explicitly did not use the observation dates of early sequences. I’m not sure how much these considerations would weigh in deciding if Sangon had a lab MRCA or a collection of very early clinical sample cultures. They certainly do not support the Pekar et al. phylogeny claims.

A recent talk by Worobey repeated the Pekar et al. errors and added an additional fundamental one. He described their calculated probability of two spillovers as having been 99.5%, i.e. 200/1 odds. Actually the paper itself, on which he was a coauthor, had originally given 60/1, with his 200/1 apparently coming from just using one likelihood rather than from taking a ratio, a truly fundamental error. He then corrected those odds to 30/1, i.e. acknowledging the factor of 6 coding error but sticking with the fundamental misunderstanding of how to get odds from likelihoods. He included no correction for either of the other two acknowledged coding errors, for the peculiar priors, or for the unbalanced outcome requirements for the two hypotheses. In the bulk of the talk, focussing on location data, no mention was made of the early Weibo map or other evidence undermining the argument for or even contradicting the HSM account.

This talk is important not for our odds calculation but rather for understanding the level of alleged science underlying the canonical account. Whatever may become of my odds estimates in the light of new evidence and new reasoning, the conclusion should hold that the key arguments on which the zoonotic view currently rests are shoddy at best.

Appendix 3: Calculations

The calculations here are not intended to imply unrealistic precision. They are meant simply to use defined logical algorithms to avoid unnecessarily adding even more subjective steps.

To estimate the expected ln(likelihood) and its variance for an event based on observing it M times out of N trials, I subjectively assume a uniform prior on the probability, x, for not finding a host when there actually is one, giving analytically solvable integrals:

For N=2, M=0 we get <ln(likelihood)> = -1.67 with standard error of 1.59.
For N=4, M=1 we get <ln(likelihood)> = -1.28 with standard error of 0.68.
For N=7, M=2 we get <ln(likelihood)> = -1.22 with standard error of 0.53.

(I just found out that this particular exercise is close to the first posthumously published example worked out by Bayes.)

Likelihood weighting and final odds

At several steps we need to convert a distribution of logs of probability ratios to net odds. This is the key step in down-weighting likelihood ratios to take into account uncertainty and in combining the distribution of plausible priors with the net likelihood ratio to get the net odds.

For each likelihood ratio we can represent the uncertainties by a nuisance parameter q_i with a prior probability density function f(q_i) with mean of zero and standard deviation s_i= V_i^1/2 such that in a hypothetical perfectly specified model
ln(P(obs_i|LL)/P(obs_i|ZW)) = L_i+ q_i. These uncertainties are important because our logit is the log of the likelihood ratio obtained from averaging probabilities over the distribution, which is not the average of the log of the likelihood ratio, L_i.

We want a simple weighting function that captures the key qualitative features, going to 1 when V_i = 0, to zero for large V_i, always contributing to the net likelihood with the same sign as L_i but never contributing more than L_i. The weighting procedure used here will be to calculate the expected odds obtained from the distribution of likelihood ratios starting with prior odds of 1.0. i.e.

There is no reason to think that this weighting procedure is optimal for a general case, but it’s adequate for the fairly small corrections needed here in our crude model.

For small V_i the result is only weakly sensitive to the form of the distribution of q_i. To lowest order in V_i the effect becomes logit_i = L_i-0.5* V_i *tanh(L_i/2)). For large V_i that lowest-order approximation overstates the correction but the result can be obtained directly from the integrals if a form is assumed for f(q_i), e.g. Gaussian or uniform. For a uniform distribution there’s an analytic expression,
ln(ln((1+e^{(L+s*3^0.5)} )/(1+e^{(L- s*3^0.5)}))/ ln((1+e^{(-L+ s*3^0.5)})/(1+e^{(-L- s*3^0.5)}))). For the CGGCGG factor I used the same uniform distribution described in the text.

For the last step we combine the prior distribution with the log of the net likelihood ratio, L, the sum of the likelihood logits, to obtain the odds.

To reiterate, our method of taking uncertainties into account reduces the odds favoring LL, both for uncertainties in likelihoods and for uncertainty in the priors.

Appendix 4: FCS uses

The FCS appears at several points in the argument, so it may help to clarify in what ways it is used and in what ways it isn’t used.

Although some have argued that having an FCS is very unlikely for this type of coronavirus, that low likelihood may not apply when one remembers the precondition that we wouldn’t be discussing this virus if there weren’t a pandemic for which the FCS may be nearly needed. Now I’ve modify that somewhat because many non-bat species that host sarbecoviruses without any FCS even though having an FCS seems evolutionarily advantageous for the respiratory versions outside bats.

Wuhan is not the only place where pathogen research is done, so a priori it would be an exaggeration to say P(Wuhan|LL, pandemic) = ~1. However, the combination of the DEFUSE proposal to add an FCS to coronaviruses, along with other DEFUSE proposed features found, strongly indicate that if SC2 originated from a lab, it would be one doing the DEFUSE-proposed work. The site mentioned in DEFUSE for adding an FCS to a coronavirus, UNC, is smaller and uses highly enhanced BSL3 protocols. After DEFUSE was not funded, switching this part of the work to WIV, where there was already expertise in the methods, would have been easy. A note from a lead investigator, Peter Daszak, to the NIH about earlier work had assured them in 2016 that “UNC has no oversight over the chimera work, all of which will be conducted at the Wuhan Institute of Virology.” Notes from DEFUSE investigators have recently been released describing plans to actually conduct much of the research described as planned for BSL3 at UNC instead at BSL2 in Wuhan. While the chance of a spillover occurring at UNC isn’t zero, it’s much lower than for WIV. Thus
P(Wuhan|LL, coronavirus with FCS, etc.) = ~1.

The detailed contents of the FCS, the CGGCGG sequence, provide one significant piece of evidence used, since P(CGGCGG|LL) >> P(CGGCGG|ZW).

Deigin points out that FCS in SC2 occurs exactly at the S1/S2 junction, an obvious place for a DEFUSE-style insertion. A recently released early (before compression to meet space limits) draft of DEFUSE specifically mentions the S1/S2 boundary as a target for a cleavage site insertion, by sequence location number rather than by name. Since that is an evolutionarily advantageous location, it might only provide a small update factor favoring LL, which I don’t use.

The S2 neighborhood of the FCS, differing from related viruses only by synonymous mutations, has been cited as evidence for LL because it looks peculiar under ZW but not under LL, as in the Garry quote above. The initial post-spillover strains lacked a mutation called D614G that becomes advantageous specifically to compensate for some effects of the FCS. D614G arose quickly to predominate in multiple lines of SC2 as it spread in humans. The combination of the FCS coding, the lack of amino acid changes in S2, and the initial absence of D614G all indicate that the outbreak started not very long after the FCS was inserted, whether naturally or in a lab.

The picture of a quick route to human spillover after FCS insertion is easily consistent with LL. It fits well with only a particular subset of the zoonotic hypothesis.

Appendix 5: Research spillover of a zoonotic virus

So far I have just ignored the ZL account of a virus that formed naturally but successfully spilled over into humans via research activities. Since the likelihood factor favoring lab modification from the CGGCGG coding is smaller than I initially thought, it’s worth having a quick look at the P(ZL)/P(ZW) odds as well.

Several of the features that we have noted could fit together in a zoonotic picture qualitatively different from the bat—>wildlife—>market—> human version usually considered. The evidence described in Appendix 4 requires there was only a short interval between the FCS insertion and the spillover, perfectly consistent with LL but perhaps also with a particular zoonotic account. The reports of Laotian BANAL bat sarbecoviruses with good human ACE2 binding but lacking an FCS suggest a way for getting good preadaptation while skipping intermediate wildlife hosts altogether. A person in Laos could have become directly infected with a BANAL-related bat virus that contained a small trace of FCS variants, too little to detect in consensus sequencing tests, before those variants were lost due to their lack of fitness in bats. With some luck, the virus might survive long enough for those few FCS-containing virions to become the main strain in the human host. The disintegration of the evidence for an HSM spillover would not be surprising in this zoonotic story, since HSM would have no initial role to play.

The direct-from-bat accounts (whether natural or via research) require wending an especially narrow path to spillover needing an FCS insert and a properly ablated N-linked glycan to appear almost simultaneously in a virus that already happened to have an RBD well-adapted to humans. The absence of any FCS-containing sarbecoviruses in any species indicates that such coincidences would be very rare even after considering post-selection for successful replication outside bats. I do not think that coincidence is nearly as probable as the LL one that there was a leak from DEFUSE-like work, perhaps being done using a BANAL-related pre-FCS backbone.It would be worth further investigating the occasional successful spillovers from bats to respiratory infections in other species to quantify how narrow that path is.

If a direct spillover from a bat did nonetheless occur, the remaining issue would then be how that infection got from Laos or nearby to Wuhan without leaving a trace. This is where ZL accounts become most relevant.

For P(ZL)/P(ZW) odds we can start with Demaneuf and De Maistre’s conservative estimate, predating DEFUSE and some other relevant evidence:

P(ZL|Wuhan)/P(ZW|Wuhan) = ~1.2.

The paper also includes a “base” estimate, reflecting the authors’ best estimated factors rather than especially conservative ones and a “de minimis” estimate using the most extreme estimated factors, giving 4/1 and 1/15, respectively. The conservative estimate assumes that the work was being conducted at BSL-3. Since much was actually done at BSL-2, it should be increased by some significant factor, I’d guess more than x2, as in the authors’ less conservative estimate.

The continued absence of any detected intermediate host, including any human hosts, between the possible spillover and Wuhan plays the about same role in enhancing the odds for ZL vs. ZW as it does for LL. ZL could provide a simple one-step route for the virus getting from a possible spillover source in or near Laos to Wuhan since in Aug. 2019 WIV and Daszak submitted a publication describing the partial sequence of a bat coronavirus they had gathered in Laos. A researcher could have been infected while gathering samples or after bringing samples back to Wuhan.

P(no host|ZL)/P(no host|ZW) = ~4.

Timing may be less coincidental for ZL than for LL, since work on bat coronaviruses had been going on at WIV for at least several years. That Aug. 2019 date for publishing a Laotian sequence may, however, suggest that some timing coincidence factor could also favor ZL over ZW.

Pre-adaptation and the CGGCGG coding seem approximately irrelevant to these P(ZL)/P(ZW) odds. The straight Bayes odds (not integrating over uncertainty in parameters) would then be something like 10 or more. Allowing for uncertainties would pull that back part way toward 1. ZL seems more probable than ZW even though some of the factors pointing toward LL are not relevant to it.

Martin

Oct 20

Thanks for the great analysis! One element that I don’t find entirely convincing is the part about considering the uncertainty in the log of the prior odds, e.g. assuming a 3-df-t-distribution with an SD of 2.3. What is missing is an upper bound on the likelihood ratio. I think P0(ZW)/P0(LL)>100,000 is unjustifiable. But these absurd cases do not have a negligible weight in the t-distribution. My calculation is that the odds change from 300/1 to 1000/1 if the upper limit of P0(ZW)/P0(LL) is 100,000.

Expand full comment

1 reply by Michael Weissman

Jan 5

Thanks for the comment with code. Would it make sense to resubmit it to the most current version?

Your argument makes sense- that most of the posterior ZW odds come from the far tails of the fat prior distribution, and those tails are unrealistic. I'm trying to keep the calculation on the conservative side, remembering that in all sorts of estimates (e.g. of physical constants) error bars turn out to be larger than initially believed. In effect, the broad distribution on priors also serves as a way of allowing for some major screw-up of the likelihoods.

6 more comments...

Michael’s Substack (Free)

Discussion about this post