Reposting one of @Daoyu15's tweet threads with observations about SARS-COV-2 mutagenesis and the CGG codon, which was archived before the account was suspended. (1/26)


The CGG codon is the best translated Arg codon in humans. It is a poorly translated one in bats and/or civets. In addition, CoV genomes changes nucleotides by transitions primarily through the action of APOBEC and ADAR, but with 2 “G”s the ADAR can no longer edit it any (2/26)


APOBEC can change the C into a U, but it won’t code for Arginine any more. In addition, antisense APOBEC will cause a CGG to be turned into a CAG, which, same as TGG, don’t code for Arg any more. There exist no transition or deletion pathway that can safely remove the CGG (3/26)


without causing collateral damage to the rest of the S1-S2 sequence. It is stuck there, any detrimental selection will suppress the virus as a whole without actually allowing changes to the CGG-CGG sequence. This is intended to be a self-limiting mechanism as it is both (4/26)


awkward, restricts pathogenesis and can not be wiped away by ordinary selection alone. The RdRp can not bring itself to delete less than 15nt in the S1-S2 junction, which prevent the sequence from getting deleted trivially. GpC is not an editing site of APOBEC. (5/26)


CpG site that will not change the codon to another Arg codon in either the sense strand or the antisense strand, nor does it have any UpA site at all. None of the CpG can be edited while still coding for Arg, and there was no UpA site nor will there be any even after RNA (6/26)


editing by APOBEC. TCGGCGGG->TCAGCGGG or TCGGCAGG->TCAGCAGG for APOBEC in sense strand TCAGTGGG and TTGGCAGG for APOBEC on sense and antisense strand; TCAGCGGG or TCGGCAGG->TCAGCAGG. No editing pathway generates a UpA, and all editing results in creation of Q or W in (7/26)


the protease recognition sequence which will completely disrupt binding to not only furin but also to TMPRSS2, abolishing cleavage. As both pathways are highly detrimental to immediate viral fitness, and there exists no further pathway for RNA editing for re-acquisition (8/26)


of Arg after the editing of the CpG in the TCGGCGGG sequence—no editing can happen at all after the CpG being changed to non-Arg coding codons— and therefore no advantageous pathway can exist even after a long time post-CpG editing of the FCS CGGCGG by APOBEC—this is (9/26)


considered to be the main driver for CpG depletion in CoVs. As the immediate depletion of the FCS CpG by APOBEC is disadvantageous to the transmissibility of SARS-CoV-2, and there exist no pathway for synonymous changes to the critical-for-respiratory-transmission RRAR (10/26)


utilizing the CpG-depleting driver, which is RNA editing by APOBEC, this site is immediately neutral due to the immense immediate barrier against editing and change during transmission at this stage of the pandemic—the selection pressure in the human population is still (11/26)


too weak to drive it’s emergence. https://t.co/aDq5xC8oLk In fact, the S1-S2 sequence is a RNA mutational coldspot in Coronaviruses, which means the same nucleotide can be conserved for sequences up to 6% divergence or more. These sites are endpoint sites of RNA editing, (12/26)

onlinelibrary.wiley.com/doi/10.1002/jm…


and are simply too neutral to be changed even by random chance for a very, very long time. The RNA secondary structure here also blocks many of the RNA editing enzymes and factors that change RNA ex-vivo, making sequences here reluctant to change unless absolutely (13/26)


necessary (e.g. with selection pressures that are stronger than TLR in a naive host. Immune hosts are likely required). 90% similar sequences still had exactly same QTQTN nucleotides. Only change was the Q which changed from CAG to CAA in SARS-CoV-2. (14/26)


CpG depletion in CoV genome in live hosts is due to APOBEC that deaminates the C in a CpG in the presence of ZAP. ZAP is required to actually deplete a CpG motif which means deamination in CoVs are almost exclusively done on CpG in a short scale. Since any change that (15/26)


gets rid of the CpG (deamination of the C by APOBEC which removes binding of ZAP on the motif) in this manner results in nonsynomynous changes in the sequence that destroys the furin cleavage motif, the selection for less ZAP binding is outweighed by the effect of (16/26)


nonsynomynous changes caused by the action of APOBEC, which only changes a C into a U which will not create a synonymous change on a CGG codon if used to remove the CpG motif. This functions as a barrier against rapid mutation of the CGG in the CGG codon (as the only 2 (17/26)


possible products are much worse than CGG (TGG for W and CAG for Q), and can not be transmitted, necessitating a multi-step process to be used to successfully deplete the CpG while still keeping the Arg. This process is sufficiently rare and it does not confer sufficient (18/26)


advantage to drive exponential emergence of a non-CGG variant, which make the mutation non-permanent (non-selected for).

nature.com/articles/s4146…


of emergence, just like every single mutation ever found before mass vaccination and the emergence of those “variants of concern”. Even today, the selection pressure for a synonymous change on the CGG codon is still too low for it to emerge and achieve any significant (20/26)


fraction in the total number of isolates. The chance of mutation without selection in Coronaviruses are https://t.co/vPuzLXjswj 0.80 – 2.38 × 10-3 nucleotide substitution per site per year. The CGG dS sites constitute 2 possible site of change per codon. (21/26)

ncbi.nlm.nih.gov/pmc/articles/P…


This gives about 1/120 the chance that the site would have been hit by a dS at random during a year of circulation. Now, focus on the orange bar for mutations. The rate of non-G->U transversions are about 8 times lower combined than the C-U, U->C, A->G, G->A and G->U .. (22/26)


combined. The former is caused by the background error rate of the RdRp, which can operate on the CGG-CGG. The latter is caused by RNA editing, which can not operate on CGG-CGG. Multiplying 120 by 8 gives you 960, which mean that after an entire year, it should be (23/26)


expected that each of the CGG codons should have an 1/960 chance of being mutated in each isolate. This is in fact slightly lower than the actual average chance of a FCS CGG dS change in an isolate, which is 1/689.65 for a (24/26)


conservation level of 99.87% and 99.84% for the first and second CGG codon, respectively. This indicates near-neutral or slightly positive selection, not negative or purifying selection. (25/26)


This means that without a founder effect (being too new to found anything), it is extremely unlikely that we will be able to see a dominant dS here anytime soon. (26/26) This is a repost of an original thread by @Daoyu15


Top