Leta i den här bloggen

torsdag 16 februari 2023

XBF rekombinantista ja sen alalinjoista asiaa. Spike: P621S mutaatiosta pohdintaa.

 https://github.com/cov-lineages/pango-designation/issues/1651

Description
Sub-lineage of: XBF
Earliest sequence: 2023-1-10, Australia, New South Wales — EPI_ISL_16584770
Most recent sequence: 2023-2-5, Australia, South Australia — EPI_ISL_16902443
Countries circulating: Australia (15), Singapore (2)
Number of Sequences: 17
GISAID Query: Spike_F486P, Spike_P621S, NSP1_K120N
CovSpectrum Query: Nextcladepangolineage:XBF* & [5-of: T3442C, T9931C, C12970T, C13255T, C22000T, C23423T, T25039C]
Substitutions on top of XBF:
Spike: P621S
Nucleotide: T3442C, T9931C, C12970T, C13255T, C22000T, C23423T, T25039C

USHER Tree
https://nextstrain.org/fetch/raw.githubusercontent.com/ryhisner/jsons/main/subtreeAuspice1_genome_37cb4_a1b030.json

 Evidence
For reasons unknown, S:P621S has become a mutational hotspot of ate, particularly in lineages and individual sequences with other notable mutations. It seems meaningful, though I have no idea what is behind it. This lineage only recently appeared but seems to be growing quickly. Whether that is due to chance or an isolated outbreak or because it possesses some advantage isn't yet clear, but I think it bears watching. There are an unusual number of synonymous mutations in this lineage, all C->T or T->C.

Genomes

Genomes

 

FedeGueli

i think two new ones have been added from England! it seems really fast, unluckily

maybe unrelated @ryhisner could u check them: https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_3be10_a668b0.json?c=userOrOld&label=id:node_8273022

 Oh, wow, nice spot, Fede! It looks like there are actually six sequences from England altogether, though four of them are not yet on GISAID. Usher puts the English sequences on a separate branch from the Australian ones, but I'm almost certain they really belong together. The giveaway is that both branches have the synonymous mutation C23423T. Hard to believe that's a coincidence. The four artifactual "reversions" (T3796C, T3927C, T4586C, T5183C) that appear on every uploaded XBF sequence (but not on the other sequences on the tree) might be mucking things up somehow.'

Those two branches have C23423T (S:P621S, non-synonymous) in common, but they each have several other mutations after XBF with no overlap aside from C23423T. The branch with most of the uploaded sequences is

XBF > C12970T > T9931C > T3442C,C22000T > C13255T,C23423T,T25039C

and the branch with the two sequences from England with C23423T (S:P621S) is

XBF > C24378T > G3728T > C23423T,A23989G

(The four artifactual "reversions" (T3796C, T3927C, T4586C, T5183C) that appear on every uploaded XBF sequence (but not on the other sequences on the tree) might be mucking things up somehow).

Sorry about the reversions. Those are true for XBF (because that part of it comes from BA.5.2 not BA.2.75), but are masked out of the entire BA.2.75 branch of the UShER tree (where XBF is placed) because false reversions were an awful problem with BA.2.75 sequences. The masking on the BA.2.75 branch (but not in uploaded sequences) mucks up the details of how sequences are placed in the web interface relative to existing sequences in the tree, but should not affect which existing branches the sequences are placed on within XBF (because all of the BA.2.75 branch has those sites masked).

 

BTW there is a new, hopefully more readable, source of info about branch-specific masking in the UShER tree -- instead of a bash script that just performed the masking, there is now a YAML specification of the masking (and a separate script that performs the masking according to the spec):

https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/utils/otto/sarscov2phylo/branchSpecificMask.yml

2 days ago Label: recombinant sublineage 

-----

S:P621S may be about immune escape.

Initially I have some doubts about it since P621 is not in the main epitope. However, a closer look reveals that the growth of P621S or P621H does follow a similar pattern like E554K or E554V.
E554K

-----

@AngieHinrichs, thanks for the insight on the tree! My assumption was that the South Australian sequences in particular are so shabby as to make branches that belong together appear as if they don't belong together. Possibly the tree is different now, but when I last checked, it showed four independent acquisitions of ORF1ab:N4899I on this lineage's branch as well as at least two independent acquisitions of ORF1a:L3715N and N:L139F, which seems unlikely. But since South Australia apparently doesn't show where there is missing coverage in their sequences, putting those branches together would probably require positing multiple reversions, which is even more unlikely.

Are there any efforts to standardize certain ways of reporting sequences? I imagine it would make things a lot clearer—and make your job a lot easier—if all labs indicated where their sequences lack coverage instead of reverting to the reference genome in those places, producing multitudes of artifactual reversions. The public health lab in the state of Utah in the USA is probably the worst offender in this regard. Sometimes I wish there was a way to screen out sequences from certain countries (Pakistan, Turkey, Chile, etc) or regions (Utah, Louisiana, Kerala) from GISAID/Covspectrum searches and Usher trees.

 

 

 

Inga kommentarer:

Skicka en kommentar