Asia on aiemmin ajateltu:Viime vuoden tammikuulta on artikkeli netissä. Siinä on isoloidusti tutkittu yhden C-terminaalisen monomeerin ominaisuudet- jolloin se osoitautuu IDP-tyyppiseksi mitä suurimamssa määrin eli fysiologisissa oloissa saattaa muodostaa minkä tahansa struktuurin interaktioproteiininsa kanssa. siis siinä piilee monia resursseja toimia viruksen elinsyklin eduksi. Otan aika laajasti sitaattia.
https://www.sciencedirect.com/science/article/pii/S0042682221002300
Microsecond
simulations and CD spectroscopy reveals the intrinsically disordered
nature of SARS-CoV-2 spike-C-terminal cytoplasmic tail (residues
1242–1273) in isolation
Author links open overlay panel
Highlights
Available structures of Spike protein of SARS-CoV-2 suggest a missing electron density at its C-terminal region.
Microsecond simulations of C-terminal region (residues 1235–1273 & 1242–1273) shows its intrinsically disordered behavior.
Experimental validations using CD spectroscopy suggests it be an Intrinsically Disordered Protein Region (IDPR).
In presence of molecular crowders like Dextran-70 and PEG 8000, it is also observed to be an IDPR.
Abstract
All
available SARS-CoV-2 spike protein crystal and cryo-EM structures have
shown missing electron densities for cytosolic C-terminal regions (CTR).
Generally, the missing electron densities point towards the
intrinsically disordered nature of the protein region (IDPR). This
curiosity has led us to investigate the cytosolic CTR of the spike glycoprotein
of SARS-CoV-2 in isolation. The spike CTR is supposed to be from 1235
to 1273 residues or 1242–1273 residues based on our used prediction.
Therefore, we have demonstrated the structural conformation of cytosolic
region and its dynamics through computer simulations up to microsecond
timescale using OPLS
and CHARMM forcefields. The simulations have revealed the unstructured
conformation of cytosolic region. Further, we have validated our
computational observations with circular dichroism (CD)
spectroscopy-based experiments and found its signature spectra at
198 nm. We believe that our findings will surely help in understanding
the structure-function relationship of the spike protein's cytosolic
region.
- Download : Download high-res image (351KB)
- Download : Download full-size image
, ,
CD Circular Dichroism
cryo-EM Cryo-electron microscopy
CTR C-terminal Region
DTT Dithiothreitol
IDPR Intrinsically disordered protein region
MD Molecular Dynamics
PEG Polyethylene glycol
REMD Replica-Exchange Molecular Dynamics
SDS Sodium Dodecyl Sulfate
TFE 2,2,2-Trifluoroethanol
Keywords
Spike
Cytosolic domain
Conformational dynamics
Secondary structure
SARS-CoV-2
1. Introduction
The importance of coronavirus spike protein is apparent from it surface-exposed location, suggesting it is a prime target after viral infection for cell-mediated and humoral immune responses as well as artificially designed vaccines and antiviral therapeutics. The SARS-CoV-2 homo-trimeric spike glycoprotein consists of an extracellular unit anchored by a transmembrane (TM) domain in viral membrane and a cytoplasmic domain (Walls et al., 2020). It is secreted as monomeric 1273 amino acid long protein from endoplasmic reticulum (ER) shortly after which it trimerizes to facilitate the transport to the Golgi complex (Duan et al., 2020; Walls et al., 2020). Moreover, N-linked high mannose oligosaccharide side chains that are added to spike monomer in ER are further modified in Golgi compartments (Duan et al., 2020).
Spike
is one of the most extensively studied protein among all of SARS-CoV-2
proteome. So far, based on Uniprot database, approximately two hundred
structures have been reported using X-ray crystallography and
cryo-electron microscopy techniques. However, these structures consist
of S1 subunit of spike but lacks the transmembrane and cytoplasmic
C-terminal regions present in S2 subunit or with missing electron
densities in cytoplasmic region.
The distal S1 subunit (residues 14–685)
contains a N-terminal domain, a C-terminal domain, and two subdomains (Fig. 1).
The C-terminal domain of S1 is the receptor-binding domain or RBD, has a
receptor-binding motif (RBM) which interacts with human angiotensin converting enzyme 2 (ACE2), chief target receptor of SARS-CoV-2 on human cells (Lan et al., 2020).
RBM is present as an extended loop insertion which binds to bottom side
of the small lobe of ACE2 receptor.
The S2 subunit (residues 686–1273)
has a hydrophobic fusion peptide, two heptad repeats, a transmembrane
domain, and a cytoplasmic C-terminal tail (Fig. 1).
Fig. 1.
Domain architecture of spike Glycoprotein: depiction of available
structures in open and closed states, transmembrane domain, and
cytoplasmic C-terminal tail. Based on prediction of transmembrane region
in the spike protein by CCTOP, a consensus-based predictor, the
boundaries of all domains have been defined. As per CCTOP prediction,
the transmembrane region of spike lies within the residues 1216–1241,
and so, the cytoplasmic region of spike has been used in this study with
the residues 1242–1273.
As
of yet, cytoplasmic domain of spike protein is the least explored
region despite of such extensive research in pandemic times. It is of
particular importance as it contains a conserved ER retrieval signal
(KKXX) (Lontok et al., 2004).
In SARS-CoV and SARS-CoV-2 spike proteins, a novel dibasic KLHYT
(KXHXX) motif present at extreme ends of the C-terminus plays a crucial
role in its subcellular localization (Giri et al., 2020; McBride et al., 2007; Sadasivan et al., 2017).
Also, deletions in cytoplasmic domain of coronavirus spike are implicated in viral infection in recent reports (Bosch et al., 2005; Dieterle et al., 2020; Ou et al., 2020; Ujike et al., 2016).
SARS-CoV and SARS-CoV-2 spike having a deletion of last ∼20 residues
displayed increased infectivity of single-cycle vesicular stomatitis
virus (VSV)–S pseudotypes (Dieterle et al., 2020; Ou et al., 2020).
Contrarily, short truncations of cytoplasmic domain of Mouse Hepatitis
Virus (MHV) spike protein (△12 and △25) had limited effect on viral infectivity while the long truncation of 35 residues interfered with both viral-host cell membrane fusion and assembly.
Importantly, it is also shown to interact with the membrane protein inside host cells (Bosch et al., 2005).
In our previous report, the cytoplasmic tail is predicted to be a MoRF
(Molecular Recognition Feature) region (residues 1265–1273) by a
predictor MoRFchibi (Giri et al., 2020).
The MoRF regions in proteins are disorder-based binding regions that
contribute the binding to DNA, RNA, and other proteins. In the same
report, it is also found to contain many DNA and RNA binding residues (Giri et al., 2020).
Despite
of availability of several structures of spike protein using advanced
techniques like cryo-EM, the structure of cytoplasmic domain is not yet
clear due to its ‘missing electron density’. Generally, intrinsically disordered proteins show such characteristic of missing electron density and lacks a well-defined three-dimensional structure (Uversky, 2020).
Additionally, the consensus-based disorder prediction by MobiDB has shown this region to be disordered (Piovesan et al., 2021).
Considering these arguments, we aimed to understand the cytoplasmic
domain of the SARS-CoV-2 spike protein to gain further insights. To this
end, we computationally analyzed its behavioural dynamics using
molecular dynamic (MD) simulations up to 1 microsecond (μs) and
validated it with CD spectroscopy
based experiments. This report's outcomes will help to understand this
domain's structure and function and provide knowledge to explore the
interaction of spike protein with other viral and host proteins.
2. Material and methods
3.1. Transmembrane region analysis
The
sequence-based analysis of transmembrane region and disorder prone
regions have also been analyzed. The subcellular localization of spike
protein occurs in the extracellular, transmembrane, and cytoplasmic
regions (Cai et al., 2020). However, based on SARS-CoV and SARS-CoV-2 proteins sequence alignment, approximately 77% similarity is found among both viruses spike proteins (Giri et al., 2020). The C-terminal has shown high similarity and conserved regions, while the N-terminal has vastly varying residues.
Based on multiple predictors used in this study, spike protein's transmembrane region lies within 1213–1246 residues (Fig. 2).
A consensus-based server, CCTOP, has predicted the transmembrane region
from residues 1216–1241, which is more reliable as it compares and uses
the previously available experimental information of related proteins.
Therefore, the cytoplasmic region is selected from 1242 to 1273 amino
acids (sequence:
NH2-SCLKGCCSCGSCCKFDEDDSEPVLKGVKLHYT-COOH).
- Download : Download high-res image (936KB)
- Download : Download full-size image
Fig. 2. Transmembrane region prediction from five web servers: A. CCTOP, B. PSIPRED, C. SPLIT, D. TMHMM, and E. TMPred. F.
Table showing predicted transmembrane residues. All predictors work as
standalone except CCTOP which works based on consensus from multiple
predictors. Considering this, the cytoplasmic region of spike is chosen
from residues 1242–1273 as CCTOP has predicted residues 1216–1241 in
transmembrane region.
3.1.1. Disorder prediction
In
our recent study, we have identified the disordered and disorder-based
binding regions in SARS-CoV-2 where the cytoplasmic domain at C-terminal
of spike protein is found to be disordered (Giri et al., 2020).
Again, we analyzed the disorderedness in selected cytoplasmic region
using multiple predictors, including PONDR family, IUPred2A (redox
state), and PrDOS predictors. Out of six predictors, three predictors
from PONDR family have predicted it as highly disordered, PrDOS has
predicted it as moderately disordered, and PONDR FIT has predicted it as
least disordered. Additionally, IUPred2A has been used with its
redox-state calculation function due to high number of cysteine residues
present in the peptide (Fig. 3).
As per calculations, the redox minus (where all cysteines are replaced
by serine) state has shown high disorder propensity while redox plus has
shown least propensity.
- Download : Download high-res image (443KB)
- Download : Download full-size ima
Fig. 4. Structure models of spike full-length and C-terminal cytoplasmic domain (1242–1273 residues): A. Full-length spike protein model using AlphaFold2, B. Modelled structure through PEP-FOLD web server, visualized in Maestro, and C. Secondary structure analysis using PSIPRED web server.
3.1.3. Simulation with OPLS 2005
In
the last three decades, many advancements have been made in forcefields
and hardware related to MD simulation to match the experimental events.
Long MD simulations up to microseconds or milliseconds are incredibly
insightful to study structural conformations occurring at the nanoscale
level. We have recently explored various regions of different SARS-CoV-2
proteins through computational simulations and experimental techniques
that are very well correlated (Gadhave et al., 2020a, 2020b, 2020a).
This study performed 1 μs MD simulations of C-terminal cytoplasmic
domain of spike protein (1242–1273 residues) to understand its dynamic
nature. As obtained from structure modelling through PEP-FOLD, the model
contains one small helix at C-terminal with residues 1265LKGV1268 (Fig. 4B). According to 2struc webserver (Klose et al., 2010), the total helix propensity contribute approximately 12.5% of total secondary structure
while rest of the region is constituted by turns and extended coils.
The secondary structure prediction of spike C-terminal tail region
contains a β-strand of five residues 1262EPVLK1266, as predicted by PSIPRED webserver (Buchan and Jones, 2019) (Fig. 4C).
After
analyzing the disorder propensity and secondary structure composition,
we performed a rigorous simulation of cytoplasmic region (residues
1242–1273) to understand its atomic movement and structural integrity. A
total of 1 μs simulation was done after 50,000 steps of steepest
descent method-based energy minimization. It has been observed that the
structure of spike C-terminal cytoplasmic region remains to be
unstructured throughout the simulation. Based on mean distance analysis,
the peptide simulation setup showed massive deviations up to 7.5 Å
which does not attain any stable state (Fig. 5A). As shown in Fig. 5B, mean fluctuation in residues is observed to be within the range of 1.6–6.4 Å. The intramolecular hydrogen bond
analysis demonstrates the highly fluctuating trend portraying no stable
helical or beta sheet conformation adoption by the residues (Fig. 5E). The secondary structure timeline (Fig. 5C
& D) also reveals the disordered state of spike C-terminal
cytoplasmic region during the 1 μs simulation time (none of the frames
captured α-helix or β-sheets) which is further depicted in the snapshot
of 1 μs frame in Fig. 5F and the trajectory movie up to 1 μs (supplementary movie 1).
We have also modelled the cytosolic part of Spike protein from
1235 to 1273 residues as defined in Uniprot database and two predictors
(TMPred: 1216–1235, and TMHMM: 1214–1236) used in this study. In
modelled structure, the helical propensity in cytosolic region was shown
by 1237–1245 residues. Using above described OPLS
2005 forcefield parameters, the all-atoms explicit solvent MD
simulation was carried out for 1 μs. The trajectory analysis has been
shown in Supplementary Fig. 1, the cytosolic region has revealed majorly unstructured region along with a small β-strand of two residues 1258FD1259
after 1 μs. The upward trend of RMSD values illustrates the highly
deviating atomic positions and fluctuating RMSF shows the change in
structural property of residues (Supplementary Figs. 1A and 1B). Also, the decreasing number of hydrogen bonds demonstrates the breaking of helices in the structure (Supplementary Fig. 1C).
The time-dependent secondary structure element analysis illustrates
that a total of 15% secondary structure was formed that includes mainly
alpha helix and small percentage of beta strands (Supplementary Figs. 1D and 1E;
red: alpha helix and blue: beta strands). After huge structural
transitions, the structural composition of last frame of simulation is
shown with a small beta strand of two residues and other regions to be
disordered (Supplementary Fig. 1F). The snapshots at every 100 ns till 1 μs show the structural transitions in Spike cytoplasmic region (Supplementary Fig. 2).
- Download : Download high-res image (797KB)
- Download : Download full-size image
Fig. 6.
Depiction of representative from top 10 clusters from 1 μs simulation
trajectory. Size of each cluster is shown which represent the total
number of frames in the trajectory based on RMSD calculated with
reference structure (frame 1). The protein backbone of all frames is shown as superimposition. The N- to C-terminal protein structures are colored from red to blue, respectively.
-----. However, we also tried to get the synthesized peptide of
residues 1235–1273 of spike but due to multiple cysteine residues it was
not feasible.
Further, it was of utmost
importance to validate MD simulation outcomes using experimental
techniques. The water-soluble peptide of spike residues 1242–1273 at
25 μM concentration exhibits a prominent negative peak at approximately
198 nm in far-UV CD spectra which defines the unstructured nature of a
protein. Infact, we have also checked the secondary structure state in
presence of a reducing agent, DTT, then also, the peptide is observed to
be disordered with significant negative ellipticity. Further, in
presence of helix inducer solvent, TFE, the peptide adopts helical
structure. However, SDS micelles
in surroundings of peptide generates little changes in the peptide
structure which may signify its inability to gain structure. Also, in
presence of sucrose, the CD spectra of peptide corresponds to the
disordered conformation. Under the influence of crowding agents like
Dextran-70 and PEG (8000), conservation of disordered structure
indicates that no -intra chain forces are acting in between the
residues. Based on this combination of facts, we have interpreted that
spike C-terminal cytosolic tail (residues 1242–1273) as an intrinsically
disordered region. Generally, an IDPR gains any structure upon
interacting with its interacting partner or in physiological conditions (Wright and Dyson, 1999). In its unstructured state, it may function as a MoRF
to bind with COP1 coated transporting vesicles which localizes the
Spike protein into ER. As described earlier, the interaction of
C-terminal domain of Spike protein is reported with other structural
proteins like M which is highly likely to occur in its disordered form
with extended radius.
5. Conclusion
The
cytoplasmic region of spike glycoprotein of SARS-CoV-2 has not been
studied yet. Given its extreme importance in functioning of spike
protein, the structure and its dynamics has been investigated here. The
advancement in computational powers and excessive improvements in
forcefields have empowered structural biology. Newly developed
algorithms and their user-friendly approach allow correlating the
outcomes with experimental observations. In this article, we have
identified the transmembrane region in spike protein by employing
distinguished web predictors. This cleared the composition of amino
acids forming cytoplasmic domain. Further, the secondary structure and
disorder predisposition analysis demonstrated it to be highly
disordered. We have demonstrated the structural conformation of
cytoplasmic domain (1242–1273 residues) of spike protein at a
microsecond timescale using computational simulations. As revealed, this
domain is purely unstructured or disordered after 1 μs and have not
gained any structural conformation throughout the simulation period.
Experimental outcomes also confirm the intrinsic disordered state of
cytoplasmic domain of spike. The intrinsic disordered nature of peptide
is shown in presence of macromolecular crowders. Based on our previous
study (Giri et al., 2020),
cytoplasmic tail of spike glycoprotein has molecular recognition
features therein which needs to be explored further. The disordered
nature of cytosolic region may possibly have implications to interact
with other viral proteins
during virion assembly as well as host proteins and transporting
vesicles during localization in ERs. In this study, the multiple
conformations during the simulation process adds up to even more
interesting speculations.
..
3.1. Transmembrane region analysis
The sequence-based analysis of transmembrane region and disorder prone regions have also been analyzed. The subcellular localization of spike protein occurs in the extracellular, transmembrane, and cytoplasmic regions (Cai et al., 2020). However, based on SARS-CoV and SARS-CoV-2 proteins sequence alignment, approximately 77% similarity is found among both viruses spike proteins (Giri et al., 2020). The C-terminal has shown high similarity and conserved regions, while the N-terminal has vastly varying residues.
Based on multiple predictors used in this study, spike protein's transmembrane region lies within 1213–1246 residues (Fig. 2). A consensus-based server, CCTOP, has predicted the transmembrane region from residues 1216–1241, which is more reliable as it compares and uses the previously available experimental information of related proteins. Therefore, the cytoplasmic region is selected from 1242 to 1273 amino acids (sequence:
NH2-SCLKGCCSCGSCCKFDEDDSEPVLKGVKLHYT-COOH).