1. The challenge of distinguishing physiological interfaces from crystal packing contacts
Much of our knowledge on protein quaternary structure has been derived from X-ray crystallography, from which the spatial arrangement of protein chains can be revealed (Perutz et al. 1960). Indeed, to date, over 100,000 crystallographic structures of proteins have been deposited in the Protein Data Bank (PDB) (Berman, 2000; Velankar et al., 2016). However, one caveat, of X-ray crystallography is that it requires protein molecules to be arranged in a regular array to form a crystal lattice. In this lattice, some protein-protein contacts may be part of a proteinís quaternary structure, whereas others only result from the crystal formation and are called crystal contacts. The figure below illustrates this concept: in the protein's crystal two dimer assemblies are observed in the lattice and identifying which one of the two is physiological in challenging.
2. Inferring physiological relevance from quaternary structure conservation
In the figure below, we illustrate how quaternary structure conservation across homologs points to physiologically relevant interfaces. Tyvelose epimerase is a tetrameric enzyme in Salmonella typhi (PDB code 1ORR). A similar tetramer is found in Arabidopsis thaliana (PDB code 1I2B, r.m.s. deviation = 3.55 Ň), although the sequences of these two tetramers share only 22% identity. Such conservation suggests that both tetramers are biologically relevant. This information enables subsequent correction of entries showing identical sequence but different QS (e.g., PDB code 1I24).
3. The process of QSalign
The user submits a query structure. Additional assemblies are identified using PISA. The resulting assemblies are each superposed with candidate QSs identified by sequence homology, based on Pfam domain similarity and sequence similarity. Ultimately, we display information on QS geometry conservation and a table providing representative QSs that share sequence homology with the query.
4. How to interpret the results
The result page of QSalignWeb describes the prediction made based on the superposition with homologous QSs. The superposition of the two QSs based on which the prediction is made is shown on the right-hand side. A table displays the result of the search of non-redundant QSs with a sequence similar to the query. In this list, the closest homolog with a high-confidence QS is highlighted in green and represents the structure we judge best for homology modeling.
5. Extrapolating QS by homology
In the homolog results table, we predict the query sequence to adopt a particular QS based on the level of sequence identity. This prediction is based on the data below, which was published in Levy et al. 2008
Histogram showing the conservation of QS as a function of protein sequence similarity. A non-redundant set of structure pairs was derived for each range of sequence identity considered (see methods). Red bars indicate the fraction of pairs with the same QS, as defined by the internal symmetry of the complex. Orange bars also represent the fraction of pairs with same QS, except that a single pair is considered per protein family. For sequence identities above 90%, conservation is nearly 100%. The conservation then decreases progressively reaching 70% in the range of 30-40% sequence identity. At sequence identities below 30%, where there is only structural similarity, the conservation drops to ~50%. For these proteins, the conservation may be underestimated due to potential errors in the quaternary structure assignments (we did not systematically curate the matches below the 30% identity threshold, as described in more detail in the Methods section). Thus above 30% sequence identity, QS is well conserved. This result can be integrated with other levels of structural conservation, such as domain-domain geometry, for structural homology modelling of the cellular machinery.