Loading…
Please wait, processing your request…

S8kPred

Protein Secondary Structure Prediction using 8,000 Tripeptide Propensities & Evolutionary Information

Submit a Sequence

FASTA Sequence Input

Paste a FASTA-formatted sequence or upload a .fasta / .fa file

Cite: Kumar, Mayank & Rathore, R. S. (2026). S8Kpred: a Novel Approach to Protein Secondary Structure Prediction Using 8,000 Tripeptide Propensities. Peptide Science 118(3): e70029. https://doi.org/10.1002/pep2.70029

About S8kPred

S8kPred is a recently developed high-accuracy method that utilizes the conformational propensities of 8,000 tripeptide variants representing all possible immediate neighbouring environments for the 20 standard amino acid residues combined with evolutionary information in the form of position-specific scoring matrices (PSSMs) and machine learning (ML) techniques to predict protein secondary structure (Kumar & Rathore, 2026). The method is capable of predicting both three-state (Q3: α-helix, β-sheet, and loop) and eight-state (Q8: α-helix, 3₁₀-helix, π-helix; β-sheet and β-bulge; turn, bend, and loop) secondary structure classes. It leverages a substantially larger labelled dataset approximately 400 times richer than traditional residue-based approaches thereby enabling improved contextual learning and enhanced prediction reliability. Benchmark evaluations on standard datasets, such as CB513 and CASP, indicate that S8kPred achieves accuracies of up to 93% for Q3 prediction and 88% for Q8 prediction. These results highlight S8kPred as a promising advancement in the field of protein secondary structure prediction.

Standalone Application

The S8KPred Python package is also available as a standalone tool for local execution and can be easily installed using pip install s8kpred. The source code, documentation, and updates are available through the S8KPred GitHub repository, enabling users to integrate the predictor into custom bioinformatics pipelines and large-scale analyses. In addition, the datasets associated with the study are publicly accessible through Zenodo, ensuring reproducibility and facilitating further research and benchmarking.

Secondary Structure Prediction and Algorithmic Developments

Secondary structure prediction is a crucial component of protein tertiary structure prediction. Accurate prediction of secondary structural elements significantly narrows the conformational search space, thereby facilitating tertiary structure modeling and peptide design.

Early work in protein secondary structure prediction was pioneered by P. Y. Chou and G. D. Fasman in the 1970s. They introduced one of the first empirical approaches based on amino acid propensities statistically derived tendencies of residues to form α-helices, β-sheets, or loops (Chou & Fasman, 1974a, 1974b). Their work quantified these preferences by calculating propensity values, forming the foundation of early prediction algorithms with accuracies in the range of 50–60%.

This approach was soon followed by the GOR method developed by J. Garnier, D. J. Osguthorpe, and B. Robson, which improved prediction accuracy by incorporating information theory and local residue context (Garnier et al., 1978), increasing accuracy to around 65%.

During the late 1980s and 1990s, the integration of evolutionary information through multiple sequence alignments led to significant improvements. The introduction of neural networks in the PHD (Profile network from Heidelberg) method by Burkhard Rost and Chris Sander significantly enhanced prediction accuracy to approximately 72% by leveraging evolutionary information (Rost & Sander, 1993). Along similar lines, DSC (Discrimination of Protein Secondary Structure Class), employing linear statistical methods (King & Sternberg, 1996), and nearest-neighbor-based approaches such as NNSSP and SOPMA achieved accuracies of around 70%.

These advancements paved the way for more sophisticated models such as JPred (Cuff & Barton, 1999) and PSIPRED, developed by David T. Jones (Jones, 1999). Both methods utilized position-specific scoring matrices (PSSMs) to refine predictions, achieving accuracies of approximately 76–78%.

Machine learning played a crucial role in further improvements, as demonstrated in Rosetta + NN (Meiler & Baker, 2003) and JPred 3 (Cole et al., 2008), which leveraged structural and evolutionary information to reach accuracies of around 80%. More recent developments, such as PSIPRED V4 and Porter 4 (Mirabello & Pollastri, 2013; Torrisi et al., 2019), integrated deep learning and recurrent neural networks, increasing prediction accuracy to approximately 82–84%.

The latest advancements, including SPOT-1D (Singh et al., 2021), employ deep learning and large-scale evolutionary data, pushing accuracy beyond 85%, representing the current state of the art in secondary structure prediction. Very recent approaches have also incorporated physicochemical parameters and advanced architectures. For instance, the distillation-improved TCN-BiLSTM-MHA model (Zhao et al., 2024), CNN-based methods trained using the Subsampled Hessian Newton (SHN) approach (Chatzimiltis et al., 2025), and pre-trained protein language model (PLM)-based frameworks such as DeepPredict (Alanazi et al., 2025) have been proposed, all reporting high prediction accuracy.

Recent breakthroughs such as AlphaFold have further transformed the field by employing end-to-end learning approaches that directly predict three-dimensional protein structures from amino acid sequences, thereby blurring the traditional boundary between secondary and tertiary structure prediction (Jumper et al., 2021). These developments reflect a broader shift from rule-based and statistical methods to highly accurate, data-driven predictive systems.

Limitations of Secondary Structure Prediction Methods and Hard-to-Crack Cases

Despite these advances, the accuracy of secondary structure prediction methods remains influenced by factors such as limitations in training data, context dependence, and the dynamic nature of protein conformations. Most conventional predictors whether statistical or based on machine learning (ML) or deep learning (DL) struggle in scenarios where sequence–structure relationships are weak, ambiguous, or condition-dependent.

For example, intrinsically disordered proteins (IDPs) and proteins undergoing conformational transitions often lack a single stable secondary structure, making prediction inherently uncertain or even biologically meaningless when considered in a static framework (Wright & Dyson, 1999; Dunker et al., 2001; Kumar & Rathore, 2026). Interestingly, inaccuracies in secondary structure prediction have themselves been proposed as indicators of protein fold switching (Mishra et al., 2019). Similarly, sequences containing non-standard or post-translationally modified residues introduce noise due to their poor representation in training datasets. Disulfide bond–rich proteins may also adopt conformations strongly influenced by covalent constraints that are not easily inferred from sequence alone. Foundational studies such as Kabsch and Sander (1983) and Rost and Sander (1993) highlight the dependence of prediction accuracy on well-defined structural patterns, which are often absent in such cases.

Additional challenging scenarios include fold singletons or orphan folds, which lack homologous sequences and thus limit the effectiveness of evolutionary information. Specialized structures such as transmembrane β-barrels present further difficulties due to their unique environmental constraints and underrepresentation in training datasets. Likewise, amyloid-forming proteins exhibit context-dependent β-sheet formation during aggregation rather than in isolated sequences, complicating accurate prediction. Proteins such as proinsulin, which undergo post-translational processing, and domain-swapped proteins where structural elements are exchanged between chains further illustrate cases in which native secondary structure cannot be reliably inferred from the primary sequence alone.

Prediction performance is also affected by biases in training datasets, particularly the overrepresentation of regular secondary structures, especially α-helices, which can lead to systematic overprediction of helical content. In addition, these methods suffer from class imbalance issues, which become more pronounced in fine-grained classification schemes such as Q8-state prediction (Kumar & Rathore, 2026). Despite significant advances, secondary structure prediction methods remain limited in their ability to handle dynamic, rare, or non-canonical protein features.

S8kPred Dataset Overview

Other Utilities

The S8kPred offers two additional tools complementing sequence-based secondary structure prediction useful when you need cross-method consensus or want to assign structure directly from an experimental 3D structure.

⚖️
Consensus Secondary Structure Prediction

A unified prediction portal that runs multiple state-of-the-art tools PSIPRED, Porter 5, S4pred, and S8kPred simultaneously on your FASTA sequence and combines their outputs into a single consensus prediction. By aggregating predictions across algorithms with different architectures and evolutionary strategies, consensus analysis typically reduces individual method biases and improves overall reliability, especially for ambiguous or borderline residues. Select any two or more tools to generate a meaningful consensus result.

Go to Consensus Prediction →
🏗️
Secondary Structure Assignment (DSSP & STRIDE)

Unlike prediction from sequence, this tool assigns secondary structure directly from a known 3D protein structure. Upload a PDB file, mmCIF file, or enter a PDB ID and the tool will run DSSP (Dictionary of Secondary Structure of Proteins) and/or STRIDE (Structural Identification) to derive the precise secondary structure for every residue based on backbone hydrogen-bond geometry and φ/ψ dihedral angles. Ideal for benchmarking predictions against experimentally resolved structures or for annotating PDB entries.

Go to SS Assignment →

References

Kumar, Mayank & Rathore, R. S. (2026). S8Kpred: a Novel Approach to Protein Secondary Structure Prediction Using 8,000 Tripeptide Propensities. Peptide Science 118(3): e70029. https://doi.org/10.1002/pep2.70029

Chatzimiltis, S., Agathocleous, M., Promponas, V. J., & Christodoulou, C. (2025). Post-processing enhances protein secondary structure prediction with second order deep learning and embeddings. Computational and Structural Biotechnology Journal, 27, 243–251.

Chou, P. Y., & Fasman, G. D. (1974a). Conformational parameters for amino acids in helical, β-sheet, and random coil regions calculated from proteins. Biochemistry, 13, 211–222.

Chou, P. Y., & Fasman, G. D. (1974b). Prediction of protein conformation. Biochemistry, 13, 222–245.

Cole, C., Barber, J. D., & Barton, G. J. (2008). The JPred 3 secondary structure prediction server. Nucleic Acids Research, 36, W197–W201.

Cuff, J. A., & Barton, G. J. (1999). Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins: Structure, Function, and Genetics, 34, 508–519.

Garnier, J., Osguthorpe, D. J., & Robson, B. (1978). Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. Journal of Molecular Biology, 120, 97–120.

Jones, D. T. (1999). Protein secondary structure prediction based on position-specific scoring matrices. Journal of Molecular Biology, 292, 195–202.

Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., … Hassabis, D. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596, 583–589.

Kabsch, W., & Sander, C. (1983). Dictionary of protein secondary structure. Biopolymers, 22(12), 2577–2637.

King, R. D., & Sternberg, M. J. E. (1996). Identification and application of the concepts important for accurate and reliable protein secondary structure prediction. Protein Science, 5, 2298–2310.

Meiler, J., & Baker, D. (2003). Coupled prediction of protein secondary and tertiary structure. Proceedings of the National Academy of Sciences, 100, 12105–12110.

Mirabello, C., & Pollastri, G. (2013). Porter, PaleAle 4.0: High-accuracy prediction of protein secondary structure and relative solvent accessibility. Bioinformatics, 29, 2056–2058.

Mishra, S., Looger, L. L., & Porter, L. L. (2019). Inaccurate Secondary Structure Predictions Often Indicate Protein Fold Switching. Protein Science, 28, 1487–1493.

Rost, B., & Sander, C. (1993). Prediction of protein secondary structure at better than 70% accuracy. Journal of Molecular Biology, 232, 584–599.

Singh, J., Litfin, T., Paliwal, K., Singh, J., Hanumanthappa, A. K., & Zhou, Y. (2021). SPOT-1D-Single: Improving single-sequence-based prediction of protein secondary structure using deep learning. Bioinformatics, 37, 3464–3472.

Torrisi, M., Kaleel, M., & Pollastri, G. (2019). Deeper profiles and deep learning for protein secondary structure prediction (Porter 5). Scientific Reports, 9, 12374.

Wright, P. E., & Dyson, H. J. (1999). Intrinsically unstructured proteins: Re-assessing the protein structure–function paradigm. Journal of Molecular Biology, 293(2), 321–331.

Dunker, A. K., Brown, C. J., Lawson, J. D., Iakoucheva, L. M., & Obradović, Z. (2001). Intrinsic disorder and protein function. Biochemistry, 40(21), 6573–6582.

Zhao, L., Li, J., Zhan, W., Jiang, X., & Zhang, B. (2024). Prediction of protein secondary structure by the improved TCN-BiLSTM-MHA model with knowledge distillation. Scientific Reports, 14, 16488.

Alanazi, W., Meng, D., & Pollastri, G. (2025). DeepPredict: A state-of-the-art web server for protein secondary structure and relative solvent accessibility prediction. Frontiers in Bioinformatics, 5, 1607402.