SIFT on dbSNP
T ing the Human Variation Databases
This page contains supplemental information to "Accounting for Human Polymorphisms Predicted to Affect Protein Function" (Genome Research 12:436-446).
Submit your own substitutions for SIFT prediction.
SIFT home page (source code and other goodies).
Download Predictions on Databases
- SNPs from NCBI dbSNP database (Build 129, predictions generated: 2008).
- Substitutions annotated to be involved in disease according to Swiss-Prot (2002).
5218 predicted on, 69% (3626/5218) predicted to affect protein function.
- SNPs from Whitehead Institute as described in their paper (2002).
115 predicted on, 81% (93/115) predicted to be tolerated
- Variants in dbSNP Build 95 (May 15, 2001)
3084 predicted on, 75% (2327/3084) predicted to tolerated.
About the files you are downloading (after decompressing and extracting the
*.seq contains the reference protein sequence used for prediction.
*.alignedfasta contains the alignment used for SIFT prediction. The sequences
were chosen by SIFT by conservation with a
median conservation cutoff of 2.75 searching SWISS-PROT/TrEMBL.
*.prediction contains the predictions for a protein sequence. Only those
predictions with median conservation cutoff <= 3.25 were included.
A line in XP_001290.subst reads:
P1395L INTOLERANT 0.03 3.17 2.06 2.862 1.200 15 28
1rst field: Substitution from P at position 1395 to L.
2nd field: Predicted to be intolerant.
3rd field: SIFT score of 0.03.
4th field: Conservation score at this position was 3.17.
5th field: Conservation of the neighboring positions were examined (two on either
side of the substitution) is 2.06.
6th field: The sequences that have an amino acid represented at this position have
a median conservation of 2.862. (This should be under 3.25, our cutoff.)
This corresponds to the median conservation cutoff.
7th field: Ignore.
8th field: Number of sequences that have an amino acid represented at this
position (15 in this example).
9th field: Total number of sequences in alignmen (28 in this example).
Variants detected from
sequencing error (6 could be real polymorphisms)
Biased nsSNPs predicted
to be damaging from Sunyaev et al. paper
Removing sequences >
90%, 95%, and 99% identical to the query gives similar results
Questions or comments?