Protein-DNA Explorer

Center for Studies in Physics and Biology

Welcome to the Protein-DNA Explorer website

HOMOLOGY MODELING OF DNA BINDING PROTEINS: You may upload your protein sequences, or choose protein sequence ids from a collection of organisms, and find matches to protein-DNA structures sorted by the interface alignment score.
ASSIGNING SEQUENCE MOTIFS TO PROTEIN-DNA STRUCTURES:You may upload alignments of binding sites or position specific weight matrices and find protein-DNA structures that correspond most closely to the input motif.

Important tips

Protein-DNA Explorer is a web server that allows users to carry out homology modeling of protein-DNA complexes. The user can submit one or several protein sequences in order to:
1. Detect DNA binding domains using HMMER run on a collection of protein families from the Pfam database.
2. Find structural homologs for each DNA-binding domain detected in the query sequences. The homologs are sorted by the interface alignment score.
3. Inspect protein sequence alignments, view amino acid mutations at the DNA binding interface using Jmol molecular viewer.
4. Predict position-specific weight matrices for protein-DNA structures chosen as templates. These predictions can be used to guide motif searches in bioinformatics algorithms such as Phylogibbs.
5. Find orthologous proteins in other species on the basis of their interface alignment score with the query protein.
The user can also upload an alignment of binding sites or a weight matrix for which the identity of the DNA binding protein is not known, and search for matches in the database of structure-based position-specific weight matrices. This allows the user to associate sequence motifs of unknown origin with transcription factors, generating experimentally testable hypotheses about inputs to gene regulation.
Data analysis is organized by projects (folders), that can contain multiple input files. A (project,inputfile) pair uniquely identifies a data set and its associated results. If the same (project,inputfile) pair is analyzed in multiple runs, the most recent results override earlier results.
Do not run more than one request at a time, or upload too many (>5) protein sequences. This may lead to corrupted results and/or server timeouts.
Using protein sequence IDs is more efficient than uploading sequences because HMMER domain matches are precomputed for the former. All protein sequences with precomputed domain matches are available for inspection and download on the help page.

People

Eric Siggia, Alexandre Morozov, Yupu Liang.

References

Morozov, A.V. and Siggia, E.D. Connecting protein structure with predictions of regulatory sites. Proc. Nat. Acad. Sci. USA 2007 Apr 24; 104(17): 7068-7073.
Foat, B.C., Morozov, A.V. and Bussemaker, H.J. Statistical mechanical modeling of genome-wide transcription factor occupancy data by MatrixREDUCE. Bioinformatics 2006 Jul 15; 22(14): e141-149.
Morozov, A.V., Havranek, J.J., Baker, D. and Siggia, E.D. Protein-DNA binding specificity predictions with structural models. Nucleic Acids Res. 2005 Oct 24; 33(18): 5781-5798.

Supporting data for the homology modeling protein-DNA paper (Morozov et al.):

Comments and Bug reports

For sending comments or reporting problems with the web server, please send email to Yupu Liang or Alexandre Morozov.

For sending comments or reporting problems with the algorithm, please send email to Alexandre Morozov.

Disclaimer

All software used on this website is available to academic users free of charge.