FFAS03: About
Current login: not logged in
The Burnham Institute
Godzik Lab
[login/register]  [new search]  [precalculated results]  [public results]  [pairwise alignment]  [references]  [about FFAS]  [Godzik lab publications]


The FFAS Server: Fold & Function Assignment System

OverviewDatabasesCommon mistakes
AlgorithmUpdatesNews
InputUser accounts
ServicesFAQ

Overview
The FFAS03 server provides an interface to the third generation of the profile-profile alignment and fold recognition algorithm FFAS. Profile-profile alignments utilize information present in sequences of homologous proteins to amplify the sequence conservation pattern defining the family resulting in detection of remote homologies beyond the reach of other sequence comparison methods. Input into the FFAS03 server is a protein sequence provided by the user. From the sequence a profile is generated which is then compared to several databases of sequence profiles of proteins and domains from public databases, such as PDB, COG, PFAM, and SCOP. The latest structural and sequence information is updated in the FFAS03 template databases (see Databases). In addition, FFAS03 server provides access to comparative modeling tools.

Algorithm
Step 1: Generate a multiple sequence alignment using PSI-BLAST. Five iterations of PSI-BLAST are performed against the sequence pool (NR85S) database of protein sequences (see Databases).
Step 2: Create a profile using sequences found by PSI-BLAST. Weights are assigned to sequences based on their uniqueness.
Step 3: Calculate alignment score. FFAS aligns profiles using a standard local-local dynamic programming algorithm. The value of the comparison score between positions n and m from the two profiles is calculated as a vector*matrix*vector product which includes the n-th column from the first profile, substitution matrix BLOSUM62, and the m-th column from the second profile. The alignment score is then calculated using dynamic programming.
Step 4: Calculate FFAS score. The alignment score is translated into the final FFAS score by comparing it with the distribution of scores obtained for pairs of unrelated proteins.

Input
The input data for the FFAS03 server is an amino-acid sequence(s) in FASTA format. Up to 3 protein sequences may be submitted per job. Adding a meaningful fasta description line is recommended. The server accepts sequences between 25-2,000 residues. However, the algorithm is optimized for protein sequences of length between 50-500 residues and containing one or two domains. Sequences longer than 1,000 residues and/or expected to contain multiple protein domains should be split into shorter fragments.

Services
FFAS03 server supports two services: 1) database search available through the new search link and 2) pairwise alignment available through the pairwise alignment link. The first collects homologs from one or more databases of protein profiles and the second calculates alignment and evaluates sequence similarity between two protein sequences. In addition to homologs detected in profile database(s) by the FFAS method itself, the FFAS03 server also displays homologs detected by PDB-BLAST and BLAST methods. All homologs collected by PSI-BLAST from the sequence pool (NR85S) database (used to calculate FFAS profiles) can also be displayed. See glossary for a list of methods used by FFAS03 server.

Databases
Two types of databases are utilized by the FFAS03 server:

1) Sequence-pool database (referred to as 'nr85s') - a large database of protein sequences that is used to calculate protein profiles. This database is searched automatically with every sequence submitted to the FFAS03 server. The results of these searches are available through psi-nr85 links. The following sets of protein sequences are included in the Sequence-pool database:


The above sets of sequences have been clustered with the CD-HIT program to remove redundancy. The NR database has been clustered at 85% sequence identity and all metagenomic samples have been clustered at 60% sequence identity. Regions of low complexity have been masked with SEG program.

2) Template databases - contain FFAS03 sequence profiles calculated for PDB, SCOP, Pfam, and COG databases. These databases are available for user queries. The database of protein profiles targeted by the Joint Center for Structural Genomics is also available for searches. The searchable databases of profiles were recently added for virulence factors from Virulence Factors Database, for human proteins with mutations related to diseases extracted from HUMSAVAR resource of SWISSPROT, and for proteins of a dominant member of human intestinal microflora Bacteroides thetaiotaomicron. Through the precalculated results link results of comparisons of these databases of sequence profiles can be displayed. To learn more about these sets of profiles, please use links to original resources:


NOTE: These resources should be referenced if they are accessed via links from results of FFAS searches and used in your research.

Updates
Sequence-pool Database: Newly sequenced homologs improves sensitivity of profile-profile comparison, so as of April 2009 monthly updates of the sequence-pool database are being conducted. Since full updates of the pre-calculated results and template databases are less frequent, the profiles of user queries now include sequences from the new sequence pool, while searchable databases of profiles are still based on the old sequence pool. This is not expected to create problems in user's results; however, the results of new searches may depend on the date of submission and be different from pre-calculated results obtained for the same sequence. The search form gives users an option to use the old version of the sequence-pool database if the preference is to obtain results consistent with pre-calculated results.

Template Database: Due to lack of funds/personnel, the FFAS03 server is supported through volunteer work and is not updated regularly. The last full update of all databases was performed in April, 2008. This update included all databases used by the server and all-to-all comparisons available through the pre-calculated results link. In addition, the PDB database is updated every Wednesday. Please note that the incremental update of the PDB database does not include updates of the corresponding results in the pre-calculated results and user's results. Users need to resubmit their queries in order to get an updated result.

User accounts
User accounts on the FFAS03 server are password-protected lists of results intended to organize work on different projects and protect confidential data. Creating a new account - Click on the login/register link. Provide a login (name of the account) and a password. This page is also where existing accounts can be accessed.

Once logged in, all results of the searches performed by the user are automatically stored in her/his account. However, pairwise alignments are calculated on-the-fly and are not stored in any account on the server.

Frequently Asked Questions

Common mistakes

News


Hot Paper and Paper of the Day:
Hot Paper from Godzik Lab     Ying Zhang, Ines Thiele, Dana Weekes, Zhanwen Li, Lukasz Jaroszewski, Krzysztof Ginalski, Ashley Deacon, John Wooley, Scott Lesley, Ian Wilson, Bernhard Palsson, Andrei Osterman, Adam Godzik. Three-Dimensional Structural View of the Central Metabolic Network of Thermotoga maritima. Science. 2009 Sep 18;325(5947):1544-9.
Paper of the Day     Zhi D, Krishna SS, Cao H, Pevzner P, Godzik A. Representing and comparing protein structures as paths in three-dimensionalspace. BMC Bioinformatics. 2006 Oct 20;7:460.


FFAS is supported by the NIH grant R01-GM087218-01
2 0 7 1 7 3   jobs submitted since Sep. 25, 2005
Comments and questions to: webmaster
Locations of visitors to this page