FASTA format can be used to represent either one or several sequences of the sequence in one file. Series of one sequence, combined, represent multisequence file. The best source for the description of FASTA / Pearson format documentation Suite FASTA programs. It can be downloaded from any of the free distribution of FASTA and advise its own (see fasta20.doc, fastaVN.doc or fastaVN.me - where VN is the version number).
Sequence in FASTA format is represented as a series of lines that should be no longer than 120 characters, and usually does not exceed 80 characters. This is probably because for preallocation fixed-sized lines in the software: while the majority of users rely on Dec. VT (or compatible) terminals that can display 80 or 132 characters per line. Most people prefer a large font in the 80-character mode, and so was the recommended mode to use 80 characters or less (often 70) in the line of FASTA.
The first line in the FASTA file begins with either ">" ("more") or the symbol ";" (semicolon) and was accepted as a comment. Follow the lines beginning with a semicolon will be ignored by the software. Since only the comment used in the first place, it quickly became used to store a brief description of the sequence, often beginning with a unique number, joining the library, and eventually it became a routine use to always use the ">" on the front line and do not use "; "comments (which are otherwise ignored.)
After the initial line (used to uniquely describe the sequence) is the actual sequence itself into a standard single letter code. Nothing but the correct code will be ignored (including spaces, tabs, asterisks, etc. ..). Originally it was also common for the end of the sequence with "*" (asterisk) character (by analogy with the PIR formatted sequence) and, for the same reason, to leave a blank line between description and sequence.
For example......this is protein sequence of SOS protein involve in signalling pathaway in fruit fly
Sequence in FASTA format is represented as a series of lines that should be no longer than 120 characters, and usually does not exceed 80 characters. This is probably because for preallocation fixed-sized lines in the software: while the majority of users rely on Dec. VT (or compatible) terminals that can display 80 or 132 characters per line. Most people prefer a large font in the 80-character mode, and so was the recommended mode to use 80 characters or less (often 70) in the line of FASTA.
The first line in the FASTA file begins with either ">" ("more") or the symbol ";" (semicolon) and was accepted as a comment. Follow the lines beginning with a semicolon will be ignored by the software. Since only the comment used in the first place, it quickly became used to store a brief description of the sequence, often beginning with a unique number, joining the library, and eventually it became a routine use to always use the ">" on the front line and do not use "; "comments (which are otherwise ignored.)
After the initial line (used to uniquely describe the sequence) is the actual sequence itself into a standard single letter code. Nothing but the correct code will be ignored (including spaces, tabs, asterisks, etc. ..). Originally it was also common for the end of the sequence with "*" (asterisk) character (by analogy with the PIR formatted sequence) and, for the same reason, to leave a blank line between description and sequence.
For example......this is protein sequence of SOS protein involve in signalling pathaway in fruit fly
>gi|18110536|ref|NP_476597.2| Son of sevenless [Drosophila melanogaster]
MFSGPSGHAHTISYGGGIGLGTGGGGGSGGSGSGSQGGGGGIGIGGGGVAGLQDCDGYDFTKCENAARWR
GLFTPSLKKVLEQVHPRVTAKEDALLYVEKLCLRLLAMLCAKPLPHSVQDVEEKVNKSFPAPIDQWALNE
AKEVINSKKRKSVLPTEKVHTLLQKDVLQYKIDSSVSAFLVAVLEYISADILKMAGDYVIKIAHCEITKE
DIEVVMNADRVLMDMLNQSEAHILPSPLSLPAQRASATYEETVKELIHDEKQYQRDLHMIIRVFREELVK
IVSDPRELEPIFSNIMDIYEVTVTLLGSLEDVIEMSQEQSAPCVGSCFEELAEAEEFDVYKKYAYDVTSQ
ASRDALNNLLSKPGASSLTTAGHGFRDAVKYYLPKLLLVPICHAFVYFDYIKHLKDLSSSQDDIESFEQV
QGLLHPLHCDLEKVMASLSKERQVPVSGRVRRQLAIERTRELQMKVEHWEDKDVGQNCNEFIREDSLSKL
GSGKRIWSERKVFLFDGLMVLCKANTKKQTPSAGATAYDYRLKEKYFMRRVDINDRPDSDDLKNSFELAP
RMQPPIVLTAKNAQHKHDWMADLLMVITKSMLDRHLDSILQDIERKHPLRMPSPEIYKFAVPDSGDNIVL
EERESAGVPMIKGATLCKLIERLTYHIYADPTFVRTFLTTYRYFCSPQQLLQLLVERFNIPDPSLVYQDT
GTAGAGGMGGVGGDKEHKNSHREDWKRYRKEYVQPVQFRVLNVLRHWVDHHFYDFEKDPMLLEKLLNFLE
HVNGKSMRKWVDSVLKIVQRKNEQEKSNKKIVYAYGHDPPPIEHHLSVPNDEITLLTLHPLELARQLTLL
EFEMYKNVKPSELVGSPWTKKDKEVKSPNLLKIMKHTTNVTRWIEKSITEAENYEERLAIMQRAIEVMMV
MLELNNFNGILSIVAAMGTASVYRLRWTFQGLPERYRKFLEECRELSDDHLKKYQERLRSINPPCVPFFG
RYLTNILHLEEGNPDLLANTELINFSKRRKVAEIIGEIQQYQNQPYCLNEESTIRQFFEQLDPFNGLSDK
QMSDYLYNESLRIEPRGCKTVPKFPRKWPHIPLKSPGIKPRRQNQTNSSSKLSNSTSSVAAAAAASSTAT
SIATASAPSLHASSIMDAPTAAAANAGSGTLAGEQSPQHNPHAFSVFAPVIIPERNTSS
MFSGPSGHAHTISYGGGIGLGTGGGGGSGGSGSGSQGGGGGIGIGGGGVAGLQDCDGYDFTKCENAARWR
GLFTPSLKKVLEQVHPRVTAKEDALLYVEKLCLRLLAMLCAKPLPHSVQDVEEKVNKSFPAPIDQWALNE
AKEVINSKKRKSVLPTEKVHTLLQKDVLQYKIDSSVSAFLVAVLEYISADILKMAGDYVIKIAHCEITKE
DIEVVMNADRVLMDMLNQSEAHILPSPLSLPAQRASATYEETVKELIHDEKQYQRDLHMIIRVFREELVK
IVSDPRELEPIFSNIMDIYEVTVTLLGSLEDVIEMSQEQSAPCVGSCFEELAEAEEFDVYKKYAYDVTSQ
ASRDALNNLLSKPGASSLTTAGHGFRDAVKYYLPKLLLVPICHAFVYFDYIKHLKDLSSSQDDIESFEQV
QGLLHPLHCDLEKVMASLSKERQVPVSGRVRRQLAIERTRELQMKVEHWEDKDVGQNCNEFIREDSLSKL
GSGKRIWSERKVFLFDGLMVLCKANTKKQTPSAGATAYDYRLKEKYFMRRVDINDRPDSDDLKNSFELAP
RMQPPIVLTAKNAQHKHDWMADLLMVITKSMLDRHLDSILQDIERKHPLRMPSPEIYKFAVPDSGDNIVL
EERESAGVPMIKGATLCKLIERLTYHIYADPTFVRTFLTTYRYFCSPQQLLQLLVERFNIPDPSLVYQDT
GTAGAGGMGGVGGDKEHKNSHREDWKRYRKEYVQPVQFRVLNVLRHWVDHHFYDFEKDPMLLEKLLNFLE
HVNGKSMRKWVDSVLKIVQRKNEQEKSNKKIVYAYGHDPPPIEHHLSVPNDEITLLTLHPLELARQLTLL
EFEMYKNVKPSELVGSPWTKKDKEVKSPNLLKIMKHTTNVTRWIEKSITEAENYEERLAIMQRAIEVMMV
MLELNNFNGILSIVAAMGTASVYRLRWTFQGLPERYRKFLEECRELSDDHLKKYQERLRSINPPCVPFFG
RYLTNILHLEEGNPDLLANTELINFSKRRKVAEIIGEIQQYQNQPYCLNEESTIRQFFEQLDPFNGLSDK
QMSDYLYNESLRIEPRGCKTVPKFPRKWPHIPLKSPGIKPRRQNQTNSSSKLSNSTSSVAAAAAASSTAT
SIATASAPSLHASSIMDAPTAAAANAGSGTLAGEQSPQHNPHAFSVFAPVIIPERNTSS
Lets learn a Perl script that take a string as a input from a file (String represent sequence of nucleotide and /or protein) as an input and generate FASTA format.
#!/usr/bin/perl
$file = $ARGV[0]; # .....input file name from argument
open(FH, $file); #........ Use file handle
@file_aray = <FH>; #....... Take all the file lines in one array
close(FH); #.......... Close the file
foreach $line(@file_aray)
{
chomp $line;
push @temp, $line;
}
$sequence = join('',@temp);
print ">Sequence1\n"
$counter = 0;
@seqaray = split(//,$sequence);
for($i = 0; $i>@seqaray;$i++)
{
print $seqaray[$i];
$counter++;
if ($counter eq 60)
{
print "\n";
$counter = 0;
}
}
No comments:
Post a Comment