This Perl script will take every line of the input file as a single DNA sequence and then it will create a new file with every sequence in FASTA format with the FASTA headers defined by the user.
The script is called "fasta_seq.pl" and can be downloaded by clicking the link.
So here it is a short Tutorial about how to use it:
Imagine that we got a file called "seq" that contains a single DNA sequence per row and we do not want to separate each one with spaces and name them manually.
NOTE: This script only will run under Linux/UNIX environments because it depends of the commands: "pr","sed","tr" and "fold" (Sorry to all the Windows users).
STEP 1 <- Open a Terminal and get inside the input files folder:
STEP 2 <- Execute the Perl script with
EXPLANATION OF THE PARAMETERS:
STEP 3 <- Type the name of the input file that contains the DNA sequences (in this Tutorial, our file is called "seq")
STEP 4 <- Type the name of the fasta header for each sequence (is not necessary to put the ">" symbol).
In this Tutorial, I want that the FASTA header will be "just a dna sequence"
STEP 5 <- Type the width of the sequences (how many nucleotides per column)
In this Tutorial, I want that the width of nucleotides per column will be "45" nucleotides
STEP 6 <- Type the complete name of output file (In this Tutorial, I want that the output file name of my results will be "output_sequences.fa"
STEP 7 <- Finally, enjoy the results ;)
Now we can go into our input folder and take a look of the output file:
Do you see it?? successfully now we got an output file that contains every sequence of the input file (this script considers every line as an independent DNA string) under a fasta header named by ourselves and ready in FASTA format.
Benjamin
The script is called "fasta_seq.pl" and can be downloaded by clicking the link.
So here it is a short Tutorial about how to use it:
Imagine that we got a file called "seq" that contains a single DNA sequence per row and we do not want to separate each one with spaces and name them manually.
NOTE: This script only will run under Linux/UNIX environments because it depends of the commands: "pr","sed","tr" and "fold" (Sorry to all the Windows users).
STEP 1 <- Open a Terminal and get inside the input files folder:
STEP 2 <- Execute the Perl script with
perl fasta_seq.fa
EXPLANATION OF THE PARAMETERS:
- perl <- here you are telling the Terminal that you want to run the Perl environment
- fasta_seq.fa <- name of the Perl script
STEP 3 <- Type the name of the input file that contains the DNA sequences (in this Tutorial, our file is called "seq")
STEP 4 <- Type the name of the fasta header for each sequence (is not necessary to put the ">" symbol).
In this Tutorial, I want that the FASTA header will be "just a dna sequence"
STEP 5 <- Type the width of the sequences (how many nucleotides per column)
In this Tutorial, I want that the width of nucleotides per column will be "45" nucleotides
STEP 6 <- Type the complete name of output file (In this Tutorial, I want that the output file name of my results will be "output_sequences.fa"
STEP 7 <- Finally, enjoy the results ;)
Now we can go into our input folder and take a look of the output file:
Do you see it?? successfully now we got an output file that contains every sequence of the input file (this script considers every line as an independent DNA string) under a fasta header named by ourselves and ready in FASTA format.
Benjamin
;O Interesante :P
ReplyDelete