Hi, this script Parses motifs from MEME output files (meme.txt) and print them in separated FASTA formated files.
The script parses every *.txt meme output file inside the target folder where you run it to automatize the procedure.
You can download the script here: MEME2fasta.sh
NOTES: It works in Debian and Debian based Linux systems and I have not tested yet in another Linux distributions.
In order to run the script:
STEP 1 <- To execute it, just change the permission of the file to run as a program:
STEP 2 <- To run the program (you can copy and paste it inside your bin path or run the script locally):
# From the bin folder:
# Go to the path of the target meme.txt output files and then:
# From the local folder (Which contain the script and the target meme.txt files)
The script parses every *.txt meme output file inside the target folder where you run it to automatize the procedure.
You can download the script here: MEME2fasta.sh
NOTES: It works in Debian and Debian based Linux systems and I have not tested yet in another Linux distributions.
In order to run the script:
STEP 1 <- To execute it, just change the permission of the file to run as a program:
$ chmod +x MEME2fasta.sh
STEP 2 <- To run the program (you can copy and paste it inside your bin path or run the script locally):
# From the bin folder:
# Go to the path of the target meme.txt output files and then:
$ MEME2fasta.sh
# From the local folder (Which contain the script and the target meme.txt files)
$ ./MEME2fasta.sh
SHORT TUTORIAL
INPUT FOLDER AND INPUT FILES:
INPUT FOLDER AND INPUT FILES:
OUTPUT FOLDER AND OUTPUT FILES:
Code:
#!/bin/bash
# MEME2fasta.sh
#
# I used this script to parse the DNA sequences obtained
# from each motif of MEME output files "meme.txt"
# to generate a single FASTA file per motif.
# Finally I used the FASTA files to build PWMs
# Author: Benjamin Tovar
# Date: 11 July 2011
###########################################################
# Parse the data that is among the line "BL MOTIF" and "//":
# to retrieve the DNA sequences that defines each motif
##########################################################
for meme_file in *.txt
do
sed -n '/BL MOTIF/,/\/\//p' $meme_file > $meme_file.sed
done;
##########################################################
# Split every DNA motif into separated files in "*.csplit"
# format
##########################################################
for sed_file in *.sed
do
csplit -z $sed_file '/^BL MOTIF/' '{*}' --suffix="%02d.csplit" --prefix=$sed_file- -s
done
##########################################################
# Parse the DNA sequences from each *.csplit files
##########################################################
for csplit_file in *.csplit
do
# grep -v '^$' <- delete blank lines
# sed 's/1//g' <- deletes the number "1" from the line.
cut -c34-150 $csplit_file |grep -v '^$' | sed 's/1//g' > $csplit_file.cut
done
##########################################################
# Generate Fasta files
##########################################################
for cut_file in *.cut
do
pr -n:3 -t -T $cut_file | sed 's/^[ ]*/>/' | tr ":" "\n" | fold -w 100 > $cut_file.fa
done
# remove unnecessary files:
rm *.sed | rm *.csplit | rm *.cut
# Rename the FASTA files
rename -f 's/\.csplit.cut.fa$/\.fa/' *.fa
rename -f 's/.txt.sed//s' *.fa
exit;
# Benjamin Tovar
Benjamin
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.