Quantcast
Channel: Post Feed
Viewing all articles
Browse latest Browse all 41826

Python program accepting FASTA and BLAST, outputting in XML then returning IDs of all sequences

$
0
0

Ok. The assignment was to write a program which accepts an input argument of a FASTA file and prints the hit_id of all found sequences using three functions: run_blast (accepting the FASTA and optional input for BLAST, then using subprocess executes BLAST with BLAST output in XML format), read_blast (accepts the above BLAST XML file and returns IDs of sequences), and main (contains main processes the script will execute).

After 4 hours, all I can come up with is this:

import subprocess
from Bio.Blast import NCBIXML
def run_blast(fasta_filename,blastoutput):
    f=open(fasta_filename,'r')###opening the file with reading access           
    lines=f.read() ###lines equates to reading                                  
    col=lines.split("\n")###splits first and second lines                       
    sequence=col[1] ###gives this second column a name                          
    return sequence ###returns the answer of the second line of the FASTA       
fasta_filename='RBP1a.fa'###defining file names                                 
blastoutput='RBP1a.xml'
subprocess.call(['blastn','-query',"fasta_filename",'-db','nt','-outfmt','5','-\
out','blastoutput']) ###blast output in xml form                                
from Bio.Blast import NCBIXML
record=NCBIXML.read(open("blastoutput"))

def read_blast(blastoutput):

#####I don't even know what to put in order to extract the sequences from the XML file. Something about the order isn't correct here.


Viewing all articles
Browse latest Browse all 41826

Trending Articles