Quantcast
Channel: Post Feed
Viewing all articles
Browse latest Browse all 41826

Python Program Accepting Fasta And Blast, Outputting In Xml Then Returning Ids Of All Sequences

$
0
0
Ok. The assignment was to write a program which accepts an input argument of a FASTA file and prints the hit_id of all found sequences using three functions: run_blast (accepting the FASTA and optional input for BLAST, then using subprocess executes BLAST with BLAST output in XML format), read_blast (accepts the above BLAST XML file and returns IDs of sequences), and main (contains main processes the script will execute). After 4 hours, all I can come up with is this: import subprocess from Bio.Blast import NCBIXML def run_blast(fasta_filename,blastoutput): f=open(fasta_filename,'r')###opening the file with reading access lines=f.read() ###lines equates to reading col=lines.split("\n")###splits first and second lines sequence=col[1] ###gives this second column a name return sequence ###returns the answer of the second line of the FASTA fasta_filename='RBP1a.fa'###defining file names blastoutput='RBP1a.xml' subprocess.call(['blastn','-query',"fasta_filename",'-db','nt','-outfmt','5','-\ out','blastoutput']) ###blast output in xml form from Bio.Blast import NCBIXML record=NCBIXML.read(open("blastoutput")) def read_blast(blastoutput): #####I don't even know what to put in order to extract the sequences from the XML file. Something about the order isn't correct here. ...

Viewing all articles
Browse latest Browse all 41826

Trending Articles