Ok. The assignment was to write a program which accepts an input argument of a FASTA file and prints the hit_id of all found sequences using three functions: run_blast (accepting the FASTA and optional input for BLAST, then using subprocess executes BLAST with BLAST output in XML format), read_blast (accepts the above BLAST XML file and returns IDs of sequences), and main (contains main processes the script will execute).
After 4 hours, all I can come up with is this:
import subprocess
from Bio.Blast import NCBIXML
def run_blast(fasta_filename,blastoutput):
f=open(fasta_filename,'r')###opening the file with reading access
lines=f.read() ###lines equates to reading
col=lines.split("\n")###splits first and second lines
sequence=col[1] ###gives this second column a name
return sequence ###returns the answer of the second line of the FASTA
fasta_filename='RBP1a.fa'###defining file names
blastoutput='RBP1a.xml'
subprocess.call(['blastn','-query',"fasta_filename",'-db','nt','-outfmt','5','-\
out','blastoutput']) ###blast output in xml form
from Bio.Blast import NCBIXML
record=NCBIXML.read(open("blastoutput"))
def read_blast(blastoutput):
#####I don't even know what to put in order to extract the sequences from the XML file. Something about the order isn't correct here.
...