Dear community,
I have a XML file contained 50,000 genes Blast result with 10 hits for each gene. I want to mine desire genes Blast result from that XML file and the output file is still in XML format. The output file should have a full Blast result in XML form of desire genes.
I tired with the Python script shared by juefish on this posts, however, I only get partial of one gene Blast XML result parsed instead a list of my desire genes. I quote juefish's Python script here to make the discussion easier.
#!/usr/bin/env python
import sys
import os
import sets
import Bio
from sets import Set
from Bio.Blast import NCBIXML
# Usage.
if len(sys.argv) < 2:
print ""
print "This program extracts blast results from an xml file given a list of query sequences"
print "Usage: %s -list file1 -xml file2 > outfile"
print "-list: list of sequence names"
print "-xml: blast xml output file"
print ""
sys.exit()
# Parse args.
for i in range(len(sys.argv)):
if sys.argv[i] == "-list":
infile1 = sys.argv[i+1]
elif sys.argv[i] == "-xml":
infile2 = sys.argv[i+1]
fls = [infile1,infile2]
results_handle = open(fls[1], "r")
fin1 = open(fls[0],"r")
geneContigs = Set([])
#establish list of names of queries to extract from xml file
for line in fin1:
temp=line.lstrip('>').split() ...