Hello all. I am very, very new to Python/Biopython and am currently stuck.
I am using standalone BLAST via Bash. I have about 40k non-human sequences which I am blasting against the human genome, outputting as XML format. Included in the output are GI and RefSeq accession numbers. I believe that it is possible to query NCBI for various bits of information, among those official gene symbols or Entrez ID's. What I would like to do is for each record in the BLAST output, utilize the hit id (accession number) to query NCBI for the official gene symbol and/or Entrez ID. I want my output to be the original BLAST query, the hit id, and the gene symbol and Entrez ID per record.
I have been delving into the various Biopython resources and have managed to parse my BLAST output (using Bash; I am very new to Biopython and just haven't yet committed the time to parsing it in Biopython instead of Bash) to grab only the query and hit id per record, but I do not know how to convert/use this in terms of querying NCBI for gene symbols/Entrez IDs. Any help in the right direction would be appreciated.