Hi all
I am building a pipeline for taxonomically identifying a blast result at each taxa level, with a bespoke reference dataset. I want it to be easily reproduceable, which it is, other than an ugly step where I have to take a list of taxonIDs and put it into the site:
http://www.ncbi.nlm.nih.gov/Taxonomy/TaxIdentifier/tax_identifier.cgi
retrieve the output and carry on.
Does anyone know if I can get the code that is used? Clearly its calling a cgi from somewhere but I cant find it in the ncbi ftp. If not, is there an equivalent? Technically, I could pull apart the names.dmp and nodes.dmp but there is already a ncbi tool so I'm loathed to do that.
Thanks
N.B. The function is turning:
515482
515474
into
515482 | Nitzschia dubiiformis | 515482 2857 33852 33851 33850 33849 2836 33634 2759 131567
515474 | Cocconeis stauroneiformis | 515474 216715 216714 186023 33850 33849 2836 33634 2759 131567
EDIT: For the record, I have ~30,000 taxIDs that I am retrieving and gawbul's script, while elegant, won't complete. Im looking at Frederic's now for its multi-coring ability.