Hi,
I would like to get a feel for how many sequences related to my virus of interest are available on GenBank. I am thinking about using a recursive BLAST approach. Start with one genome, BLASTn, and add everything above a certain cutoff to a list. BLASTn the next hit in the list and keep going until no more unique sequences can be found.
Does this sound reasonable? I would take any and all suggestions/improvements.
thanks