I am having trouble getting blast to give me "correct" results.
I am trying to retrieve as many hits with e-value better than 1. I query the database with a sequence that should have several thousand hits in the database. However, at best, using tblastn, I get more or less 1000 matches (~250 independent hits). I am at a loss understanding what is wrong with my command:
tblastn -query protein.fasta -db nucl.blastdb -out results.tblastnout -evalue 1 -outfmt 7 -num_descriptions 100000
I get several matches within one same sequence hit, all with e-values better than 1. But what confuses me is that the best e-value of the worst hit (sorry if this is confusing ;) is nowhere near the -evalue limit, and is usually lower than 1E-60... Obviously, even including the redundancy of matches within a hit, I certainly do not reach the 100000 limit I asked for.
So I have three questions:
- Is it possible to only list one (the best) "match" per "hit"?
- Any idea why I do not get a larger number of descriptions, considering that I expect to have close to 30000 positives in my database?
- Any comments/suggestions to improve my search?