Quantcast
Viewing all articles
Browse latest Browse all 41826

Missing BLAST hits - max_hsps and max_target_seqs?

Hi, I have a single file that contains 10,000 query sequences, each 300bp long. The subject is a single chromosome that I have imported into a nucleotide database ("makeblastdb -dbtype nucl"). Due to the nature of the data, I am expecting virtually all (>99%) of the query sequences to find a strong match, but I only want to return the TOP match per query sequence. I assume that "max_hsps" and "max_target_seqs" should be the way to achieve that, but I don't seem to be getting the expected results. If I use "max_hsps 1 max_target_seqs 1", I get 322 (unique) hits. If I use "max_hsps 1" by itself, then I get the same 322 hits. If I use "max_target_seqs 1", I get an enormous number of hits (which I could reduce by filtering by evalue, but that's not really the point - I just want the top hit). If I use no parameters, then I get a similarly enormous number of results. It feels as though there is an error in blast where it is simply not blasting the vast majority of the sequences. I know there was a bugfix a couple of versions back that fixed something similar .. Has anyone come across something similar? Can anyone think of anything obvious that I might be doing wrong? I'm using blast 2.2.29, on a Mac Mini running Darwin 13.3.0. EDIT: Just in case it's not clear, I am hoping to have up to (but probably slightly less than) 10,000 results in my output file (one per query sequence). I am using "-outfmt 10", outputting to CSV. ...

Viewing all articles
Browse latest Browse all 41826

Trending Articles