I've read a paper demonstrating an implementation of sequence alignment and the authors indicate that in order to provide substantial speed gains they must compromise on query size and output fidelity, relative to BLASTn.
From S. Datta, P. Beeraka, and R. Sass, "RC-BLASTn: Implementation and Evaluation of the BLASTn Scan Function", in Proc. FCCM, 2009, pp.88-95.:
"While one could argue that BLAST is a heuristic and 100% compatibility with NCBI BLAST is unnecessary, it is difficult to convince biologists....Thus, in addition to just “better performance” there are still signficant challenges including complete compatibility with NCBI BLAST and arbitary sized queries."A paper from another group expresses similar sentiments:
Of the many versions of BLAST, NCBI BLAST [11] has become a de facto standard. Public access is possible either through download of code or directly through the large web-accessible server at NCBI. This standardization motivates the design criteria for accelerated BLAST codes: users expect not only that performance be significantly upgraded, but also that outputs match exactly those given by the original system.My question is, is this really that important? My understanding is the initial search in BLASTn is not exhaustive so providing an ...