I'm trying to find homologs of a set of proteins using BLASTp. I'm working with custom databases.
I'm using evalue of 0.00001 as threshold.
I would like to filter queries having hits with >90% identities. Because BLASTp output is based on HSPs, I cannot filter by %identities/query, only by HSP.
I would like to know how to do this and also if I'm following a reasonable strategy.
Thanks, Bernardo
# Here is an alignment example: qcovs=100 but qcovhsp lower.
qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore qcovs qcovhsp
HPNK_01698 HAPS_0519 81.88 596 75 5 630 1225 615 1177 0.0 889 100 49
HPNK_01698 HAPS_0519 49.17 301 115 8 84 366 201 481 2e-56 214 100 23
HPNK_01698 HAPS_0519 53.64 261 61 6 436 684 616 828 6e-49 191 100 20
HPNK_01698 HAPS_0519 46.61 251 62 3 332 510 584 834 6e-46 181 100 15
HPNK_01698 HAPS_0519 53.27 214 79 4 1 194 1 213 1e-45 180 100 16
HPNK_01698 HAPS_0519 55.96 218 60 8 550 764 643 827 1e-40 164 100 18
HPNK_01698 HAPS_0519 51.56 225 61 7 516 731 642 827 1e-38 157 100 18
HPNK_01698 HAPS_0519 49.57 230 77 6 484 ...
↧