Quantcast
Channel: Post Feed
Viewing all articles
Browse latest Browse all 41826

Best Hits From Blast Tabular Output Of Multiple-Queries

$
0
0
Hello. I have been trying to get the best hits out of blast output files. I know that -v option or maxtargetseqs in standalone blast gives the best ones. But I already have a huge output of multiple queries which have to be sorted according to best cut-off values. I am not using XML parser as my output is already tabular. Locus3034v1rpkm4.98 gi|315134697|dbj|AP012030.1| Escherichia coli DH1 DNA,... 100.00 280 0 0 1 280 3402569 3402290 4e-140 506 Locus3035v2rpkm3.64 gi|87248160|gb|DQ311188.1| Bombyx mori hydroxymethylglutaryl-Co... 98.94 94 1 0 1 94 523 616 4e-37 165 Locus303v1rpkm4.98 gi|346706507|dbj|AK380484.1| Bombyx mori mRNA clone: ftes50G24 100.00 159 0 0 1 159 4395 4553 1e-74 288 This is the tabular output I have. I have more than one criterion to take. I am looking for hits such that my the alignment length should be high but no compromise on the sequence identity. E-value should be less than 1e-06, less gaps and mismatches. I tried to introduce a score such that - score = (alignmentlength) / ( 101- %identity) s = L / (101 - I%) I used 101 as if 100 is taken there can be undefined values if completely identical. But I am missing the point for alignment length here. How can I select such that all these criterion are followed to give best hits in the following order of preference -
  1. Alignment length
  2. %identity
  3. less gaps and mismatches
  4. e-value 1e-06
Regards, Rohit ...

Viewing all articles
Browse latest Browse all 41826

Trending Articles