I'm working on software to allow users to blast an entire proteome against a manually curated database in order to attempt to quantify the composition of that organisms proteins against our database. Usually multiple organisms are usually analyzed using this method and then the results are contrasted between each organism and relations are hypothesized.
However, to me, this seems like rather shallow data. I'd like to find a more quantifiable metric so that I can say that there is a good relation between the query and hit proteins. I've gone to quantify the number of hydrophilic residues in each protein since it's an important metric in my database. However I'm having a difficult time scoring these results using the e-value from blast and this count.
Can anyone refer me to similar papers which have done this or suggest methods of further analysis?