I'm going over some slides about BLAST from an introductory course of bioinformatics. I don't understand how the words for the look-up table are scored. The query sequence is split up in words of 3 letters. In the example the query sequence is QLNFSAGW. So the words are: QLN, LNNF, NFS etc. Then the score given for the words when found is (using BLOSUM table):
words from sequence Query words ---------------------------------------- QLN QLN=11, QMD=9, HLN=8 etc LNF LNF=9, LBF=8, LBT=8 etc
However, when I look at the BLOSUM62 table Q-Q = 5, L-L = 4 and N-N = 6. So why is QLN 11 points and not 15 points? Same for QMD: Q-Q = 5, L-M = 2, N-D = 1, why is QMD 9 and not 8? How are the word scores calculated?
The NCBI handbook only mentions that it makes a look-up table of all the words, not how the words are scored.
BLAST works by first making a look-up table of all the “words” (short subsequences, which for proteins the default is three letters) and “neighboring words”, i.e., similar words in the query sequence.