I've a table of blast results with ~5hits/query protein:
Protein | Class |
ProtA | 1 |
ProtA | 1 |
ProtA | 1 |
ProtA | 0 |
ProtA | 1 |
ProtB | 1 |
ProtB | 1 |
ProtB | 0 |
ProtB | 0 |
ProtB | 1 |
I would like to convert this into a feature vector matrix like this:
Protein | Class1 | Class2 | Class3 | Class4 | Class5 |
ProtA | 1 | 1 | 1 | 0 | 1 |
ProtB | 1 | 1 | 0 | 0 | 1 |
Can someone suggest me an efficient way to do this, since I've ~2300k hits in the file.