Hello,
I am dealing with short sequence reads (>1,000,000 reads) from a metagenome and a metatranscriptome. I used a BLAST-like program called RAPSearch 2, which provides me with a BLAST Tabular output format like this:
# Fields (separated by tab): query_id, subject_id, %_identity, alignment_length, mismatch, gap, query_start, query_end, subject_start, subject_end, log_evalue, bit score
HWI-ST... 438753.AZC_3721 60.7843 51 20 0 1 51 213 263 -9.38 68.17
and an alignment format like this:
HWI-ST... vs 438753.AZC_3721 bits=68.17 log(E-value)=-9.38 identity=60.7843% aln-len=51 mismatch=20 gap-openings=0 nFrame=0
Query: 1 EGRLASLLTDVAAGRLAPLYNYMKDLPAMEGTPAPFLPRRYIERMLGSSSS 51
EGRL +L D A+G L PLYN+M DLP + GTP PFLP+ Y+ R LG SSS
Sbjct: 213 EGRLDQVLHDAASGTLEPLYNFMNDLPGIGGTPVPFLPKTYVSRTLGLSSS 263
What I want to achieve is to compare the information from the output with mapping files to retrieve crosslinked information in a tab delimited file format. More precisely, I have an output (searched against the eggNOG database) containing the information about
Protein name (and the basic output like evalue, alignment length,...)
I downloaded mapping files, which contain:
1) nog name taxID.protein name (this is the actual subject_id from the output files!)
2) cog name protein name
2) nog name function
3) nog name description
4) species name tax id
4) cog name functional category
I would like ...
↧