Quantcast
Channel: Post Feed
Viewing all articles
Browse latest Browse all 41826

Taxonomy Of Blast Hits

$
0
0
Lets have 200k genomic contigs with some (unknown) bacterial contamination. I blasted (blastn vs nr) all of them, got tabulated output and passed the uniq acc nos ca 5k to Batch Entrez. Since neither my target genome nor bacterias causing contamination are not sequenced, I got a shotgun of results (3000 Eukaryota, 2000 Bacteria, few viruses). Now for a tricky part: what I need is: sequence_identifier + taxonomic_id(s) + main_tax_group something along the line: A000001 573 Bacteria Apart from writing a script storing the sequence & taxonomy info into say MySQL, then going through blast top hits output, are there any tools (taverna work flows?) which can do it for me?re Pierre Primary input is text blast output of: blastcl3 -p blastn -m 9 -e 0.00001 -b 1 -i frag01 -o out_blastn_frag01 I grep-ed and awk-ed hit acc numbers from second column. Resulting text file (one acc no per line) was feed to Batch Entrez. As far as I can tell there is no way of selecting output in form: A000001 573 Bacteria The most parsable output seems to be TinyXML, but then I will download full bacterial genomes / eukaryotic chromosomes worth of sequence which at this stage I do not need. Ideally instead of two extremes (E.coli K12 + Bacteria) getting a whole taxonomic path:
cellular organisms; Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales; Enterobacteriaceae; Escherichia; Escherichia coli
will be preferr ...

Viewing all articles
Browse latest Browse all 41826

Trending Articles