Quantcast
Channel: Post Feed
Viewing all articles
Browse latest Browse all 41826

makeblastdb problem for bacterial genomes

$
0
0

Hi 

 

I am trying to make a blast database for my metatranscriptomic data. 

I downloaded the whole bacterial_genome and bacterial_draft folder from NCBI ftp. Then I merged all faa sequences into one big file all.fasta which contains all the protein sequences from these two folders. It's huge, 11G. 

I was trying to make a prot database since I want to use blastx with my data against this database. 

My command is: 

module load blast+/2.2.30
makeblastdb -in Bacteria_all.fasta -out Bacterial_all_blastDB -dbtype prot -parse_seqids

But the problem is there are redundancy in this big fasta file so I got error for this job:

 

BLAST Database creation error: Error: Duplicate seq_ids are found:
REF|YP_001740126.1|

I checked the data and find:

$ grep "YP_001740126.1|" Bacteria_genome_all_faa.fasta>gi|218960351|ref|YP_001740126.1| chromosomal replication initiation protein [Candidatus Cloacimonas acidaminovorans str. Evry]>gi|218960351|ref|YP_001740126.1| chromosomal replication initiation protein [Candidatus Cloacamonas acidaminovorans]

Did anybody knows any method/tools to solve this problem? Or you would like to suggest download some other built database for my purpose? Thank you very much!


Viewing all articles
Browse latest Browse all 41826

Trending Articles