Quantcast
Viewing all articles
Browse latest Browse all 41826

Ncbi Refseq Viral Genomes

I want to create custom blastdb with all viruses available in the refseq. But I don't know which source files to use. My first point is:ftp://ftp.ncbi.nih.gov/genomes/Viruses/

From research I concluded that I might need the

all.fna.tar.gz

file, since it supposedly contains nucleotide information for all viruses in the refseq, however it turned out that, for example, the Bluetongue_virus_uid14938 is doesn't have an entry in this archive BUT it has a directory and respectively files if I want download the all.gbk.tar.gz archive.

So my question is which archive (file types) should I use in order to create the most complete database of viruses that are in refseq? SHould I used the fna/ffn and just concatenate the files and send them to makeblastdb OR should I manually parse the .gbk files and create fasta files out of them - involving basically extracing the respective fasta sequences from each .gbk and rebuilding the header?


Viewing all articles
Browse latest Browse all 41826

Trending Articles