Quantcast
Channel: Post Feed
Viewing all articles
Browse latest Browse all 41826

Vertebrate Subset Nr Database? Build My Own?

$
0
0
I think I have the answer to my own question, but I'm somewhat new to bioinformatics, and I want to make sure my strategy is sound (and that there are no easier solutions to my problem). We need to search against the vertebrate subset of the nr database. Hence I've been tasked with finding or building a vertebrate subset of the nr database. So far, I can't find one. So to build one, I'm planning on doing the following.
  1. Get the nr database (I believe I have the one from ncbi or ensembl).
  2. Get the taxonomy database from ncbi.
  3. Somehow traverse the taxonomy db, and extract the taxids of all children of the parent vertebrate node (looks like the parent vertebrate is taxid 7742 from names.dump from taxdump.tar.gz) (see ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump_readme.txt).
  4. Use the list of vertebrate taxids from the previous step to extract the GIs of proteins from gi_taxid_prot.dmp.gz (seeftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid.readme).
  5. Use the list of vertebrate GIs from the previous step with the blastdb_aliastool to build an aliased blastdb of just vertebrates. Something along the lines of this: blastdb_aliastool -gilist vertebrate_gis.txt -db nr -out nr_vertebrates -title nr_vertebrates
Am I on the right track? Am I reinventing the wheel? Does the vertebrate gi_list or verte ...

Viewing all articles
Browse latest Browse all 41826

Trending Articles