Standalone Blast 2 Short Sequences

March 2, 2012, 6:30 pm

≫ Next: How to build a local Pfam database?

Hi. I'm new to blast. I have downloaded blast+ and installed on a windows machine.

I want to use blast to align short nucleotide sequences (<400 bases) A against reads of similar length B. Set A and B are only tens of reads each.

I see that I can use formatdb on one set of reads (A) and then use blast+ to match my other set of reads (B).

This is acceptable, but I see that NCBI hosts a tool where you can blast two sequences. Is this possible with the standalone blast without building the database?

Not sure it matters, but I'm doing this on windows, probably with a python wrapper.

↧

How to build a local Pfam database?

October 3, 2016, 11:00 am

≫ Next: PSSM format for command-line PSIBLAST

≪ Previous: Standalone Blast 2 Short Sequences

Hi everyone I would like to build a local database (on my server) to search for conserved domains with Pfam database. I think it's possible to make RPS-blast through NCBI BLAST +. Could someone help me which file should I download PFAM and build this database on my server?

↧

PSSM format for command-line PSIBLAST

January 24, 2019, 9:27 pm

≫ Next: OID not found

≪ Previous: How to build a local Pfam database?

I am trying to run PSI-BLAST from the command line, but keep getting the following error: BLAST query/options error: Unsupported format for PSSM I am getting the PSSM by downloading it from this page: https://www.ncbi.nlm.nih.gov/Class/Structure/pssm/pssm_viewer.cgi?mode=Position&cd=cd00209. I already created a protein sequence database with makeblastdb, and I think the problem is in the PSSM. I just cant find enough information on this. Any help would be appreciated. Thanks in advance.

↧

OID not found

December 4, 2015, 2:27 pm

≫ Next: Blast plus command line filter !

≪ Previous: PSSM format for command-line PSIBLAST

Hello

I am running standalone BLAST version 2.2.31 locally on my computer.I wanted to do BLASTx against Nr database. For this I downloaded the database and configured the system according to the mannual.

but it is giving this error: "OID not found"

Can anybody tell me what is wrong with the commands?
thanks

↧

Blast plus command line filter !

May 20, 2015, 1:58 pm

≫ Next: BLAST: How much of the query is aligned?

≪ Previous: OID not found

Hello All,

I am doing blast with "NR" database which is taking very long time and i am interested in searching only specific species. Is there a way to submit in the blast option to restrict it to search for specific species. I am using blast plus. In older blast there is command "blastall -u "organism name" " . Is there any similar option in blast plus other than creating separate database from "NR".

Thanks

↧

BLAST: How much of the query is aligned?

November 8, 2014, 11:16 pm

≫ Next: DIAMOND blast imported into MEGAN6 has no taxonomic assignment

≪ Previous: Blast plus command line filter !

At first, I thought this question would answered by the "qcovs" field, but a glance at the results proved that that isn't the case. To begin with, each qcovs value relates not to the original query, but a smaller query partitioned therefrom. And I don't even know what this number actually means for those mini-queries. "Query Coverage Per Subject" is what the manual says, but apparently they use it in a different sense from what I would normally understand. Second, "length" is supposed to be "length of alignment," but I'm now sure what that means, either. It's neither the length of the mini-query (qend-qstart+1) nor that of the corresponding subject, although there's a strong correlation between the three. My purpose is to see whether the genome assembler succeeded in putting together a conserved gene of interest. As a measure of how well of each original (unpartioned) gene query is assembled, I'm think of either: max([set of "nident" from all mini-queries based on the same original query])/original query length or max([set of "length" from all mini-queries based on the same original query])/original query length Which one, if any, is the right approach? Please feel free to suggest your own, although I would appreciate an explanation of what I got wrong. An elucidation of "qcov" and "length" would be nice, too. Thank you. ...

↧

DIAMOND blast imported into MEGAN6 has no taxonomic assignment

November 23, 2016, 10:34 am

≫ Next: Psiblast Warning: Composition-Based Score Adjustment

≪ Previous: BLAST: How much of the query is aligned?

**Dear Friends, Hi** (*not native in Eng..*.) I have used [DIAMOND][1] for creating a **.daa** file after blastx my transcriptome.fasta assembly against NCBI nr database with this script: > diamond blastx -d nr -q '/home/Trinity_pathless.fasta' -o diamond-Trinity-daa -p 22 -f 100 --evalue 0.000001 --sensitive Then I have imported it in the **MEGAN6 community version** (I have tried both approach **(1)** direct import and then create a MEGAN6 "RMA" file and **(2)** using Meganaizer tool - according to last lines of the [MEGAN manual][2]), **but the result has no taxonomic data!** Please help me in this regard and thank you in advance NOTE: It seems that the MEGAN6 manual did not offer any guidance about **blast taxonomy parameters**. NOTE2: If you are aware of any MEGAN6 problem-solver groups, please let me know. [1]: https://ab.inf.uni-tuebingen.de/software/diamond [2]: https://ab.inf.uni-tuebingen.de/software/megan6/getting-started

↧

Psiblast Warning: Composition-Based Score Adjustment

December 6, 2011, 12:13 pm

≫ Next: blastn source code

≪ Previous: DIAMOND blast imported into MEGAN6 has no taxonomic assignment

Warning: lcl|Query_1 1DCH:A|PDBID|CHAIN|SEQUENCE: Warning: Composition-based score adjustment conditioned on sequence properties and unconditional composition-based score adjustment is not supported with PSSMs, resetting to default value of standard composition-based statistics

Can someone help me as to what this warning message means? I mean the results that I get are pretty fine, so until now I just ignored it. But slowly I'm wondering if I'm doing a mistake.

I run PSI-Blast with following parameters

./psiblast -query 1cbyA.seq -db ./blastdb -matrix PAM30 -numiterations 5 -numalignments 3000 -num_descriptions 3000 -gapopen 32767 -gapextend 32767 -evalue 10 -outfmt 7 -out ./output.txt

↧

blastn source code

February 11, 2016, 2:59 am

≫ Next: Vertebrate Subset Nr Database? Build My Own?

≪ Previous: Psiblast Warning: Composition-Based Score Adjustment

Hello,

I the Blastn source code available anywhere expect NCBI?

Thank you,

Alaa

↧

Vertebrate Subset Nr Database? Build My Own?

March 17, 2011, 1:15 am

≫ Next: Finding variants for a particular gene/protein

≪ Previous: blastn source code

I think I have the answer to my own question, but I'm somewhat new to bioinformatics, and I want to make sure my strategy is sound (and that there are no easier solutions to my problem). We need to search against the vertebrate subset of the nr database. Hence I've been tasked with finding or building a vertebrate subset of the nr database. So far, I can't find one. So to build one, I'm planning on doing the following.

Get the nr database (I believe I have the one from ncbi or ensembl).
Get the taxonomy database from ncbi.
Somehow traverse the taxonomy db, and extract the taxids of all children of the parent vertebrate node (looks like the parent vertebrate is taxid 7742 from names.dump from taxdump.tar.gz) (see ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump_readme.txt).
Use the list of vertebrate taxids from the previous step to extract the GIs of proteins from gi_taxid_prot.dmp.gz (see ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid.readme).
Use the list of vertebrate GIs from the previous step with the blastdb_aliastool to build an aliased blastdb of just vertebrates. Something along the lines of this: blastdb_aliastool -gilist vertebrate_gis.txt -db nr -out nr_vertebrates -title nr_vertebrates

Am I on the right track? Am I reinventing the wheel? Does the vertebrate gi_list or verte ...

↧

Finding variants for a particular gene/protein

November 18, 2015, 4:29 am

≫ Next: parsing too large blast result with bioperl OR other methods?

≪ Previous: Vertebrate Subset Nr Database? Build My Own?

Hello!

I was wondering if anyone had some input on the issue of finding variants that represent the total variation observed for a particular gene across species?

One idea that I have in mind is to utilize BLAST searches with certain interesting sequences as inputs. I would then attempt to filter this output based on some basic rules, for example e-value and maximal length difference. This I hope would yield a decent large set of all variants available in the database, on which clustering could be performed. Problem is, I do not know which database to utilize in this case, neither for protein sequences nor DNA sequences.

Are the refseq_ databases the way to go, or perhaps the non-redundant/nucleotide-collection database?

I have a vague idea on how to further restrict my searches based on taxonomy, but I'm not quite there yet seeing as which database to use is still unclear.

I would deeply appreciate your input!

Thank you!

↧

parsing too large blast result with bioperl OR other methods?

September 10, 2014, 6:42 pm

≫ Next: Database Error When Running Blastn

≪ Previous: Finding variants for a particular gene/protein

Hi all, Recently I am dealing with bunch of genes to design the appropriate primers. However, it is still hard for me to obtain the homology information of the primers. For example, I need to design a pair primers for one exon of the gene. I firstly get all possible primers with predefined length, e.g. 18-30 bps, and then use blastall -p blastn (or megablast) with -e 1 -W 8 to determine whether the primers have homogenous seqs. However, for those >10000 primers, the blast out file was larger than 200M, which requires longer time to parse using Bio::SearchIO module. And sometimes even crash the memory. Moreover, blasting those primer seqs within 18-30 bps are danger because shorter seqs will sometimes fail due to unkown reasons. Another method is to blast the whole exon regions with parameter -e 0.1 -W 11, however, it will generate huge output and it will take long time to parse the blast file, and to determine whether the primer region falls into homologous part. Till now, I have not obtained any good method to fix such problem. If anyone experienced such issue, can you plz tell me how? Thanks. ##############################2014.9.2 Although we could firstly define those nts belong to repeat regions using repeatMasker, and then use -F parameter in blast to neglect these reigons, those repeat regions, however, will sometimes do not share too much homologous sequences. This is the method that I can find now, but is not perfect. Hope someone could provid ...

↧

Database Error When Running Blastn

February 5, 2014, 6:04 am

≫ Next: What is the fastest way to extract a sequence ID from huge multiple FASTA file based on given sequence?

≪ Previous: parsing too large blast result with bioperl OR other methods?

Hello, as initially stated here, I am experiencing problems when I try to run the blastn command blastn -query myquery -db mydatabase.fasta.nhr.

What I get is:

BLAST Database error: No alias or index file found for nucleotide database [mydatabase.fasta.nhr] in search path [/home/user/Desktop::]

The database file was generated using makeblastdb -in mydatabase.fasta -dbtype nucl, which actually outputs three files, all giving the same error as above.

I understand that somehow I should add an external "alias or index" file in the same folder, maybe the solution would require blastdb_aliastool, included in the BLAST command line user manual, but at this point I would appreciate some hints. Thanks in advance.

↧

What is the fastest way to extract a sequence ID from huge multiple FASTA file based on given sequence?

June 12, 2015, 9:31 am

≫ Next: after blast with uniprot.fasta file, how could i get the output file which included all blasted protein's all sequence header

≪ Previous: Database Error When Running Blastn

I have a file containing millions of FASTA protein sequences from more than 2000 species. I'm looking for an efficient way (faster than BLAST) to retrieve protein's ID for a given amino-acid sequence. I know that blastdbcmd can pull out an individual sequence record from the BLAST database based on given sequence identifier, but it doesn't work for querying sequences.

Do you know any tools that skip the "alignment building step" and allow for fast retrieval of a FASTA record based on its sequence?

↧

after blast with uniprot.fasta file, how could i get the output file which included all blasted protein's all sequence header

December 1, 2015, 9:44 pm

≫ Next: What Is A Good Web Front End For (Blast) Homology Search?

≪ Previous: What is the fastest way to extract a sequence ID from huge multiple FASTA file based on given sequence?

hey guys, i have downloaded uniprot.fasta, now i want to blast the protein sequences with my transcripts. uniprot.fasta file format: kurban@kurban-X550VC:~/Desktop/Uniprot$ more uniprot_sprot.fasta >sp|Q6GZX4|001R_FRG3G Putative transcription factor 001R OS=Frog virus 3 (isolate Goorha) GN=FV3-001R PE=4 SV=1 MAFSAEDVLKEYDRRRRMEALLLSLYYPNDRKLLDYKEWSPPRVQVECPKAPVEWNNPPS EKGLIVGHFSGIKYKGEKAQASEVDVNKMCCWVSKFKDAMRRYQGIQTCKIPGKVLSDLD AKIKAYNLTVEGVEGFVRYSRVTKQHVAAFLKELRHSKQYENVNLIHYILTDKRVDIQHL EKDLVKDFKALVESAHRMRQGHMINVKYILYQLLKKHGHGPDGPDILTVKTGSKGVLYDD SFRKIYTDLGWKFTPL my query fasta file format: kurban@kurban-X550VC:~/Desktop/Uniprot$ more truncated_cd-hit-est-Trinity_CD_and_CK.fasta >TR1|c0_g1_i1 TAAGAGGTAAGAAAGCTAGAAAAGAGGAAATATTTTTAATAAAAATAATAAAACTTAATA ATATAATAATAAGTATCTTTTTATAATATTATAATAAATAAAATAAGGTAGAAATTATAT AAATTTATAAGAAAGTAATATTCTTATAATAAGAATTAACTTTTATTAATATTAAACTAG CTAAAGTAAAAATATAAATTTAAAAAAAAGATAATAATAATAAAGATTTTAAAAAATA and i have done blast: blastx -db uniprot_sprot.fasta -query truncated_cd-hit-est-Trinity_CD_and_CK.fasta -out uniprot_sprot_truncated_cd-hit-est-Trinity_CD_and_CK_blastx_tabular -evalue 1e-5 -num_threads 3 -num_alignments 1 -outfmt 6 the output file form i got: kurban@kurban-X550VC:~/Desktop/Uniprot$ more uniprot_sprot_truncated_cd-hit-est-Trinity_CD_and_CK_blastx_tabular TR4|c0_g1_i1 sp|Q9WVJ0|KCNH3_MOUSE 76.54 81 19 0 243 1 2 82 8e-40 144 TR21|c0_g1_i1 sp|Q99315|YG31B_YEAST 34.09 88 ...

↧

What Is A Good Web Front End For (Blast) Homology Search?

November 29, 2010, 7:39 am

≫ Next: How to export taxonomies from MEGAN (input is BLAST table)

≪ Previous: after blast with uniprot.fasta file, how could i get the output file which included all blasted protein's all sequence header

One fairly common task is to make some sequences available for BLAST search over the web. So far, this has meant installing the NCBI web front end to BLAST, but this is something like 15 years old, and it shows. Although we have some local enhancements that allows the user to click on sequence names to download them, etc, I really would like a more flexible replacement. Surely something more advanced exists? Something equally easy to download and set up?

Edit: To be more specific, I would like a user friendly (i.e. simple) interface to do sequence search (not necessarily BLAST) against our own sequence data. Features I'd like to see might be

visualization of the alignments (better than the rudimentary NCBI imagemap)
links to target sequences (or to a genome browser or similar)
hit selection/filtering based on e.g. taxonomy
free software with a sensible (GPL, BSD, MIT..) license
simultaneous search against multiple databases
automatic database discovery (no more perl config_setup.pl)
automatic restriction of available alignment algorithms based on sequence types (no more blastp on nucleotides sequences)

(Feel free to suggest additions to the list in the comments below!)

↧

How to export taxonomies from MEGAN (input is BLAST table)

October 6, 2016, 2:21 pm

≫ Next: Any Local Gui-Guided Blast Tools?

≪ Previous: What Is A Good Web Front End For (Blast) Homology Search?

Hi, I am new to MEGAN. My purpose is to annotate a fasta file. I have done a BLAST search for the sequences in my fasta file against the NCBI reference sequence database, and I imported the BLAST output table together with NCBI reference sequence database taxa map (include NCBI accessory number and taxa info) to MEGAN. It is easy to see which species were found in my fasta file with MEGAN, but I just cannot find a way to export the taxonomy information (after LCA process) together with my fasta seq_id. Could any of you tell me how could I export taxa together with my fasta seq_id?

↧

Any Local Gui-Guided Blast Tools?

May 5, 2011, 1:48 pm

≫ Next: How an alignment with multiple HSPs is evaluated in blastn, blastp?

≪ Previous: How to export taxonomies from MEGAN (input is BLAST table)

Hi All,

I am currently looking into local, GUI-guided BLAST tools. I stumbled upon BlastStation2 and was wondering if anyone has experience with it ? Are there any similar tools such as BlastStation2 available, that I might have missed? It makes no difference whether it is free or commercial.

I really appreciate your help.

Thanks,

Beeth

↧

How an alignment with multiple HSPs is evaluated in blastn, blastp?

July 28, 2017, 11:39 am

≫ Next: blastn linux command specify database

≪ Previous: Any Local Gui-Guided Blast Tools?

When blastn finishes searching, it reports many alignments (hits or query-sequence match), many of these alignments consist of many hsps. So, when it picks the top alignments, how does it evaluate the alignments's score? I know how each HSP is scored and how the score is used to compute evalue. But, how these individual HSPs contribute to the alignment's score? If there is no idea of score for an alignment, then how does blast decide which alignments to keep and report? I was going with the idea that every alignment is judged with it's highest scoring HSP (or lowest evalue HSP). But when I do split-database query, I found some alignment with a pretty high scoring HSP does not get picked by the search when run on the whole database. Is there any sum-statistics in play when evaluating alignments?

↧

blastn linux command specify database

November 26, 2015, 5:43 pm

≫ Next: Multiple protein accession number query Command LIne Blastp

≪ Previous: How an alignment with multiple HSPs is evaluated in blastn, blastp?

I'm trying to run a local linux blast search and I know the general formatof the command: blastx -query seqs.fasta -db nr -out output.blast.txt. However I don't know the short names for all the databases I want to search e.g. "ena_fun". How can I find these inputs out?

Thanks

↧