Quantcast
Channel: Post Feed
Viewing all 41825 articles
Browse latest View live

Help With Rps-Blast With New Blast

0
0

Hi

I want to make rpsblast with the new blast 2.2.25 with rpstblastn. But I need to format the database. I think for the older blast version you use formatrpsdb and it´s create others files including .aux, but for this new version I dont know what to use.

I tried this

makeblastdb -in Pfam-A.fasta -title Pfam -logfile pfam

But it just creates .psq .pin and .phr, and if I use rpstblastn it says that canot find the .aux file

I hope someone can help me!!! Thanks!


Creating A Dna Sequence Database Locally For Blastplus

0
0

Hi,

I have a fasta file containing cDNA sequences and I would like to create a blastable database for blastplus, (using blastn) that I have installed locally. I am, however, unsuccessful up to now as I end up creating a PROTEIN database, although my file obviously contains DNA sequences. Here is the command I use, following blastplus user_manual.pdf:

makeblastdb -in my_file.fasta -title my_db -out ~/path/blastplus/database/my_db

And here is the output produced:

Building a new DB, current time: 07/26/2010 11:00:28
New DB name:   /path/blastplus/database/my_db
New DB title:  my_db
Sequence type: Protein
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1073741824B
Adding sequences from FASTA; added 1100 sequences in 0.0458999 seconds.

The user manual says that makeblastdb is supposed to recognise the data type automatically. Should it not recognize that I'm using DNA and not proteins?

Your suggestions are appreciated :)

Cheers

The Meaning Of The E In Blast Output

0
0

I have a question that's been floating around in my head for a while, and a colleague recently asked the same thing so I thought I should finally get to the bottom of it.

What does the 'e' in the output of a BLAST search refer to? Specifically, does the symbol 'e' refer to 'x10^' or does it refer to the mathematical constant 'e=2.72...'? I could see a justification for either: e as 'x10^' is a convenient notation, while 'e=2.72...' is actually used in the calculation of the e-value score itself.

This hasn't really mattered in assessing the relative quality of a hit, or determining the homology between sequences, but I thought I'd better find an answer for this question.

Thanks!

David

Are Alignment Lengths Reported In Ncbi Blast+ Results Counting Nucleotides Or Amino Acids?

0
0

Very simply, what are the units of the alignment length reported by BLASTX and BLASTP? The units of length are not listed in the output column headings (-outfmt 7) and I have found nothing in the O'Reilly BLAST manual.

It is specifically BLASTX (translated nucleotide query vs protein database) and BLASTP (protein vs protein) for which I need to know this information.

Problems Using Formatdb And Fastacmd

0
0

Hello!!

I am having an extremely difficult time with formatdb and fastcmd from NCBI blast.

I am trying to make a db from single mouse chromosome from NCBI using formatdb. Afterwards I try to obtain sequence from defined positions of the db. I use the commands:

formatdb -p F -i 12.fa -o T -t C57Bl6_chr12 -n C57Bl6_chr12
fastacmd -d C57Bl6_chr12 -p T -L 1000000,10001000 -o test.txt

[fastacmd] ERROR: ERROR: Cannot initialize readdb for C57Bl6_chr12 database

It is very annoying that the formatdb is not making the readdb for my file, do you know what I can do about it??

Thanks!!

Strange Behaviour Of Bioperl'S Bio::Searchio When Parsing Xml Blast Output

0
0
Hello, I've noticed some strange behaviour when parsing BLAST .xml output files (-oufmt 5) using BioPerl's Bio::SearchIO library. I have a simple parser script that looks something like: #!/usr/bin/perl -w use strict; use Bio::SearchIO; my $in = Bio::SearchIO -> new (-format => 'blastxml', -file => "consensusSeqs.BLASTp.xml"); open (OUT, ">consensusSeqs.parse.OUT"); my $query_count = 1; while( my $result = $in->next_result ) { print "Query count: ".$query_count."\n"; print "Query name: ".$result->query_description."\n"; print "Number of hits: ".$result->num_hits."\n"; my $hit_count = 1; if (!defined $result->next_hit) { print OUT $result->query_description."\tNo hit\n"; } while ( my $hit = $result->next_hit ) { print "\tHit count: ".$hit_count."\n"; while ( my $hsp = $hit->next_hsp ) { my @a = split /\|/, $hit->name; my $hit_accession = $a[3]; # print "Hit name: ".$hit->name."\n"; print "\tHit accession: ".$hit_accession."\n"; ## get some stats of hit my $percent_id = sprintf("%.2f", $hsp->percent_identity); my $percent_q_coverage = sprintf("%.2f", ((($hsp->length('query')/($result->query_length)))*100)); my @b = split /[\[\]]/, $hit->description; my $organism = $b[1]; (my $short_desc = $hit->description) =~ s/\[.*//; print ...

Create Blast Database From Query Result?

0
0
Possible Duplicate:make a custom BLAST library using the output of another blast result
Hello, I would like to create a BLAST database using as input a file generated after executing a BLAST query. I first query a fasta file which contains a nucleotide sequence using the blastn command, saving the result in a text file. Next, I try to create a database using the makeblastdb, specifying as input parameter the text file generated after executing the query. Unfortunately, I get the following error:
BLAST options error: Input format not supported (unknown format). Use -input_type to specify the input type being used.
I try to execute the query in the first step using the -outfmt argument, saving the query results in XML, text ASN.1, binary ASN.1. When I try to create the database specifying in the input_type the 'asn1_bin', 'asn1_txt' and 'xml' respectively, I get the following error:
Error: NCBI C++ Exception: "/am/ncbiapdata/release/blast/src/2.2.25/Linux64-Suse-icc/c++/ICC1010-ReleaseMT64--Linux64-Suse-icc/../src/serial/objistr.cpp", line 838: Error: ncbi::CObjectIStream::SkipFileHeader() - line 3: incompatible type BlastOutput<>Seq-entry ( at Seq-entry)
I would ...

blast -F F equivalent in blast+

0
0

In legacy blast there is an option -F, which turns on or off filtering. By default filtering is ON and it effects my results a lot, so I turn it off by -F F (false)

In blast+ I couldn't figure out the equivalent of -F F. In another post  it was suggested to use "-dust yes", I tried it and it does not work, there's still filtering going on.

So, what is the option in blast+ to turn off filtering?

 


Blast Database Sequences Length

0
0

Hi all

I am doing blast queries using genomes as databases. I would like to know if is possible to filter out database sequences by their length. So, if there is a sequence in the database that is lesser than the query + several kb not use it. If this is not possible, there is a work arround to obtain the lengths of the sequences in the output?

Thanks!

Almost Nothing Mapped Using Bwa Or Bowtie. But A Lot Of Mapping In Blast

0
0

I have pair-ended illumina samples with the read length of 100-150bp. I tried to map them to the reference genome/transcritpome, but almost nothing mapped using BWA or Bowtie.

When I blast them to nt reference using megablast, the most reads map to the right species. What does this mean? How can I interpret this? and what do I do to improve short-read alignment using BWA or bowtie?

I Use Blastdbcmd Command With Nr Database And It Gives Me The "Not Found Database" Error

0
0

This is the error

BLAST Database error: No alias or index file found for nucleotide database [nr] in search path [/home/aj3/Desktop/h/database::]

what should I do? do I have to use any other commands prior to this one to make the database ready or what?

Blast Formatdb, Multiple Folders/Directories

0
0

I have downloaded viral genomes from the NCBI website I want to use formatdb so I can create a BLAST database of these viral genomes. However, they are present as nearly 300 folders, each containing the fasta files for each of the genomes/genome segments (named by NC_ ids). In the documentation for formatdb, it says to format multiple files I must quote the input files to be formatted. I do not have a list of these files as they are within the folders. Is there any way to formatdb using multiple folders/directories rather than the individual fasta files?

Any help much appreciated :-) Thanks

Obtaining the top matches from blast

0
0

Hi,

I have downloaded the current version of the stand-alone-blast (ncbi-blast-2.2.29+) and I am trying to use blast (blastn) to find similarity of of a group of nucleotide sequences that I have. However, I am interested on only the top 3 matches. I tried searching online and I saw some posts that suggests using -K, but I realized this does not work with the new version that I am using. I looked at the help document and I tried using ( -max_target_seqs) and ( -num_alignments) but none of them worked. The result contains all the matches found by blast.

Does anyone know how to limit the results to let say just top 3 matches?

Thanks!

 

 

How To Get Status Update In Ncbi Standalone Blast?

0
0

For example, I am running standalone Blast+ for thousands of EST sequences with remote (NCBI) server. I am not getting any status message like 15 of 100 sequence is running. Is it possible to get any status message like that? or any other way to send one after another sequence using perl scripts? Many thanks!

Blastx For A Million Metagenomic Sequences

0
0

I intend to use a similarity based binning program like MEGAN or SOrt-ITEMS or CARMA for analyzing sequences in my metagenomic data set. For this, I first have to generate a BLASTx output of my metagenomic sequences against a huge data base such as nr or pfam. I do not have huge computing resources to run a standalone for the same. Any suggestions as to how do I obtain a blastx output for a million sequences.


Ncbi-Blast+ Version 2.2.23 'Make' Error

0
0
I'm getting errors when I run 'make' for rmblast-1.2-ncbi-blast-2.2.23+-src. This is the version of RMBlast modified for use with RepeatMasker from their site. I had no problem installing it in Ubuntu on my laptop (64-bit), but am now struggling with it on my Desktop (32-bit, Ubuntu 11.10). The configure runs successfully, but here are the errors I get after running 'make':/usr/local/rmblast-1.2-ncbi-blast-2.2.23+-src/c++/include/dbapi/driver/impl/dbapi_driver_utils.hpp:234:32: error: reference ‘m_Conn’ cannot be declared ‘mutable’ [-fpermissive] make[8]: *** [public.o] Error 1 make[8]: Leaving directory `/usr/local/rmblast-1.2-ncbi-blast-2.2.23+-src/c++/GCC461-ReleaseMT/build/dbapi/driver' make[8]: Entering directory `/usr/local/rmblast-1.2-ncbi-blast-2.2.23+-src/c++/GCC461-ReleaseMT/build/dbapi/driver' rm -f libdbapi_driver.a .dbapi_driver.dep .libdbapi_driver.a.stamp if [ '/bin/bash /usr/local/rmblast-1.2-ncbi-blast-2.2.23+-src/c++/scripts/common/impl/if_diff.sh "ln -f"' != '@:' -a -d /usr/local/rmblast-1.2-ncbi-blast-2.2.23+-src/c++/GCC461-ReleaseMT/status -a -w /usr/local/rmblast-1.2-ncbi-blast-2.2.23+-src/c++/GCC461-ReleaseMT/status -a /usr/local/rmblast-1.2-ncbi-blast-2.2.23+-src/c++/src/dbapi/driver != . ]; then \ rm -f /usr/local/rmblast-1.2-ncbi-blast-2.2.23+-src/c++/GCC461-ReleaseMT/lib/libdbapi_driver.a /usr/local/rmblast-1.2-ncbi-blast-2.2.23+-src/c++/GCC461-ReleaseMT/status/.dbapi_driver.dep \ /usr/local/rmblast-1.2-ncbi-blast-2.2.23 ...

Blast+ Error Code=12

0
0
Something really bizarre is going on. I ran 6 tblastn jobs independently using the following command line entry: tblastn -query blastqueries14920.fasta -out blastqueries14920.fasta.out -db refseqgenomic -outfmt 5 -softmasking True -show_gis -seg yes & after repeating this 6 times for all my input files, a quick look at the directory showed the output files had been created as expected, with a size of 0 KB. in the top list, I saw all 6 of my tblastn jobs running happily at 100% cpu usage each. after about a day, 2 of them are no longer in the list of active processes, and yet all 6 out files are still empty. No error messages in case you were wondering. ...so any ideas on where the heck the results are? if there are any? aah! ********UPDATE***** I am running tblastn 2.2.25 against a local copy of the blast db refseq_genomic. computer is a macpro with 2 quad-core intel xenon processors, 16 GB of RAM. The blast+ that I downloaded was the disk image version, don't know it tht means that it is 32 bit only?? Okay so I now know more about whats going on! The rest of my runs also died, this time I was around to notice the full error: set a breakpoint in mallocerrorbreak to debug tblastn(32920) malloc: * mmap(size=1496113152) failed (error code=12) s ...

[Fastacmd] How To Retrieve Sequence From Blast Db

0
0

Let's say I have multifasta with protein sequences having internal IDs (integer)

>1234
MGKL...*

I build blast db using:

formatdb -i infile.fa -pF -n someDB

But then, I'm unable to retrieve sequence from db using simple protein id:

fastacmd -d someDB -s 1234

How to define fasta header so I can retrieve sequences easily?
I have noticed formatdb assign internal identifiers (increment int) to my sequences, and orginal ID appears later:

>gnl|BL_ORD_ID|12 1234

Why is that?

I then defined headers as:

>gnl|dbname|1234

but with no effect. Do I have to define headers as >gi|1234 in order to be able to get sequence? Or is there any other way of retrieving sequences from blast db?

How To Merge Contiguous Blast Hsps! (-M 8 Tab)

0
0
Hi, guys! I performed blastx (-m 8) using a query file of many sequences, and for each target sequence, the output contains many fragmental hsps of significance, and these hsps have overlap positions or not. so, how can i merge those closely related hsps into one via setting a flanking value (e.g <300bp) when these hsps match the same subject (different regions). In the following figure, I want to transform the upper results to the lower ones http://www.imagebam.com/image/0c13b2256142668 Thanks in advance! Transform scaffold16:1661-2239(+) gi|471236998|ref|YP_007641386.1| 50.00 122 52 3 **225 578** 603 719 2e-53 126 scaffold16:1661-2239(+) gi|471236998|ref|YP_007641386.1| 75.00 76 19 0 **1 228** 528 603 2e-53 108 scaffold16:1661-2239(+) gi|333951646|gb|AEG25349.1| 52.10 119 54 2 **225 578** 604 720 7e-53 124 scaffold16:1661-2239(+) gi|333951646|gb|AEG25349.1| 77.63 76 17 0 **1 228** 529 604 7e-53 109 scaffold28:2776872-2777385(-) gi|327335359|gb|AEA49877.1| 70.18 57 17 0 **173 343** 554 610 3e-30 90.5 scaffold28:2776872-2777385(-) gi|327335359|gb|AEA49877.1| 72.22 54 15 0 **1 162** 497 550 3e-30 67.0 To scaffold16:1661-2239(+) gi|471236998|ref|YP_007641386.1| . . . . **1 578** . . 2e-53 **234** scaffold16:1661-2239(+) ...

How To Parse Psiblast Results Using Biopython And Blast-2.2.24+?

0
0
I'm trying to run a PSIBlast program which selects certain sequences out at every round before it does the next iteration. For this I need the Round attribute shown in http://biopython.org/DIST/docs/tutorial/Tutorial.html#fig:psiblastrecord.<br< a="">> The biopython tutorial says: "In Biopython, the parsers return Record objects, either Blast or PSIBlast depending on what you are parsing." However, I can only access attributes found in the normal Blast record class: http://biopython.org/DIST/docs/tutorial/Tutorial.html#fig:blastrecord. from Bio.Blast.Applications import NcbipsiblastCommandline from Bio import SeqIO from Bio.Blast import NCBIXML File = "KNATM" def psiBlast(File): fastaFile = open("BLAST-"+File+".txt","r") my_blast_db = r"C:\Niek\Test2.2.17\TAIR9_pep_20090619.fasta" my_blast_file = '"C:\\Niek\\Evolution MiP\\BLAST-'+File+'.txt"' my_blast_exe = r"C:\Niek\blast-2.2.24+\bin\psiblast.exe" E_VALUE_TRESH = "10" for seq_record in SeqIO.parse("BLAST-"+File+".txt", "fasta"): global cline tempFile = open("tempFile.fasta","w") tempFile.write(">"+str(seq_record.id)+"\n"+str(seq_record.seq)+"\n") tempFile.close() cline = NcbipsiblastCommandline(cmd = my_blast_exe, db = my_blast_db, \ query = ...
Viewing all 41825 articles
Browse latest View live




Latest Images