shortcut to bin for execution of blast commands

January 21, 2015, 3:22 am

≫ Next: Bioperl - Parsing a tblastn report for gene presence/absence using bioperl

≪ Previous: error downloading refseq for local BLAST install via FTP

so everytime i have to perform a blast execution be it makeblastdb, tblastn, blastn ,,, i have to rush back to the bin directory in the installation folder

is there any way i can link the operations so that i can run them from any location

↧

Bioperl - Parsing a tblastn report for gene presence/absence using bioperl

April 27, 2015, 7:04 am

≫ Next: Forum: Looking For Help To Understand Bioinformatics Papers On Blast, Yasara And Huntington'S Disease

≪ Previous: shortcut to bin for execution of blast commands

Hi, any advice is appreciated! I am trying to write a Perl program which uses Bioperl to parse a tblastn report of gene protein query sequences against a nucleotide genome sequence. I am using a sequence length of 80% and a fraction identical measure of 80% relative to the query as cut-offs. The program reports '1' if present, '0' if absent next to gene name in the output. Also I used tiling methods in the program as the nucleotide sequences are made up of draft contig sequences. However, when i do a positive control and do a tblastn of the protein sequences against the same genome sequence that the proteins have originated from, many genes are reported as absent ("gene" 0 is the ouput). Any help as to this problem? Here is the program: #!/usr/bin/perl use strict; use Bio::SearchIO; use Bio::Search::Tiling::MapTiling; my $in = new Bio::SearchIO(-format => 'blast', -file => $ARGV[0]); while( my $result = $in->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object my $tiling = Bio::Search::Tiling::MapTiling->new($hit); if( $tiling->frac_identical('query') >= 0.8 && (100 / $result->query_length) * $tiling->length('query') >= 80) { print $result->query_name, " 1", "\n"; } else { print $result->query_name, " 0", "\n";} } } Sample output ...

↧

Forum: Looking For Help To Understand Bioinformatics Papers On Blast, Yasara And Huntington'S Disease

March 13, 2013, 8:59 am

≫ Next: Downloading And Maintaining A Local, Blast-Able Nr Database

≪ Previous: Bioperl - Parsing a tblastn report for gene presence/absence using bioperl

Hello,

I'm a student and i've got to make a paper about bioinformatics i've been working on it for a while and i'm almost done. I only have to write about BLAST, YASARA and the huntington disease approached by bioinformatics research. I'm having some difficulties with those three chapters. If there's anyone who would like to help me, please send a message.

Thanks in advance.

↧

Downloading And Maintaining A Local, Blast-Able Nr Database

March 29, 2011, 7:41 pm

≫ Next: How To Get Ungapped Sequences From Blast Output?

≪ Previous: Forum: Looking For Help To Understand Bioinformatics Papers On Blast, Yasara And Huntington'S Disease

I am planning to set up and maintain a local version of the NR and other NCBI databases, for running in-house BLAST searches. I would also like to my local version of the databases be in synch with NCBI through regular updates. NCBI suggests using the update_blastdb.pl (http://www.ncbi.nlm.nih.gov/blast/docs/update_blastdb.pl) to download the latest versions of all the pre-formatted databases. Does anyone have experiences to share on using this script? Are there alternative solutions? Will appreciate everyone's feedback. Thanks, Anjan

↧

How To Get Ungapped Sequences From Blast Output?

September 16, 2011, 8:19 am

≫ Next: Blastn Segmentation Fault

≪ Previous: Downloading And Maintaining A Local, Blast-Able Nr Database

I am interested in getting ungapped sequence hits (hspsnogap) from blast output, I tried using this:

result_handle = NCBIWWW.qblast("blastp", "nr", record.format("fasta"),expect=10,"hsps_no_gap")
blast_records=NCBIXML.read(result_handle)

but there are still gaps in the sequences.

I would appreciate some hints on how to get ungapped sequences in fasta format from blast output.

↧

Blastn Segmentation Fault

December 7, 2011, 9:08 pm

≫ Next: How Scores are given in Substitution matrix like PAM and BLOSUM?

≪ Previous: How To Get Ungapped Sequences From Blast Output?

Hi,

I am running a BLASTN of about 150 sequences against a genome that is 2.2 gigabases long. A few of my queries are actually full length BAC end sequences running to around 150,000 bases. I expect to find huge, contiguous hits for some BACs in the genome. Here's the command I use -

blastn 80BACs.fasta -db mygenome -out 80BACsBLAST -outfmt 10 -num_threads 8 -evalue 10e-3 -index_name mygenomeMBI

Around 10 minutes after it starts running, the program halts after producing a segmentation fault. I did a 'ulimit -s unlimited' to set the stack size to unlimited, but to no avail. I also went easy on the number of threads in subsequent trials, setting num_threads to 5 and subsequently, 2 - but that didn't help either.

I am using the binaries from rmblast-1.2-ncbi-blast-2.2.23+. I had earlier run a smaller query dataset against the same genome which worked fine, the BLAST completed in half a day. This issue, I am convinced is most definitely due to some very very long query sequences - I'd highly appreciate any help in this regard!

Thanks,

Srihari

↧

How Scores are given in Substitution matrix like PAM and BLOSUM?

October 6, 2014, 10:25 am

≫ Next: No hits found

≪ Previous: Blastn Segmentation Fault

hi everyone..

while aligning two protein sequences, amino acid of the query sequence n amino acid of Database sequence is aligned..if both match a score is given..if there is a substitution of aminoacid for eg ( leucine for Isoleucine).how score is given for this alignement?

or simply any one explain how subsitution matrix compares two sequence and align according to the substitution scoring matrix?

↧

No hits found

December 19, 2014, 10:33 pm

≫ Next: How To Filter Blast Results

≪ Previous: How Scores are given in Substitution matrix like PAM and BLOSUM?

Bioinformatics noob here,

Im working with a firefly database. I have downloaded the blast executables and made a local database with the makeblastdb command. To test the database I have used the blastn command using a luciferase gene as the query, which I know is in the database because I have blasted before in a different program, and I get **No hits found**. I have tried this with several different luciferase sequences and each time I am getting **No hits found**.

Does anyone have any idea as to why this is happening? Do I have to change any search parameters?

↧

How To Filter Blast Results

May 21, 2012, 9:39 pm

≫ Next: List record in a local BLAST database

≪ Previous: No hits found

Hello everyone,

I have short nucleotide sequences. For further analysis of these sequences i have blasted them with balstn program in ncbi with default parameters against ESTs of the desired organism and i got multiple hits. How can i filter them, what parameter should i consider for filtering. can i go for EST with minimum e-value.....

thanks! jyoti

↧

List record in a local BLAST database

April 7, 2015, 8:23 am

≫ Next: Opening A Fasta File In Windows

≪ Previous: How To Filter Blast Results

Is there a way to get a FASTA file containing all the record in a local BLAST database.

I've created a local database using makeblastdb with a fasta file, but was wondering if there is any way to reverse this process?

Thanks!

↧

Opening A Fasta File In Windows

March 6, 2012, 2:02 am

≫ Next: Output blast and time execution

≪ Previous: List record in a local BLAST database

Hi all,

I am a beginner with Blast+.I am using Windows.My aim as of now is to download the nr protein sequence in Fasta format and then format it using makeblastdb.then extract the first 1000 characters from the nr file as a seperate file (say qa.fasta) and then query it against the whole database.

Now i downloaded the nr database in Fasta format from this link

ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz (are these the original fasta files??)

then i used to makeblastdb command like this

makeblastdb -in nr -dbtype prot -out outnr -> This resulted in the nr file to be split into different parts nr.00 to nr.03.(Is this normal).

Now i need help to extract the first 1000 char from nr file.But how to i open a Fasta file in windows??? How do i proceed??

↧

Output blast and time execution

June 18, 2014, 1:55 am

≫ Next: Problems Using Formatdb And Fastacmd

≪ Previous: Opening A Fasta File In Windows

Hi,

I have a simple question. Why the xml output format in blastn take a lot longer than the tabular output ?

There is a huge difference between the two output formats, using the same data and bank.

Thanks a lot.

↧

Problems Using Formatdb And Fastacmd

April 9, 2012, 4:00 pm

≫ Next: Problem Running Blast Jobs.

≪ Previous: Output blast and time execution

Hello!!

I am having an extremely difficult time with formatdb and fastcmd from NCBI blast.

I am trying to make a db from single mouse chromosome from NCBI using formatdb. Afterwards I try to obtain sequence from defined positions of the db. I use the commands:

formatdb -p F -i 12.fa -o T -t C57Bl6_chr12 -n C57Bl6_chr12
fastacmd -d C57Bl6_chr12 -p T -L 1000000,10001000 -o test.txt

[fastacmd] ERROR: ERROR: Cannot initialize readdb for C57Bl6_chr12 database

It is very annoying that the formatdb is not making the readdb for my file, do you know what I can do about it??

Thanks!!

↧

Problem Running Blast Jobs.

April 11, 2011, 5:44 pm

≫ Next: Question concerning a for loop in R with strsplit to generate amino acid abundance visuals

≪ Previous: Problems Using Formatdb And Fastacmd

I started to run blast, locally on my machine, on 4 files with 1323, 210, 501, 166 fasta sequences each.

For all jobs except the first one blast returned an error:

blastall(7004) malloc: *** mmap(size=2097152) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug

The first job is currently still running as I'm writing this.

The exact blastall command used is as follows:

blastall -p blastn -d /Users/anjan/blastdb/nt -i <input.filename.fasta>  -o RPMBBP10.fasta.blast.out -m 8 -b 10000.

The version of Blast is 2.2.17. I'm running this on a Mac server with 2 Quad-core Intel processors with 16GB RAM. Anyone has any ideas on what the source of the problem is?

↧

Question concerning a for loop in R with strsplit to generate amino acid abundance visuals

December 2, 2014, 12:40 am

≫ Next: Protein Sequence Alignment Using Blast

≪ Previous: Problem Running Blast Jobs.

Hello Biostars,

I'd like to preface this post to say that I am new to R and bioinformatics coding, and that I'd really appreciate some input from this knowledgable community. My goal for the code posted below is to generate pie charts that show amino acid abundance per protein from BLAST results. I uploaded a csv file from UniProt, converted it to a matrix, and wrote out the code below. I keep getting the error: In AAs[i] = table(strsplit(BLAST_AA_seqs[i], "", useBytes = TRUE)) :number of items to replace is not a multiple of replacement length. Column 8 is the output column that contains the amino acid sequences. Thanks in advance!

mydata=read.table("C:/Users/du0/Desktop/Downloads/CDPKbeta_BLAST_results.csv", header=TRUE,sep=",")
mydata=as.matrix(mydata)

AAs=c()
BLAST_AA_seqs=c()
for(i in 1:nrow(mydata)){
print(i)
BLAST_AA_seqs[i]=mydata[i,8]
AAs[i]=table(strsplit(BLAST_AA_seqs[i],"", useBytes=TRUE))
pie(AAs, col=rainbow(length(AAs)), main="Residue abundance")
}

↧

Protein Sequence Alignment Using Blast

February 1, 2014, 10:59 am

≫ Next: using ClustalW output to BLASTn search against the nucleotide (nr/nt) collections

≪ Previous: Question concerning a for loop in R with strsplit to generate amino acid abundance visuals

Hi All, I am trying to run Blast over protein sequences from two organisms. I downloaded the fasta from NCBI. I am trying to iterate over the list of sequences in the fasta file and do a sequence alignment of each sequence in one file with each sequence in the other. I want to run over local Blast but getting some error, I would greatly appreciate some suggestions.

'''from Bio.Blast.Applications import NcbiblastxCommandline
help(NcbiblastxCommandline)'''

from Bio.Blast.Applications import NcbiblastpCommandline
from StringIO import StringIO
from Bio.Blast import NCBIXML
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio import SeqIO
from Bio.Blast import NCBIWWW
import cStringIO

def BlastSeq():
    SC_Fasta = open("sc.fsa","r")
    HS_Fasta = open("hsap.fsa","r")
    blastp = "C:\\Program Files\\NCBI\\blast-2.2.29+\\bin\\blastp"

    record1 = list(SeqIO.parse(SC_Fasta,"fasta"))
    for r1 in record1:
        r1.id
        r1.seq


    record2 = list(SeqIO.parse(HS_Fasta,"fasta"))
    for r2 in record2:
        r2.id
        r2.seq

    for r1 in record1:
        for r2 in record2:
            output = NcbiblastpCommandline(blastp,query= r1.seq, subject=r2.seq, outfmt=5)()[0]
            blast_result_record = NCBIXML.read(StringIO(output))


def main():
    BlastSeq()

main()

Error: Bio.Application.ApplicationError: Command 'C:\Program Files\NCBI\blast-2.2.29+\bin\blastp -outfmt 5 -query    MVKLTSIAAGVAAIAATASATTTLAQS ...

↧

using ClustalW output to BLASTn search against the nucleotide (nr/nt) collections

May 2, 2015, 3:25 pm

≫ Next: Is Makeblastdb Creat 3 Files Like Formatdb + Blast Error

≪ Previous: Protein Sequence Alignment Using Blast

hello there

please someone tell me, how I can employ the resulted multiple alignments of ClustalW in BLASTn search against the nucleotide (nr/nt) collections??

you Imagine I have this output of ClustalW

CLUSTAL 2.1 multiple sequence alignment


q_[1938_1979].1      ACCTTGAAGCAAGAAAGGGGAAGTGGAGACAAAACTGATTAA 42
t_[1512_1553].2      ACCAAAAAAGAAGAAGGAGGAGATGGAGAAAAAAAAGACAAA 42
                     ***   **  ***** * ***  ****** ****  **  **

now I want to use this in BLASTn search against the nucleotide (nr/nt) collections...

how i can??

↧

Is Makeblastdb Creat 3 Files Like Formatdb + Blast Error

July 20, 2011, 12:54 pm

≫ Next: Where can I download the blast version that includes formatdb ?

≪ Previous: using ClustalW output to BLASTn search against the nucleotide (nr/nt) collections

Hi everyone,

I have a big problem!

I install blast+ for windows.

I run this :

makeblastdb.exe -in filename.fasta -dbtype nucl -out filename_BLASTdb

apparently it create db coz I have this message:

a -dbtype nucl -out 020_GC.LD_ED_19July_BLASTdb


Building a new DB, current time: 07/20/2011 09:47:50
New DB name:   020_GC.LD_ED_19July_BLASTdb
New DB title:  020_GC.LD_ED_19July.fasta
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1073741824B
Adding sequences from FASTA; added 809 sequences in 0.289397 seconds.

right?

but there is no file created so when i try to blast

blastn.exe     -db filename_BLASTdb -word_size 7 -query mature.fasta -out filename.blastsmallRNA     -evalue 10

there is an error i can not understand:

Error: (106.18)
NCBI C++ Exception:
    Error: (CArgException::eSynopsis) Too many positional arguments (1), the off
ending value: filename.fasta-db
    Error: (CArgException::eSynopsis) Application's initialization failed

could you please help

thanks

↧

Where can I download the blast version that includes formatdb ?

November 5, 2014, 12:50 pm

≫ Next: Blast output query

≪ Previous: Is Makeblastdb Creat 3 Files Like Formatdb + Blast Error

I am trying to use a pipeline that uses formatdb command of old BLAST. This option is been replaced by makeblastdb in newer versions of blast+ . I could not find archive for blast to download the old version. I am using CentOS 6.5.

Here is the program, if any body is interested to look at the source code:

https://github.com/chiulab/surpi/blob/master/plot_reads_to_gi.sh

↧

Blast output query

July 25, 2014, 11:36 pm

≫ Next: Blastn command line results -- bad alignments but high scores

≪ Previous: Where can I download the blast version that includes formatdb ?

I did a local standalone blast pre-miRNAs against a genome in tabular format (-m 8) and got the results. My next steps for refining the results include GC content analysis, RepeatMasker etc. I am currently developing a Perl program to extract the part of the sequence that has matched to any pre-MiRNAs from the Tabular column. My logic include

1) Matching the supercontig name with a specific sequence block name in the genome file.

2) Extracting the matched area in between the sequence match and end points mentioned in the tabular file.

In the tabular output, there is query sequence match start and end point as well as subject sequence start and end point. Which should I be using as a start and end point for sequence extraction? Query sequence or subject sequence?

http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall_node93.html

I have read published papers on miRNA where they mention a sliding window of 70 or 100 nucleotides on either side of the match area. I presume that these researchers extract 70 nucleotides before the start of the match area as well as 70 nucleotides after the end of the match area. Am I right in presuming this and should I be doing the same thing?

Please help

↧

Latest Images