so everytime i have to perform a blast execution be it makeblastdb, tblastn, blastn ,,, i have to rush back to the bin directory in the installation folder
is there any way i can link the operations so that i can run them from any location
so everytime i have to perform a blast execution be it makeblastdb, tblastn, blastn ,,, i have to rush back to the bin directory in the installation folder
is there any way i can link the operations so that i can run them from any location
Hello,
I'm a student and i've got to make a paper about bioinformatics i've been working on it for a while and i'm almost done. I only have to write about BLAST, YASARA and the huntington disease approached by bioinformatics research. I'm having some difficulties with those three chapters. If there's anyone who would like to help me, please send a message.
Thanks in advance.
I am planning to set up and maintain a local version of the NR and other NCBI databases, for running in-house BLAST searches. I would also like to my local version of the databases be in synch with NCBI through regular updates. NCBI suggests using the update_blastdb.pl (http://www.ncbi.nlm.nih.gov/blast/docs/update_blastdb.pl) to download the latest versions of all the pre-formatted databases. Does anyone have experiences to share on using this script? Are there alternative solutions? Will appreciate everyone's feedback. Thanks, Anjan
I am interested in getting ungapped sequence hits (hspsnogap) from blast output, I tried using this:
result_handle = NCBIWWW.qblast("blastp", "nr", record.format("fasta"),expect=10,"hsps_no_gap")
blast_records=NCBIXML.read(result_handle)
but there are still gaps in the sequences.
I would appreciate some hints on how to get ungapped sequences in fasta format from blast output.
Hi,
I am running a BLASTN of about 150 sequences against a genome that is 2.2 gigabases long. A few of my queries are actually full length BAC end sequences running to around 150,000 bases. I expect to find huge, contiguous hits for some BACs in the genome. Here's the command I use -
blastn 80BACs.fasta -db mygenome -out 80BACsBLAST -outfmt 10 -num_threads 8 -evalue 10e-3 -index_name mygenomeMBI
Around 10 minutes after it starts running, the program halts after producing a segmentation fault. I did a 'ulimit -s unlimited' to set the stack size to unlimited, but to no avail. I also went easy on the number of threads in subsequent trials, setting num_threads to 5 and subsequently, 2 - but that didn't help either.
I am using the binaries from rmblast-1.2-ncbi-blast-2.2.23+. I had earlier run a smaller query dataset against the same genome which worked fine, the BLAST completed in half a day. This issue, I am convinced is most definitely due to some very very long query sequences - I'd highly appreciate any help in this regard!
Thanks,
Srihari
hi everyone..
while aligning two protein sequences, amino acid of the query sequence n amino acid of Database sequence is aligned..if both match a score is given..if there is a substitution of aminoacid for eg ( leucine for Isoleucine).how score is given for this alignement?
or simply any one explain how subsitution matrix compares two sequence and align according to the substitution scoring matrix?
Bioinformatics noob here,
Im working with a firefly database. I have downloaded the blast executables and made a local database with the makeblastdb command. To test the database I have used the blastn command using a luciferase gene as the query, which I know is in the database because I have blasted before in a different program, and I get **No hits found**. I have tried this with several different luciferase sequences and each time I am getting **No hits found**.
Does anyone have any idea as to why this is happening? Do I have to change any search parameters?
Hello everyone,
I have short nucleotide sequences. For further analysis of these sequences i have blasted them with balstn program in ncbi with default parameters against ESTs of the desired organism and i got multiple hits. How can i filter them, what parameter should i consider for filtering. can i go for EST with minimum e-value.....
thanks! jyoti
Is there a way to get a FASTA file containing all the record in a local BLAST database.
I've created a local database using makeblastdb with a fasta file, but was wondering if there is any way to reverse this process?
Thanks!
Hi all,
I am a beginner with Blast+.I am using Windows.My aim as of now is to download the nr protein sequence in Fasta format and then format it using makeblastdb.then extract the first 1000 characters from the nr file as a seperate file (say qa.fasta) and then query it against the whole database.
Now i downloaded the nr database in Fasta format from this link
ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz (are these the original fasta files??)
then i used to makeblastdb command like this
makeblastdb -in nr -dbtype prot -out outnr -> This resulted in the nr file to be split into different parts nr.00 to nr.03.(Is this normal).
Now i need help to extract the first 1000 char from nr file.But how to i open a Fasta file in windows??? How do i proceed??
Hi,
I have a simple question. Why the xml output format in blastn take a lot longer than the tabular output ?
There is a huge difference between the two output formats, using the same data and bank.
Thanks a lot.
Hello!!
I am having an extremely difficult time with formatdb and fastcmd from NCBI blast.
I am trying to make a db from single mouse chromosome from NCBI using formatdb. Afterwards I try to obtain sequence from defined positions of the db. I use the commands:
formatdb -p F -i 12.fa -o T -t C57Bl6_chr12 -n C57Bl6_chr12
fastacmd -d C57Bl6_chr12 -p T -L 1000000,10001000 -o test.txt
[fastacmd] ERROR: ERROR: Cannot initialize readdb for C57Bl6_chr12 database
It is very annoying that the formatdb is not making the readdb for my file, do you know what I can do about it??
Thanks!!
I started to run blast, locally on my machine, on 4 files with 1323, 210, 501, 166 fasta sequences each.
For all jobs except the first one blast returned an error:
blastall(7004) malloc: *** mmap(size=2097152) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
The first job is currently still running as I'm writing this.
The exact blastall command used is as follows:
blastall -p blastn -d /Users/anjan/blastdb/nt -i <input.filename.fasta> -o RPMBBP10.fasta.blast.out -m 8 -b 10000.
The version of Blast is 2.2.17. I'm running this on a Mac server with 2 Quad-core Intel processors with 16GB RAM. Anyone has any ideas on what the source of the problem is?
Hello Biostars,
I'd like to preface this post to say that I am new to R and bioinformatics coding, and that I'd really appreciate some input from this knowledgable community. My goal for the code posted below is to generate pie charts that show amino acid abundance per protein from BLAST results. I uploaded a csv file from UniProt, converted it to a matrix, and wrote out the code below. I keep getting the error: In AAs[i] = table(strsplit(BLAST_AA_seqs[i], "", useBytes = TRUE)) :number of items to replace is not a multiple of replacement length. Column 8 is the output column that contains the amino acid sequences. Thanks in advance!
mydata=read.table("C:/Users/du0/Desktop/Downloads/CDPKbeta_BLAST_results.csv", header=TRUE,sep=",")
mydata=as.matrix(mydata)
AAs=c()
BLAST_AA_seqs=c()
for(i in 1:nrow(mydata)){
print(i)
BLAST_AA_seqs[i]=mydata[i,8]
AAs[i]=table(strsplit(BLAST_AA_seqs[i],"", useBytes=TRUE))
pie(AAs, col=rainbow(length(AAs)), main="Residue abundance")
}
'''from Bio.Blast.Applications import NcbiblastxCommandline
help(NcbiblastxCommandline)'''
from Bio.Blast.Applications import NcbiblastpCommandline
from StringIO import StringIO
from Bio.Blast import NCBIXML
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio import SeqIO
from Bio.Blast import NCBIWWW
import cStringIO
def BlastSeq():
SC_Fasta = open("sc.fsa","r")
HS_Fasta = open("hsap.fsa","r")
blastp = "C:\\Program Files\\NCBI\\blast-2.2.29+\\bin\\blastp"
record1 = list(SeqIO.parse(SC_Fasta,"fasta"))
for r1 in record1:
r1.id
r1.seq
record2 = list(SeqIO.parse(HS_Fasta,"fasta"))
for r2 in record2:
r2.id
r2.seq
for r1 in record1:
for r2 in record2:
output = NcbiblastpCommandline(blastp,query= r1.seq, subject=r2.seq, outfmt=5)()[0]
blast_result_record = NCBIXML.read(StringIO(output))
def main():
BlastSeq()
main()
Error: Bio.Application.ApplicationError: Command 'C:\Program Files\NCBI\blast-2.2.29+\bin\blastp -outfmt 5 -query MVKLTSIAAGVAAIAATASATTTLAQS ...
hello there
please someone tell me, how I can employ the resulted multiple alignments of ClustalW in BLASTn search against the nucleotide (nr/nt) collections??
you Imagine I have this output of ClustalW
CLUSTAL 2.1 multiple sequence alignment q_[1938_1979].1 ACCTTGAAGCAAGAAAGGGGAAGTGGAGACAAAACTGATTAA 42 t_[1512_1553].2 ACCAAAAAAGAAGAAGGAGGAGATGGAGAAAAAAAAGACAAA 42 *** ** ***** * *** ****** **** ** **
now I want to use this in BLASTn search against the nucleotide (nr/nt) collections...
how i can??
Hi everyone,
I have a big problem!
I install blast+ for windows.
I run this :
makeblastdb.exe -in filename.fasta -dbtype nucl -out filename_BLASTdb
apparently it create db coz I have this message:
a -dbtype nucl -out 020_GC.LD_ED_19July_BLASTdb
Building a new DB, current time: 07/20/2011 09:47:50
New DB name: 020_GC.LD_ED_19July_BLASTdb
New DB title: 020_GC.LD_ED_19July.fasta
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1073741824B
Adding sequences from FASTA; added 809 sequences in 0.289397 seconds.
right?
but there is no file created so when i try to blast
blastn.exe -db filename_BLASTdb -word_size 7 -query mature.fasta -out filename.blastsmallRNA -evalue 10
there is an error i can not understand:
Error: (106.18)
NCBI C++ Exception:
Error: (CArgException::eSynopsis) Too many positional arguments (1), the off
ending value: filename.fasta-db
Error: (CArgException::eSynopsis) Application's initialization failed
could you please help
thanks
I am trying to use a pipeline that uses formatdb command of old BLAST. This option is been replaced by makeblastdb in newer versions of blast+ . I could not find archive for blast to download the old version. I am using CentOS 6.5.
Here is the program, if any body is interested to look at the source code:
https://github.com/chiulab/surpi/blob/master/plot_reads_to_gi.sh
I did a local standalone blast pre-miRNAs against a genome in tabular format (-m 8) and got the results. My next steps for refining the results include GC content analysis, RepeatMasker etc. I am currently developing a Perl program to extract the part of the sequence that has matched to any pre-MiRNAs from the Tabular column. My logic include
1) Matching the supercontig name with a specific sequence block name in the genome file.
2) Extracting the matched area in between the sequence match and end points mentioned in the tabular file.
In the tabular output, there is query sequence match start and end point as well as subject sequence start and end point. Which should I be using as a start and end point for sequence extraction? Query sequence or subject sequence?
http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall_node93.html
I have read published papers on miRNA where they mention a sliding window of 70 or 100 nucleotides on either side of the match area. I presume that these researchers extract 70 nucleotides before the start of the match area as well as 70 nucleotides after the end of the match area. Am I right in presuming this and should I be doing the same thing?
Please help