Quantcast
Channel: Post Feed
Viewing all 41826 articles
Browse latest View live

Blast output query

$
0
0

I did a local standalone blast pre-miRNAs against a genome in tabular format (-m 8) and got the results. My next steps for refining the results include GC content analysis, RepeatMasker etc. I am currently developing a Perl program to extract the part of the sequence that has matched to any pre-MiRNAs from the Tabular column. My logic include

1) Matching the supercontig name with a specific sequence block name in the genome file.

2) Extracting the matched area in between the sequence match and end points mentioned in the tabular file.

In the tabular output, there is query sequence match start and end point as well as subject sequence start and end point. Which should I be using as a start and end point for sequence extraction? Query sequence or subject sequence?

http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall_node93.html

I have read published papers on miRNA where they mention a sliding window of 70 or 100 nucleotides on either side of the match area. I presume that these researchers extract 70 nucleotides before the start of the match area as well as 70 nucleotides after the end of the match area. Am I right in presuming this and should I be doing the same thing?

Please help


BLAST options: max_target_seqs and num_alignments

$
0
0

Hi everyone,

Could anyone explain in a little bit more details what's the difference for the options max_target_seqs  and num_alignments.  If I only want the top hit, how to do in BLAST setting?

Thank you very much!

Problems with BLAST db, maybe install issue?

$
0
0
Hey all, I'm having a bit of trouble with BLAST.  I've run the same blast commands [literally] thousands of times.  I recently had a bad drive and reformatted my ubuntu server, installing 14 server version (where previously I'd had 12).  It doesn't seem to matter if it is blastp or blastn I'm running I still get a similar error message (below) or what files I try on. I've made the db with makeblastdb for all of the files in question before running blast, so those db files do exist.  But I still get an error like the following.  The blastn message tells a bit more, though I'm not sure what to do about either message since the db files do exist.  My best guess is an incompatibility with the Ubuntu 14?  Anyone else had this issue?  and fixed it?  Or know what the problem is? BLASTP: BLAST Database error: No alias or index file found for protein database [filename.faa.db] in search path [/home/username/pathtofiles/::] BLASTN: BLAST Database error: No alias or index file found for nucleotide database [filename.fna.db] in search path [(unreachable)/path/startingafterhome/::] BLAST Database error: CSeqDBAtlas::MapMmap: While mapping file [/home/username/path/startingafterhome/filename.fna.db.nin] with 0 bytes allocated, caught exception: NCBI C++ Exception:     T0 "/home/coremake/release_build/build/PrepareRelease_Linux64-Centos_JSID_01_80348_130.14.18.6_9008__PrepareRelease_Linux64-Centos_1433254587/c++/compilers/unix/../../src/corelib/ncbifile.cpp", line 5416: Error: ncbi:: ...

each protein with each protein

$
0
0

Dear colleagues,

I have a lot of proteins. What is the

simplest way to compare each one

of them with all the other proteins?

Probably I have to make a database

and use a program like BlastP.

But how to make sure that I've checked

all the protein pairs I have?

I feel that I have to use "for" in Linux,

but I've failed to finish the cycle expression.

Please. help me!

Many thanks!

N.

How to get subset from NCBI databases using Accession/GI

$
0
0

Is there a (nice) way to extract entries from NCBI databases using accession/GI?

Or the Identifier I get when I do a local blastn, like "gi|46392154|gb|AY580535.1|"

So I basically want to get a subset from a NCBI db using a table of identifiers...

How to makeblastdb Uniprot's Taxonomic Divisions?

$
0
0

Hi,

I'm interested in creating a Blast database (makeblastdb) from Uniprot's Bacteria division.

I had to turn to the database release on FTP due to the extremely slow download speed of the website's query results.

 

So I went to Uniprot's downloads page:

http://www.uniprot.org/downloads

Then clicked 'Taxonomic divisions' (or in my case the FTP mirror that is closer to my country):

ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/taxonomic_divisions/

And downloaded these two files:

uniprot_sprot_bacteria.dat.gz

uniprot_trembl_bacteria.dat.gz

 

Then when I unzipped, I realized these were not Fasta files.

I can't figure out what is their format or how to create a blast database from it.

 

Any help would be greatly appreciated.

Blast in metagenomics: should I download all NR database for this purpose?

$
0
0
Very basic question. I want to do a taxonomic analysis of an Iontorrent metagenomics data using MEGAN. I know that firstly I must blast the reads to a database (usually NR database from Genbank). I presume that this procedure should be done locally in order to process faster the data. So, I need to download all the NR database from Genbank. Is it correct?

how to make BLAST output maximum length of alignment

$
0
0
Hi everyone I have very simple question about how to operate blast. so I have blast a sequences against it's reference genome, and I got two inversion at certain region shown in dot plot: ![enter image description here][1] I am wondering, is it able to ask BLAST to come results that shows the break points when align the sequences? In this case: I want the output to be strand plus/plus Query 1----------- 900,000 Subject 1 --------------900,000 strand plus/minus Query 900,001----------- 1,250,000 Subject 900,001 --------------1,250,000 strand plus/plus Query 1,250,001----------- 3,400,000 Subject 1,250,001 --------------3,400,000 strand plus/minus Query 3,400,0001----------- 4,200,000 Subject 3,400,0001 --------------4,200,000 strand plus/plus Query 4,200,000----------- 4,411,532 Subject 3,400,0001 --------------4,411,532 I don't know if this is possible with local alignment but what I am envisioning if it can create dot plot like this it should be able to do it. So it is still local alignment but make the length to max til where the break point is. I don't care about sequence identity too much because I want to know where the break point happens in the query sequences. I prefer command line input if it is available. Thank you [1]: https://image.ibb.co/dmrAPQ/hit_matrix.png

local blast: makeblastdb: Abort trap 6

$
0
0
I am running a standalone Blast 2.6.0 on my Mac OS 10.12.2 with 1.6 Ghz intel core i5 8 GB 1600 MHz DDR3. Everytime when I want to build a blast database with the makeblastdb command, I get the following error: Building a new DB, current time: 01/24/2017 09:13:26 New DB name: /Users/***/bin/blast/blastdb/Cat2.fasta New DB title: Cat2.fasta Sequence type: Nucleotide Keep MBits: T Maximum file size: 1000000000B Abort trap: 6 What can I do? My file is 3.4Mb big. I appreciate your help!

Base by base alignment from BLAST to reference

$
0
0
Hi All, I have aligned a 5k sequence to a reference sequence longer than 5k. The Blast alignment report shows alignment not from the start position of the reference but from bases from where alignment starts. I want an alignment report where reference which does not match to the consensus is also mentioned in the report. For example (Desired Output); Query ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ AGCTTGGCCC ATTGCATACG TTGTATCCAT Reference AAACGAGCTC TGCTTTTATA GGCGCCCACC AGCTTGGCCC ATTGCATACG TTGTATCCAT Query ~~~~~~~~~~ ~~~~~~~~~ Reference AATTAAGCGC GCAACGATG Blast only reports bases which are aligned. Example: Query AGCTTGGCCC ATTGCATACG TTGTATCCAT Reference AGCTTGGCCC ATTGCATACG TTGTATCCAT I can only use **BLAST** to do this. How can I obtain such alignment report using Blast? Thanks in advance!!!

OrthoFinder running time

$
0
0
Hello, I am running the OrthoFinder software for 34 species, average 20-30,000 proteins, except 4 of them which have ~60,000 genes. The BLAST all-v-all took 11 days to finish, and now it's on the Running OrthoFinder algorithm step. However I have no clue how much it will last. Anyone that has run OrthoFinder with such big data so that I can have an estimate? I run it on 7 nodes, each node has 128gb RAM and 20 cores.

PSI-BLAST out put is empty

$
0
0
Hi guys, I'm new in bio and I'm trying to make pssm matrix with psi-blast. This is the code I use: blastdb$ ../ncbi-blast-2.9.0+/bin/psiblast -query /home/sepehr/blastdb/Butyrylation/Butyrylation/P02294.seq -db nr -out $P02294.out -num_iterations 3 -out_ascii_pssm $Pxxx.pssm -inclusion_ethresh 0.001 -num_threads 4; I get this warning: Warning: [psiblast] Query_1 P02294: Composition-based score adjustment conditioned on sequence properties and unconditional composition-based score adjustment is not supported with PSSMs, resetting to default value of standard composition-based statistics I have changed the mode for all blast apps like psi-blast, blastp ... every thing in bin folder to 777 but app's result just create .out file that is empty and no .pssm matrix can any one tell me how to fix it? I really appreciate any help...

reciprocal BLAST meaning

$
0
0
I am Anuj, a PhD student, from the University of Oklahoma. I was trying to replicate your PNAS paper titled : " Major diversification of voltage-gated K+ channels occurred in ancestral parahoxozoans" I wanted to ask a couple of questions about it and maybe you cold point me to resources or other people. My question is about the first paragraph in the methods sections which is Cnidarian and Mnemiopsis Shaker and KCNQ family K+ channel genes described in this study were identified and compiled through comprehensive BLAST (46) searches of genome drafts, transcriptomes, and gene predictions of Mnemiopsis leidyi (41) andNematostella vectensis (73), and for KCNQ only, Hydra magnipapillata (66), and Acropora digitifera (74) andOrbicella faveolata. Multiple bilaterian members of each channel type were used as query sequences, and reciprocal BLAST searches of identified sequences against bilaterian databases were used to classify the sequences before phylogenetic analysis. Most queries identified all voltage-gated K+, Na+, and Ca2+channels, but reciprocal searches sorted target sequences by gene family and were used to refine gene predictions when necessary. I have found the wording for this paragraph really confusing. Was protein prediction first performed on the Mnemiopsis ,Nematostella, Hydra, Acropora, and Orbicella transcriptomes and then the Shaker, KCNQ genes used as a query ? Or Was the identified proteins from the transcriptomes used as a query for the Shaker and KCNQ gen ...

Trying To Extract Blast Results Into Xml Outfile From Larger Blast Xml File

$
0
0
This is probably a fairly basic question, so I apologize in advance, but I can't seem to figure out how to output xml format using Biopython. Basically, I have a fairly large BLAST results file in xml format and I'm trying to extract a portion of that file using a list of specific queries I am interest in. I can find the queries in the larger file, but I can't seem to output them into xml format. Here is the script I am currently using: #!/usr/bin/env python import sys import os import sets import Bio from sets import Set from Bio.Blast import NCBIXML # Usage. if len(sys.argv) < 2: print "" print "This program extracts blast results from an xml file given a list of query sequences" print "Usage: %s -list file1 -xml file2 -out file3" print "-list: list of sequence names" print "-xml: fasta file" print "-out: outfile name" print "" sys.exit() # Parse args. for i in range(len(sys.argv)): if sys.argv[i] == "-list": infile1 = sys.argv[i+1] elif sys.argv[i] == "-xml": infile2 = sys.argv[i+1] elif sys.argv[i] == "-out": outfile = sys.argv[i+1] fls = [infile1,infile2,outfile] results_handle = open(fls[1], "r") fin1 = open(fls[0],"r") save_file = open(fls[2], "w") geneContigs = Set([]) results_list = list() blast_records = NCBIXML.parse(results_handle) for line in fin1: temp=line.lstrip('>').split() geneContigs.add(temp[0]) f ...

Blast Settings For Short Sequences

$
0
0
I'm searching for short sequences in nt. By short, I mean 10-20 bases. When I run blastn, I get no results, regardless of my -evalue settings. Here's the test sequence: >ponzr CGCGGTAAAACACATTTG And I run BLAST as follows: ./blastn -db nt -remote -query test2.seq -task "megablast" -out test2.out With the default evalue settings, I should get a lot of hits, but I get none. I've verified this on the NCBI web service but it informs me that adjustments have been made to the parameters to accommodate my short search. I want to make these adjustment from the command line, but I'm not sure where to start. Here's my output: BLASTN 2.2.26+ Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller (2000), "A greedy algorithm for aligning DNA sequences", J Comput Biol 2000; 7(1-2):203-14. Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS,environmental samples or phase 0, 1 or 2 HTGS sequences) 16,169,102 sequences; 41,381,280,968 total letters Query= ponzr1 Length=18 RID: XU4WV327016 ***** No hits found ***** Lambda K H 1.33 0.621 1.12 Gapped Lambda K H 1.28 0.460 0.850 Effective search space used: 123416233314 Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS,environmental samples or phase 0, 1 or 2 HTGS sequences) Posted date: Jun 15, 2012 10:12 AM Number of letters in database: 41,381,280,968 ...

Mpiblast Vs Blast+

$
0
0

How up-to-date is mpiBlast compared to blast+ from ncbi? I am not sure what the relation ship between mpiblast and blast+ is, is mpiBlast derived from blast+. blast+ seems to be multi-threaded, but I am really interested in blast running on a Linux cluster. Any ideas what the best way to go about this is?

Thanks a lot in advance.

Ncbi Web-Blast With Huge Subject Sequence Provided By The User In Fasta File

$
0
0

hello everyone,

I tried to blast my query sequences with my subject sequences in fasta file...i uploaded both my query and subject sequences in fasta file to the ncbi blast page, but it didnot respond..I used 64 bit computer with 16GB RAM....I couldnot figure out the problem..Can anyone suggest me something....

Thanks!

Concatenating fasta files for BLAST

$
0
0

I have two fasta files from WormBase.  One is the coding transcripts, the other is ncRNA.  When I BLAST (using NCBI Blast+) 44k 60-mers against just the transcripts, I get 37,216 hits.  When I concatenate the  ncRNA fasta file to the transcripts file (and do makeblastdb again) I get 33,572.  I know that several of these 60-mers hit both the transcripts and the ncRNA so I need to BLAST them together.   How do I do that without losing ~4k hits?

Query empty when performing psiblast

$
0
0
When I ran standalone PSI-BLAST, it gave me the following message: Warning:[psiblast] Query is empty!

compare two fasta files

$
0
0
I have two large files with fasta sequences (~3000 each). I can't quite get things to work by blasting two sequences : https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastn&PAGE_TYPE=BlastSearch&LINK_LOC=blasthome I was wondering if anyone else had solved a similar problem. I'm very new to BI and have very limited programming skills so any advice would be awesome. Thanks.
Viewing all 41826 articles
Browse latest View live