Quantcast
Viewing all articles
Browse latest Browse all 41826

[Help]: Extracting Sequence From Fasta Using Blast Informations?

Hi there! This is my first post on your forum and I hope you'll be able to help me. This is my first intership in bioinformatics and I'm kind of learning everything by myself. I learnt how to use python 2 days ago. I'm actually working with a large dataset (200Go) of metagenomics sequences. My boss asked me to blast them against the 16SMicrobial NCBI database and then "copy/paste" in a new file the sequences with hits, assuming they are 16S related. Easier to say than to do when you blast about 413000 sequences. So I decided to try to make a script with Python that would do it. I also installed Biopython because it seemed to be a very good tool. My first idea is to use a script i found on the Biopython website. this script is used to retrieve sequences with no blast hits. This would at least help to separate sequences i don't from sequences i need. from Bio import SeqIO from Bio.Blast import NCBIXML q_dict = SeqIO.to_dict(SeqIO.parse(open("queries.fasta"), "fasta")) hits = [] for record in NCBIXML.parse(open("BLAST_RESULTS.xml")): # As of BLAST 2.2.19 the xml output for multiple-query blast searches # skips queries with no hits so we could just append the ID of every blast # record to our 'hit list'. It's possible that the NCBI will change this # behaviour in the future so let's make the appending conditional on there # being some hits (ie, a non-empty alignments list) recorded in the blast record if record.alignments: # The blast record's ...

Viewing all articles
Browse latest Browse all 41826

Trending Articles