Quantcast
Viewing all articles
Browse latest Browse all 41826

How to use the RECON package to identify repeat families from genomic sequences

I'm trying to learn RECON and am experimenting using chr22. My steps so far, roughly:

  1. Make blast database from chr22.fa
  2. BLAST chr22.fa against its own database
  3. Run MSPCollect.pl (RECON provided script) to create an MSP file
  4. Run recon.pl on the MSP file and a list of sequence IDs

    However, blasting a sequence against its own database takes a prohibitively long time or results in 100% self hits. If I remove the self hits, I'm left with a bunch of alignments that RECON is then happy to work with, but I had to write my own Python script to filter those out.

    By now this is all feeling very convoluted, so my question is: Am I even close to doing the BLAST part correctly? The RECON home page says to avoid self hits for performance sake, but I haven't been able to discover how. Any other glaring mistakes?


Viewing all articles
Browse latest Browse all 41826

Trending Articles