HI I have a file that contains information like this:
Gene1 Gene2 sequence
FOXP1 ABL1 acacctctcaatgcagctttacagCtctacgtctcctccgagagccgctgtggagagttacgtcgaaatgtc
FUS ATF1 TATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGTTGCCATTGCCCCAAATGGAGCCTTACAGTTGGCAAGTCCAGGCACAGATGGAGTACA
FUS CREB3L1 ttgagtctgtggctgattacttcaagcagattgGctctctgcccccctccagccctgtcaggcccatg
FUS CREB3L1 TATGGACAGCAGGACCGTGGAGGCCGCGcCACGGCCATCTCCACCTCCCCACTCCTCACTGCCCCTCACAAA
These genes are known to be involved in the translocation and the sequence contains part of Gene1 and part of Gene2. for example the sequence acacctctcaatgcagctttacagCtctacgtct
belongs to FOXP1
and cctccgagagccgctgtggagagttacgtcgaaatgtc
might belong to ABL1
.
I have thousands of such sequences and i want to exacly locate where in the gene that translocation occurs i,e., the genomic positions where that part of the sequence belongs
gene1 genomic_position translocation_occurs gene2 genomic_position translocation_occurs
FOXP1 chr6:98925342-99435345 chr6:98925380-chr6:98925414 ABL1 chr2:31688556-31804227 chr2:31688756 31688794
How can i get such information i have thousands of such sequences