Dear all, I performed blast using ~200 gene sequences from reference organism against a draft genome dataset from related species ( containing 12million short reads(sr), each of which is 88bp). The following examplifies the blast result.
----------length, strand + /-, e-value ommited, some numbers are wrong---------------
sr-ID GeneID Iden. Start(sr) End(sr) Start(gene) End(gene)
Sxxx001 Gene1 100 1 36 2302 2334
Sxxx002 Gene1 98 1 75 313 348
Sxxx004 Gene1 100 3 43 481 519
Sxxx001 Gene2 100 8 78 2140 2172
Sxxx006 Gene2 97 2 88 280 312
Sxxx007 Gene3 100 1 56 862 897
Sxxx008 Gene3 100 6 78 2020 2055
Sxxx009 Gene3 100 5 77 3934 3972
I can get each short read sequence ranging from start(sr) to end (sr) . However, next step is troublsome; I need to assemble short reads to longer sequence according to the corresponding gene hit
. e.g., need to assemble Sxxx001, Sxxx002, Sxxx004 according to Gene1; and assemble Sxxx001 and Sxxx006 according to Gene2, etc.
This analysis aims to identify the homologous genes from the draft genome data. Could anyone help to describe ways to assembling sequences according to blast result?
THANK YOU in advance!!