Tabix your bed file to parse your bed quickly

It is the first time that I realize that the bed, gtf, gffand vcf format file can be indexed to improve their parsing efficiency. The story is come from the user experience for strelka2. The strelka2 was publihsed recently and i try to use it for claimed improvement on both accuracy and performance. I used the strelka1 before and found the false positive pos by such software(see this issue ). Although the problem is not fixed yet, I still want to have a try on my current data. Hope it will be fixed soon.

install Tabix

The detailed description of tabix can be found here.

To install, i empoyed conda package management system. Simplely by this command

    sudo conda install -i bioconda tabix 

Sort your bed

Tabix required your bed sorted, which can be fulfiled by this

    sort -k 1,1 -k 2,2n -k 3,3n your.bed | bgzip -c > your.bed.gz

tabix it

Index your bed

    tabix -pbed your.bed.gz 

then you can find a your.bed.gz.tbi file in your currenty folder.

run strelka2 for somantic variants calling

        # read args

        # set path 

        # creat directories 
        if [ ! -d ${tumor}_strelka  ];then
          mkdir ${tumor}_strelka
          rm -rf ${tumor}_strelka
          echo ${tumor}_strelka exist
          echo overwrite anyway
          mkdir ${tumor}_strelka

        # run configuration 

        ${STRELKA_INSTALL_PATH}/bin/ \
            --normalBam $normalbam \
            --tumorBam $tumorbam \
            --ref $genome \
            --callRegions $targetBed \
            --runDir ${tumor}_strelka

        ${tumor}_strelka/ -m local -j 20

