A Nextflow-based pipeline for comprehensive analyses of long non-coding RNAs from RNA-seq datasets🚀

September 2016 – September 2018
Guangzhou, GuangDong, China

Postdoc Fellow

Sun Yat-sen University

working as a postdoc of cancer research

Selected Publications

High-throughput sequencing technology is rapidly becoming the standard method for measuring gene expression at the transcriptional level. One of the main goals of such work is to identify differentially expressed genes under two or more conditions. A number of computational tools , such as DESeq (Anders and Huber 2010) (updated as DESeq2 (Love, Huber et al. 2014)), edgeR (Robinson, McCarthy et al. 2010, Zhou, Lindsay et al. 2014), NOISeq (Tarazona, García-Alcalde et al. 2011), PoissonSeq (Li, Witten et al. 2011), and SAMseq (samr) (Li and Tibshirani 2013) and Cuffdiff (Trapnell, et al., 2013) have been developed for the analysis of differential gene expression from patterns in RNA-seq data. Most of these tools are implemented in R language, which is commonly used for the analysis of high-dimensional expression data. However, a fairly high level of programing skill is required when applying these R tools to screen out differentially expressed genes, greatly hindering the application of these tools since many biology researchers have little programing experience. Beyond this problem, due to a lack of an interactive interface in these tools, it is inconvenient to adjust the analytical parameters, even for advanced users. Moreover, since different packages generate inconsistent results, an interactive platform that combines these tools together is necessary for obtaining more solid analysis results.To address the above issues, here we introduce the Interactive Differential Expression Analyzer (IDEA), a Shiny-based web application dedicated to the identification of differential expression genes in an interactive way. IDEA was built as a user-friendly and highly interactive utility using the Shiny (RStudio Inc. 2014) package in R. Currently, five relevant R packages are integrated into IDEA. IDEA is capable of visualizing the results with plenty of charts and tables, as well as providing great ease of interaction during the course of the analysis.
bioRxiv, 2018

Recently, long noncoding RNA molecules (lncRNA) captured widespread attentions for their critical roles in diverse biological process and important implications in variety of human diseases and cancers. Identification and profiling of lncRNAs is a fundamental step to advance our knowledge on their function and regulatory mechanisms. However, RNA sequencing based lncRNA discovery is currently limited due to complicated operations and implementation of the tools involved. Therefore, we present a one-stop multi-tool integrated pipeline called LncPipe focused on characterizing lncRNAs from raw transcriptome sequencing data. The pipeline was developed based on a popular workflow framework Nextflow, composed of four core procedures including reads alignment, assembly, identification and quantification. It contains various unique features such as well-designed lncRNAs annotation strategy, optimized calculating efficiency, diversified classification and interactive analysis report. LncPipe allows users additional control in interuppting the pipeline, resetting parameters from command line, modifying main script directly and resume analysis from previous checkpoint.
Journal of genetics and genomics, 2018

Small cell carcinoma of the oesophagus (SCCE) is a deadly malignancy, while its genetic characteristic remains unknown and treatment is commonly adopted from therapies for its histologically identical counterpart, small cell lung cancer (SCLC). We performed whole-exome sequencing to examine the genetic landscape of 55 patients with SCCE. We identified three novel significantly mutated genes (PDE3A, PTPRM and CBLN2) and mutations in other five well-known tumour-associated genes (TP53, RB1, NOTCH1, FAT1 and FBXW7). Notably, activation of Wnt signaling pathway including DVL3 amplification were ubiquitously observed. Furthermore, comparison analysis revealed that SCCE tumours were more closely related to oesophageal squamous cell carcinoma, other than esophagus adenocarcinoma and SCLC, suggesting that current therapies for SCCE might be inappropriate. Functional characterizations suggested that PDE3A acts as a novel onocogene. Moreover, pathway enrichment analysis revealed that genes involved in cell cycle, p53, Notch and Wnt signaling were frequently altered in SCCE. Our findings provide insights for better understanding of SCCE pathogenesis and better treatment strategies for patients with SCCE.
Cell research, 2018

Small ubiquitin-like modifiers (SUMOs) regulate a variety of cellular processes through two distinct mechanisms, including covalent sumoylation and non-covalent SUMO interaction. The complexity of SUMO regulations has greatly hampered the large-scale identification of SUMO substrates or interaction partners on a proteome-wide level. In this work, we developed a new tool called GPS-SUMO for the prediction of both sumoylation sites and SUMO-interaction motifs (SIMs) in proteins. To obtain an accurate performance, a new generation group-based prediction system (GPS) algorithm integrated with Particle Swarm Optimization approach was applied. By critical evaluation and comparison, GPS-SUMO was demonstrated to be substantially superior against other existing tools and methods. With the help of GPS-SUMO, it is now possible to further investigate the relationship between sumoylation and SUMO interaction processes. A web service of GPS-SUMO was implemented in PHP + JavaScript and freely available at http://sumosp.biocuckoo.org.
Nucleic acids research, 2014

