Comparative Evaluation of Filtering Strategies in Differential Gene Expression Analysis of RNA Sequencing Data

Authors

  • Abdulazeez Giwa Department of Zoology and Environmental Biology, Lagos State University, Lagos, Nigeria Author
  • Barakat Oladipupo Department of Zoology and Environmental Biology, Lagos State University, Lagos, Nigeria Author
  • Oluwafunmito Ishola Department of Zoology and Environmental Biology, Lagos State University, Lagos, Nigeria Author
  • Mubaraq Abdulrahmon Department of Zoology and Environmental Biology, Lagos State University, Lagos, Nigeria Author
  • Zainab Abdulrahman-Giwa Department of Zoology and Environmental Biology, Lagos State University, Lagos, Nigeria Author
  • Oluwadamilola Ogunmolu Department of Zoology and Environmental Biology, Lagos State University, Lagos, Nigeria Author

DOI:

https://doi.org/10.56919/usci.2651.008

Keywords:

Transcriptomics, Differential Gene Expression, Filtering, RNA-Seq, Differentially expressed genes

Abstract

Differential gene expression (DGE) analysis identifies genes expressed at varying levels between conditions, offering valuable insights into affected biological processes.  RNA Sequencing (RNA-Seq) DGE analysis usually includes a filtering step to remove genes having low expression from the count data matrix.  This study assesses the impact of different filtering strategies on DGE analysis.  RNA-Seq read counts of the GSE150706 (n = 72) and TARGET (Therapeutically Applicable Research to Generate Effective Treatments) neuroblastoma (n = 84) datasets were used for analysis.  DGE analysis was performed between the Pulled and Close-out groups in GSE150706 and between the MYCN-amplified and non-amplified groups in the TARGET neuroblastoma datasets.  The effect of filtering strategies (filterByExpr, count, minimal, and no filtering) was assessed on the count data matrix, the number of low-count genes, the number of differentially expressed genes (DEGs) identified, and enrichment analysis.  An adjusted p-value < 0.05 was set as the significance threshold for DGE analysis and enrichment analysis.  For the GSE150706 dataset, 222, 288, 289, and 208 DEGs were identified from the filterByExpr, none, minimal, and count filtered matrices, respectively, while for the neuroblastoma dataset, 1662, 2059, 2075, and 1579 DEGs were identified from the filterByExpr, none, minimal, and count filtered matrices, respectively.  FilterByExpr and count filtering returned no outliers and low counts at the end of DGE analysis.  The filtering strategy also influenced enrichment analysis results.  Filtering is an important step in DGE analysis with a significant impact on DGE output and downstream analysis.  It is recommended to use filterByExpr or count filtering in DGE analysis of RNA-Seq data.

References

Anders, S., & Huber, W. (2010). Differential expression analysis for sequence count data. Genome Biology, 11, Article R106.

Bray, N., Pimentel, H., Melsted, P., & Pachter, L. (2016). Near optimal probabilistic RNA-seq quantification. Nature Biotechnology, 34, 525–527.

Chen, E., Tan, C., Kou, Y., Duan, Q., Wang, Z., Meirelles, G., Clark, N., & Ma'ayan, A. (2013). Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinformatics, 14, Article 128.

Chen, Y., Lun, A., & Smyth, G. (2016). From reads to genes to pathways: Differential gene expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline [version 2; referees 5; Approved]. F1000Research, 5, 1438.

Conesa, A., Madrigal, P., & Tarazona, S. (2016). A survey of best practices for RNA-seq data analysis. Genome Biology, 17, 1–19.

Costa-Silva, J., Domingues, D., & Lopes, F. M. (2017). RNA-Seq differential expression analysis: An extended review and a software tool. PLoS ONE, 12(12), Article e0190152.

Crow, M., Lim, N., Ballouz, S., Pavlidis, P., & Gillis, J. (2019). Predictability of human differential gene expression. Proceedings of the National Academy of Sciences of the United States of America, 116, 6491–6500.

Dong, Z., & Chen, Y. (2013). Transcriptomics: advances and approaches. Science China Life Sciences, 56, 960–967.

Eshibona, N., Giwa, A., Rossouw, S., Gamieldien, J., Christoffels, A., & Bendou, H. (2022). Upregulation of FHL1, SPNS3, and MPZL2 predicts poor prognosis in pediatric acute myeloid leukemia patients with FLT3-ITD mutation. Leukemia & Lymphoma, 63, 1897–1906.

Giwa, A., & Giwa, R. (2022). A 20-Gene expression diagnostic signature of bovine respiratory disease in cattle. Journal of Scientific Research, 14, 593–599.

Giwa, A., Fatai, A., Gamieldien, J., Christoffels, A., & Bendou, H. (2020). Identification of novel prognostic markers of survival time in high-risk neuroblastoma using gene expression profiles. Oncotarget, 11, 4293–4305.

Hayden, H., Savin, K., Wadeson, J., Gupta, V., & Mele, P. (2018). Comparative metatranscriptomics of wheat rhizosphere microbiomes in disease suppressive and non-suppressive soils for rhizoctonia solani AG8. Frontiers in Microbiology, 9, Article 859.

Ismail, R., Baldwin, R., Fang, J., Browning, D., Karlan, B., Gasson, J., & Chang, D. (2000). Differential gene expression between normal and tumor-derived ovarian epithelial cells. Cancer Research, 60, 6744–6749. https://pubmed.ncbi.nlm.nih.gov/11118061/

Kuleshov, M., Jones, M., & Rouillard, A. (2016). Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Research, 44, W90–W97.

Law, C. W., Chen, Y., Shi, W., & Smyth, G. K. (2014). Voom: Precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology, 15, Article R29.

Law, C., Alhamdoosh, M., Su, S., Smyth, G., & Ritchie, M. (2016). RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR. F1000Research, 5, 1408.

Levin, L., Ekau, W., Gooday, A., Jorissen, F., Middelburg, J., Naqvi, S., Neira, C., Rabalais, N., & Zhang, J. (2009). Effects of natural and human-induced hypoxia on coastal benthos. Biogeosciences, 6, 2063–2098.

Love, M., Anders, S., Kim, V., & Huber, W. (2016). RNA-Seq workflow: gene-level exploratory analysis and differential expression. F1000Research, 4, 1070.

Love, M., Huber, W., & Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15, Article 550.

Love, M., Soneson, C., & Patro, R. (2018). Swimming downstream: statistical analysis of differential transcript usage following Salmon quantification. F1000Research, 7, 952.

Manthey, A., Terrell, A., Lachke, S., Polson, S., & Duncan, M. (2014). Development of novel filtering criteria to analyze RNA-sequencing data obtained from the murine ocular lens during embryogenesis. Genomics Data, 2, 369–374.

Nearing, J., Douglas, G., Hayes, M., MacDonald, J., Desai, D., Allward, N., Jones, C., Wright, R., Dhanani, A., Comeau, A., & Langille, M. (2022). Microbiome differential abundance methods produce different results across 38 datasets. Nature Communications, 13, Article 342.

Niu, S., Yang, J., McDermaid, A., Zhao, J., Kang, Y., & Ma, Q. (2018). Bioinformatics tools for quantitative and functional metagenome and metatranscriptome data analysis in microbes. Briefings in Bioinformatics, 19, 1415–1429.

Patro, R., Duggal, G., Love, M., Irizarry, R., & Kingsford, C. (2017). Salmon provides fast and bias-aware quantification of transcript expression. Nature Methods, 14, 417–419.

Rai, M., Tycksen, E., Sandell, L., & Brophy, R. (2018). Advantages of RNA-seq compared to RNA microarrays for transcriptome profiling of anterior cruciate ligament tears. Journal of Orthopaedic Research, 36, 484–497.

Rao, M., Van Vleet, T., Ciurlionis, R., Buck, W., Mittelstadt, S., Blomme, E., & Liguori, M. (2019). Comparison of RNA-Seq and microarray gene expression platforms for the toxicogenomic evaluation of liver from short-term rat toxicity studies. Frontiers in Genetics, 9, Article 636.

Robinson, M., McCarthy, D., & Smyth, G. (2010). edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics, 26, 139–140.

Saliani, M., Jalal, R., & Javadmanesh, A. (2022). Differential expression analysis of genes and long non-coding RNAs associated with KRAS mutation in colorectal cancer cells. Scientific Reports, 12, Article 7965.

Schurch, N. J., Schofield, P., Gierliński, M., Cole, C., Sherstnev, A., Singh, V., Wrobel, N., Gharbi, K., Simpson, G. G., Owen-Hughes, T., Blaxter, M., & Barton, G. J. (2016). How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use? RNA, 22(6), 839–851.

Sha, Y., Phan, J., & Wang, M. (2015). Effect of low-expression gene filtering on detection of differentially expressed genes in RNA-seq data. Conference Proceedings of the Annual International Conference of the IEEE Engineering Medicine and Biology Society, 2015, 6461–6464.

Soneson, C., & Delorenzi, M. (2013). A comparison of methods for differential expression analysis of RNA-seq data. BMC Bioinformatics, 14, Article 91.

Squair, J. W., Gautier, M., Kathe, C., Anderson, M. A., James, N. D., Hutson, T. H., Hudelle, R., Qaiser, T., Matson, K. J. E., Barraud, Q., Levine, A. J., Manno, G. L., Skinnider, M. A., & Courtine, G. (2021). Confronting false discoveries in single-cell differential expression. Nature Communications, 12, Article 5692.

Stelpflug, S., Sekhon, R., Vaillancourt, B., Hirsch, C., & Buell, C. (2016). An expanded maize gene expression atlas based on RNA sequencing and its use to explore root development. The Plant Genome, 9, 1–16.

Sun, H., Srithayakumar, V., Jiminez, J., Jin, W., Hosseini, A., Raszek, M., Orsel, K., Guan, L., & Plastow, G. (2020). Longitudinal blood transcriptomic analysis to identify molecular regulatory patterns of bovine respiratory disease in beef cattle. Genomics, 112, 3968–3977.

Tello-Ruiz, M., Stein, J., & Wei, S. (2016). Comparative plant genomics and pathway resources. Nucleic Acids Research, 44, D1133–D1140.

van der Kloet, F., Buurmans, J., Jonker, M., Smilde, A., & Westerhuis, J. (2020). Increased comparability between RNA-Seq and microarray data by utilization of gene sets. PLoS Computational Biology, 16, Article e1008295.

Van Verk, M., Hickman, R., Pieterse, C., & Van Wees, S. (2013). RNA-Seq: revelation of the messengers. Trends in Plant Science, 18, 175–179.

Wang, Z., Gerstein, M., & Snyder, M. (2009). RNA-Seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics, 10, 57–63.

Westwood, J. (2018). Using transcriptomics to study behavior. In R. T. Gerlai (Ed.), Molecular-genetic and statistical techniques for behavioral and neural research (pp. 267–288). Academic Press.

Xue, J., Liu, Y., Wan, L., & Zhu, Y. (2020). Comprehensive analysis of differential gene expression to identify common gene signatures in multiple cancers. Medical Science Monitor, 26, Article e919953.

Yang, J., Liu, D., Wang, X., Ji, C., & Cheng, F. (2016). The genome sequence of allopolyploid Brassica juncea and analysis of differential homolog gene expression influencing selection. Nature Genetics, 48(10), 1225–1232.

Zhao, S., Fung-Leung, W., Bittner, A., Ngo, K., & Liu, X. (2014). Comparison of RNA-Seq and microarray in transcriptome profiling of activated T cells. PLoS One, 9, Article e78644.

Zhou, X., Lindsay, H., & Robinson, M. D. (2014). Robustly detecting differential expression in RNA sequencing data using observation weights. Nucleic Acids Research, 42(11), Article e91.

Published

2026-03-30

Issue

Section

Articles

How to Cite

Giwa, A., Oladipupo, B., Ishola, O., Abdulrahmon, M., Abdulrahman-Giwa, Z., & Ogunmolu, O. (2026). Comparative Evaluation of Filtering Strategies in Differential Gene Expression Analysis of RNA Sequencing Data. UMYU Scientifica, 5(1), 90-100. https://doi.org/10.56919/usci.2651.008

Similar Articles

51-60 of 128

You may also start an advanced similarity search for this article.