Comparative Evaluation of Filtering Strategies in Differential Gene Expression Analysis of RNA Sequencing Data
DOI:
https://doi.org/10.56919/usci.2651.008Keywords:
Transcriptomics, Differential Gene Expression, Filtering, RNA-Seq, Differentially expressed genesAbstract
Differential gene expression (DGE) analysis identifies genes expressed at varying levels between conditions, offering valuable insights into affected biological processes. RNA Sequencing (RNA-Seq) DGE analysis usually includes a filtering step to remove genes having low expression from the count data matrix. This study assesses the impact of different filtering strategies on DGE analysis. RNA-Seq read counts of the GSE150706 (n = 72) and TARGET (Therapeutically Applicable Research to Generate Effective Treatments) neuroblastoma (n = 84) datasets were used for analysis. DGE analysis was performed between the Pulled and Close-out groups in GSE150706 and between the MYCN-amplified and non-amplified groups in the TARGET neuroblastoma datasets. The effect of filtering strategies (filterByExpr, count, minimal, and no filtering) was assessed on the count data matrix, the number of low-count genes, the number of differentially expressed genes (DEGs) identified, and enrichment analysis. An adjusted p-value < 0.05 was set as the significance threshold for DGE analysis and enrichment analysis. For the GSE150706 dataset, 222, 288, 289, and 208 DEGs were identified from the filterByExpr, none, minimal, and count filtered matrices, respectively, while for the neuroblastoma dataset, 1662, 2059, 2075, and 1579 DEGs were identified from the filterByExpr, none, minimal, and count filtered matrices, respectively. FilterByExpr and count filtering returned no outliers and low counts at the end of DGE analysis. The filtering strategy also influenced enrichment analysis results. Filtering is an important step in DGE analysis with a significant impact on DGE output and downstream analysis. It is recommended to use filterByExpr or count filtering in DGE analysis of RNA-Seq data.
References
Anders, S., & Huber, W. (2010). Differential expression analysis for sequence count data. Genome Biology, 11, Article R106.
Bray, N., Pimentel, H., Melsted, P., & Pachter, L. (2016). Near optimal probabilistic RNA-seq quantification. Nature Biotechnology, 34, 525–527.
Chen, E., Tan, C., Kou, Y., Duan, Q., Wang, Z., Meirelles, G., Clark, N., & Ma'ayan, A. (2013). Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinformatics, 14, Article 128.
Chen, Y., Lun, A., & Smyth, G. (2016). From reads to genes to pathways: Differential gene expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline [version 2; referees 5; Approved]. F1000Research, 5, 1438.
Conesa, A., Madrigal, P., & Tarazona, S. (2016). A survey of best practices for RNA-seq data analysis. Genome Biology, 17, 1–19.
Costa-Silva, J., Domingues, D., & Lopes, F. M. (2017). RNA-Seq differential expression analysis: An extended review and a software tool. PLoS ONE, 12(12), Article e0190152.
Crow, M., Lim, N., Ballouz, S., Pavlidis, P., & Gillis, J. (2019). Predictability of human differential gene expression. Proceedings of the National Academy of Sciences of the United States of America, 116, 6491–6500.
Dong, Z., & Chen, Y. (2013). Transcriptomics: advances and approaches. Science China Life Sciences, 56, 960–967.
Eshibona, N., Giwa, A., Rossouw, S., Gamieldien, J., Christoffels, A., & Bendou, H. (2022). Upregulation of FHL1, SPNS3, and MPZL2 predicts poor prognosis in pediatric acute myeloid leukemia patients with FLT3-ITD mutation. Leukemia & Lymphoma, 63, 1897–1906.
Giwa, A., & Giwa, R. (2022). A 20-Gene expression diagnostic signature of bovine respiratory disease in cattle. Journal of Scientific Research, 14, 593–599.
Giwa, A., Fatai, A., Gamieldien, J., Christoffels, A., & Bendou, H. (2020). Identification of novel prognostic markers of survival time in high-risk neuroblastoma using gene expression profiles. Oncotarget, 11, 4293–4305.
Hayden, H., Savin, K., Wadeson, J., Gupta, V., & Mele, P. (2018). Comparative metatranscriptomics of wheat rhizosphere microbiomes in disease suppressive and non-suppressive soils for rhizoctonia solani AG8. Frontiers in Microbiology, 9, Article 859.
Ismail, R., Baldwin, R., Fang, J., Browning, D., Karlan, B., Gasson, J., & Chang, D. (2000). Differential gene expression between normal and tumor-derived ovarian epithelial cells. Cancer Research, 60, 6744–6749. https://pubmed.ncbi.nlm.nih.gov/11118061/
Kuleshov, M., Jones, M., & Rouillard, A. (2016). Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Research, 44, W90–W97.
Law, C. W., Chen, Y., Shi, W., & Smyth, G. K. (2014). Voom: Precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology, 15, Article R29.
Law, C., Alhamdoosh, M., Su, S., Smyth, G., & Ritchie, M. (2016). RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR. F1000Research, 5, 1408.
Levin, L., Ekau, W., Gooday, A., Jorissen, F., Middelburg, J., Naqvi, S., Neira, C., Rabalais, N., & Zhang, J. (2009). Effects of natural and human-induced hypoxia on coastal benthos. Biogeosciences, 6, 2063–2098.
Love, M., Anders, S., Kim, V., & Huber, W. (2016). RNA-Seq workflow: gene-level exploratory analysis and differential expression. F1000Research, 4, 1070.
Love, M., Huber, W., & Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15, Article 550.
Love, M., Soneson, C., & Patro, R. (2018). Swimming downstream: statistical analysis of differential transcript usage following Salmon quantification. F1000Research, 7, 952.
Manthey, A., Terrell, A., Lachke, S., Polson, S., & Duncan, M. (2014). Development of novel filtering criteria to analyze RNA-sequencing data obtained from the murine ocular lens during embryogenesis. Genomics Data, 2, 369–374.
Nearing, J., Douglas, G., Hayes, M., MacDonald, J., Desai, D., Allward, N., Jones, C., Wright, R., Dhanani, A., Comeau, A., & Langille, M. (2022). Microbiome differential abundance methods produce different results across 38 datasets. Nature Communications, 13, Article 342.
Niu, S., Yang, J., McDermaid, A., Zhao, J., Kang, Y., & Ma, Q. (2018). Bioinformatics tools for quantitative and functional metagenome and metatranscriptome data analysis in microbes. Briefings in Bioinformatics, 19, 1415–1429.
Patro, R., Duggal, G., Love, M., Irizarry, R., & Kingsford, C. (2017). Salmon provides fast and bias-aware quantification of transcript expression. Nature Methods, 14, 417–419.
Rai, M., Tycksen, E., Sandell, L., & Brophy, R. (2018). Advantages of RNA-seq compared to RNA microarrays for transcriptome profiling of anterior cruciate ligament tears. Journal of Orthopaedic Research, 36, 484–497.
Rao, M., Van Vleet, T., Ciurlionis, R., Buck, W., Mittelstadt, S., Blomme, E., & Liguori, M. (2019). Comparison of RNA-Seq and microarray gene expression platforms for the toxicogenomic evaluation of liver from short-term rat toxicity studies. Frontiers in Genetics, 9, Article 636.
Robinson, M., McCarthy, D., & Smyth, G. (2010). edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics, 26, 139–140.
Saliani, M., Jalal, R., & Javadmanesh, A. (2022). Differential expression analysis of genes and long non-coding RNAs associated with KRAS mutation in colorectal cancer cells. Scientific Reports, 12, Article 7965.
Schurch, N. J., Schofield, P., Gierliński, M., Cole, C., Sherstnev, A., Singh, V., Wrobel, N., Gharbi, K., Simpson, G. G., Owen-Hughes, T., Blaxter, M., & Barton, G. J. (2016). How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use? RNA, 22(6), 839–851.
Sha, Y., Phan, J., & Wang, M. (2015). Effect of low-expression gene filtering on detection of differentially expressed genes in RNA-seq data. Conference Proceedings of the Annual International Conference of the IEEE Engineering Medicine and Biology Society, 2015, 6461–6464.
Soneson, C., & Delorenzi, M. (2013). A comparison of methods for differential expression analysis of RNA-seq data. BMC Bioinformatics, 14, Article 91.
Squair, J. W., Gautier, M., Kathe, C., Anderson, M. A., James, N. D., Hutson, T. H., Hudelle, R., Qaiser, T., Matson, K. J. E., Barraud, Q., Levine, A. J., Manno, G. L., Skinnider, M. A., & Courtine, G. (2021). Confronting false discoveries in single-cell differential expression. Nature Communications, 12, Article 5692.
Stelpflug, S., Sekhon, R., Vaillancourt, B., Hirsch, C., & Buell, C. (2016). An expanded maize gene expression atlas based on RNA sequencing and its use to explore root development. The Plant Genome, 9, 1–16.
Sun, H., Srithayakumar, V., Jiminez, J., Jin, W., Hosseini, A., Raszek, M., Orsel, K., Guan, L., & Plastow, G. (2020). Longitudinal blood transcriptomic analysis to identify molecular regulatory patterns of bovine respiratory disease in beef cattle. Genomics, 112, 3968–3977.
Tello-Ruiz, M., Stein, J., & Wei, S. (2016). Comparative plant genomics and pathway resources. Nucleic Acids Research, 44, D1133–D1140.
van der Kloet, F., Buurmans, J., Jonker, M., Smilde, A., & Westerhuis, J. (2020). Increased comparability between RNA-Seq and microarray data by utilization of gene sets. PLoS Computational Biology, 16, Article e1008295.
Van Verk, M., Hickman, R., Pieterse, C., & Van Wees, S. (2013). RNA-Seq: revelation of the messengers. Trends in Plant Science, 18, 175–179.
Wang, Z., Gerstein, M., & Snyder, M. (2009). RNA-Seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics, 10, 57–63.
Westwood, J. (2018). Using transcriptomics to study behavior. In R. T. Gerlai (Ed.), Molecular-genetic and statistical techniques for behavioral and neural research (pp. 267–288). Academic Press.
Xue, J., Liu, Y., Wan, L., & Zhu, Y. (2020). Comprehensive analysis of differential gene expression to identify common gene signatures in multiple cancers. Medical Science Monitor, 26, Article e919953.
Yang, J., Liu, D., Wang, X., Ji, C., & Cheng, F. (2016). The genome sequence of allopolyploid Brassica juncea and analysis of differential homolog gene expression influencing selection. Nature Genetics, 48(10), 1225–1232.
Zhao, S., Fung-Leung, W., Bittner, A., Ngo, K., & Liu, X. (2014). Comparison of RNA-Seq and microarray in transcriptome profiling of activated T cells. PLoS One, 9, Article e78644.
Zhou, X., Lindsay, H., & Robinson, M. D. (2014). Robustly detecting differential expression in RNA sequencing data using observation weights. Nucleic Acids Research, 42(11), Article e91.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Abdulazeez Giwa, Barakat Oladipupo, Oluwafunmito Ishola, Mubaraq Abdulrahmon, Zainab Abdulrahman-Giwa, Oluwadamilola Ogunmolu (Author)

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
UMYU Scientifica recognizes the importance of protecting authors’ intellectual property while promoting the free exchange of scientific knowledge. The journal adopts a copyright-retention model that empowers authors to maintain ownership of their work while granting the journal rights necessary for publication and dissemination.
1. Copyright Ownership
Authors publishing with UMYU Scientifica retain full copyright and publishing rights to their work. By submitting a manuscript, authors agree to grant the journal a non-exclusive license to publish, reproduce, distribute, and archive the article in all forms and media for the purpose of scholarly communication.
2. Licensing Terms
All articles are published under the Creative Commons Attribution–NonCommercial (CC BY-NC) license.
This license permits others to:
- Share - copy and redistribute the material in any medium or format.
- Adapt - remix, transform, and build upon the material.
- For non-commercial purposes only, provided that proper credit is given to the original author(s) and UMYU Scientifica as the source, a link to the license is provided, and any modifications are clearly indicated.
Commercial reuse or distribution of the content requires written permission from both the author and the editorial office.
3. Author Rights
Authors are free to:
- Deposit all versions of their manuscript (preprint, accepted version, and published version) in institutional, disciplinary, or public repositories without embargo.
- Use and distribute their published article for non-commercial scholarly purposes, including teaching, conference presentations, and research sharing.
- Include their work in future books, theses, or compilations, provided proper citation to the journal is made.
4. Publisher’s Rights
Upon publication, UMYU Scientifica retains the right to:
- Host, index, and disseminate the article through the journal’s website and partner databases.
- Archive the content in long-term preservation systems such as the PKP Preservation Network (PKP-PN) and the Umaru Musa Yar’adua University Institutional Repository.
5. Attribution and Citation
Users must give appropriate credit to the author(s), include a link to the article’s DOI or the journal webpage, and indicate if changes were made. Proper citation is required whenever the work is reused or referenced.
6. License Reference
For detailed terms of use, please refer to the Creative Commons Attribution–NonCommercial 4.0 International License (CC BY-NC 4.0):
https://creativecommons.org/licenses/by-nc/4.0/









