This article introduces Iliad, a suite of automated Snakemake workflows for processing genomic data for downstream applications.
This article introduces Iliad, a suite of automated Snakemake workflows for processing genomic data for downstream applications. Iliad offers automated workflows with optimized time and resource management that are comparable to other workflows available but generates analysis-ready VCF files from the most common datatypes using a single command. The storage footprint challenge of genomic data is overcome by utilizing temporary intermediate files before the final VCF is generated. This file is ready for use in imputation, genome-wide association study (GWAS) pipelines, high-throughput population genetics studies, select gene candidate studies, and more. Iliad was developed to be portable, compatible, scalable, robust, and repeatable with a simplistic setup, so biologists that are less familiar with programming can manage their own big data with this open-source suite of workflows. As sequencing data availability grows, the ability for biologists to process it using stable, automated, and reproducible workflows is paramount as it significantly reduces the time to generate clean and reliable data. The Iliad suite of genomic data workflows was developed to provide users with seamless file transitions from raw genomic data to a quality-controlled variant call format (VCF) file for downstream applications. Iliad benefits from the efficiency of the Snakemake best practices framework coupled with Singularity and Docker containers for repeatability, portability, and ease of installation. This feat is accomplished from the onset with download acquisitions of any raw data type (FASTQ, CRAM, IDAT) straight through to the generation of a clean merged data file that can combine any user-preferred datasets using robust programs such as BWA, Samtools, and BCFtools. Users can customize and direct their workflow with one straightforward configuration file. Iliad is compatible with Linux, MacOS, and Windows platforms and scalable from a local machine to a high-performance computing cluster. (Published Abstract Provided)
Downloads
Similar Publications
- Identification of Cadaveric Liver Tissues Using Thanatotranscriptome Biomarkers
- Linking Ammonium Nitrate – Aluminum (AN-AL) Post-Blast Residues to PreBlast Explosive Materials Using Isotope Ratio and Trace Elemental Analysis for Source Attribution
- Raman Spectroscopy and Chemometrics for Forensic Bloodstain Analysis: Species Differentiation, Donor Age Estimation, and Dating of Bloodstains