Overview

TBtoFHIR is a Nextflow-based workflow designed for the analysis of TB genomic data. It processes raw sequencing data (long-read or short-read) and pre-annotated VCF to identify drug resistance mutations based on the World Health Organization (WHO) database, TB lineages, and generates a FHIR-compliant genomics bundle.

Key Features

  • Multi-platform Support: Processes raw reads and processed data from diverse platforms.

  • Drug Resistance Analysis: Identifies mutations associated with TB drug resistance based on WHO’s latest TB mutation database.

  • Lineage Classification: Identifies TB lineages based on barcode SNPs.

  • FHIR Compliance: Generates standardized genomics data exchange formats.

  • Clinical Integration: Merges genomic data with clinical metadata.

  • Quality Control: QC reporting with MultiQC.

Key Outputs

  • TB lineage classification

  • FHIR genomics bundle (Observations, DiagnosticReport)

  • Clinical summary reports

  • Quality control metrics

Directory Structure

tb-to-fhir-full
├── main.nf                             # Main workflow
├── nextflow.config                     # Configuration and parameters
├── workflows/
│   ├── illumina.nf                     # Illumina sub-workflow
│   ├── nanopore.nf                     # Nanopore sub-workflow
│   ├── vcf.nf                          # VCF sub-workflow
│   ├── lineage.nf                      # Lineage classification
│   ├── fhir.nf                         # FHIR variants generation
│   ├── validate_fhir.nf                # FHIR validation
│   ├── merge_clinical_data.nf          # Clinical metadata merge
│   ├── upload_fhir.nf                  # FHIR server upload
│   ├── report.nf                       # QC and sample report generation
│   └── utils.nf                        # Utility functions
├── scripts/
│   ├── annotated_to_fhir.py            # VCF-to-FHIR converter
│   ├── clinical_metadata_parser.py     # Patient/org/practitioner parser
│   ├── generate_sample_report.py       # Per-sample text report
│   ├── lineage_classifier.py           # SNP-barcode lineage classifier
│   ├── merge_clinical_fhir.py          # FHIR genomics + clinical data merger
│   ├── upload_fhir.py                  # FHIR uploader
│   ├── get_access_token.py             # Standalone token fetcher
│   └── get_versions.py                 # Software version collector
├── data/
│   ├── NGS/                            # Input FASTQ files
│   ├── VCF/                            # Input VCF files
│   ├── H37Rv.fasta                     # Reference genome
│   ├── repetitive_regions.bed          # Exclusion regions
│   ├── *_lineage.bed                   # Lineage barcode SNPs
│   ├── *_annotation_table.tsv.gz       # WHO mutation annotation table
│   ├── patient_clinical_metadata.csv   # Patient metadata
│   ├── organization_metadata.csv       # Organization metadata
│   └── practitioner_metadata.csv       # Practitioner metadata
└── tools/
    └── fhir-validator.jar              # HL7 FHIR validator