DNANet Documentation¶
DNANet is a deep-learning framework for forensic DNA electropherogram (EPG)
analysis, developed at the Netherlands Forensic Institute (NFI). It provides
end-to-end pipelines for segmentation, classification, and reconstruction of
Short Tandem Repeat (STR) profiles from .hid files produced by capillary
electrophoresis instruments.
Key Features¶
- Modular architecture — Clean separation between domain models, data loading, neural networks, training logic, and evaluation.
- Multiple model types — U-Net segmentation, peak classification, autoencoders, and combined classifiers.
- Kit-agnostic — Strategy pattern supports PowerPlex Fusion 6C (PPF6C) and GlobalFiler kits; new kits can be added by implementing a scaling strategy.
- Hydra configuration — All parameters (model, training, data, logging) are composed from YAML config groups with full CLI override support.
- PyTorch Lightning — Training, evaluation, and cross-validation are orchestrated by Lightning, providing automatic GPU support, checkpointing, early stopping, and logging.
- Comprehensive evaluation — Pixel-level and allele-level metrics, allele calling strategies, and EPG visualisation.
Quick Start¶
# Install (editable, with dev dependencies)
pip install -e ".[dev]"
# Train a U-Net on the NFI R&D dataset
dnanet task=train data=dnanet_rd model=unet training=segmentation
# Train on ProvedIt dataset
dnanet task=train data=provedit model=unet training=segmentation
# Evaluate a checkpoint
dnanet task=evaluate data=dnanet_rd model=unet checkpoint=outputs/.../checkpoints/best.ckpt
# 5-fold cross-validation
dnanet task=cross_validate data=dnanet_rd model=unet training=segmentation