Configuration Reference¶

DNANet uses Hydra for hierarchical configuration composition. The master config at conf/config.yaml merges independent config groups (data, model, training, evaluation, logging) into a single resolved config object.

Master Config¶

# conf/config.yaml
defaults:
  - data: dnanet_rd
  - model: unet
  - training: segmentation
  - evaluation: segmentation
  - logging: mlflow
  - _self_

task: train              # One of: train, evaluate, cross_validate
seed: 42                 # Random seed for reproducibility
output_dir: outputs/${now:%Y-%m-%d_%H-%M-%S}
verbosity: INFO          # DEBUG, INFO, WARNING, ERROR
checkpoint: null         # Path to checkpoint (for evaluate/resume)

Config Groups¶

Data (`conf/data/`)¶

Controls dataset loading. Each YAML file maps to a supported dataset.

Key	Type	Description
`name`	str	Dataset identifier
`kit`	str	Kit/scaling strategy (`PPF6C`, `GLOBALFILER`)
`dataset_strategy`	str	File handling strategy (`NFI_RND`, `PROVEDIT`)
`root`	str	Path to HID files directory
`annotations_path`	str/null	Path to annotation files
`hid_to_annotations_path`	str/null	CSV mapping HID filenames to annotations
`best_ladder_paths_csv`	str/null	CSV mapping samples to ladder files
`ladder_alleles_csv`	str/null	CSV with expected ladder alleles

Available Data Configs¶

dnanet_rd — NFI R&D 2p/5p mixture dataset (PPF6C kit, 350 samples)

provedit — PROVEDIt court validation dataset (GlobalFiler kit, ~750 samples)

Model (`conf/model/`)¶

Defines the neural network architecture and loss function using Hydra's _target_ instantiation.

`unet` — U-Net Segmentation¶

architecture:
  _target_: dnanet.models.unet.UNet
  depth: 4
  kernel_size: [3, 5]     # (height=dyes, width=signal)
  num_filters: 32
loss:
  _target_: dnanet.models.loss.DiceLoss

`autoencoder` — Convolutional Autoencoder¶

architecture:
  _target_: dnanet.models.autoencoder.PerDyeConv1dAutoencoder
  in_channels: 6
  input_length: 4096
  hidden_channels: 64
  depth: 4
  compression: 8
loss:
  _target_: torch.nn.MSELoss

Used to pretrain a compact profile representation. PeakNet later reuses the encoder output as global context.

`peak_classifier` — Peak Classification¶

architecture:
  _target_: dnanet.models.peak_classifier.PeakClassificationModel
  width: 120
  n_markers: 28
  embedding_dim: 8
  hidden_channels: [32, 64]
  pooling: flat
loss:
  _target_: torch.nn.CrossEntropyLoss

Used both as a standalone peak classifier and as the local branch inside PeakNet.

`peaknet` — Combined Classifier¶

architecture:
  _target_: dnanet.models.peaknet.CombinedClassifier
  autoencoder: ...
  peak_classifier: ...
  hidden_dims: [64, 32]
  combiner: mlp           # mlp, film, or attention
loss:
  _target_: torch.nn.CrossEntropyLoss

Combines local peak-window features with full-profile context from the autoencoder encoder.

Training (`conf/training/`)¶

Controls the training loop, optimizer, scheduler, and callbacks.

Key	Type	Description
`type`	str	`segmentation`, `classification`, `reconstruction`
`max_epochs`	int	Maximum training epochs
`batch_size`	int	Batch size
`learning_rate`	float	Initial learning rate
`weight_decay`	float	L2 regularization
`val_fraction`	float	Fraction of data for validation (0.0–1.0)
`num_workers`	int	DataLoader workers
`threshold`	float	Prediction threshold (segmentation only)
`n_folds`	int	Number of cross-validation folds

Early Stopping¶

early_stopping:
  monitor: val/loss
  patience: 5
  min_delta: 0.01
  mode: min

Checkpointing¶

checkpoint:
  monitor: val/loss
  save_top_k: 1
  mode: min

Scheduler¶

scheduler:
  _target_: torch.optim.lr_scheduler.ExponentialLR
  gamma: 0.8

Evaluation (`conf/evaluation/`)¶

metrics:
  - pixel_precision
  - pixel_recall
  - pixel_f1_score
  - average_binary_iou
save_predictions: false
predictions_dir: predictions

Logging (`conf/logging/`)¶

mlflow — MLflow experiment tracking (default)

logger_type: mlflow
mlflow:
  tracking_uri: mlruns
  experiment_name: dnanet

tensorboard — TensorBoard logging

csvlogger — CSV file logging

CLI Override Examples¶

# Override nested keys with dot notation
dnanet training.learning_rate=0.001 training.max_epochs=100

# Override list values
dnanet model.architecture.kernel_size=[3,7]

# Multi-run (sweep)
dnanet -m training.learning_rate=0.001,0.0001,0.00001

Configuration Reference¶

Master Config¶

Config Groups¶

Data (conf/data/)¶

Available Data Configs¶

Model (conf/model/)¶

unet — U-Net Segmentation¶

autoencoder — Convolutional Autoencoder¶

peak_classifier — Peak Classification¶

peaknet — Combined Classifier¶

Training (conf/training/)¶