Configuration Reference¶
DNANet uses Hydra for hierarchical configuration
composition. The master config at conf/config.yaml merges independent
config groups (data, model, training, evaluation, logging) into a single
resolved config object.
Master Config¶
# conf/config.yaml
defaults:
- data: dnanet_rd
- model: unet
- training: segmentation
- evaluation: segmentation
- logging: mlflow
- _self_
task: train # One of: train, evaluate, cross_validate
seed: 42 # Random seed for reproducibility
output_dir: outputs/${now:%Y-%m-%d_%H-%M-%S}
verbosity: INFO # DEBUG, INFO, WARNING, ERROR
checkpoint: null # Path to checkpoint (for evaluate/resume)
Config Groups¶
Data (conf/data/)¶
Controls dataset loading. Each YAML file maps to a supported dataset.
| Key | Type | Description |
|---|---|---|
name |
str | Dataset identifier |
kit |
str | Kit/scaling strategy (PPF6C, GLOBALFILER) |
dataset_strategy |
str | File handling strategy (NFI_RND, PROVEDIT) |
root |
str | Path to HID files directory |
annotations_path |
str/null | Path to annotation files |
hid_to_annotations_path |
str/null | CSV mapping HID filenames to annotations |
best_ladder_paths_csv |
str/null | CSV mapping samples to ladder files |
ladder_alleles_csv |
str/null | CSV with expected ladder alleles |
| data_loading_strategy | str | raw, analyzed, or superior |
| adjustment_of_annotations | str/null | top, complete, or null |
| limit | int/null | Max images to load |
| skip_if_invalid_ladder | bool | Skip images with bad ladders |
| include_size_standard | bool | Include 6th dye (size standard) |
Available Data Configs¶
dnanet_rd — NFI R&D 2p/5p mixture dataset (PPF6C kit, 350 samples)
provedit — PROVEDIt court validation dataset (GlobalFiler kit, ~750 samples)
Model (conf/model/)¶
Defines the neural network architecture and loss function using Hydra's
_target_ instantiation.
unet — U-Net Segmentation¶
architecture:
_target_: dnanet.models.unet.UNet
depth: 4
kernel_size: [3, 5] # (height=dyes, width=signal)
num_filters: 32
loss:
_target_: dnanet.models.loss.DiceLoss
autoencoder — Convolutional Autoencoder¶
architecture:
_target_: dnanet.models.autoencoder.PerDyeConv1dAutoencoder
in_channels: 6
input_length: 4096
hidden_channels: 64
depth: 4
compression: 8
loss:
_target_: torch.nn.MSELoss
Used to pretrain a compact profile representation. PeakNet later reuses the encoder output as global context.
peak_classifier — Peak Classification¶
architecture:
_target_: dnanet.models.peak_classifier.PeakClassificationModel
width: 120
n_markers: 28
embedding_dim: 8
hidden_channels: [32, 64]
pooling: flat
loss:
_target_: torch.nn.CrossEntropyLoss
Used both as a standalone peak classifier and as the local branch inside PeakNet.
peaknet — Combined Classifier¶
architecture:
_target_: dnanet.models.peaknet.CombinedClassifier
autoencoder: ...
peak_classifier: ...
hidden_dims: [64, 32]
combiner: mlp # mlp, film, or attention
loss:
_target_: torch.nn.CrossEntropyLoss
Combines local peak-window features with full-profile context from the autoencoder encoder.
Training (conf/training/)¶
Controls the training loop, optimizer, scheduler, and callbacks.
| Key | Type | Description |
|---|---|---|
type |
str | segmentation, classification, reconstruction |
max_epochs |
int | Maximum training epochs |
batch_size |
int | Batch size |
learning_rate |
float | Initial learning rate |
weight_decay |
float | L2 regularization |
val_fraction |
float | Fraction of data for validation (0.0–1.0) |
num_workers |
int | DataLoader workers |
threshold |
float | Prediction threshold (segmentation only) |
n_folds |
int | Number of cross-validation folds |
Early Stopping¶
Checkpointing¶
Scheduler¶
Evaluation (conf/evaluation/)¶
metrics:
- pixel_precision
- pixel_recall
- pixel_f1_score
- average_binary_iou
save_predictions: false
predictions_dir: predictions
Logging (conf/logging/)¶
mlflow — MLflow experiment tracking (default)
tensorboard — TensorBoard logging
csvlogger — CSV file logging