Training Pipeline

From raw hydrometeorological data to a trained error-correction model. This page documents the complete pipeline: data assembly, feature engineering, optimization strategy, and experiment design.

Model Architecture

Hydra v3 architecture diagram showing the GRU-Transformer hybrid pipeline

Hydra v3 architecture: Feature Importance Gate, GRU encoder, Multi-Scale Temporal Convolutions, Transformer encoder with attention pooling, and regime-conditioned bias correction.

Data Assembly

NWM v2.1 Retrospective

Hourly CHRTOUT streamflow analysis from the National Water Model retrospective run (1979-2020). Provides the baseline forecast that Hydra corrects.

nwm_cms

USGS Streamflow

Hourly observed discharge from USGS gauging stations. Serves as ground truth for computing residuals and evaluating model skill.

usgs_cms

ERA5 / ERA5-Land

Meteorological reanalysis providing atmospheric and land-surface variables. 6-hourly data reindexed to hourly with nearest-neighbor (tolerance = 3h, no future leakage).

15 features

Target Variables

Residual Mode (default)

y_residual = USGS - NWM

Model predicts the NWM error; corrected flow = NWM + predicted residual

Direct Mode

y_corrected = USGS

Model directly predicts observed streamflow without explicit residual decomposition

Input Features

ERA5 Meteorological Features

TemperatureAir temperature at 2m

temp_c

DewpointDewpoint temperature at 2m

dewpoint_c

PressureSurface pressure

pressure_hpa

PrecipitationTotal hourly precipitation

precip_mm

RadiationSurface solar radiation

radiation_mj_m2

Wind Speed10m wind speed

wind_speed

VPDVapor pressure deficit

vpd_kpa

HumidityRelative humidity

rel_humidity_pct

Soil MoistureVolumetric water content

soil_moisture_vwc

Hour EncodingCyclical hour of day

hour_sin/cos

Day-of-YearCyclical day of year

doy_sin/cos

Month EncodingCyclical month

month_sin/cos

Normalization

1.Z-score normalization computed on training split only, then applied identically to validation and test splits. Prevents information leakage.
2.asinh target transform applied to residual and corrected targets. Stabilizes variance across low-flow and high-flow regimes without log-domain issues at zero.
3.Per-site training — each gauge is trained independently to prevent cross-site sequence leakage.

Data Augmentation

During training, each input sequence has a 50% chance of being perturbed with additive Gaussian noise (sigma = 0.05). This regularizes against overfitting to exact feature values and improves generalization to unseen weather patterns.

Data Splits

Train (8 yr)

Val

Test (2 yr)

2010201820192021

Training

2010-01-01 to 2017-12-31

8 years of hourly data per site (~70,000 samples)

Validation

2018-01-01 to 2018-12-31

1 year for early stopping and hyperparameter selection

Test

2019-01-01 to 2020-12-31

2 years held out for final evaluation (never seen during training)

Training Configuration

Epochs

Maximum training epochs

Batch Size

Sequences per gradient step

Sequence

168h

7-day input window

Learn Rate

5e-4

Initial learning rate

Patience

Early stopping epochs

Optimizer

Primary: Ranger (RAdam + Lookahead)
Fallback: AdamW
Weight Decay: 5e-5
LR Schedule (Ranger): ReduceOnPlateau (patience=3, factor=0.5)
LR Schedule (AdamW): CosineAnnealing (T_max=epochs)

Training Details

Mixed Precision: FP16 via torch.autocast (GPU)
Gradient Clipping: max_norm = 1.0
Best Model: Checkpoint with lowest val loss
Target Transform: asinh(x) for variance stabilization
Sites: Trained independently per gauge

Multi-Objective Loss

The training loss combines multiple objectives. Core terms are always active; optional terms are enabled per-experiment. A LossAutoNormalizer uses exponential moving averages to keep all components at unit scale, so weights act as pure priority signals.

always

Gaussian NLL

core

Heteroscedastic negative log-likelihood on residual and corrected predictions. Learns per-sample uncertainty.

always

Consistency Loss

core

MSE between corrected prediction and observed streamflow. Ensures residual + NWM aligns with direct correction.

optional

Non-Negativity Penalty

physics

Physics constraint penalizing negative streamflow: relu(-Q)^2. Streamflow cannot be negative.

optional

NSE Surrogate

hydrology

Differentiable Nash-Sutcliffe Efficiency proxy. Directly optimizes the standard hydrological skill metric.

optional

KGE Stabilizer

hydrology

Kling-Gupta decomposition into correlation, variability ratio, and bias ratio components.

optional

Quantile Pinball

uncertainty

Pinball loss for probabilistic prediction intervals. Calibrates uncertainty quantiles.

Experiment Design

19 experiments across 3 study sites systematically ablate architecture, training strategy, and input features. Each experiment uses identical data splits and evaluation protocol.

Architecture Ablation

Compare encoder architectures while holding inputs and training procedure constant.

-LSTM
-Transformer-only
-GRU-Transformer v2
-Hydra v3

Training Ablation

Vary training constraints and data sampling strategies on the best architecture.

-Causal attention mask
-Non-negativity constraint
-Event oversampling (3x Q90+)
-Combined configurations

Input Feature Ablation

Test which input sources drive predictive skill.

-NWM + ERA5 (standard)
-ERA5-only (no NWM)
-USGS + NWM + ERA5
-USGS + ERA5 (no NWM)

Now that you understand the training pipeline, see how different configurations perform.

Explore Experiments