Training Pipeline

From raw hydrometeorological data to a trained error-correction model. This page documents the complete pipeline: data assembly, feature engineering, optimization strategy, and experiment design.

Model Architecture

Hydra v3 architecture diagram showing the GRU-Transformer hybrid pipeline

Hydra v3 architecture: Feature Importance Gate, GRU encoder, Multi-Scale Temporal Convolutions, Transformer encoder with attention pooling, and regime-conditioned bias correction.

Data Assembly

N

NWM v2.1 Retrospective

Hourly CHRTOUT streamflow analysis from the National Water Model retrospective run (1979-2020). Provides the baseline forecast that Hydra corrects.

nwm_cms

U

USGS Streamflow

Hourly observed discharge from USGS gauging stations. Serves as ground truth for computing residuals and evaluating model skill.

usgs_cms

E

ERA5 / ERA5-Land

Meteorological reanalysis providing atmospheric and land-surface variables. 6-hourly data reindexed to hourly with nearest-neighbor (tolerance = 3h, no future leakage).

15 features

Target Variables

Residual Mode (default)

y_residual = USGS - NWM

Model predicts the NWM error; corrected flow = NWM + predicted residual

Direct Mode

y_corrected = USGS

Model directly predicts observed streamflow without explicit residual decomposition

Input Features

ERA5 Meteorological Features

TemperatureAir temperature at 2m
temp_c
DewpointDewpoint temperature at 2m
dewpoint_c
PressureSurface pressure
pressure_hpa
PrecipitationTotal hourly precipitation
precip_mm
RadiationSurface solar radiation
radiation_mj_m2
Wind Speed10m wind speed
wind_speed
VPDVapor pressure deficit
vpd_kpa
HumidityRelative humidity
rel_humidity_pct
Soil MoistureVolumetric water content
soil_moisture_vwc
Hour EncodingCyclical hour of day
hour_sin/cos
Day-of-YearCyclical day of year
doy_sin/cos
Month EncodingCyclical month
month_sin/cos

Normalization

  • 1.Z-score normalization computed on training split only, then applied identically to validation and test splits. Prevents information leakage.
  • 2.asinh target transform applied to residual and corrected targets. Stabilizes variance across low-flow and high-flow regimes without log-domain issues at zero.
  • 3.Per-site training — each gauge is trained independently to prevent cross-site sequence leakage.

Data Augmentation

During training, each input sequence has a 50% chance of being perturbed with additive Gaussian noise (sigma = 0.05). This regularizes against overfitting to exact feature values and improves generalization to unseen weather patterns.

Data Splits

Train (8 yr)
Test (2 yr)
2010201820192021

Training

2010-01-01 to 2017-12-31

8 years of hourly data per site (~70,000 samples)

Validation

2018-01-01 to 2018-12-31

1 year for early stopping and hyperparameter selection

Test

2019-01-01 to 2020-12-31

2 years held out for final evaluation (never seen during training)

Training Configuration

Epochs

40

Maximum training epochs

Batch Size

64

Sequences per gradient step

Sequence

168h

7-day input window

Learn Rate

5e-4

Initial learning rate

Patience

5

Early stopping epochs

Optimizer

Primary
Ranger (RAdam + Lookahead)
Fallback
AdamW
Weight Decay
5e-5
LR Schedule (Ranger)
ReduceOnPlateau (patience=3, factor=0.5)
LR Schedule (AdamW)
CosineAnnealing (T_max=epochs)

Training Details

Mixed Precision
FP16 via torch.autocast (GPU)
Gradient Clipping
max_norm = 1.0
Best Model
Checkpoint with lowest val loss
Target Transform
asinh(x) for variance stabilization
Sites
Trained independently per gauge

Multi-Objective Loss

The training loss combines multiple objectives. Core terms are always active; optional terms are enabled per-experiment. A LossAutoNormalizer uses exponential moving averages to keep all components at unit scale, so weights act as pure priority signals.

always

Gaussian NLL

core

Heteroscedastic negative log-likelihood on residual and corrected predictions. Learns per-sample uncertainty.

always

Consistency Loss

core

MSE between corrected prediction and observed streamflow. Ensures residual + NWM aligns with direct correction.

optional

Non-Negativity Penalty

physics

Physics constraint penalizing negative streamflow: relu(-Q)^2. Streamflow cannot be negative.

optional

NSE Surrogate

hydrology

Differentiable Nash-Sutcliffe Efficiency proxy. Directly optimizes the standard hydrological skill metric.

optional

KGE Stabilizer

hydrology

Kling-Gupta decomposition into correlation, variability ratio, and bias ratio components.

optional

Quantile Pinball

uncertainty

Pinball loss for probabilistic prediction intervals. Calibrates uncertainty quantiles.

Experiment Design

19 experiments across 3 study sites systematically ablate architecture, training strategy, and input features. Each experiment uses identical data splits and evaluation protocol.

Architecture Ablation

Compare encoder architectures while holding inputs and training procedure constant.

  • -LSTM
  • -Transformer-only
  • -GRU-Transformer v2
  • -Hydra v3

Training Ablation

Vary training constraints and data sampling strategies on the best architecture.

  • -Causal attention mask
  • -Non-negativity constraint
  • -Event oversampling (3x Q90+)
  • -Combined configurations

Input Feature Ablation

Test which input sources drive predictive skill.

  • -NWM + ERA5 (standard)
  • -ERA5-only (no NWM)
  • -USGS + NWM + ERA5
  • -USGS + ERA5 (no NWM)

Now that you understand the training pipeline, see how different configurations perform.

Explore Experiments

Master's Thesis Project | Appalachian State University | 2024–2025

Hydra Transformer for NWM Streamflow Error Correction