How do you decide what to test and monitor in a pipeline, given you cannot check everything?
Quality-engineering judgment: risk-weighted coverage versus checkbox testing.
Weight by blast radius and detectability: the tables feeding executive dashboards, financial reporting, and ML training get the strictest coverage; silent failure modes (a join quietly going one-to-many, volume drifting down) get monitors because no one will report them. Standard layers: schema and uniqueness tests at transformation, freshness and volume monitors at the platform level, anomaly detection on the business-critical metrics. Name the discipline of pruning noisy alerts — a monitor nobody trusts is worse than no monitor.