Why Dashboards Keep Missing What Matters

June 29, 202612 min read

More than a thousand sensors in a wind turbine. Drivetrain temperature. Blade pitch. The data is there. The decisions are not.

Operators watch dashboards built across three timescales. Supervisory Control and Data Acquisition (SCADA) telemetry rolls up to 10-minute averages. Vibration spectra arrive on hourly or daily reports. Asset performance gets reviewed weekly, monthly, or after a failure. Each layer is useful. None of them see what is about to happen.

The pattern is familiar. A turbine performs inside every threshold. The dashboard stays green. A bearing fails anyway. Multi-million dollar damage occurs before the first alert flashes.

Disruptions and faults are not anomalies. They are part of the operation. A cloud crosses a solar farm. A tree drops on a power line. A bearing enters distress. Assets fail every day, and operators handle them after the fact. Speed of detection separates a managed event from a catastrophe.

The interval between fault and response is where catastrophes live. Ververica removes the interval. Sensor data, context, and inference run as one continuous stream, sub-second end to end. Detection happens at the speed the equipment changes, not the speed the dashboard refreshes. Informed decisions are made earlier and faster. Fixes move from emergency repairs to scheduled tasks. Outages averted. Ververica puts the operator more in control.

Figure one: Modern Wind Turbines Data Production

The Limits Of Average

The default sampling rate for SCADA in commercial wind turbines is 10-minute averages of 1 Hz signals. Operators work on information that reduces 600 samples to four metrics; mean, max, min and standard deviation.

Limiting data points saves bandwidth and storage, but it hides almost everything that matters for early fault detection. A faulty bearing does not raise its average temperature for weeks. It produces brief, intermittent oscillations. The averaging erases them. The operator sees nothing. The fault is already forming.

A 2023 SAGE review by Pandit, Astolfi and colleagues said it plainly. Abnormal vibration and anomalous electrical behavior are very hard to detect from data averaged over 10 minutes. The signal exists. The pipeline destroys it.

Operators compensate with parallel condition monitoring systems. Vibration spectra. Oil particle analysis. Acoustic emission. Parallel systems mean extra cost. Cost to cut, or not even invest in in the first place. Worse, they sit on separate infrastructure, separate dashboards, separate review cycles. Correlation happens manually, after the fact, by an engineer working on vibes.

The pattern is not unique to wind. The same problem hits every plant type. Grid operators report the same gap between SCADA polling cycles, which run at 1, 10, or 20 seconds, and phasor measurement units that sample at 10 to 20 Hz. The high-resolution signal sees voltage peaks and frequency excursions the slower system averages away.

Rules Catch What You Already Know

Threshold-based alarms are the operator's standard tool. Gearbox oil temperature above the manufacturer’s limit. Generator winding above the insulation class rating. Vibration RMS above the alarm setpoint. Each rule encodes a known failure mode.

The problem is everything else.

A misaligned main bearing transfers thrust load to the gearbox. Internal clearances grow. Planetary alignment drifts. None of these states cross a threshold. They emerge from a relationship between variables: rotor speed, temperature gradients, subtle shifts in power curve residuals.

A study of this topic found that fixed-threshold methods produce both missed detections and false positives. The operating envelope of a turbine is not a box. It is a moving surface. It changes with wind, season, age, and load history.

Rules are expensive to maintain. Every new turbine model brings new thresholds. Every site has its own quirks. Every false alarm erodes operator trust in the system that raised it.

What Machine Learning Actually Does

The shift from rules to learned models does not replace engineering judgment. It recognizes patterns an engineer cannot enumerate.

A model trained on healthy operating data learns the relationship between hundreds of variables across the operating envelope. When the turbine deviates, the model flags it, even when no single variable crosses a threshold.

The published results are concrete. Regression models for generator bearing temperature, built on power output, nacelle temperature, and shaft speed, flagged catastrophic bearing failures 25 days early. LSTM autoencoders on vibration data reported outlier detection around 97% on real wind farm data. Other studies forecast remaining useful life weeks out, accurate to within hours.

These approaches share a requirement traditional architectures cannot meet. They need data at the resolution where the signal lives. Joined across components. Joined with weather and operating context. Processed continuously, not in nightly batches. This is where data streaming changes the equation.

Rules and learned models are not alternatives. The mature architecture runs both, on the same stream. Complex event processing (CEP) catches multi-step patterns operators already know. Two consecutive overtemperature readings inside a window. Two turbines on one site crossing a wind threshold within a minute. Learned models catch the rest. Each surfaces what the other misses.

This is the architecture Ververica was built for. One engine. One stream. CEP patterns and trained models running side by side against the same sensor data, with the same event-time semantics, the same state, the same delivery guarantees. Rules and models stop being separate systems.

You Already Have A Streaming Stack

Fair assumption. Spark Structured Streaming runs micro-batch jobs. A feature store holds the model inputs. Three mature tools, already in the building.

That stack does not do what multi-million dollar industrial equipment requires.

A trigger interval of a few seconds to minutes is blindness by design, repeated forever. The bearing does not wait for the next batch boundary. Event-time correlation across sensor, weather, and maintenance streams is exactly the join Spark does poorly, and Kafka does not do at all. The feature store splits your logic in two. One code path computes features for training Another recomputes them at serving time. When they silently drift over time, your model misses a fault and nobody notices.

Flink was built for this shape of problem. True one-event-at-a-time processing. Native event-time semantics with watermarks: a late sensor reading lands in the right window instead of the wrong one. Managed keyed state per asset, held in the engine, not in a database you query on every event. One feature definition, computed once, used for the backtest and the live score. CEP rules and the learned models run in the same job, against the same stream, with the same delivery guarantees.

The Architecture Problem

A typical wind farm monitoring stack looks like this:

SCADA writes to a historian.
Vibration data lands in a separate condition monitoring database.
Weather data comes from a third source.
Maintenance logs sit in an asset management system.
Reporting happens in a BI tool that pulls from all of them on a schedule.

A model that needs to correlate signals across these systems in real time cannot. This architecture is the failure. The data exists. It does not arrive together. It does not arrive fresh.

Streaming changes the topology. Sensor data, weather feeds, control system events, and maintenance records flow into one continuous pipeline. Models score each turbine continuously, against its current context, against the rest of the fleet. Anomalies surface within seconds, not days. Engineers receive ranked alerts with the contributing signals attached, not a wall of green punctuated by an unexplained red.

The same architecture supports the historical work. Models train on years of data. Backtests run against the full event history. The lakehouse holds what happened. The stream holds what is happening. Both share one definition of the data.

One pipeline ingests sensor events. Multiple detectors run on the same stream in parallel: a health scoring window keyed by asset, a power curve deviation tracker, a yaw alignment monitor, a grid frequency monitor keyed by site, and a CEP engine watching for known multi-event patterns. Outputs feed one dashboard, one alert feed, one control loop.

Figure two: Reference architecture: monitoring and risk classification in real-time using the Ververica Platform

The Number That Matters

Most write the argument as a cost question. A failure carries a measurable cost. State it as a consequence.

A single onshore gearbox failure costs $250,000 to $300,000 in repair. Gearbox failure rates run around 0.154 events per turbine per year. Annual main bearing failure rates are in the range of 3 to 6%. Crane mobilization adds $350,000 per week. Offshore vessels run $150,000 to $300,000 per day, with a month of mobilization typical. One case study reported $820,000 in lost revenue across three turbines over 28 months from main bearing failures alone.

A bearing run to failure transfers loads to the gearbox. The gearbox ingests the cost. A blade liberation event can damage the tower. Insurance responds.

Against those numbers, one avoided catastrophic failure pays for a streaming pipeline and a fleet of trained models.

The calculation is not niche. It applies wherever assets are expensive, downtime is expensive, and failure modes are too subtle for rules to pre-enumerate. Wind. Hydro. Power transmission. Petrochemicals. Heavy industry. Rail. Data centers. Same structure. Same cost. Same arithmetic.

What To Build, Not What To Buy

For data and platform engineers, the practical question is what changes in the architecture.

Four things, in order.

First, data resolution. Move the high-frequency signals off the historian and into the stream. The 10-minute average is a reporting artifact, not a monitoring strategy. The model needs the underlying samples.

Second, integration. Sensor streams, weather feeds, control events, and maintenance history have to share one event time, one schema, one access pattern. Joins happen at the platform layer, not in the dashboard.

Third, feedback. The model surfaces a candidate fault. The technician confirms or rejects it. That label flows back to training. Without the loop, the model decays. With it, the model gets better than the engineer who built it.

Fourth, action. The mature system does not stop at alerting. High-confidence predictions trigger commands the control layer executes: reduce RPM, ramp down power, feather blades. Side outputs route by risk class. Medium-confidence events enter the maintenance queue. The loop closes without waiting for a human to read the dashboard.

None of this is exotic. It is what mature streaming architectures look like. The shift is treating the monitoring stack as one continuous system, not three disconnected ones.

From Observation To Action

The wind industry accepts that retrospective analysis is not enough. Every operator has a bearing failure story. Every operator has a dashboard that stayed green through a costly failure.

The question is not whether to move to continuous, learned monitoring. The question is how fast the data arrives, how cleanly the signals join, and how quickly the model output reaches the engineer who can act.

This is the work Ververica's Unified Streaming Data Platform exists to support. Built on Apache Flink® and powered by the VERA engine, it processes sensor data, control events, and contextual feeds as one continuous stream. Sub-100 milliseconds inference. Full lineage from event to alert. Real-time and historical workloads in one engine. Grid operators already detect and act on faults inside 50 milliseconds. The wind farm case is the same problem at a slower timescale.

The dashboard told you the turbine was fine. The data told a different story. The architecture decides which one you hear first.

More Resources

Read how the VERA stream processing engine supports industrial monitoring that captures, analyzes, and acts on data in real time.

Don't wait for a catastrophic event. Book a fully tailored demo now.

Share:LinkedIn

Why Dashboards Keep Missing What Matters

The Limits Of Average

Rules Catch What You Already Know

What Machine Learning Actually Does

You Already Have A Streaming Stack

The Architecture Problem

The Number That Matters

What To Build, Not What To Buy

From Observation To Action

More Resources

SQL now stands for Streaming Query Language

The Sovereignty Tax. What Cloud-Only Vendors Won't Tell Tier 1 Banks

While the World Buffers, We Act.

The Limits Of Average

Rules Catch What You Already Know

What Machine Learning Actually Does

You Already Have A Streaming Stack

The Architecture Problem

The Number That Matters

What To Build, Not What To Buy

From Observation To Action

More Resources

Continue reading

SQL now stands for Streaming Query Language

The Sovereignty Tax. What Cloud-Only Vendors Won't Tell Tier 1 Banks

While the World Buffers, We Act.