Skip to main content
Skip to content
Ververica

Why Dashboards Keep Missing What Matters

12 min read

A modern wind turbine carries more than a thousand sensors covering everything from drivetrain temperature to blade pitch. The data is there. The decisions, often, are not.

Operators sit in front of dashboards built across three timescales. SCADA telemetry rolls up to 10-minute averages. Vibration spectra (vibration signals broken down into their constituent frequencies) arrive on hourly or daily reports. Asset performance and maintenance needs are reviewed weekly, monthly, or after a failure. Each layer is useful. But independently, none of them are able to see or predict what is about to happen.

The result is a familiar pattern. A turbine performs inside every threshold. The dashboard stays green. A bearing fails anyway.

Unfortunately, disruptions and faults are not anomalies. They are expected, part of the routine. A cloud over a solar farm, a tree on a power line, a mechanical bearing entering distress: performance fluctuates over external factors, assets fail every day, and operators are left to handle them after they occur. What separates a managed event from a catastrophic one is speed of detection and containment.

The interval between fault and response is where catastrophes live. Ververica eliminates the interval. Sensor data, context, and inference run as one continuous stream, sub-100ms end to end. Detection happens at the speed the equipment is changing, not the speed the dashboard refreshes.

Figure one: Modern Wind Turbines Data Production

THE LIMITS OF AVERAGE

The default sampling rate for SCADA in commercial wind turbines is 10-minute averages of 1 Hz signals. This means that operators are working on information that reduces 600 samples to four numbers; mean, max, min and standard deviation.

Using a limited number of data points made sense when bandwidth was scarce and storage was expensive. However, it hides almost everything that matters for early fault detection. A faulty bearing does not raise its average temperature for weeks. It produces brief, intermittent oscillations. Using a limited data set, these oscillations get averaged out before the data even reaches the operator for consideration.

A 2023 SAGE review by Pandit, Astolfi and colleagues put it bluntly: abnormal vibration in damaged mechanical components and anomalous behavior in faulty electrical components are very difficult to detect from data averaged over 10 minutes. The signal exists, but the pipeline destroys it.

Operators compensate for these known limitations by establishing parallel condition monitoring systems. Vibration spectra. Oil particle analysis. Acoustic emission. These work, when they exist, but they sit on separate infrastructure, separate dashboards, separate review cycles. Correlation happens manually, after the fact, by an engineer working on vibes.

The pattern is not unique to wind energy. With different specifics the same problem exists for all plant types. Grid operators report the same gap between their SCADA polling cycles, which run at 1, 10, or 20 seconds, and the phasor measurement units that sample at 10–20 Hz. The high-resolution signal sees voltage peaks and frequency excursions the slower system averages away.

RULES CATCH WHAT YOU ALREADY KNOW

Threshold-based alarms are the operator's standard tool. Gearbox oil temperature above the OEM limit. Generator winding above the insulation class rating. Vibration RMS above the alarm setpoint. Each rule encodes a known failure mode.

The problem is everything else.

A misaligned main bearing transfers thrust load to the gearbox. Internal clearances grow. Planetary alignment drifts. None of these states cross any threshold. They emerge from a relationship between variables: rotor speed, temperature gradients, subtle shifts in power curve residuals.

A study of this topic has noted that traditional fixed-threshold methods produce both omitted detections and false positives, because the operating envelope of a turbine is not a box. It is a moving surface, and it changes with wind, season, age, and load history.

Rules are also expensive to maintain. Every new turbine model brings new thresholds. Every site has its own quirks. Every alarm that turns out to be only noise erodes operator trust in the system that raised it.

WHAT MACHINE LEARNING ACTUALLY DOES

The shift from rules to learned models is not about replacing engineering judgment. It is about recognizing patterns an engineer cannot enumerate.

A model trained on healthy operating data learns the relationship between hundreds of variables across the operating envelope. When the turbine deviates from that relationship, the model flags the deviation, even if no single variable has crossed any established thresholds.

The published results are concrete. Regression-based generator bearing temperature models, built on power output, nacelle temperature, and shaft speed, have flagged catastrophic bearing failures 25 days before the damage. LSTM autoencoder approaches on vibration data have reported outlier detection performance around 97% on real wind farm data. Other studies forecast remaining useful life on multi-week horizons with most predictions accurate to within hours.

These approaches share a requirement that traditional architectures struggle to meet. They need data at the resolution where the signal lives, joined across components, joined with weather and operating context, and processed continuously rather than in nightly batches. This is where data streaming changes the equation.

Rules and learned models are not alternatives. The mature architecture runs both, on the same stream. Complex event processing (CEP) catches multi-step patterns operators already know to look for: two consecutive overtemperature readings inside a window, two turbines on the same site crossing a wind threshold within a minute. Learned models catch the rest. Each surfaces what the other misses.

This is the architecture Ververica was built for. One engine. One stream. CEP patterns and trained models running side by side against the same sensor data, with the same event-time semantics, the same state, the same delivery guarantees. Rules and models stop being separate systems.

THE ARCHITECTURE PROBLEM

A typical wind farm monitoring stack looks like this:

  • SCADA writes to a historian.
  • Vibration data lands in a separate condition monitoring database.
  • Weather data comes from a third source.
  • Maintenance logs sit in an asset management system.
  • Reporting happens in a BI tool that pulls from all of them on a schedule.

For a model that needs to correlate signals across these systems in real time, this architecture is the failure. The data exists. It does not arrive together, and it does not arrive fresh.

Figure two: Reference architecture: monitoring and risk classification in real-time using the Ververica Platform

Streaming changes the topology. Sensor data, weather feeds, control system events, and maintenance records flow into a continuous pipeline. Models score each turbine continuously, against its current operating context, against the rest of the fleet. Anomalies surface within seconds, not days. Engineers receive ranked alerts with the contributing signals attached, not a wall of green dashboards punctuated by an unexplained red.

The same architecture supports the historical work. Models train on years of data. Backtests run against the full event history. The lakehouse holds what happened. The stream holds what is happening, all sharing one definition of the data.

One pipeline ingests sensor events. Multiple detectors run on the same stream in parallel: a health scoring window keyed by asset, a power curve deviation tracker, a yaw alignment monitor, a grid frequency monitor keyed by site, and a CEP engine watching for known multi-event patterns. Outputs feed one dashboard, one alert feed, one control loop.

THE NUMBER THAT MATTERS

The argument is usually written as a cost question. The argument gets written as cost. It should be written as a consequence. What does a failure actually cost?

A single onshore gearbox failure costs $250,000–$300,000 in repair. Gearbox failure rates run around 0.154 events per turbine per year. Main bearing failure rates of 3–6% per year are not unusual. Crane mobilization adds $350,000 per week. Offshore vessels run $150,000–$300,000 per day, with a month of mobilization typical. One case study reported $820,000 in lost revenue across three turbines over 28 months from main bearing failures alone.

A bearing run to failure transfers loads to the gearbox. The gearbox ingests the cost. A blade liberation event can damage the tower. Insurance responds.

Against those numbers, one avoided catastrophic failure costs the same amount as a streaming pipeline with a fleet of trained models avoiding future failures.

The calculation is not niche. It applies wherever assets are expensive, downtime is expensive, and failure modes are subtle enough that rules cannot pre-enumerate them. Wind. Hydro. Power transmission. Petrochemicals. Heavy industry. Rail. Data centers. The structure is the same. The cost to the business during failures is the same. The benefit arithmetic is the same.

WHAT TO BUILD, NOT WHAT TO BUY

For data and platform engineers, the practical question is what changes in the architecture.

Four things, in order.

First, data resolution. Move the high-frequency signals off the historian and into the stream. The 10-minute average is a reporting artifact, not a monitoring strategy. The model needs the underlying samples.

Second, integration. Sensor streams, weather feeds, control events, and maintenance history have to share one event time, one schema, and one access pattern. Joins happen at the platform layer, not in the dashboard.

Third, feedback. The model surfaces a candidate fault. The technician confirms or rejects it. That label flows back to training. The system improves. Without the feedback loop, the model decays. With it, the model gets better than the engineer who built it.

Fourth, action. The mature system does not stop at alerting. High-confidence predictions trigger commands the control layer can execute: reduce RPM, ramp down power, feather blades. Side outputs route by risk class. Medium-confidence events enter the maintenance queue. The architecture closes the loop without waiting for a human to read the dashboard.

None of this is exotic. It is what mature streaming architectures look like. The shift is treating the monitoring stack as one continuous system rather than three disconnected ones.

FROM OBSERVATION TO ACTION

The wind industry accepts that retrospective analysis is not enough. Every operator has a bearing failure story. Every operator has a dashboard that stays green, in the midst of a costly failure.

The question is not whether to move to continuous, learned monitoring. The question is how fast the data can arrive, how cleanly the signals can be joined, and how quickly the model output can reach the engineer who can act on it.

This is the work Ververica's Unified Streaming Data Platform exists to support. Built on Apache Flink® and powered by the VERA engine, it processes sensor data, control events, and contextual feeds as one continuous stream. Sub-100ms inference. Full lineage from event to alert. Real-time and historical workloads in one engine. Grid operators already use streaming architectures of this kind to detect and act on faults inside 50 milliseconds. The wind farm case is the same problem at a slower timescale.

The dashboard told you the turbine was fine. The data told a different story. The architecture decides which one you hear first.

MORE RESOURCES

Read about how the VERA stream processing engine helps design and operate industrial monitoring that tackles the risks coming with data not being captured,analysed and acted on in real time.

Don’t wait for a catastrophic event to happen. Book a fully tailored demo now.

Share:LinkedIn

While the World Buffers, We Act.

We tore down the facade. With No Mercy Magenta and a new voice we challenge 'real-time' pretenders. We are the authoritative operator for sovereign, low-latency AI. The world is buffering. We are not.

Fabian Wilckens4 min read