SkyHaven GROUP AIM
AI

Energy Resilience Is an Operations Problem

The electric grid is being asked to support a level of load growth, volatility, and operational complexity that many utility systems were not designed to handle.

Energy Resilience Is an Operations Problem
Photo by Sebastian Schuster / Unsplash
SkyHaven Group
AI Energy Field Note Chris Parker Sept 19, 2025 9 Min Read
Operational intelligence layer

Energy Resilience Is an Operations Problem

Grid resilience is usually framed as a construction challenge. But as load growth, climate exposure, distributed resources, and system complexity accelerate, the harder question is operational: can utilities understand risk fast enough to act?

Opening position

The electric grid is being pushed into a more volatile operating regime. New demand is arriving faster, distributed resources are changing power flows, and weather risk is reducing the margin for slow coordination.

Data centers, industrial expansion, electrification, electric vehicle charging, distributed energy resources, and digital infrastructure are all adding pressure to systems that were built around more predictable assumptions. At the same time, extreme weather, aging assets, vegetation exposure, wildfire risk, flooding, and constrained equipment supply chains are making disruption harder to absorb.

The usual answer is to build more and harden more. Utilities do need transmission expansion, substation upgrades, undergrounding where it is justified, automation, storage, backup generation, and more durable distribution infrastructure. But capital investment alone does not solve the operating challenge.

A hardened asset can still fail if its condition is misunderstood. A backup resource can still be unavailable if it is not tied into operational planning. A restoration plan can still break down if crews, materials, access routes, outage intelligence, and customer priorities are assembled too late.

Resilience becomes operational when the grid can be understood, prioritized, and coordinated under changing conditions.

The next resilience problem is therefore not only a question of assets. It is a question of operating intelligence: the connective layer that lets utilities see the physical grid, the digital systems that describe it, and the human workflows that keep it running as one decision environment.

Planning to operations

Resilience is becoming an operating discipline

For decades, resilience has been managed through long-range plans, reliability programs, capital budgets, vegetation cycles, emergency exercises, and after-action reviews. Those practices still matter. The issue is that the conditions they are trying to manage now change faster than traditional planning cycles.

A forecast can alter field readiness in days. A transformer shortage can change restoration strategy. Load growth can concentrate around a substation before the capital plan catches up. A new industrial customer, a data center, or an electrification cluster can change the risk profile of a circuit that previously looked stable.

That means resilience can no longer live mostly in static plans. Utilities need continuous visibility into which feeders face wind, flooding, heat, vegetation, or wildfire exposure; which substations are carrying load outside original assumptions; which assets show deteriorating condition; which work has been deferred because of labor, materials, or access; and which failures would create the largest customer, safety, financial, or regulatory consequences.

Those questions cannot be answered well by disconnected reports. They require a shared operating model that connects asset hierarchy, network topology, geospatial exposure, telemetry, outage history, inspections, work management, crew capacity, material availability, and customer criticality.

Planning Programs, budgets, forecasts, reliability studies
SEAM
Operations System state, outages, crews, switching, materials
Known risk Exposure, asset health, historical failure patterns
TRANSLATION
Executed work Prioritized work packages before the next event

Without that connective layer, resilience work fragments across departments. Each team may understand a piece of the problem, but no one has a complete enough picture to decide what matters most before conditions deteriorate.

Modeling challenge

The grid is harder to model operationally

Distribution systems were not originally designed to behave like highly instrumented, bidirectional, software-mediated networks. Traditional planning assumed centralized generation, radial distribution patterns, relatively predictable demand, and limited real-time visibility below the substation.

That model is being challenged by rooftop solar, batteries, flexible industrial loads, electric vehicles, microgrids, data center demand, and weather-driven peaks that move faster than planning assumptions. The result is not only an infrastructure problem. It is a modeling problem.

A transformer is not just a record in an asset system. It belongs to a feeder, serves a group of customers, carries a load profile, has a maintenance history, sits in a specific geography, and may become more critical when nearby demand, switching configurations, or exposure conditions change.

The same logic applies at the system level. A storm forecast is not useful until it can be translated into likely impacts across assets, terrain, access routes, customer segments, and restoration resources. A wildfire layer is not enough unless it is connected to conductor type, vegetation proximity, wind exposure, device settings, inspection status, and emergency procedures.

Operational model From records to decisions
Asset Age, condition, hierarchy, maintenance history
Topology Feeders, switching, protection, downstream customers
Exposure Weather, wildfire, flooding, vegetation, terrain
Work Open orders, deferred work, crews, access, materials
Decision layer Risk ranking, scenarios, evidence, action
Outcome Restoration, reliability, learning, model improvement

Many utilities already have the necessary facts somewhere. The difficulty is that those facts are not structured to support fast, evidence-backed operating decisions. When the model of the system is incomplete, decisions become dependent on manual synthesis, institutional memory, and heroic coordination during the event itself.

Data readiness

Data volume is not the same as readiness

Most utilities already operate a large and growing technology estate: SCADA for telemetry and control, ADMS for distribution operations, OMS for outage management, AMI for meter-level signals, GIS for location and network context, EAM or work management for maintenance, ERP for materials and finance, customer systems for service context, and specialized tools for vegetation, inspections, forecasting, and reliability analytics.

The problem is not the absence of data. The problem is that the data often fails to form a trustworthy operating picture. Asset records can lag field reality. Work systems can record completion without showing operational risk. Weather exposure can be modeled separately from feeder-level consequences. Inspection findings can stay disconnected from capital prioritization. Restoration estimates can still depend on experienced people assembling context by hand.

A useful resilience architecture treats operational context as a first-class asset. Asset identifiers have to be reconciled across systems. Geospatial data has to be accurate enough for field and hazard analysis. Time stamps have to be normalized across telemetry and event data. Work records have to carry enough structure to support risk-based prioritization.

  • Asset identity has to be consistent across planning, operations, GIS, work management, and finance.
  • Geography has to be precise enough to connect infrastructure to weather, vegetation, terrain, and access.
  • Events have to be time-aligned so teams can compare telemetry, outage signals, crew movement, and restoration actions.
  • Work history has to describe operational consequence, not just task completion.

Without that foundation, analytics and AI outputs become difficult to validate, difficult to trust, and difficult to use in actual workflows. More dashboards do not solve the problem if the underlying context remains fragmented.

Technical foundation

The foundation is an operational intelligence layer

The practical answer is not another generic data lake or dashboard. Energy resilience needs a decision-oriented system that connects source data to the operating questions utilities have to answer before, during, and after disruptive events.

At the data level, that layer has to combine batch, streaming, and event-driven patterns. Historical outage records, inspection results, asset condition, capital plans, vegetation cycles, and maintenance history can often be handled through warehouse or lakehouse structures. SCADA events, AMI last-gasp signals, switching activity, weather alerts, and crew status updates require more real-time event processing.

Geospatial intelligence has to be central rather than incidental. Resilience depends on where assets sit, what surrounds them, how crews reach them, which customers they serve, and what external conditions are likely to affect them. Location is not decoration; it is one of the main dimensions of operational risk.

At the modeling level, utilities need entity resolution, topology-aware analytics, time alignment, and risk scoring that can operate across domains. A feeder-level risk model should combine asset age and condition, recent faults, vegetation exposure, forecasted weather, customer criticality, switching options, historical outage behavior, and open work orders.

The objective is not to build a perfect digital twin as an abstract technology project. The objective is to build an operational representation of the grid that is current enough, accurate enough, and explainable enough to support real decisions.

AI placement

AI belongs inside the operating model, not on top of it

AI can help utilities manage resilience, but only when it is grounded in the operating model. Language models can summarize inspection notes, extract structured findings from field reports, compare current conditions to historical outage patterns, support restoration communications, assist engineering review, and help planners explore scenarios.

Predictive models can estimate asset failure probability, outage likelihood, storm impact, vegetation risk, load growth, material demand, and restoration windows. But those capabilities only become operationally valuable when they are tied to governed data, workflow integration, decision rights, and evidence trails.

A recommendation to prioritize a feeder for vegetation work should show the evidence: proximity, growth rate, weather exposure, outage history, customer impact, inspection findings, and pending work. A restoration estimate should expose assumptions around damage severity, crew travel, material availability, access constraints, weather progression, and confidence range.

This is the difference between AI as a demo and AI as an operating capability. In utility environments, an output has to be explainable to engineers, useful to operators, safe for field teams, and defensible to executives, regulators, and customers. That requires lineage, validation, monitoring, versioning, and human review from the start.

AI does not fix fragmented operations. It amplifies the quality of the operating model it is placed inside.

Execution

Resilience depends on role-specific execution

A resilience platform has to serve different roles without creating different realities. Distribution planners, operators, field supervisors, vegetation teams, supply chain leaders, engineering groups, and customer operations each need a different lens on the system. But they should be working from the same underlying facts.

Distribution planners Evaluate exposure, reliability trends, load growth, asset health, and capital trade-offs over multiple years.
Operations teams See current state, outage predictions, switching options, crew status, and restoration confidence during events.
Field supervisors Confirm job readiness, safety conditions, access constraints, material availability, and work package clarity.
Vegetation teams Connect growth models, inspection imagery, weather exposure, outage drivers, and prioritization logic.
Supply chain teams Forecast demand for transformers, poles, conductors, fuses, reclosers, and other critical materials.
Customer operations Understand critical facilities, customer impact, communication confidence, and restoration uncertainty.

Resilience failures often happen at the seams. A risk can be visible in planning but not converted into work. A work order can exist but remain deprioritized before storm season. A material constraint can be obvious to supply chain but absent from restoration planning. A critical customer can be known to customer operations but not reflected in feeder-level prioritization.

Operational intelligence reduces those seams by making relationships visible. The planner’s risk view, the operator’s event view, and the field supervisor’s work view should be different projections of the same operating model.

Workforce

The workforce constraint makes system design more important

Utilities are also dealing with a knowledge problem. Experienced engineers, operators, planners, and field personnel know how the system behaves, which areas are difficult to access, which assets are recurring problems, which restoration estimates are realistic, and which historical workarounds are safe. Much of that knowledge is carried in people rather than systems.

As experienced workers retire or move into new roles, utilities risk losing institutional memory that rarely appears in structured records. Operational intelligence can help preserve and scale that knowledge, but only if it is designed to support professional judgment rather than bypass it.

Systems should capture decisions, assumptions, exceptions, field observations, engineering rationale, and after-action lessons in a reusable way. When a future event resembles a previous event, teams should be able to find the comparable case, see what actions were taken, understand what worked, and avoid repeating mistakes.

This is especially valuable for newer engineers and operators who need to learn complex systems quickly. A well-designed resilience platform becomes a practical learning system by connecting data, standards, historical events, expert judgment, and workflow context.

Operating questions

What resilient grid operations should be able to answer

A mature resilience capability should help utilities answer questions that cut across planning, operations, field execution, and customer response. These questions are not narrow analytics tasks. They are operating questions that require context, evidence, workflow, and accountability.

  1. Which feeders are most exposed to the next weather event, and why?
  2. Which assets combine high likelihood of failure with high consequence of failure?
  3. Which planned work would reduce the most risk if completed before storm season?
  4. Which materials are likely to constrain restoration under different damage scenarios?
  5. Which substations are seeing load growth that changes their resilience profile?
  6. Which circuits serve critical facilities, and how do switching options change under outage conditions?
  7. Which restoration estimates are supported by historical evidence, and where is uncertainty too high?

Answering those questions requires more than models. Weather forecasts have to become asset-level exposure estimates. Asset health indicators have to become risk-ranked work recommendations. AMI and outage signals have to become damage inference. Crew and material constraints have to become restoration scenarios. After-action reviews have to become improvements in models, procedures, and planning assumptions.

The value is not in any single tool. The value is in the operating loop: sense, interpret, prioritize, act, and learn.

The real imperative

The next resilient grid will be defined by what it can understand.

The grid still needs investment in generation, transmission, distribution, automation, storage, hardening, and modernization. But those investments will not deliver their full resilience value unless utilities also improve how they operate the system.

Energy resilience should be understood as the ability to maintain and restore service under changing conditions through infrastructure strength, situational awareness, risk-based prioritization, coordinated execution, and continuous learning.

That definition clarifies the technical challenge. Utilities need better asset models, better geospatial intelligence, better event processing, better work integration, better evidence lineage, and better decision support for the people responsible for operating the grid.

The next generation of resilient energy systems will not be defined only by what gets built in the field. It will also be defined by the intelligence layer that helps utilities understand vulnerability, choose the actions that matter most, and coordinate people, materials, and infrastructure when conditions become difficult.