When the plant learns to run itself: reinforcement learning agents in desalination digital twins - Smart Water Magazine

Connecting Waterpeople

Premium content

13/05/2026

When ACCIONA and Siemens commissioned the Al Khobar 1 desalination plant in Saudi Arabia in 2020, they did something the industry had never seen: they ran the entire start-up sequence remotely, from Madrid, through a digital twin processing more than 40,000 real-time signals. With engineers unable to travel during COVID-19, the simulation platform held the plant together, virtually testing control programmes, running start-up sequences, and training operators before a single litre of water was produced. The plant hit its first-water milestone on schedule and went on to achieve energy consumption below 4 kWh/m³.

On the other side of the world, IDE Technologies' digital twin at the Carlsbad desalination plant in California, the Western Hemisphere's largest, uses five years of operational data to model membrane biofouling at the individual element level, projecting up to $1.5 million in maintenance savings over five years. In Singapore, Gradiant's SmartOps AI platform is targeting a demonstration facility with energy consumption below 2 kWh/m³, against an industry benchmark of 3.5 kWh/m³.

These are the most sophisticated digital twin deployments in desalination today. None of them uses reinforcement learning (RL). That is about to change.

What digital twins do and cannot do

The current generation of desalination digital twins operates across three broad modes. The first is engineering simulation: a physics-based replica of the plant used for commissioning, operator training, and control system testing, exemplified by ACCIONA's Siemens SIMIT platform at Al Khobar 1, which models every hydraulic system from seawater intake to product water tank. The second is predictive monitoring: machine learning models trained on operational data to forecast equipment behaviour, detect fouling onset, and flag anomalies before they become failures. IDE's membrane degradation model at Carlsbad sits here, as does Gradiant's membrane cleaning prediction at PUB Singapore's Bedok NEWater Factory, which achieved 98.1% accuracy in predicting when cleaning was needed. The third is optimisation: AI tools that recommend or automatically adjust operating setpoints to reduce energy or chemical consumption. ACCIONA's ACRRO® system applies differential equation modelling to identify optimal configurations for RO racks, paired with a real-time ML tool called Insight that adjusts process parameters continuously, together forming what the company calls a dual-model optimisation system, deployed at a facility in Qatar. Gradiant's algorithms, derived from its 2022 acquisition of Canadian ML startup Synauta, deliver setpoint recommendations to operators at plants run by ENGIE, PUB Singapore, and others; at a large SWRO facility in the Middle East, they achieved up to 5% energy savings verified through the ISO IPMVP® measurement protocol, with savings reaching 18% at smaller plants in Australia.

Digital twins run at Al Khobar 1, Carlsbad and PUB Singapore but all operate in advisory mode, recommending actions rather than taking them

Pani Energy, a Victoria, BC-based startup whose Zed platform operates across more than 100 facilities in ten countries, targets up to 20% energy reduction and extends membrane life by an estimated 20 to 35% through condition-based cleaning schedules. Its partnership with Aquatech International targets seawater RO energy consumption of 2.7 kWh/m³, against the current standard of 3.5 kWh/m³. Veolia's Hubgrade platform connects more than 50,000 assets across 5,000 sites globally, though its depth in desalination specifically lags its wastewater footprint.

What this entire generation of systems shares is a fundamental constraint: they are reactive, not adaptive. They learn from historical data, but they do not learn from the consequences of their own actions. When conditions fall outside the range of their training, such as an unusual algal bloom, an unexpected shift in feed salinity, or a combination of equipment states not previously encountered, they cannot discover new strategies. They can only extrapolate from what they already know. Reinforcement learning is a different kind of machine intelligence.

The RL frontier

Where supervised machine learning learns from labelled examples, reinforcement learning learns by doing. An RL agent takes actions, observes outcomes, receives rewards or penalties, and gradually discovers policies that maximise long-term performance, without being told explicitly what those policies should be. Applied to a desalination plant, the implications are considerable. An RL agent controlling RO operations would not need to be programmed with the optimal pressure for a given salinity and temperature combination. It would discover that optimum and continue refining it as conditions changed. It would learn the relationship between today's membrane cleaning decision and next month's energy bill. It would find operating strategies that no human engineer had thought to try.

The academic results are striking. A 2025 paper from Korean researchers, the first dedicated multi-agent reinforcement learning (MARL) study for a two-stage RO system, trained QMIX and VDN agents (two cooperative MARL algorithms) on a numerical model calibrated against real operational data from an industrial plant. The agents achieved lower specific energy consumption than a single-agent baseline while meeting production targets, demonstrating that multi-agent architectures can capture the interactions between RO stages that single-agent approaches miss.

Reinforcement learning is the only class of AI that improves through the consequences of its own decisions, unlike today's supervised ML tools

A cascade architecture developed at Qatar University paired a deep deterministic policy gradient (DDPG) agent for continuous pump pressure control with a discrete agent optimising setpoints against time-of-day electricity pricing; after 1,000 training episodes, the system outperformed proportional-integral-derivative (PID) control for flow tracking and reduced energy costs under variable demand.

The University of Alberta's benchmarking study found that a convolutional neural network-enhanced soft actor-critic (CNN-SAC) algorithm outperformed all competing approaches, including DDPG, proximal policy optimisation (PPO), and twin delayed deep deterministic policy gradient (TD3), for managing a renewable-powered RO system, with SAC's entropy-based exploration proving particularly robust under the uncertainty of variable solar generation. Separately, Shim et al. (2025) applied RL to optimise membrane cleaning schedules in an industrial RO process for ultrapure water production, a methodology directly transferable to desalination, achieving a 16.13% reduction in operating costs and a 139.53% increase in RO operating time through smarter clean-in-place (CIP) decisions alone.

The industrial infrastructure for RL is also beginning to materialise. In November 2024, AVEVA and NVIDIA published work coupling AVEVA's Dynamic Simulation platform, a first-principles process simulator, with NVIDIA's Raptor distributed reinforcement learning engine, originally developed to design arithmetic circuits for NVIDIA's own silicon. The demonstration involved a chemical distillation column: the RL agent stabilised the system after large feed composition changes in half the time a manual operator required. AVEVA defines an autonomy scale from Level 0 (fully manual) to Level 5 (full autonomy), with RL targeting Levels 3 to 5, and its published roadmap calls for piloting the integration with major energy companies before productising it through no-code workflows. The sectors listed include chemicals, refining, and process industries, which are desalination's closest process analogues. No water-specific deployment has been announced, but the tooling now exists. The question is no longer whether industrial RL is possible. It is who will apply it to water first.

The hard problems

Three obstacles stand between the academic results and a production RL deployment at an operational desalination plant. Mohtada Sadrzadeh, Professor of Mechanical Engineering at the University of Alberta and one of the leading researchers on RL for membrane processes, is direct about the timeline: "In the next three to five years, I expect we will begin to see early deployments in real plants, but in a supportive role. The move toward fully autonomous, closed-loop control will take longer, likely five to ten years." The constraint, he argues, is not algorithmic. "The reason for this isn't an algorithmic limitation. It's about trust. When you're producing drinking water, there is no margin for error."

The first multi-agent RL study for a two-stage RO system was published in 2025, calibrated against real plant data, showing lower energy use

RL-optimised membrane cleaning decisions achieved a 16% cost reduction and a 139% increase in RO operating time in simulation

The first obstacle is the simulation gap. Every RL result cited above was achieved in a digital twin, not a physical plant. When the agent moves from simulation to reality, it encounters dynamics its training never prepared it for. As Sadrzadeh puts it: "Real desalination plants are totally different. Feedwater quality fluctuates, membranes age, fouling develops in unpredictable ways, and small disturbances can propagate in complex ways." In process control, this is known as the reality gap: the difference between how a system behaves in a model and how it behaves in operation. The Korean MARL study is the closest the desalination field has come to addressing it, calibrating their simulation against real operational data rather than purely theoretical models, but the agents were not deployed on the physical plant. No desalination RL paper has yet employed domain randomisation or transfer learning to harden an agent against this gap. "The real technical challenge," Sadrzadeh explains, "is ensuring that an RL agent can carry what it has learned in simulation into these uncertain, time-varying environments and still behave reliably."

The second obstacle is regulation. Under the EU AI Regulation, AI systems used as safety components in the supply of water are listed in Annex III Section 2 as critical infrastructure, and therefore classified as high-risk under Article 6(2), requiring conformity assessment, technical documentation, transparency obligations, and human oversight mechanisms, with full compliance obligations for high-risk AI systems under Annex III applying from August 2026. What this means in practice for an RL agent actively controlling a treatment process remains undefined: no regulator has yet published guidance on the question. In the Gulf, where the largest desalination capacity sits, regulatory frameworks for AI in critical infrastructure are similarly nascent.

The hardware gains in desalination are largely exhausted and the next frontier is not in the membrane but in the algorithm that runs it

The third obstacle is trust. A plant supplying drinking water to hundreds of thousands of people is not an environment where operators will readily hand control to an algorithm that learned its policy in a computer model. The commercial digital twins deployed today, at Al Khobar 1, Carlsbad, and PUB Singapore, have earned operator confidence precisely because they work in advisory mode: they recommend, and humans decide. Moving to autonomous RL control requires what Sadrzadeh describes as a layered safety architecture, with strict technical limits on pressure, recovery, salinity, and flux that the agent cannot violate, a supervisory control layer that filters out any action posing a risk, and operators kept firmly in the loop. "The AI is not replacing control systems," he says. "It is refining and optimising them." The goal, in his framing, is not to hand over control but to build a partnership where human expertise and machine intelligence complement each other. The energy savings projected in simulation are compelling: a 139% increase in RO operating time through RL-optimised cleaning decisions alone. But realising them in production will require earning trust one carefully monitored step at a time.

The next efficiency frontier

The desalination industry spent two decades optimising membranes. The results were transformative: energy consumption fell from roughly 20 kWh/m³ in the 1970s to around 2.0 to 2.5 kWh/m³ for the RO stage in best-in-class plants today. The thermodynamic floor is approximately 1.07 kWh/m³. The remaining gap will not be closed by better membranes alone. It will be closed by better decisions about how to use them, decisions about pressure, flow, cleaning timing, and energy scheduling made thousands of times a day, optimised continuously against changing conditions.

Reinforcement learning is the only class of AI designed specifically to improve through trial-and-error interaction with its environment. The field's readiness level for desalination currently sits between Technology Readiness Level (TRL) 2 and 3, with the Korean MARL study reaching TRL 4 through calibration against real plant data. The jump to a pilot demonstration requires solving the reality gap, establishing a credible safety architecture, and finding an operator willing to go first. As Sadrzadeh puts it, the transition will not be a leap. It will be "a carefully guided evolution."

The first operator to run a reinforcement learning agent in production will not just save energy. They will redefine what it means to operate a plant.