From commanding a nuclear-powered fast-attack submarine to leading hyperscale data centre projects at AWS, Meta and Oracle, Tony Grayson brings a unique blend of precision, resilience and technical expertise to AI infrastructure.
Now President and General Manager at Northstar Federal & Northstar Enterprise & Defense, Grayson (above, right) will be delivering the Principal Keynote at the Cloud & Datacenter Convention 2025 in Sydney, examining how reinforcement learning (RL) compute and agentic AI are driving the shift to distributed computing.
W.Media sat down with him to discuss what these developments mean for infrastructure design, rapid deployment, sustainability, and the future demands on telecom and cloud operators.
W.Media: As someone who built and scaled a company to millions in contracts, what advice do you have for organisations planning AI infrastructure investments today? How should they structure these investments to maintain flexibility for pivots, exits, or technology shifts over the next 3-5 years?
Grayson: From my experience at NorthStar and scaling EdgePoint Systems, agility must be embedded at every layer to keep pace with AI’s rapid change. First, use modular, incremental builds over monolithic ones. Modular data centres (MDCs) deploy in 3–9 months at ~US$7–9m per MW (US/Australia averages), versus 18–24 months and US$12–15m per MW for hyperscale. This avoids over-provisioning and stranded capacity from hardware refresh cycles. Think of your design as Lego blocks: standardise components for mass customisation while ensuring maintainability.
MDCs also help bypass 6–18 month permitting delays; prefabricated builds can cut on-site timelines by 50–70% and avoid full environmental reviews, enabling brownfield retrofits or edge sites near substations to sidestep grid queue delays like PJM’s multi-year backlogs. Second, choose vendor-agnostic, hybrid/multi-cloud architectures to avoid lock-in – accommodating NVIDIA, AMD, Groq and standards like ONNX. Consider Opex leasing for MDCs to support exits or pivots.
Third, budget 20–30% for emerging tech like distributed reinforcement learning (RL) and agentic AI-RL can cut latency under 10 ms and, per McKinsey, agentic AI could hit US$50bn by 2030 (45% CAGR). Use scenario planning frameworks to map “what-if” scenarios and plan for inference-heavy workloads, projected at 60–80% of AI spend by 2030 (US$254bn, 17.5% CAGR). Finally, link all investments to ROI – MDCs can cut build costs by 40–60% and accelerate revenue through faster deployment. Remember, technology alone doesn’t generate revenue; adaptable infrastructure does.
W.Media: Given your emphasis that “silicon moves faster than steel and concrete,” how should organisations balance long-term infrastructure investments with the reality that silicon refreshes every 12-24 months? What planning frameworks work best for this paradox?
Grayson: The phrase “silicon moves faster than steel and concrete” sums up the biggest challenge in AI infrastructure. Hardware refreshes every 12–24 months, which is far faster than traditional data centre build cycles, and that can leave you with millions in stranded assets if you’re not careful. The key is to decouple your infrastructure from any single generation of silicon. That’s becoming even more important as we enter the Rubin era, with power densities pushing from 800 kW to 1.5 MW per rack for Hopper or Grace Blackwell systems – and those racks can weigh twice as much as today’s.
One approach I’ve used is what I call a “modular refresh cycle,” breaking infrastructure into pods that can be upgraded without disrupting the entire site. At NorthStar, our modular data centres support racks from 30–132 kW and use advanced liquid cooling, so we can roll new silicon in and out as needed. We plan in 18–24 month horizons but model over a five-year lifecycle, factoring in the 20–30% opex savings modularity delivers. Use frameworks like Monte Carlo simulations for silicon price volatility and sensitivity analysis for refresh impacts to navigate this uncertainty.
The competitive landscape is shifting too: while NVIDIA dominates training via CUDA, AMD’s MI400X is challenging in inference, and custom silicon like Groq may optimise further. RL training (as in Grok 4) favours distributed compute, reducing centralisation needs.
We’re focused on GPUs today, but the future is disaggregated architectures with Compute Express Link (CXL), which lets CPUs and GPUs pool memory on demand. Traditional GPUs tie HBM to each chip, causing stranded capacity and 30% higher switch costs. CXL delivers more than 30% better performance per watt and cuts total costs by 20–30%. I’ve seen MDCs running CXL-pooled accelerators improve throughput by 25% over GPU-only setups, while open frameworks like ONNX help avoid vendor lock-in.
Each technology wave – GPUs, distributed RL, quantum (potentially needing specialized shielded facilities) – demands different infrastructure. The old data centre assumption of 20- to 30-year customer lifecycles is dead. AI moves too fast, and you have to design for 3- to 5-year obsolescence risks. Modularity and flexible opex models keep your steel and concrete serving as a foundation, not a cage.
W.Media: In a recent article you wrote, you challenged the industry to ask “how are you going to make money?” before building massive AI infrastructure. What specific financial metrics and ROI models should organisations use when planning for unpredictable AI workload requirements?
Grayson: The challenge boils down to this: language models don’t inherently monetise unless you’re a cloud service provider or neo-cloud – inference is where the revenue lies. Capex matters, but opex will dictate sustainability. To plan amid workload unpredictability, you need to focus on metrics that tie directly to value creation.
The key Metrics to track include prioritising TCO per inference – target 30-70% reductions via custom silicon like Groq LPUs, which can generate up to 50x more revenue (think US$15,500/day per rack versus US$310/day for H100 equivalents). Monitor Power Usage Effectiveness with PUE under 1.2 for MDCs versus 1.5+ for legacy sites. Track capex per MW – modular builds offer significant cost advantages over traditional approaches. Watch stranded capacity risk, which can hit US$100-500 million for inflexible builds.
Don’t forget inference throughput – Groq can deliver 100,000+ tokens/sec versus around 2,000 on H100 – and energy efficiency measured in tokens/sec/kW.
For ROI models, implement what I call a “Phased Payback” approach. Compute your Internal Rate of Return over five years – you want 25%+ for a 1MW MDC with NVIDIA B200 GPUs, which could potentially yield US$3.4 million in margins from AI services. Use Net Present Value to discount future cash flows, factoring in the inference market’s projected 17.5% CAGR to US$254 billion by 2030. Account for opportunity costs – faster modular deployments can save you millions in delayed revenue.
For uncertainty, leverage Monte Carlo simulations on workload shifts like the projected 60-80% inference dominance and silicon pricing fluctuations – H100 rates dropped from US$4/hour to around US$0.9/hour.
Scenario-based ROI is essential: base case assumes centralised training, optimistic case factors distributed RL slashing costs by 35%, pessimistic case accounts for 20% capacity stranding. Always ground everything in revenue fundamentals: What’s your dollar-per-token or dollar-per-query yield? Custom chips often deliver 10-25x better cost-per-inference, making them a solid hedge against commoditisation.
W.Media: You’ve advocated strongly for modular data centres over traditional giga-campus approaches. What are the key decision criteria organisations should use to determine when modularity makes sense versus when scale economics favour larger, centralised infrastructure
Grayson: Modularity excels when agility outweighs raw scale economies. Frankly, I’m skeptical that sheer scale is always necessary – AI accelerators have doubled PFLOPS roughly every six months, suggesting token volumes may plateau amid data scarcity, while RL compute pushes toward distribution.
From a cost and risk perspective, modularity wins for under US$10 million per MW with 35–60% TCO savings, avoiding US$100 million-plus in stranded assets from tech shifts. Centralised infrastructure works for ultra-low US$/kWh at 100 MW-plus scales, but carries higher upfront risk. Workload type matters too – edge inference and distributed RL, like Grok 4, favour MDCs for sub-10 ms latency, while massive pre-training still demands hyperscale’s bandwidth and density. Scalability also plays a role: choose incremental pod additions for volatile demand, centralised for predictable, high-volume training.
On sustainability and sovereignty, MDCs integrate renewables more easily – achieving superior PUE performance and 40–60% less embodied CO2 – and enable data locality. Modularity can achieve 20–30% lower embodied carbon than traditional builds through prefabrication, recycled materials, and reduced waste.
Geography often decides so opt for edge or regional modularity for latency-sensitive applications, centralised for bulk compute in energy-rich areas. Often a hybrid model, with MDCs augmenting hyperscale, strikes the best balance. Modular approaches also offer deployment advantages through streamlined permitting processes.
W.Media: With chips showing 50x+ revenue potential over traditional GPUs, how should organisations hedge their bets across different silicon architectures (NVIDIA, AMD, Groq, AWS Inferentia) when the “winner” is still unclear?
Grayson: As custom chips like Groq’s demonstrate 50x revenue edges and AMD’s MI300X gains traction in inference, hedging is about building agnosticism into your stack. Deploy only for immediate needs, because land acquisition and permitting timelines can bottleneck pivots. Standardise designs so you can quickly swap between architectures, and make sure you have clear upgrade paths. Support both greenfield and brownfield sites for versatility. At NorthStar, our MDCs accommodate NVIDIA, AMD, Groq, and AWS Inferentia through flexible 30–132 kW+ racks and ONNX compatibility.
Refresh strategy should align to 12–18-month cycles, and MDCs allow downtime-free rollouts. Diversify partnerships to gain access to betas and co-development opportunities, and monitor ecosystem shifts. NVIDIA’s CUDA dominance in training may not hold in inference, where efficiency trumps generality. AMD’s Developer Cloud, launched in June 2025, is a good example – ROCm 7 enhancements, MI350X delivering up to 35x better inference over prior generations in 2025, and MI400X in 2026, offer aggressive pricing and open ecosystems that rival NVIDIA’s DGX Cloud. This accelerates alternatives for inference and can lower TCO by delivering 40% more tokens per dollar.
W.Media: Given your nuclear submarine background and advisory work on using SMRs and micro-reactors for edge AI workloads, what role do you see nuclear power playing in the future of data centre infrastructure, particularly for mission-critical and carbon-free AI deployments? How should DC operators view renewables in the meantime?
Grayson: From my nuclear submarine command and SMR advisory roles, nuclear will be transformative for resilient, carbon-free AI infra, especially at the edge. SMRs and micro-reactors could power MDCs by 2035 which is very different from the marketing out there, offering baseload energy for sovereign, mission-critical deployments. Realistically, timelines vary: Gen III+ reactors (with passive cooling) are deployable now, micro-reactors may hit scale by 2027-2028, while Gen IV faces hurdles in design approval, testing, and fuel sourcing.
By 2035, we might see 10 MW racks equating to today’s 3 GW in PFLOPS, amplifying nuclear’s appeal. In the interim, treat renewables as a vital bridge: integrate solar/wind for 40%+ of MDC energy mixes, bolstered by batteries and microgrids for stability. Explore natural gas as a reliable backup; hydrogen’s potential remains, though slower than anticipated. This hybrid path ensures carbon goals without compromising uptime. Sustainability Metrics Beyond PUE: To complement renewables, consider embodied carbon reductions via modularity (20-30% lower) and water efficiency – AI could demand 4.2-6.6 billion cubic metres globally by 2027, but closed-loop liquid cooling in MDCs recycles 90-95% of water, addressing 1-5 litres per query consumption.
W.Media: You mentioned that Grok 4’s approach to distributed reinforcement learning could change deployment models. How should organisations prepare for a potential shift from centralised hyperscaler dependency to more distributed, edge-focused AI architectures?
Grayson: Grok 4 marked a big shift in training balance. Earlier LLMs focused mostly on pre-training, with only light reinforcement learning from human feedback (RLHF). Grok 4 used around 100× more total compute than Grok 2, splitting it equally between pre-training and RL, and delivering state-of-the-art results on benchmarks like Humanity’s Last Exam. RLHF improves a model’s reasoning depth, and Grok 4’s multi-agent RL – where agents debate answers or simulate reasoning paths – has shown strong performance for low-latency, edge-focused use cases.
Unlike pre-training, which needs monolithic clusters, RL workloads are more parallel and tolerant of latency. The workflow splits into three roles: rollout workers (generate outputs), evaluation nodes (score outputs), and learner nodes (update parameters). Rollout workers and evaluators can handle tens to hundreds of milliseconds latency and run on older or commodity hardware, while new methods like GRPO cut inter-node communication and eliminate separate critic models, lowering TCO by 10–20%.
INTELLECT-2’s demo showed a 32B-parameter RL setup reducing response times by 15% and failed requests by 24%. Models up to 10-30B parameters can run full RLHF on single GPUs, while even 70B+ models can distribute across cheaper resources using frameworks like openRLHF, TBA, and Ray RLlib.
To prepare: invest in Edge MDCs for <10ms evaluations using frameworks like OpenRLHF or GRPO; pilot distributed RL workflows across regional pods (potentially cutting TCO by 35%); adopt a hybrid strategy keeping hyperscale for pre-training while shifting RL/inference to edge for sovereignty and cost savings; and build tooling with federated learning and vendor-agnostic designs for swift adaptation.
W.Media: What lessons can Australian data centre operators take from the way the industry is developing in the US?
Grayson: Australian operators should heed US pitfalls like overbuilds leading to stranded assets, pivoting instead to modular/edge models for superior agility. Adopt modular approaches for rapid deployment, enhanced sustainability via renewables, and a focus on RL/inference—use vendor-agnostic pods to hedge silicon volatility. Emphasise sovereign AI through local edge infrastructure.
The core US lesson is that in AI’s whirlwind evolution, prioritise modularity over mass – craft adaptable, distributed systems to keep pace with silicon and evade obsolete monoliths. Flexibility is the ultimate edge. For Australian operators, this means prioritising compliance with APRA regulations for financial stability and data protection, alongside global standards like GDPR for cross-border ops. In distributed RL setups, incorporate cyber hardening—e.g., zero-trust architectures and encrypted federated learning—to safeguard against threats in edge environments, ensuring sovereignty without sacrificing performance.
W.Media: What one lesson would you hope attendees will take away from your Keynote in Sydney?
Grayson: The future of AI infrastructure is unpredictable – cultivate the agility to pivot swiftly, minimising long-term capex in deployments where rapid market shifts could erode returns. Modularity isn’t just a tactic; it’s the principle for thriving amid uncertainty.
Event details: Sydney International Convention Centre, 21 August 2025, 8:00am–8:30pm.
Register here: https://clouddatacenter.events/events/sydney-cloud-datacenter-convention-2025/