Fractile raises US$ 220 million for AI inference hardware

May 15, 2026 at 12:43 PM GMT+8

Fractile, a British AI hardware startup, has raised US$ 220 million in new funding to develop chips and systems designed for large-scale AI inference workloads. in a market that is becoming a bottleneck because current hardware and frontier models generate longer outputs and use more compute at inference time. 

According to a company press release, Fractile is focused on accelerating inference for large language models (LLM) and reasoning systems. Current GPUs struggle with the memory bandwidth and latency requirements of long-context inference, particularly as advanced AI systems increasingly generate millions of tokens per task. Fractile argues that inference, rather than model training, is becoming the main infrastructure constraint for frontier AI systems. 

The startup says that its hardware is being designed to increase inference throughput from 40 tokens per second on current systems to more than 1,000 tokens per second for large-context workloads, its approach spans chip design, memory architecture, systems engineering, and foundry-level optimization. 

Workloads such as reasoning models, agentic coding systems, and other applications require long chains of sequential inference. Some AI systems are already producing outputs approaching 100 million tokens, which can take weeks to complete on existing hardware.

The funding round was led by Accel, Factorial Funds and Founders Fund, with participation from Conviction, Gigascale, 01A, Felicis, Buckley Ventures and 8VC.

The broader argument reflects a growing view inside the AI industry that inference is becoming both the main revenue engine and the largest infrastructure cost for frontier AI companies. That trend has intensified demand for specialized hardware beyond traditional GPU architectures as models become more capable, companies are increasing the amounts of compute deployment during inference to improve reasoning quality, autonomy and reliability.