Unlocking a post-Moore's Law approach to computing

Introducing XONAI

Just over a year ago, we launched XONAI Computing to create a new type of platform that transparently optimises execution performance of demanding cloud workloads. Last week, the team released their first benchmark results: They’ve outperformed even our most optimistic expectations, with performance that’s over twice as fast as standard Apache Spark and over 40% faster than Amazon’s EMR platform for big data, whilst instantly slashing compute costs. This article covers why this matters.

It’s led by Brock and Leandro, co-founders with previous experience in optical physics and in accelerator design at Intel and Graphcore; and working with our friends and early investors with first-hand experience of the pain, such as Mehdi Ghissassi (Head of Application at Google DeepMind, the lead on optimising Google’s cloud and Atomico investor) and Martin Gould (Head of Product for Spotify’s Content Platform).

Why optimising cloud computation matters

Cloud computation underpins much of the online modern world, including big data analytics and AI workloads: From social media and E-commerce to banking and the scientific analysis behind genomics and aircraft design. Data scientists and developers face real commercial challenges in the time required to analyse data, often overnight, with constraints in business objectives and costs in the tens of millions of dollars per day. 

While long promoted as the answer to companies computing requirements, the cloud has reached the point of being an economic burden. a16z wrote a great article on this, highlighting that the economic cost is likely in the hundreds of billions of dollars, money which currently flows to Amazon and the likes but which otherwise would be more gainfully deployed in the market, R&D or the consumer’s pocket. The cloud providers have little incentive to improve efficiency because they charge based on compute time used. It is also highly damaging to the environment, with cloud computing alone responsible for 3.7% of global carbon emissions (more than air travel!), with 44 million tonnes produced by Amazon alone each year, whilst the constant electronic waste from recycling hardware sends rare precious metals to landfill. 

Cloud computing is fulfilled by only a small number of providers, such as Amazon, who own their own big data distributed processing frameworks, such as Apache Spark, which is used by over 1,000 customers, including Shopify, Yelp and even NASA’s Jet Propulsion Laboratory.

Critically, the monopoly-like behaviour of the few cloud suppliers limits innovation more broadly. We see it regularly in our own portfolio; it is often financially impossible to run all of the analysis that we would want to run. This is particularly evident in training large language models like GPT-3. These almost superhuman models have rightly garnered significant attention but the processing power required to train these models, even when carefully lined up to match current cutting-edge processors, costs millions of pounds. That again locks this capability in the hands of the same few companies.

The only partial solutions available to them today require time and effort for migrating into closed ecosystems and drive up costs even further.

Overcoming Moore’s Law

The growth of computing power has largely followed Moore’s Law since it was first observed in 1965, incrementally increasing available processing power by reducing transistor size and increasing density. With the die size now at 3 nanometres, it is bumping up against the limits of physics and heat dissipation. Incumbent solutions are often more “hacks” than solutions - one example being turning areas of a chip off temporarily to reduce power demand.

In general, there is an implicit trade-off between flexibility of use cases and speed of performance, with big data opting for more efficiency at the expense of flexibility. For example, AI chips are extremely fast at AI matrix multiplications, but quite useless for serving an image on a website. One day Quantum may be many, many times faster at a tiny subset of functions but will still require more general computation to interface with the rest of the application.

Building a universal compute fabric

The fast-growing availability of application-specific hardware in the market is creating a new problem: the hardware environment to run software is becoming increasingly more plural, whilst the development tools provided by vendors are hardly interchangeable, requiring duplicate engineering effort to execute the same logic in different chips. The established paradigm in compute-intensive software (such as in big data and AI) is to run entirely on a specific chip. This is called heterogeneous compute: Optimally running software on specific hardware. 

This does not guarantee that the best chip will be dynamically and optimally selected to perform an execution step and, even on a single chip, execution performance is hardly ever optimal; often as a consequence of intermixing distinct software libraries or failing to optimally combine processing steps across data pipelines. This results in raw processing power being underutilised, driving up costs and emissions. What is required is a layer that allocates the compute-intensive software to the optimal hardware: A universal compute fabric.

XONAI is solving this problem with a new type of platform that transparently performs code optimization across data processing pipelines and selects the best hardware to run individual execution steps. 

With an initial focus on big data analytics and AI workloads, their benchmark results indicate that their engine instantly provides more than 2x faster execution of Apache Spark workloads, with no changes to code or infrastructure. 

With XONAI, organisations can leverage a new type of solution that doesn’t require any changes to code or infrastructure, often just running off of a template machine “image” on existing cloud services, something that can be tested or activated on a 10-minute call. They are rolling out trials with the first prospecting customers within the next 2 months, who altogether spend over $300m on public cloud services each year.

For more information on their initial results, the team has covered them extensively in this blog post.

This is the first step of a much larger vision and product. The platform currently supports general-purpose hardware (such as Intel and Arm CPUs), exploiting built-in accelerator features such as vectorization and employing a number of automatic code optimizations. In reality, data processing pipelines are littered with functions that can be broken down into much faster primitives and deployed across faster hardware, but that would require specialised developers and resources to achieve. 

Such specific framework utilisation, for example CUDA in deep-learning, is only ever viable where the majority of the programme requires the same small set of functions (again, trading generality and freedom of hardware selection for speed). XONAI achieves the same goal universally and transparently. 

Their next step is to expand their hardware support (such as GPUs) and leverage Kubernetes to deliver multi-cloud, serverless and modularized solutions for powering data pipelines at scale, ultimately realising a universal compute fabric. This will free developers from needing to think about the intricate details of hardware selection and cluster management to achieve optimal execution performance. 

Creating vertically integrated companies

As a venture creator, Deep Science Ventures led the initial discovery, branching from exploring the quantum computing space and speaking to hundreds of scientists and developers before hitting on the ideal solution and building out the early team behind XONAI. We believe in the power of ecosystems, whether that’s synergistic therapeutics companies in our Pharma team, or our carbon capture and valorisation companies in our Climate team - we’re taking a similar approach in our Computation team.

As XONAI develops this universal compute fabric, we are actively working on identifying the optimum hardware for future computational loads and how that will develop and grow over time. This will involve radically rethinking our historical approach of incremental efforts, e.g. moving cache a few nanometres closer to the processing components, to novel approaches that remove the trade-off between flexibility and efficiency and can seamlessly adapt to evolving loads.

Over the coming months, we will be deep diving into alternative semiconductor substrates and reconfigurable architectures. If you’re interested in joining the conversation alongside the XONAI team, senior execs in the semiconductor industry and researchers in the space, request access to the community here

We’re particularly interested in people with backgrounds in non-volatile memories, alternative semiconductor materials and low-level reconfigurability in IMC, FPGAs and such, but, as always, we’re open to ideas.