Sunday, December 22, 2024

Roadmap: AI Infrastructure

Must read

Bessemer has had a long history of infrastructure investing: From partnering with chip and semiconductor leaders such as Habana and Intucell, to backing developer platform pioneers Twilio and Auth0 at the earliest stages, to participating in the modern data stack movement with open-source leaders such as HashiCorp and Imply. Today, another wave is upon us, with AI ushering a new generation of infrastructure tools purpose-built for enterprises leveraging AI in their platforms.

The AI revolution is catalyzing an evolution in the data stack

Machine learning has dramatically advanced in recent years— since the 2017 breakout paper “Attention is all you need,” which laid the foundation for the transformer deep learning architecture, we have now reached a Cambrian explosion of AI research, with new papers being published every day and compounding at an astonishing pace.

This tectonic shift in AI innovation is catalyzing an evolution in data infrastructure across many vectors. 

First, AI is powering the modern data stack, and incumbent data infrastructure companies have started incorporating AI functionalities for synthesis, retrieval, and enrichment within data management. Additionally, recognizing the strategic importance of the AI wave as a business opportunity, several incumbents have even released entirely new products to support AI workloads and AI-first users. For instance, many database companies now support embeddings as a data type, either as a new feature or standalone offering.

Next, data and AI are inextricably linked. Data continues to grow at a phenomenal rate to push the limits on current infrastructure tooling. The volume of generated data, especially unstructured data, is projected to skyrocket to 612 zettabytes by 2030, driven by the wave of ML/AI excitement and synthetic data produced by generative models across all modalities. (One zettabyte = one trillion gigabytes or one billion terabytes.) In addition to volume, data types and sources continue to grow in complexity and variety. Companies are responding by developing  new hardware including more powerful processors (e.g., GPUs, TPUs), better networking hardware to facilitate efficient data movement, and next-gen storage devices. 

Lastly, building on recent progress in ML and hardware, a new wave of AI-native and AI-embedded startups is emerging—these companies either leverage AI/ML from the ground up or use it to augment their existing capabilities. Unfortunately, much of current data infrastructure and tooling is still not optimized for AI use cases. Similar to forcing a square peg into a round hole, AI engineers have had to create workarounds or hacks within their current infrastructure.

An emerging infrastructure stack purpose-built for AI

With numerous “why now” tailwinds building in recent years, the lack of native and purpose-built tooling has paved the way for a new AI infrastructure stack for AI-native and embedded AI companies.

We are in the midst of a massive technological shift—innovation within this emerging AI infrastructure stack is progressing at an unprecedented pace. Even as we write this roadmap and develop our views, researchers are publishing new papers every day, making previous views obsolete. The rapidly changing environment is intimidating, but the potential and opportunities for startups are expansive, despite unknown variables. 

As it often goes, we’re investing as the revolution is happening. With new cutting-edge research released daily, it can sometimes feel like the ground is shifting beneath our feet. We are constantly incorporating the latest developments into our thesis. Here are several themes we are drawn to:

1. Innovations in scaling, novel model architectures, and specialized purpose foundation models

The model layer is shaping up to be the most dynamic and hotly contested layers within the AI infrastructure stack. Foundation models are the new “oil” and given the strategic importance of this part of the stack, the winners here may define the future of downstream applications for many years to come as more and more companies build upon their heuristics.

Consequently, we’ve seen an explosion of activity at the model layer—from open-source to small language models. Much of the activity and capital is focused on scaling transformer-based models (i.e., via data, model parallelism, mixed-modality, etc.) or attempting to push these models across various performance properties (e.g., cost, latency, deployment, memory footprint, context window, etc.). For instance, several teams are improving the building blocks (primitives) of generative models such as attention and convolution mechanisms to create more powerful, capable, and efficient AI technology.   Due to the capital intensity of model training, many of these efforts are venture capital-funded. Beyond training costs, a high bar of human capital and specialized resources with the right mix of research and engineering talent are also required to innovate at this layer. We cover more of the current state of innovation, competition, and funding dynamics at the model layer in our upcoming 2024 State of the Cloud report. 

But “attention is not all you need”—researchers are developing non-transformer-based architectures, too, and they are  continually pushing the limits on what’s possible for foundation models. For example, state-space models (SSMs), such as Mamba, and various recurrent architectures are expanding the frontier on foundation models that are less computationally intensive and exhibit lower latency, potentially providing a cheaper and faster alternative to traditional transformers for training and inference. SSMs focused on dynamic, continuous systems have existed since the 1960s, but have only recently been applied to discrete end-to-end sequence modeling. Linear complexity also makes SSMs a great choice for long-context modeling and we’re seeing several companies blossom on this front. While early results suggest impressive efficiency across various properties, researchers have a ways to go to demonstrate various properties (e.g. control, alignment, reasoning) now taken for granted in the transformer ecosystem.

Additionally, groundbreaking research within geometric deep learning, including categorical deep learning and graph neural networks, is equipping researchers with methods of structured reasoning. While this field has existed for quite awhile, it has earned renewed interest in this new wave of AI as geometric methods often enable deep learning algorithms to take into account geometric structures embedded in real-world data (e.g. abstract syntax trees in code, biological pathways, etc) and can be applied to various domains.

Furthermore, beyond general-purpose models, there is currently a proliferation of teams training specific-purpose models for code generation, biology, video, image, speech, robotics, music, physics, brainwaves, etc., adding another vector of diversity and flexibility into the model layer. 

2. Innovations in model deployment and inference

The compute layer is one of the most complex layers of the AI infrastructure stack, not only because it’s a core layer that quite literally powers other parts of the stack, but it also blends innovations and interactions within hardware (such as GPUs and custom-built hardware), software (such as operating systems, drivers, provisioning tools, frameworks, compilers, and monitoring and management software), and business models. Adding to this complexity, both large incumbents as well as startups are innovating in this area. 

At the hardware layer, GPU costs are coming down as supply chain shortages ease. Next-gen GPUs, such as NVIDIA’s H100 and B100 series, combined with advancements in interconnect technology, are scaling data and GPU parallelism at the model layer. 

Beyond hardware, various algorithmic and infrastructure innovations are enabling new AI capabilities. For example, the self-attention mechanism in the transformer architecture has become a key bottleneck due to its high compute requirements—specifically, quadratic time and space complexity. To address these challenges, the ML systems community has published a variety of model and infra-layer research: evolutions of self-attention (e.g. Ring Attention), KV-cache optimizations (e.g. channel quantization, pruning, approximation), etc. These innovations reduce the memory footprint for the decoding steps of LLMs, unlocking faster inference, longer contexts, and cost efficiencies. 

As we move towards personalized, cheaper fine-tuning approaches, many open questions remain. Methods like LoRA have unlocked memory and cost-efficient fine-tuning, but scalably managing GPU resources to serve fine-tuned models has proven difficult (GPU utilization tends to be low as is, and copying weights in and out of memory reduces arithmetic intensity). While improvements in batching, quantization, and higher up the stack in serverless infra have made infrastructure more turnkey, lots of low-hanging fruit remains. Projects like Skypilot and vLLM alongside companies like Modal, Together AI, Fireworks, and Databricks, are pushing the fold here. 

Vendors in this layer play an outsized impact on the unit economics (particularly gross margins) of AI application companies that are leveraging their services, and we anticipate these dynamics to continue to drive innovation based on demand from downstream applications.

3. Cutting-edge model training and development techniques

As highlighted earlier, AI research is progressing at a breathtaking pace, and most notably, we are in an exciting period where new AI methods and techniques across pre-training, training, and development, are in bloom. New methods are being developed everyday alongside evolution of existing methods, meaning that the AI infrastructure stack is dynamically being defined and re-defined. 

We are seeing these techniques proliferate across all aspects, advancing LLM and diffusion model outputs across base performance parameters (such as accuracy and latency) all the way to pushing the limits on new frontiers (such as reasoning, multimodal, vertical-specific knowledge, and even agentic AI or emergent capabilities). We highlighted a few architectural paradigms in Section I, but other examples of techniques encompass:

  • Fine-tuning and alignment: supervised feedback, specialized training data, or refining weights to adapt models for specific tasks (e.g. RLHF, constitutional AI, PEFT)
  • Retrieval-augmented generation (RAG): connecting the LLM to external knowledge sources through retrieval mechanisms, combining generative functions with an ability to search and/or incorporate data from relevant knowledge bases
  • Prompting paradigms: an interactive process where the LLM is instructed and guided to the desired outcome (e.g. few-shot learning, many-shot in-context learning, step-back prompting, CoT, ToT)
  • Model mixing and merging: machine learning approaches that mix separate AI model sub-networks to jointly perform a task (e.g. MoE, SLERP, DARE, TIES, frankenmerging)
  • Training stability: decisions around normalization methods (e.g. LayerNorm vs. RMSNorm), normalizations, activations, and other properties can affect training stability and performance
  • Parameter efficiency: various methods such as efficient continual pre-training that affect model capabilities and efficiency

While there is a trade-off between simplicity of experimentation versus efficacy of these methods, we predict that these techniques will inspire new developments as researchers iterate faster and solve for real-world scalability and applicability. Furthermore, it is common in applied AI to see a mix or combination of techniques being deployed, but ultimately, the methods that produce the highest bang for buck will likely dominate the applied AI space.  Additionally, the landscape is evolving dynamically as base models become better and better and as more AI powered solutions are deployed in production and with real-world constraints. 

Ultimately, we believe that we are in early days here and no hegemony has necessarily been established yet, especially for enterprise AI. We are thus excited to partner with companies developing, enabling, or commercializing these techniques as such companies will transform and reimagine how we build, develop, operate, and deploy AI models and apps in reality, and form the key tooling layer for AI companies.

4. DataOps 2.0 in the age of AI

We began this article by claiming that data and AI outputs are inextricably linked. We see this happening across many vectors from data quality affecting AI output (garbage in garbage out), to recent AI innovations unlocking insights from previously untapped data sources (such as unstructured data), to proprietary data serving as a competitive advantage and moat for AI-native companies. We explored this relationship in our Data Shift Right article, and also highlighted new data strategies that companies are leveraging to optimize for AI competitive advantage in our recent Data Guide. 

Given these catalysts, new demands are being placed on data ops, resulting in the emergence of new approaches and frameworks for storage, labeling, pipelining, preparation, and transformation. A few exciting examples:

  • At the pre-processing stage, we are seeing the rise of data curation and ETL solutions purpose-built to manipulate data for a LLM to understand.
  • The emergence of new data types (e.g. embeddings) has inspired entirely new data ops categories such as vector databases.
  • Data annotation has evolved in the age of AI to incorporate advanced data-centric methods, which have sped up prior manual or weak-supervision approaches, and brought more non-technical end-users into the fold.
  • The AI revolution has ushered in the mainstream embrace of tooling for processing various modalities of data, especially unstructured data (such as video and images). Many of these state-of-the-art tools are now integrated into day to day workflows. Previously, dealing with these modalities was challenging and often bespoke, resulting in organizations unable to fully glean value from these rich data sources. 
  • New enterprise toolchains and data workflows, such as RAG stack, are emerging as organizations leverage innovations in model training and inference techniques (see Section III).

Just as the modern data stack has fueled the rise of iconic decacorns within the data ops space, we believe a new generation of data ops giants will emerge fueled by a focus on AI workflows.

5. Next-gen observability

Alongside each wave of new technology, observability has in turn taken various forms (e.g., data observability in the modern data stack, APM for cloud application development). Similarly, we’re seeing observability evolve in the age of AI – a new suite of vendors emerging to help companies monitor model and AI application performance. While we’ve seen many companies enter the market solving one key wedge, either in pre-production (e.g., LLM evaluation, testing), in post-production (e.g., monitoring, catching drift and bias, explainability), or even extending into adjacent functions such as model security and compliance, smart routing, and caching, we anticipate (and have already seen) the long-term roadmaps of these companies converging into creating an end-to-end observability platform,, creating a single source of truth for model performance in both pre and post-production environments. 

We’re enthusiastic about a Datadog-like outcome in observability for AI—however, given the ever-changing environment with new models, new training / fine-tuning techniques, and new types of applications, winning in observability will likely involve a team capable of delivering on high product velocity, perhaps more so than in other spaces. As we’ve gleaned from Datadog’s rise, the company was able to break out of a crowded landscape of a dozen or so other (similar) competitors as they focused on a) rapid execution on a broad product and capability set and b) building out deep coverage of what Datadog could monitor, and 3) enabling broad integration support so as to bring as many adjacent systems as possible into its ecosystem. We’re excited to meet and support this next generation of startups who are taking on such an endeavor for the AI stack. 

6. Orchestration

As newcomer LLM and generative AI application companies continue to grow, we see a significant opportunity for companies in the orchestration layer to become the backbone of AI development. With the “orchestra conductor”-like role in the AI development lifecycle and the integral responsibility of ensuring and coordinating the development, deployment, integration, and general management of the AI application, orchestration vendors are a critical (and importantly, vendor neutral) centralized hub that harmonizes the sprawl of various AI tools developers encounter. 

Companies like Langchain and LlamaIndex are early breakouts in this space for LLMs, with strong open source ecosystems buoying adoption into companies. They’ve created frameworks that provide developers with a set of best practices and a toolkit for developing their own LLM applications, abstracting away much of the complexity when it comes to connecting the right data sources to the models, implementing retrieval methods, and beyond. Beyond LLMs, we are seeing an ecosystem of vendors create orchestration solutions for agent-based applications, further streamlining the development process for new innovative agentic AI applications. Much like the success of React in simplifying web development, we anticipate a similar opportunity for AI orchestration vendors to streamline development and enable the masses with the capabilities to develop various types of AI applications (LLM, agent, computer vision, etc). 

A massive opportunity exists for AI infrastructure businesses

As Mark Twain once famously said: “When everyone is looking for gold, it’s a good time to be in the pick and shovel business.” We believe that a massive opportunity exists to build “picks and shovels” for machine learning, and that many multi-billion-dollar companies will be built by equipping enterprises with the tools and infrastructure to operationalize AI. 

Having partnered with category-defining data infrastructure and developer platform companies such as Auth0, HashiCorp, Imply, Twilio, Zapier, we know that building novel and foundational technologies within the infrastructure layer is challenging and often requires specialized knowledge and resources. As such, we have extensive networks and tailored resources to support AI infrastructure founders in their drive for innovation as they ride these tailwinds, including:

  • Renowned operating and technical advisors including experts such as Adam Fitzgerald (Head of Developer Relations at HashiCorp), Emilio Escobar (CISO at Datadog), Mike Gozzo (Chief Product and Technology Officer at Ada), Lance Co Ting Keh (former AI Lead at GoogleX), Solmaz Shahalizadeh (former Head of Data at Shopify), Talha Tariq (CIO & CSO at HashiCorp), and Tony Rodoni (former EVP at Salesforce).
  • Unique access and credit programs with compute providers and cloud vendors
  • Invite-only events, briefing sessions, and presentation opportunities with leading academics and business leaders in the field of AI
  • Community-specific networking groups for functional leaders at AI startups
  • AI-specific talent network for startups to leverage when building out their teams

If you are a technical or early-stage team building in this space, please reach out to David Cowan, Janelle Teng, and Bhavik Nagda. For growth-stage startups, please reach out to Elliott Robinson, Mary D’Onofrio, and Grace Ma.

Special thanks to Lance Co Ting Keh (BVP Operating Advisor), Solmaz Shahalizadeh (BVP Operating Advisor & Former Head of Data at Shopify), and Will Gaviria Rojas (Co-founder of Coactive.AI) for their feedback.

Latest article