Cerebras Systems Q1 2026 Earnings Call Transcript

Key Takeaways

  • Positive Sentiment: Cerebras reported record Q1 core revenue of $191.3 million, up 92% year over year, driven by strong growth in both cloud/services and hardware.
  • Positive Sentiment: The company highlighted major customer momentum, including a more than $20 billion agreement with OpenAI and a newly signed definitive agreement with AWS to deploy Cerebras in AWS data centers.
  • Positive Sentiment: Management emphasized that Cerebras’ wafer-scale architecture gives it an order-of-magnitude speed advantage versus GPUs, which they believe makes fast AI inference more valuable and broadens the addressable market.
  • Neutral Sentiment: Gross margin improved sharply in Q1, but management said cloud margin will temporarily decline as the company rents third-party capacity to meet demand while it builds out its own data centers.
  • Positive Sentiment: The company ended the quarter with $3.3 billion in cash and marketable securities and raised additional capital through equity, debt, and its IPO to fund aggressive capacity expansion.
AI Generated. May Contain Errors.
Earnings Conference Call
Cerebras Systems Q1 2026
00:00 / 00:00

There are 13 speakers on the call.

Speaker 6

Good afternoon. Welcome to Cerebras Systems' first quarter fiscal year 2026 earnings conference call. Currently, all participants are in a listen only mode. Following management's prepared remarks, we will open the call for questions. Please note that today's call is being recorded. I will now turn the call over to Sean Dorsey, Head of Investor Relations. Please go ahead.

Speaker 9

Thank you, operator. Good afternoon, everyone. Welcome to Cerebras Systems' first earnings call as a public company. Earlier today, we issued our press release and posted our supplemental earnings presentation to the investor relations section of our website. A replay of this webcast will also be available on our investor relations website following the call. Joining me today are Andrew Feldman, our Co-founder, Chief Executive, and President, and Bob Koman, our Chief Financial Officer. Before we begin, I would like to remind everyone that today's discussion will include forward-looking statements under the safe harbor of the Private Securities Litigation Reform Act of 1995. These statements include, but are not limited to, statements regarding our future financial performance, business strategy, market opportunity, customer demand, product roadmap, technology leadership, supply chain, operating model, and outlook for Q2 and full year 2026.

Speaker 9

Forward-looking statements are based on current expectations and assumptions and are subject to risks and uncertainties that could cause actual results to differ materially from those expressed or implied. These risks are described in our SEC filings, including our final prospectus related to our IPO and our future periodic filings with the SEC. We undertake no obligation to update these forward-looking statements except as required by law. During today's call, we will also discuss certain non-GAAP financial measures. Reconciliations between GAAP and non-GAAP results are included in today's press release and supplemental materials, which are available on the investor relations page of our website. With that, I'll turn the call over to Andrew.

Operator

Thank you, Sean. Thank you everyone for joining us today. This has been an extraordinary several months. I want to begin by thanking our customers, our partners, suppliers, employees, and shareholders. We would not be here without your trust and your support. Earlier today, we posted our Q1 2026 results. We delivered a strong quarter. We delivered core revenue of $191.3 million, up 92% year-over-year. Core hardware revenue contributed $111.6 million, up 60% year-over-year, while core cloud and services revenue contributed $79.8 million, up 167% year-over-year. Bob will share more color on our financial results shortly. Before Bob digs into that, I'd like to say a few things about the market. I'll divide my comments into several sections.

Operator

I'll begin by spending a few minutes sharing my views on the larger drivers underpinning the AI revolution, their impact on the compute market, and why speed wins. I'll then turn to our successes in Q1 with special attention to our progress with OpenAI and AWS. Finally, I will talk about how we expect to avoid many of the supply chain challenges that bedevil others in our space. To understand the dynamics in the compute market, it's important to realize that AI provides new capabilities to computers. AI gives computers purchase on whole swaths of the world that had previously been foreclosed. This is why AI is so transformative and why its impact is so profound, and why we believe it increases the size of the market addressable to compute by many thousands of times.

Operator

Computers have historically been good at math, very good, but they were relatively poor at everything else. They did not provide much insight into text or images. For these modalities, all they could do is store and retrieve. Computers were at their best in a 2D world of numbers. In a real world of three dimensions, they were challenged. AI opens up the world of human experience to computers. As a result, the size of the market increases exponentially. It is as if prior to AI, computers worked in black and white and in two dimensions, and after AI, they address a world of color in many dimensions. This is why AI has spurred an explosion in the demand for compute. Computers can now do things they have never done before, and why, in our opinion, demand will continue to accelerate for many years to come.

Operator

Text, images, video, agents, robotics, these are all part of how AI expands the computer's ability to understand, participate, and take actions in the world. These all represent opportunities for Cerebras. Let's look at the specifics of how this is unfolding. Prior to 2025, AI was a parlor trick, a novelty. Interesting, but not useful. Cool, but not valuable. AI is now valuable because it has become profoundly useful. Led by OpenAI, the foundation model providers pioneered the way. The foundation model makers, and shortly thereafter, the open source models, made models smart enough to be useful across many domains. Once something is useful, people use it. Once people start using a technology, speed determines its productivity. Fast is productive and slow is unproductive. Speed provides answers in less time, providing competitive advantage. Speed makes the largest and smartest frontier models interactive.

Operator

Speed enables agents to complete tasks faster. Fast tokens are the most valuable tokens because they get more work done in less time. Today, Cerebras delivers the fastest AI in the world, bar none, not by a little bit, but by an order of magnitude. We do this for small models, for medium models, and for the largest models in the industry. We do this for models with small KV cache, with medium KV cache, and with giant KV caches. We generate tokens faster than anyone else. What I'd like to show you right now is a quick demo, just of how much faster we are than GPUs on Kimi-2, a trillion-parameter open source model. We're going to run the exact same prompt. On the left, it's Cerebras, on the right is a leading GPU. The only difference, same model. We're finished already. Same model. Same prompt.

Operator

The difference is hardware. We're finished. It took us 21 seconds. We're now waiting on the GPU. Still waiting. We've increased the speed 5x in the video to not make you wait as long as you otherwise would. Still waiting. Okay. What Cerebras did in 21 seconds, it took 4 minutes and 37 seconds for the GPU to do. The same model, the same prompt. That's what it means to be 13 times faster. In AI, inference speed is productivity. Slow isn't productive. This should not come as a surprise. It is in line with each of our everyday experience. How big is the market for slow search? How big is the market for slow internet access? Any of you still use dial-up? How long will you wait for a website to resolve? Why would it be different for AI?

Operator

Not only does speed increase the value of tokens, but speed accelerates the adoption of AI. When AI is fast, it's more fun to use. People use it. They use it more often, for more things, and they use it to solve more important problems. With fast AI, users invent things that never existed before. They solve problems in new ways. They develop new offerings, new business models. This is what speed does, and this is what Cerebras' speed enables. A final point on speed. There recently has been a great deal of focus, especially at the frontier model level, on safety and the importance of guardrails. How do guardrails work? Guardrails add a layer of compute on top of the AI to create a safer experience. This compute takes time, and it takes more time on slow infrastructure.

Operator

Traditionally, guardrails force a trade-off between safety and user experience, between safe and fast. Cerebras eliminates this trade-off. Fast AI inference allows guardrails to work without inserting crippling delays. AI is safer with these guardrails, and AI is safer and more productive when it's blisteringly fast. Our performance advantage is born of our wafer scale architecture. We're more than an order of magnitude faster than GPUs because we solve problems that haven't been solved or couldn't be solved by others. The problems of yield, cross-radical connectivity, mismatches in thermal expansion, power delivery, and cooling are all problems that the industry struggles with, but that Cerebras solved years ago. Moreover, the advantages of wafer scale are durable.

Operator

By building chips that are 58 times larger than the largest competitor, we're able to use SRAM and benefits from its blistering speed, while competitive offerings use HBM, which is slow, expensive, and in short supply. We see the advantage of wafer scale technology expanding our performance lead as we bring next-generation solutions to market. The technology underpinning of wafer scale fundamentally advantages additional technologies in the future. For example, wafer scale technology brings profound advantage to memory stacking and optical integration. As we look further into the future, data centers in space are also advantaged by wafer scale integration. Not only does wafer scale compute deliver faster speeds and for latency sensitive workloads, less power per unit compute than do GPUs. Most importantly, it requires less chip to chip communication.

Operator

Chip-to-chip communication is one of the fundamental limitations of terrestrial data centers, and a yet-to-be-solved problem for data centers in space. With this as a backdrop, in the first quarter of 2026, how did we meet this extraordinary market and how do we leave Q1 even better positioned? In this section, I'll focus on our partnership with OpenAI and AWS as they took shape in this quarter. We signed a definitive agreement with OpenAI on December 24, 2025, for the purchase of more than $20 billion of Cerebras compute over the next several years. By February 1st, we were in production, running a model we'd never before seen, 35 days from signature to production deployment. Beyond the transformative revenue ramifications, our collaboration with OpenAI gives us a direct view into frontier model development and the direction it is moving.

Operator

By pairing frontier model intelligence with the world's fastest inference, we build products and technologies that others simply can't. In fact, the boundaries of these capabilities have yet to be fully explored. OpenAI and Cerebras are excited that GPT-5.4 is now running on Cerebras. This collaboration brings together OpenAI's frontier models with Cerebras' wafer-scale inference infrastructure to enable highly responsive model interactions. GPT-5.4 on Cerebras is currently available to OpenAI engineers and to select OpenAI customers as part of OpenAI's strategic rollout. OpenAI and Cerebras are also actively working to bring GPT-5.5 onto Cerebras as part of the next phase of this rollout, and expect to share more shortly. In March, continuing this trend, we signed a binding term sheet with AWS to deploy Cerebras in AWS data centers.

Operator

Our solutions will combine AWS's leading Trainium3 chips with Cerebras' CS-3 in a disaggregated solution that is expected to be an order of magnitude faster. Trainium will do prefill and Cerebras will be decode, and together the solution is expected to deliver the fastest tokens at massive throughput. Remember, disaggregated solutions are a significant opportunity for Cerebras. The technical strategy is one of divide and conquer. It is based on the recognition that inference has two computational components. The first is where we process the prompt. This is called prefill and is highly parallelizable. The second is where we generate the response. This is called decode and is strictly sequential. By using different processors for the prefill and for the decode, we can deliver truly exceptional results.

Operator

We are also proud to announce that we have, as of this week, completed a definitive agreement with AWS that will begin our technical collaboration as well as prepare for deployments in their data centers. As you all know, AWS is a leading cloud compute company and one of the most important providers in the world for developers and enterprises. Many enterprises want to run AI where they store their data and where they have existing agreements, and where the environment is familiar and is secure. As a result, AWS provides an easy way for Cerebras solutions to meet the world's enterprises where they already are. Let's for a minute now turn to supply chain. Keeping up with this extraordinary market growth has brought supply chain challenges to many in our industry. At Cerebras, we have several fundamental advantages.

Operator

First, the binding constraint in the market right now is HBM memory. It's in short supply, it's expensive, and we don't use it. We avoid this constraint entirely. We use SRAM, and SRAM is printed on our logic wafer. It's not a separate chip. As long as you can make the chip, you can make SRAM. Its supply is approximately infinite. The second binding constraint is the CoWoS process at TSMC. We don't use it. Again, we sidestep this constraint. Third, 3 nanometer capacity at TSMC is a constraint, and again, we don't use it. We're the fastest in the world, and happily at the 5 nanometer node where there is less contention for fab resources and where manufacturing is less expensive. Our partnership with TSMC deserves special mention, as they know more about chip making than just about anyone else on Earth.

Operator

They believed in the wafer scale approach from the time we were a tiny team with nothing but a PowerPoint slide, and they've been with us along the way. They have proven themselves to be an extraordinary partner. Just as a reminder, our saleable unit is not our wafer, but our CS-3 system. We sell the CS-3 for on-premise deployments or time on the CS-3 through our Cerebras Cloud or through our partner's cloud. We manufacture our CS-3s in the U.S. and in fact, to the best of my knowledge, we are the only accelerator maker to manufacture exclusively in the U.S. We have added hundreds of thousands of square feet of manufacturing and clean room space to support our growth. We've expanded our partnership with Flex and are proud to have added Sanmina as our second major contract manufacturer to assist us in managing our expansion.

Operator

Finally, it's no secret that data center capacity is at a premium. It's a dog fight out there. Despite this, we've added data centers around the world. We've added data centers across the U.S. and Canada, Europe, including France and the Nordics, and we're in early discussions for data centers in Israel, the UAE, Australia, Singapore, India, and Indonesia. We're expanding the capacity we need to serve customers, and we're doing it with urgency. The demand environment is strong, but this is not just about demand, it's about building the infrastructure required for the next phase of AI. To wrap up, there's a tectonic shift in compute demand brought about by AI's ability to make the world around us tractable for computers. As a result, the market will need vastly more compute, in my view, for decades.

Operator

AI power users represent today a tiny fraction of the world's population, by some estimates less than 1%, and compute and memory is already in tight supply. Just imagine. To this AI revolution, we bring leadership technology, which in turn enables us to deliver the fastest AI inference in the world by more than an order of magnitude. Fast tokens are more valuable tokens, and Cerebras tokens are the fastest. The result was a record quarter. With that, I'll turn things over to Bob, and he can provide more color on the financial results. Bob?

Speaker 2

Thank you, Andrew, and good afternoon, everyone. I want to also add my thanks to our customers, partners, team Cerebras, and the investment community, both new and who have gotten to know us over the last several years. Cerebras is more than 10 years into the journey, and we're still just at the very beginning. I want to thank everyone for joining us today on our first earnings call operating as a public company. The opportunities we see ahead for us with FastAI are massive, and we appreciate everyone who has chosen to join us for the road ahead. Today, I want to describe the financial framework we will use to discuss our results. It's the same way that we evaluate our financial performance and make resource allocation decisions internally.

Speaker 2

Provides additional visibility to amounts that are embedded in our reported GAAP revenue and cost of revenue that we believe provide more transparency as well as direct comparability to our prior historical results to better analyze our trends. Beginning in Q1 2026, we have data center costs, which our contract with OpenAI has us pass through to them with a 3% markup. These data center pass-through items are reported gross, so they increase both our cloud and other services revenue and cost of services but are at a significantly lower margin than the rest of our business. These amounts start out small in Q1, but they'll become more significant over time. OpenAI has the option to choose whether to receive its future committed amounts in our cloud or in its own data centers, which would mean there would be no future corresponding pass-through amounts for that capacity.

Speaker 2

Because these amounts can be highly variable and are outside of our control, we're excluding them from our core business metrics. We also now have non-cash amortization of customer warrants that is recorded as a reduction in revenue for both our hardware and cloud and other services GAAP revenue line items, depending on the related services the customer is purchasing. We're adjusting our GAAP numbers to exclude the impact of these items and a few other common ones like stock-based compensation and one-time items, and we define the resulting non-GAAP amounts as our core business metrics. I will only be discussing these core metrics today. Reconciliations to GAAP for all of our non-GAAP items are available in today's earning material and on our website. I'd like to start with revenues. Q1 was another record quarter for Cerebras. Our core total revenue was $191.3 million, representing 92% year-over-year growth.

Speaker 2

Looking at revenue by type. Core cloud and other services revenue reached $79.8 million and grew 167% year-over-year. Market demand for Cerebras Inference Cloud remains incredibly strong. We are ramping our capacity rapidly, and we saw a meaningful pickup in revenue across Q1 as we began our ramp with OpenAI in February, as well as from other customers using the Cerebras Cloud. We expect increasing year-over-year growth rates for each quarter in 2026, with more of this revenue coming later in the year as the ramp in our cloud capacity deployments accelerates. Core hardware revenue was $111.6 million, up 60% year-over-year. We plan to see decreasing hardware revenue for the next few quarters as our existing POs are delivered and our mix shifts towards the majority of our hardware production being deployed in Cerebras Cloud to fulfill our significant contracts.

Speaker 2

This trend could change relatively quickly, however, as OpenAI and AWS, as well as other customers, make decisions about when and how they prefer to deploy our hardware solutions in our data centers or theirs. Now moving on to gross margin. Core gross margin was 46.5% in the quarter, compared to 42.1% in the prior year period, and 41% last quarter. Core Cloud and services margin improved significantly to 52.9% in the quarter from lower levels we saw last year as we launched the Cerebras Cloud service. The primary reasons for the increase were higher pricing as the market is now valuing higher speed inference at a premium and market demand exceeds supply. The utilization of our systems that we began to deploy in late 2025 improved quickly, and there was a small amount of rent backs, relatively speaking, to increase capacity from a customer.

Speaker 2

For the rest of 2026, in order to accelerate our ability to service the significant near-term demand in our contracted backlog, we've chosen to make more capacity available sooner by temporarily renting our own systems back from an existing customer while we aggressively build out and deploy our own data center capacity. The additional cost of renting third-party capacity will depress core Cloud and other services margin temporarily from current levels. We expect the impact to be a decrease of 10-15 margin points based on the volumes we are now anticipating before beginning to ramp back towards our target margin of 60% plus as we transition away from our rented systems. Core hardware margin was 42% compared to 30.6% in Q1 2025.

Speaker 2

Over the last few quarters, we've benefited from the timing of incremental performance-based incentive pricing after the target was achieved but was recognized prospectively for the remaining systems that had not yet been shipped. We expect core hardware margin to be more similar to the first half of 2025 and return to the low 30s as this contract pricing normalizes. As a reminder, when we sell hardware systems and recognize that revenue up front, we also include support and other services which have significantly higher margins. As a result, total profitability over the life of the individual contracts is much closer to our target overall gross margin. These additional elements of revenue are required to be recognized over the contracted life of the services and are recorded as core Cloud and other services, so are not included in our core hardware revenue and gross margin.

Speaker 2

We are focused on improving gross margin over time through scale economies, improved product throughput and performance, manufacturing efficiency, utilization of cloud capacity, and performance-driven pricing improvements to achieve our long-term overall gross margin target of 60%. At the same time, we will continue to be aggressive and creative, including potentially investing ahead of demand when we see attractive long-term opportunities to gain key customers, accelerate revenues, and drive gains in market share. Now I'm going to talk about operating expenses. Our non-GAAP operating expenses were $92.6 million, up 51% from a year ago at just more than half the rate of core revenue growth of 92%, demonstrating the strong operating leverage available as we grow our business. R&D was our largest area of investment at $69.8 million.

Speaker 2

We believe sustained R&D investment is essential to maintaining our technology leadership and requires being at the frontier of AI across silicon, systems, software, models, and cloud infrastructure to deliver the fastest performance. We have an exciting product roadmap to bring to market over the next several years, including near-term innovations such as the implementation of disaggregated inference solutions with multiple hardware partners, which we expect to begin to deliver in the second half of this year. Sales and marketing expense was $12.9 million, reflecting continued investment in customer engagement, field capacity, developer adoption, and go-to-market infrastructure to support increasing market demand. G&A expense was $9.9 million and will continue to step up significantly next quarter due to incremental costs associated with operating as a public company and rapid growth in the size of the business. Moving on to profitability.

Speaker 2

Core non-GAAP operating loss improved to near breakeven at -$3.5 million with operating margin of -2%, a significant improvement from a year ago when core operating loss was -$19.3 million and operating margin was -19%. It was also a nice improvement sequentially from Q4 2025 when operating margin was -10%. Core non-GAAP net loss was $2.5 million. While the temporary reduction in gross margin I described earlier that will result from renting back our systems until we deploy significant capacity in our own data centers will cause these metrics to regress somewhat for the next few quarters. We believe the steady improvement that we delivered over the past several quarters highlights our ability to achieve our target profitability profile of approximately 60% gross margin and 40% operating margin in the medium to long term. Moving on to our current cash position.

Speaker 2

We ended the quarter with $3.3 billion in cash equivalents, restricted cash and marketable securities. We've accelerated the pace of our fundraising over the last several quarters to support our increasing growth rate and provide us with the liquidity we need to scale. As a reminder, we raised $1 billion in Series G equity in September 2025, another $1 billion in Series H equity in February 2026, added a revolving credit facility for up to $850 million in April 2026, and then just a few weeks ago completed the largest semiconductor IPO in history, raising another $6.4 billion. We are well-positioned with the financial flexibility to accelerate the sourcing and deployment of data centers and our supply chain to support significant near-term growth of our cloud business. Now turning to our outlook.

Speaker 2

We'll typically provide quarterly guidance, but since this is our first earnings call, we'll also provide some color on the year. In our core business in Q2, we expect core revenue of approximately $194 million, representing year-over-year growth of 88%. Core gross margin in the range of 36%-38%. Core operating margin in the range of -30%--32%. For the full year 2026, we currently project core revenue in the range of $855 million-$865 million, representing year-over-year growth of 69% at the midpoint. Core gross margin in the range of 38%-41%, and core operating margin in the range of -28%--32%. In summary, we made significant progress in our business during the first quarter. We delivered strong revenue growth, gross margin improvement, and meaningful customer momentum.

Speaker 2

We significantly strengthened our balance sheet through our IPO and our fundraising activities. We're poised to continue executing on the enormous amount of opportunity we see. We're working hard to bring more data center capacity online as soon as possible to meet robust demand. With that, I'll turn the call back to Andrew for closing remarks. Andrew?

Operator

Thank you, Bob. Cerebras was founded on the belief that AI infrastructure needed a new approach, one that was built from a clean sheet. The progress we report today reinforces this belief. The world needs faster AI. Faster AI, like faster versions of all technologies before it, drive adoption, usage, and customer experience. When given the choice, who wants slow? We're built to deliver fast AI. That's what we do. As AI continues to expand its footprint, so will we. We're proud to be a public company. We're redoubling our effort on the work ahead to continue to fuel our culture with fearless engineering and with the ability to delight our customers with experiences that are unavailable elsewhere. We also will work diligently to communicate with our stakeholders and our investors, to do so with transparency and with discipline. We thank you for joining us today.

Operator

Operator, please open the line for questions.

Speaker 6

Thank you. As a reminder, to ask a question, you will need to press star one one on your telephone. To remove yourself from the queue, you may press star one one again. Please limit yourself to one question and one follow-up to allow everyone the opportunity to participate. Please stand by while we compile the Q&A roster. Our first question comes from the line of Timothy Arcuri of UBS. Please go ahead, Timothy.

Speaker 11

Thanks a lot. Andrew, now that you have the definitive agreement with AWS, can you just sort of help us to think about the timing on this and your ability to supply that customer? I know you had to put in your wafer orders back in February. Can you just give us a little bit of help in terms of when you can start to ship to them? Thanks.

Operator

Sure. I think TSMC has been extremely good to us. We are in the happy position of having supply for our plan and beyond in 2026. I think you should expect to see AWS's impact in 2027.

Speaker 11

Got it. If I could ask a quick follow-up. I also heard, Andrew, you talked about multiple partners for disaggregating solutions. Does this imply that there's another customer beyond AWS? I guess I ask because I did see that Cerebras had a presence at Microsoft Build. I'm just wondering what you mean by the multiple partners. Thanks.

Operator

I think the opportunity to provide decode for people who have GPUs is real and in front of us. I think that's exciting. I think that the GPU as an architecture struggles with the sequential nature of decode

Operator

We are extraordinary at it. It makes sense to explore partnerships on that vector.

Speaker 6

Thank you. Our next question comes from the line of Thomas O'Malley of Barclays. Your line is open, Tom.

Speaker 10

Thanks, guys, for taking my question, and congrats on the nice results. Andrew, I wanted to ask you a question on your TAM. I think that during the process, there was a lot of conversation about your ability to handle larger models. When you look at Kimi, that's one example of a large model. You're again showing a demonstration today about attacking larger models as well. Jensen spent time talking about 25% of the inferencing market is fast inferencing, and maybe even took a step back from that on the last call. What do you think your TAM is when you look at the broader AI market? Would love to get your opinion there.

Operator

Thanks for the question. We look out into technologies and can't find examples of where slow has owned meaningful portions of the market over medium periods of time. I think you should think very carefully about the example of search. There is no slow search because nobody wants it. There's no more dial-up because nobody wants it. I think when given the choice on the same model between fast and slow, I don't think it's a very hard decision. When we look out at the space, we see the entire inference market as available to us for fast inference. I mean, who doesn't want answers in less time? Who doesn't want more productive agents? That's what we see. I know that's at odds with GPU makers. Both of our arguments are, I think in some way, self-interested.

Operator

We build fast and think the market's big for fast. I'm not surprised at that.

Speaker 10

Super helpful. We might find this out in the filings, just wanted to give it a crack on the call. Did you have any top 10% customers, and are you willing to share on the call how large they were? Thank you.

Operator

I don't think we should share on the call. I think you'll see in the filings.

Speaker 10

Thanks, guys. Congrats on the results.

Operator

Appreciate it.

Speaker 6

Thank you. Our next question comes from the line of Quinn Bolton of Needham & Company. Your line is open, Quinn.

Speaker 7

Thank you, Andrew, Bob, congratulations on your first call as a public company. Andrew, I wanted to follow up on the inference TAM question. Just obviously, you guys are addressing the fast inference portion of the market, which you think allows you to address the entire market, but your tokens may be more expensive. Just wondering if you could address the higher token cost for fast inference. How much of the market do you think is willing to pay a premium for fast inference? Then I've got a follow-up on the roadmap.

Operator

I think today, in many instances, fast is priced at a premium. I think you saw Anthropic offer a service. In fact, most now offer services in which fast tokens are sold at a premium. I think they're sold at a premium because they're more valuable. Right? I think you can look to your own experience with your internet provider. If dial-up were free, do you want it? I think the answer there is quite the contrary. You'd have to pay quite a bit of money to get someone to take dial-up. I think that the reason right now that there's a premium is because people prefer fast. It's more valuable. I think we'll see over time how that shapes out.

Speaker 7

Got it. The question, just with the AWS definitive agreement now signed, if you look across the compute spectrum, oftentimes these AI compute deals can extend into the gigawatt range. Just wondering, can you give us any sense of the scale? Is this tens of megawatts, hundreds of megawatts? Could it reach a gigawatt? Just any sense on the size of the AWS partnership and definitive agreement?

Operator

I don't think we're sharing that at this time.

Speaker 7

Understood. Thank you.

Operator

Sure enough.

Speaker 6

Thank you. Our next question comes from the line of Atif Malik of Citi. Your line is open, Atif.

Speaker 1

Thank you for taking my questions, congratulations on the debut. Andrew, on the OpenAI and AWS partnerships, what is the decision tree for them to take the future commitments in cloud or as hardware and data centers?

Operator

First, greetings, Atif. Good to hear from you again. Second, with AWS, they are deployed in AWS data centers. That's the deal. I think OpenAI has a choice. They can deploy it in their data centers, in a model where they buy the hardware, or they can receive the compute via cloud service. I think it will depend on OpenAI's portfolio decision of their data centers and their various capacity versus what we can bring in data centers. I think that's likely to be the determining factor, but I think that's really an important question for them.

Speaker 1

Got it. Bob, as a follow-up, Andrew talked about the dogfight in terms of data centers and power availability and whatnot. When you look at your full-year outlook, and thank you for providing that on this call, how much of that is new data centers or new power shells versus renting back from your existing G42 customer or your Cerebras Cloud?

Operator

This is Andrew. We're trying to add data center space as fast as we can. We're engaged with builders throughout North America, data center operators in Europe, in the Middle East. We have new data centers coming on board in Q3, Q4, Q1, Q2, Q3, Q4 of next year, and are adding more. We're in discussions with literally dozens of different data center owner operators. I think the answer is all of the above. The demand for our product right now is so significant, we are seeking data center capacity around the world as quickly as we can.

Speaker 6

Thank you. Our next question comes from the line of Joe Moore of Morgan Stanley. Your line is open, Joe.

Speaker 3

Yeah, thank you. On the same lines as the last question, is the constraint on your growth five nanometer wafer capacity? Is it space and power and the kind of build-out of your cloud? Or are there some other constraints that we should be thinking of? It feels like demand is not the constraint here, it's how quickly you can ramp.

Operator

Demand is not the constraint. Supply is not the constraint. The constraint is data centers.

Speaker 3

Okay, that's helpful. To the extent that your gross margins are better than we had modeled, is that a function of sort of a quicker ramp of that internal capacity versus the G42 rental or just what are the dynamics of gross margin through the rest of this year?

Speaker 2

Thanks, Joe. There's a few things going on. One is actually higher pricing. Because there's tremendous demand, we've been able to see higher pricing from existing customers. Even as OpenAI is starting to ramp, that's been an upside to our gross margin profile and something that we're reflecting now in the outlook for the rest of the year. Another way to think about it is the competition has also increased in price. They have higher costs for HBM and other things. I think the floor in the marketplace has come up a bit. Then we've been able to look at the timing of the amount of capacity that we need to bring on and the economics around it, which we were estimating a couple of quarters ago.

Speaker 2

That's also turned out to be a bit more favorable, both in terms of how much is coming on when, and also the amount that we're paying. I think all of those factors, as they play out for the rest of the year, will allow us to be at higher gross margins than what we had predicted at the beginning.

Speaker 6

Thank you. Our next question comes from the line of Joshua Buchalter of TD Cowen. Your line is open, Joshua.

Speaker 4

Hey, guys, thanks for taking my question, and welcome to the fun world of earnings calls. Sorry to keep pulling at this thread. I wanted to follow up on sort of Tom and Quinn's earlier questions about the ability to service some of the larger models. Maybe using the demo that you guys provided of the supporting the trillion parameter Kimi model. Any details you can give on the specs that were in that benchmark you showed, like how many CS3s were used to support Kimi and maybe what the competing GPU-based rack architecture was? Thank you.

Operator

By way of comparison, we used a leading inference cloud. We tried to do our best to compare top of tree to top of tree. My understanding is that they're using B300 to serve as an endpoint for this model, but I can double-check that for you. I think there is a fundamental misunderstanding propagated by some analysts who just didn't understand that our architecture was perfectly suited for these models of large size, small size, medium, with big caches and small caches, and that we can do them and are doing them, not just in this demo, but for OpenAI at frontier models.

Speaker 2

Right. There are only two hardware vendors that currently serve OpenAI models, and we're one of them. It is sort of a proof point, right? An empirical validation that big models work just fine on us, and we have the same advantage as small models.

Speaker 4

Okay, understood. Thank you for the detail there. Maybe for Bob, as we think about the annual guide you gave, I think it implies sort of 20%+ half-over-half growth. Any help you can give us on how much of the second half growth is from pricing? Or maybe OpenAI contribution that we should expect as you build up to that first 250 megawatt build. Thank you.

Speaker 2

Yeah. Look, I think this initial guide coming out, which is really focused on the first quarter and looking forward for the rest of the year, where we have data centers coming on largely in the back end of the year. A lot of the improvement is going to come from OpenAI being deployed in our cloud, and it's back end loaded. As I mentioned in my remarks at the beginning, we actually have in the forecast that hardware will come down a little bit sequentially for the rest of the year. I'm being conservative for the second half as we're still pretty early in the year. Data center capacity is coming on, and as we move throughout the year, we'll update you as we have more information about the progress and timing.

Speaker 6

Thank you. Our next question comes on the line of Matt Bryson of Wedbush Securities. Your line is open, Matt.

Speaker 5

Hey, thanks for taking my question. Just going back to trying to figure out the market. It sounds like there's some more opportunity for what we're seeing with Amazon, where they're using Cerebras solution as decode. We're thinking about the amount of value that you're capturing in that type of architecture versus prefetch. Is there any chance you could take a swag at kind of what portion of the value is in the Cerebras system?

Speaker 2

Not exactly. Let me share maybe a different crack at the problem. A decode prefill, a disaggregated solution is really good in some instances, and in particular, if you know the shape of the work it's intended to support. When you specialize, right, when you buy some hardware for prefill and some for decode, you embed in your hardware deployment an assumption about the shape of the traffic. If the traffic looks different, then you have stranded compute and low utilization and higher cost. This is obviously a huge opportunity for a hyperscaler like AWS because they have technology that can drive traffic, right, of the shape they expected to their disaggregated solution and route it to other solutions if it's different from that assumption. Right? The value of the solution is highest to a hyperscaler.

Speaker 2

The exact split of value between us and Trainium is very difficult to say, as nobody has yet deployed a true disaggregated solution, we have a lot to learn in the market still.

Speaker 5

Understood. That's helpful. Then just one for you, Bob. When we're thinking about you renting out capacity from a customer to fill that OpenAI demand, is the full rental requirement baked into your quarterly guide? Or is there any chance that there's a further impact on gross margins in Q3? Basically, I'm trying to figure out if gross margins in Q2 are trough.

Speaker 2

The rental costs that we're assuming for the rest of the year are baked into Q2 and the annual guide.

Speaker 5

Awesome. Thank you.

Speaker 6

Thank you. Our next question comes on the line of Vijay Rakesh of Mizuho. Your line is open, DJ.

Speaker 12

Yeah. Hi. Thanks, Andrew and Bob. Congratulations on a good quarter guide here. Just wondering, you mentioned 50 megawatt per month ramp for Q26. I'm just wondering how that is going and how do you see that scaling into 2027? I have a quick follow-up.

Operator

I don't think I mentioned that. Maybe I didn't hear the question right. Could you repeat the question?

Speaker 12

I think you had talked about a 50 megawatt per month ramp into full Q26. Just wondering how that is going and how you see that beyond, and how that capacity ramping into 2027.

Operator

Yeah. Okay. I don't remember giving specifics on the monthly ramp. We are seeking, on average, to put a huge amount of capacity in through the end of 2026 and into 2027. As you know, we signed our agreement with OpenAI at the end of 2025, which means you probably need six or eight or 10 months at a minimum to bring on vastly more capacity. As our business ramps, we are signing large deals as well, many of which will come on in the first part of 2027. I think we announced 120 megawatt deal with Bell Canada, for example, in a facility there that does have room to expand. I think while we haven't given specifics, we are working our hardest to add as much capacity as we can between now and the end of 2027.

Speaker 12

Got it. Obviously you mentioned fast inference is very disruptive. You probably see a lot of LLM front-end model guys try to move to fast inference. I'm just wondering on how you see your customer pipeline broadening out into 2027, if you were to look out beyond OpenAI and AWS. Thanks.

Operator

Sure. Look, we're pleased with the way the customer pipeline's going. I think obviously deals of the size of OpenAI or the size that AWS could do are few and far between. The business is robust, and we're happy at the rate at which we're signing new customers. We're also happy at the rate at which existing customers are doubling down, growing their footprint, and the rate at which their token consumption is up into the right. On all fronts, we're pretty pleased.

Speaker 6

Thank you. Our next question comes from the line of Richard Shannon of Craig-Hallum Capital Group. Your line is open, Richard.

Speaker 8

Thanks, Andrew and Bob, for letting me ask a couple questions. Congrats on the first quarter call here. Andrew, my first question is following up on a couple of your different comparative remarks regarding OpenAI. You talked about stepping up a new model under 35 days here. You also mentioned about doing some work with GPT-5.4. Love to hear about your experience in bringing up the first model, the Codex-Spark, and what you've learned from that, and how you applied that to working with the GPT-5 that you might be going forward with OpenAI and/or other customers.

Operator

I think foundation model providers are fundamentally different. They are at the absolute cutting edge. What you see when you engage with them is really quite extraordinary. The amount of work that goes into a foundation model, and the visibility that we have is really one of the exceptional advantages that we get from this partnership. I think beginning with Spark, we got better. I think it improved us. It challenged us. We were up to the task. We very much enjoy working with their engineering team, and I think from the feedback we've gotten, they found kindred spirits and enjoy working with our team as well. I think the way to temper metal is with fire, and I think we're proud of our work with them and our continued work. I think it's a really thoughtful question.

Operator

I think having access to extraordinary customers and partners is a fundamental and long-term differentiator.

Speaker 8

Thanks for that, Andrew. My follow-on question is regarding AWS. There are media reports out there that Amazon may be trying to sell the Trainium-based hardware externally and not just in their own data centers. Do you view this as an opportunity for Cerebras?

Operator

I do.

Speaker 8

Okay, great. Thank you, Andrew.

Operator

Thank you. With that, I think we'll wrap up.

Speaker 6

Yes, sir. We have reached the end of the Q&A session, and that does conclude today's conference call. Thank you for participating. You may now disconnect.