Salesforce

How Google Earns the Right to Ship the Next Agent

The Agentforce playbook from inside Google’s TDX 2026 session — three pillars, five patterns, three rollout phases, and the gate they refuse to skip.

3 May 2026

11 min read

•By Anna Bromley

Google's agentic evolution of Salesforce, from Case Summary in 2023 to multi-agent systems in 2026 and beyond

Every CTO has watched a version of this happen. Three teams. Three agents. Each technically working. None of them the thing the business actually wanted. By month four nobody is using them, and the AI programme is quietly relabelled as “learnings.”

I sat in a session at TDX 2026 that named the problem on stage. The Google team running it call it building for the sake of building, and they spent a fair chunk of the hour warning the room about it. They’ve been rolling Agentforce inside Google since late 2023. They didn’t arrive at a multi-agent estate by being clever. They got there by being disciplined, in a way most enterprises aren’t willing to be.

The discipline is the playbook. Pick one painful problem. Ship the smallest possible agent against it. Prove it works against a hard accuracy bar. Earn the right to do the next thing. Repeat. That’s the whole game.

What follows is the playbook, reconstructed from the session itself. Three pillars Google rolled Agentforce against. A funnel they use to turn ideas into use cases. Five architecture patterns they will and won’t build. A five-stage evolution they walked through, slowly. A three-phase rollout they apply to every single agent. And the monitoring layer that sits behind all of it.

If you’re a CTO, CIO or COO trying to work out how to say a defensible yes to an Agentforce pilot, this is the closest thing to a reference architecture you’ll get from someone who’s already done it at scale.

The three pillars Google rolled Agentforce against

Before they built anything, Google’s team named what Agentforce was for. Three pillars, deliberately narrow, each tied to a measurable outcome.

The first is enterprise productivity. The thing the rest of the business sees. Less time on routine work, more on the work only humans should do. The second is security and privacy. Agents that don’t leak data, don’t take actions outside their permissions, and don’t embarrass anyone. The third is developer velocity. Shared resources, shared standards, a shared platform, so every new agent isn’t a snowflake.

Sounds obvious. It isn’t. Most enterprises start an Agentforce pilot with a vague ambition (“deploy AI”) and a single use case. Six months later they have a feature, not a programme. And the feature is the thing the next reorg quietly buries.

If you can’t say in one line what your three pillars are, you don’t have an Agentforce programme yet. You have a project.

Start with friction, not technology

Before Google picked a use case, they invited their own users in. Not as a focus group, but as the source of the work. The funnel they used to turn user friction into shipped capability is the most reusable thing in the whole session.

Google's ideas-to-opportunities funnel: explore data, define use case, illustrate solutions, with stages of understand, define, sketch, decide, prototype, validate

Google’s ideas-to-opportunities funnel. Six steps that put users at the start, not the end.

Read it left to right. Explore data means studying business opportunities from multiple angles before naming a target. Define use case is a joint working session with end users to validate that the friction is real. Illustrate solutions is where high-level architectures are sketched and a roadmap is shaped.

Underneath those three sit six sequential steps the team walks every time: Understand, Define, Sketch, Decide, Prototype, Validate. The build doesn’t start at Understand. The build starts at Validate. Everything before that is the work that protects you from shipping the wrong agent at speed.

The anti-pattern Google warned against is the Slack thread that opens with “we should build an Agentforce agent for X.” A solution looking for a problem. The funnel exists to stop that conversation before it consumes a sprint.

The five patterns Google will and won’t build

Once a use case clears the funnel, the next decision is architectural. Google standardised on five prescriptive patterns. The point of standardising is the point of all standards: faster decisions, fewer surprises, easier governance.

Google's five prescriptive Agentforce patterns: Salesforce Native, Co-Pilots and Agents, Content Ingestion, 1P Services Integration, Agent Interoperability

Five patterns, ordered by ambition. And by delivery risk.

Google's five prescriptive Agentforce patterns, with the delivery risk read for each
Pattern	When to use it	The delivery risk
Salesforce Native	Use case lives entirely inside Salesforce data, built within the Agentforce framework	Low. Limited reach, but the safest first step.
Co-Pilots & Agents	Human-in-the-loop assistants for real-time conversational support	Medium. Adoption fails if the UX is wrong, not if the model is wrong.
Content Ingestion (RAG)	Unstructured knowledge needs to reach the agent through Salesforce vector stores	Medium. Content quality dictates outcome. Bad knowledge in, bad answers out.
1P Services Integration	Agent invokes internal enterprise APIs or tools in user context	High. Auth, scopes, audit. The trust layer earns its keep here.
Agent Interoperability	Orchestrating multiple autonomous agents under a defined user context	Highest. Don't start here. Earn the right to it.

The framing here is more important than the list. Each pattern is a conscious choice between reach and risk. Salesforce Native is the safest, but it can’t see anything outside Salesforce. Agent Interoperability is the most powerful, but it asks more of your governance than most enterprises can give. Most pilots should start with one of the first three.

When the team at Google described 1P Services Integration, where agents call internal enterprise APIs in user context, they spent more time on the trust layer than on the agent. That ratio is correct. The reason most agentic systems struggle the moment they leave Salesforce is that auth, scopes, and audit don’t scale by accident.

“Building for the sake of building.”
The failure mode Google warned the room about
TDX San Francisco 2026

The five-stage evolution: how you earn the right to the next agent

This is the slide of the session. Google’s journey, from the first Agentforce-style use case in late 2023 to a multi-agent estate in 2026 and beyond, broken into five stages. The middle of the slide draws a line between two eras: AI Inference on the left, Agentic AI on the right. That line is the boundary nearly every enterprise underestimates.

Google's five-stage agentic evolution, and what each stage proves before you advance
Stage	What it does	What it proves before you advance
1. Case Summary	LLM + prompt. Summarises records and conversations into something a human can act on.	You can stand a model up safely against your data without breaking anything.
2. Reply Generation	LLM + retrieval (RAG). Pulls in knowledge to draft responses for human review.	Your knowledge base is fit for AI consumption. Written, tagged, governed.
3. Next Best Action	LLM + retrieval + function calling. Suggests, and then takes, defined actions.	Your tools, permissions and audit trail can survive an automated caller.
4. Service Agent	Reasoning loop with many tools. The first true agentic step.	You have the observability to debug a non-deterministic system in production.
5. Multi-Agent	Agents talking to agents. Interoperability across systems and boundaries.	Your governance model handles autonomous orchestration. Most aren't ready for this.

Stages 1 to 3 are configuration. You’re using a model, you’re grounding it in your data, you’re pointing it at a defined action. The work is real, but the system is deterministic enough that delivery teams already know how to think about it.

Stage 4 is where the system changes shape. Once you give the agent many tools and a reasoning loop, you’re running a non-deterministic system in production. Same input, different output, every time. The same controls don’t apply. This is where most pilots quietly stop, because the operating model behind them was never upgraded for the new kind of system.

Stage 5 is where you let agents talk to agents. Google were honest that they’re still working through the interoperability question themselves. If they’re cautious, you should be too.

Each stage proves the thing the next stage depends on. Skip a stage and you’re borrowing risk you’ll repay with interest.

Shadow, Supervised, Automated: the three rollout phases

The five-stage evolution is what Google built. The three-phase rollout is how they put each one into the hands of real users. Every agent walks the same path.

Google's three rollout phases for every agent. The gate isn't a date, it's a number.
Phase	What happens	What you gate on
1. Shadow	The agent runs alongside the human. It produces an answer. The human ignores it, sees it, or compares it. Nothing the agent says reaches a customer.	Output quality vs the human baseline. The 80–90% accuracy bar Google won't cross until they're sure.
2. Supervised	The agent's output reaches a customer, but only after a human has reviewed and approved it. Human-in-the-loop is on every interaction.	Edit rate. If reviewers are rewriting most of what comes through, you're not ready for the next phase.
3. Automated	The agent acts directly. Humans review by exception, on the back of monitoring, not on every transaction.	Drift, error rate, escalation rate. The monitoring layer is the safety net, and you watch it actively.

The shadow phase is the most underused mechanic in enterprise AI. The agent runs, but its output never reaches a customer. You compare it against what humans actually did. You measure how often it agreed, where it diverged, where it improved. You tune it without risk.

The number Google used to advance is the part that surprises people: the agent has to clear an 80–90% accuracy threshold against the human baseline before they’ll progress it to supervised. That bar is high enough that most agents fail it on the first attempt. Which is the point. The threshold doesn’t exist to slow the team down. It exists to make sure the next phase is defensible.

Supervised is human-in-the-loop on every interaction. The metric you watch isn’t accuracy, it’s edit rate. If reviewers are rewriting most of what the agent produces, you don’t have a candidate for automation, you have a candidate for redesign.

Automated is what the headlines describe. The headlines skip the two phases before it. Without shadow and supervised, automated is just hope.

Google won’t advance an agent to the next phase until it clears their internal accuracy bar. The bar is high enough that most never clear it first time. That’s how you know the bar is real.

Six capabilities you need before you advance

None of the phases above survive contact with a real estate without an operating layer behind them. Google standardised on six capabilities. They are not nice-to-haves. They are the prerequisite to advancing.

Google's six core capabilities for AI monitoring and adoption: selective gating, centralised dashboards, feedback loop, standardised logging, prompt management, performance monitoring

The six capabilities, grouped: cost and scale control, visibility and ROI tracking, continuous quality tuning.

The six monitoring and adoption capabilities Google put in place before scaling
Capability	What it gives you
Selective Gating	Criteria-based access tiers so rollout is incremental and cost is contained
Centralised Dashboards	One interface for adoption, success rate and ROI across teams and use cases
Feedback Loop	User sentiment captured in the UI itself, fed back into model and prompt tuning
Standardised Logging	Audit-grade logs of templates, versions and I/O for quality and compliance review
Prompt Management	A shared workspace where prompts are written, versioned and deployed as code
Performance Monitoring	Continuous tracking of latency, throughput and error rates with threshold alerts

Most failed Agentforce pilots don’t fail in the agent. They fail in the absence of these six. There’s no dashboard, so adoption isn’t visible. There’s no gating, so a bad release reaches everyone at once. There’s no logging, so when something goes wrong nobody can explain why. The agent gets the blame. The architecture takes the loss.

This is the same pattern we wrote about in Your Agent Isn’t Broken. Your Architecture Is. The agent is doing exactly what the architecture allowed it to do. If the architecture allows drift in the dark, drift is what you’ll get.

The trap: building for the sake of building

The phrase came up more than once in the session. It’s the failure mode Google warned the room about, and the one most likely to be quietly happening inside any large Salesforce estate right now.

Inside an enterprise, it looks like this. A sponsor is measured on “AI deployments shipped” rather than “AI deployments adopted.” A platform team is rewarded for the demo, not the dashboard six months later. A consultancy proposes an Agentforce pilot scoped around a use case nobody actually asked for, because the use case is interesting and the buyer wants to feel modern.

The output is a string of agents that technically work. The outcome is delivery drift and value leakage. Programmes that consume budget without changing what the business looks like.

The fix is the discipline. The funnel up front. The patterns picked on purpose. The phases gated on a number, not a date. The monitoring layer in place before anything goes near production. None of it is glamorous. All of it is what stands up to scrutiny.

The five-question pre-mortem to steal

If you take one thing from the Google session, take this. Five questions to ask before you sign an Agentforce SOW. They’re short. They’re unforgiving. If you can’t answer them, you’re not ready to start.

The five questions to ask before signing an Agentforce SOW
Question	Why it matters
1. What pillar does this serve, in one line?	If you can't name the productivity, security or developer outcome, you don't have a business case yet.
2. Whose friction are we removing, and have we asked them?	Google invited their users in before they picked a use case. Skipping this is the fastest route to 'building for the sake of building'.
3. Which of the five patterns are we choosing, and why not the simpler one?	Agent Interoperability is exciting. Salesforce Native is shippable. Pick the simplest pattern that solves the problem.
4. Which step of the evolution are we on?	Each step proves something the next step depends on. Skipping a step is borrowing risk you'll repay with interest.
5. Do we have the six monitoring capabilities in place before we ship?	Most failed pilots fail not in the agent but in the absence of the operating layer around it.

The questions don’t need a delivery partner to answer. They need an honest CTO, an honest sponsor and an honest delivery lead in the same room for an afternoon. If the answers come out clean, the pilot will be defensible. If they come out fuzzy, the pilot is a budget you’ll write off.

What this means for your Agentforce pilot

Google’s playbook isn’t a Google story. Strip the brand and the budget and what you’re left with is a pattern any serious enterprise can run. Three pillars. A user-led funnel. Five patterns picked on purpose. A five-stage evolution where each stage has to earn the next. A three-phase rollout where the gate is a number. Six monitoring capabilities in place before any of it scales.

The reason the playbook works isn’t that Google are smarter. It’s that they refuse to skip steps. Most pilots fail because somebody, somewhere, decided the discipline didn’t apply this time. The output of skipping the discipline is always the same: an agent that demos beautifully, gets shipped quietly, and is quietly turned off six months later.

The version that works is the unglamorous one. Smaller. Slower at the start. Defensible at every step. The kind of pilot a CFO can sign off without flinching, and the kind a CTO can describe to the board in five sentences.

The win isn’t the agent. The win is the discipline that lets you ship the next one.

Run your Agentforce pilot the way Google ran theirs

Our 8-week Agentforce Pilot accelerator follows the same logic as the Google playbook. We start with the friction, not the technology. We pick the simplest pattern that solves the problem. We gate every phase on a number, not a date. By the end, you have a defensible answer for your board on whether to scale, redesign, or stop. No drift. No drama.

Explore the Agentforce Pilot