AI Governance
The Question Every CIO Should Ask Their Salesforce Consultant in 2026: 'Show Me Your AI Review Stack'
Last quarter I sat in a room with a CIO whose Salesforce org had just failed an AppExchange security review. The issue was a dozen dynamic SOQL queries that skipped CRUD and field-level security checks — a known, documented, trivially-catchable class of bug. Their consulting partner had reviewed and approved every one of those pull requests over the prior eighteen months. Manually. By a senior engineer. Who was doing their honest best.
The CIO asked me a good question: “If they had a senior engineer reviewing every line, how did twelve of these get through?”
The answer is that humans miss things. Senior engineers miss things. That’s why pilots use checklists, surgeons use timeouts, and software teams use automation. The question in 2026 isn’t whether your Salesforce consultant has senior engineers — the good ones all do. The question is whether those senior engineers are augmented by the tooling that catches what humans miss.
This post is for CIOs, VPs of Engineering, and Salesforce app owners who are evaluating consulting partners — either a new one, or the one they already have. It gives you one question to ask, a seven-point checklist to score the answer, and a public benchmark you can verify without being a developer yourself.
The One Question
The next time you’re on a call with a Salesforce consultant — your current one or a new one you’re evaluating — ask this:
“Show me your AI-augmented code review stack.”
Not “do you use AI.” Every vendor will say yes. The interesting answers start when you ask them to show you.
A vendor that’s serious about this has a demonstrable stack: skills, validators, hooks, language servers, scoring, and a clear story about what their senior engineers do on top. A vendor that isn’t will pivot to talking about “methodology” or “our proven framework” or “AI-assisted development” as a marketing line. You can hear the difference in the first thirty seconds.
Here’s what to listen for.
The Seven-Point Checklist
Score the vendor’s answer against these seven points. A modern Salesforce consulting partner in 2026 should hit all seven. If they hit five, they’re keeping up. If they hit fewer than three, they’re falling behind.
1. Language Server Protocol (LSP) integration for Apex and LWC
Language servers give the AI real-time feedback about syntax errors, type mismatches, and symbol resolution — the same thing your engineers see in their editor. A serious review stack runs the Apex Language Server (apex-jorje-lsp) and the LWC Language Server during review, not just as an editor convenience. This catches a whole class of issues before the review even starts.
Good answer: “We run apex-jorje-lsp and @salesforce/lwc-language-server as part of our review loop. If the LSP flags a syntax error, the review fixes it automatically and re-runs.”
Bad answer: “We use VS Code, which has language servers built in.” (That’s editor tooling, not review tooling.)
2. Governor limit and bulkification analysis
Generic AI code review misses bulkification constantly. The classic example: a for loop that calls a helper method, and the helper method does a SOQL query. The loop looks clean. The helper looks clean. Together, they’re an N+1 query waiting to blow up on bulk load. A Salesforce-aware review stack traces these calls and flags them.
Good answer: “Our Apex scorer specifically looks for SOQL and DML inside loops, including when they’re wrapped in helper methods. We’ve got about 90 validation points on Apex alone.”
Bad answer: “We follow Salesforce best practices.” (That’s a poster on a wall, not a tool.)
3. CRUD / Field-Level Security enforcement
This is the one that cost the CIO at the top of this post an AppExchange submission. Dynamic SOQL and DML operations need explicit Security.stripInaccessible or WITH SECURITY_ENFORCED clauses — and these are trivially easy to miss manually because the code compiles and runs fine. AI review with the right rules catches them every time.
Good answer: “Every Database.query and every DML statement gets checked for CRUD/FLS enforcement. We’ve extended the base sf-apex skill with a custom validator because this is the number one AppExchange rejection reason we see.”
Bad answer: “We do security review at the end of the project.” (Too late. Costs 10× to fix in QA.)
4. SOQL schema validation before execution
A surprising amount of broken code makes it to sandboxes because a SOQL query references a field that was renamed, removed, or never existed. A modern review stack validates SOQL against the live org schema before the code ever runs. This catches typos, renamed fields, and managed-package field-name collisions at review time — not at deploy time.
Good answer: “We run SOQL pre-validation against the target org’s schema. If a query references a field that doesn’t exist, the review catches it before CI.”
Bad answer: “SOQL errors show up in tests.” (Fine, but expensive — you’re paying for three rounds of CI to find what schema validation would find in three seconds.)
5. Scored, quantitative review output — not just prose
A good review stack doesn’t just say “looks good.” It produces a score — how many validation points passed, how many failed, and why. This makes review output auditable: you can see, sprint over sprint, whether your codebase quality is improving or degrading. It also makes consultants accountable — the score is a number, not a vibe.
Good answer: “Every PR gets a scored review — Apex at 90 points, LWC at 165, Flow at 110, SOQL at 100. We send you the trend over the engagement.”
Bad answer: “Our senior engineer reviews and approves each PR.” (Fine, but unauditable. You’re trusting the human without a paper trail.)
6. Hook-based auto-validation on every Write/Edit
The difference between “we run AI review on PRs” and “we run AI review on every line of code as it’s written” is enormous. A mature stack uses PostToolUse hooks that fire automatically after every file edit — running Prettier, PMD, ESLint, and custom scorers — so the AI catches issues in the same second they’re introduced, not fifteen minutes later when the engineer opens a PR.
Good answer: “Our hooks fire on every file write. The validators run in the background and the engineer sees issues immediately — no wait, no PR cycle.”
Bad answer: “We run review when the PR is opened.” (Better than nothing, but you’re paying for slower feedback loops.)
7. A clear story about what the humans do on top
This is the tell. The best vendors are the ones who can articulate precisely what their senior engineers do that the AI doesn’t. Business logic validation. Regression intent. Org-specific context. Architectural review. The AI is a force multiplier; the humans are still doing the hard thinking. A vendor that says “AI does everything” is lying. A vendor that says “AI does nothing meaningful” is lying differently. The truth is a clear division of labor, and a good vendor can explain it in two minutes.
Good answer: “AI catches the mechanical stuff — governor limits, security, patterns, scoring. Our senior engineers focus on intent: is this the right abstraction, does it match the business logic, will it scale with your data model, does it fit the existing architecture? We don’t use AI to skip senior review — we use it to make senior review faster.”
Bad answer: “Our AI is state of the art.” (Not an answer.)
The Public Benchmark You Can Verify
Here’s the part that makes this checklist useful even if you’re not a developer: most of what I just described is available as an open-source project that any Salesforce consultant can install in five minutes.
It’s called sf-skills, maintained by Jag Valaiyapathy under an MIT license. It’s a free, public repository containing 36 production-grade Salesforce skills for Claude Code: Apex (90-point scoring), LWC (165-point scoring with accessibility and SLDS checks), Flow (110-point scoring), SOQL (100-point scoring with live query plan analysis), plus skills for Data Cloud, Agentforce, OmniStudio, and more. It includes the LSP integration, the hook system, the validator dispatcher, and the auto-fix loops I described above. All open source. All free.
This is important because it sets the floor for what AI-augmented Salesforce review should look like in 2026. Any consultant who isn’t at least running sf-skills — or something comparable — is demonstrably behind a free, publicly available baseline. You don’t have to take our word for it; you can send them the repository link and ask them to show you their equivalent.
We use sf-skills as our foundation at Cumulus Vision. We didn’t write it, and we’re not going to pretend we did. Jag and the contributors have built something genuinely excellent, and reinventing it would be wasteful engineering theater. What we do on top of it is the work worth paying for: extensions for the gaps we hit on real engagements, tighter integration with CI pipelines, senior engineers making architectural calls, and governance that maps AI review output to audit requirements.
That’s the honest version of “AI-augmented Salesforce delivery” in 2026. A strong open-source foundation plus senior engineering judgment plus client-specific extensions. Anything less is falling behind; anything that claims more is probably marketing.
How to Actually Run the Conversation
You don’t need to be a developer to have this conversation productively. Here’s a script:
- Send them the link to
sf-skillsbefore the call. Ask them to come prepared to discuss it. - On the call, ask the one question: “Show me your AI-augmented code review stack running on real code.”
- Score them against the seven-point checklist. You don’t need to understand the technical details — you need to see whether they can show you something working, in real time, or whether they dodge.
- Ask what they add on top of
sf-skills(or their equivalent). The answer should be specific: named extensions, named validators, a clear story about human review. Vague answers here are a signal. - Ask for one sample PR review output. A scored, written review artifact. If they can’t produce one in under an hour, they probably don’t generate them.
A serious partner will handle all five of these in a single call. They’ll probably enjoy the conversation — it’s a chance to show work they’re proud of. A partner that’s falling behind will schedule a follow-up, pivot to talking about “methodology,” or send you a case study PDF.
The 2026 Salesforce Partner
The shorthand for what we’re describing: in 2026, a competent Salesforce consulting partner has senior engineers and a demonstrable AI review stack. Not one or the other — both. The senior engineers handle architecture, business logic, and judgment calls. The AI stack handles the mechanical floor: governor limits, bulkification, security, schema validation, and consistent scoring. Together they ship faster, cheaper, and with fewer defects than either could alone.
If your current consultant can check every box on this list, you have a good partner. Keep them.
If they can check five or six, have the conversation about closing the gap. Most of the missing pieces are free and open source.
If they can check fewer than three, you’re paying 2026 rates for 2023 delivery. That’s the conversation worth having at your next QBR.
And if you want to see what every box checked actually looks like on your own Salesforce codebase — not a slide deck, not a case study, a live demo of the review stack running on a slice of your code — we’ll set that up. Send us a sample of your org and we’ll show you exactly what a modern Salesforce partner looks like in practice.
Beyond Code Review: Custom Agent Orchestration
Code review is one slice of what AI-augmented Salesforce delivery looks like in 2026. It’s the slice most teams ask about first because it’s the most visible. But the bigger shift — the one that actually changes how Salesforce projects get delivered — is custom agent orchestration across the full tool chain.
At Cumulus Vision we’ve built our own agent orchestration layer on top of the Claude Agent SDK, purpose-built for Salesforce teams who live inside an enterprise toolchain. What it does today:
- Connects Salesforce to Jira. User stories, acceptance criteria, and sprint context flow into the agents that scaffold metadata, Apex, and LWC. When a PR lands, the Jira ticket updates automatically with the scored review output from the seven-point stack above.
- Drives the CI/CD pipeline on Azure DevOps. Agents open PRs, request reviews, respond to pipeline failures, and gate deployments based on the same scored review criteria. Nothing ships without hitting the score threshold, and every gate decision is logged with full audit context.
- Runs with governance that security will actually sign off on. Every agent action is logged with prompt, response, and outcome. Permissions are scoped per-agent, per-tool, per-environment. Nothing runs without policy, and nothing runs without a paper trail. Designed to hold up under mid-market CISO review.
The stack we run today is Salesforce + Jira + Azure DevOps because that’s what our current clients run. But the orchestration layer is tool-agnostic: if your stack is Salesforce + ServiceNow + GitHub Actions, or Salesforce + Linear + GitLab, or something else entirely, the architecture transfers. We’ve designed it so that adding a new tool is a configuration problem, not a rewrite.
This is the work CIOs are asking us about now — not “can you write Apex with Claude” (yes, so can everyone) but “can you give my team an agent layer that makes our specific toolchain faster, with the governance our security team requires.” That’s a different conversation, and it’s the one worth having if you’re planning a serious AI investment in 2026.
If that’s where your head is, let’s talk. We’ll scope your toolchain, design the orchestration, and deploy an agent layer with the governance model baked in from day one.