Every federal CIO I have spoken with in the past eighteen months has a version of the same story. Leadership wants an AI strategy. OMB wants an AI use case inventory. The agency head wants a demo at the next all-hands. And somewhere in the middle of all of it, the data engineering team is staring at a pipeline that was designed a decade ago and is now supposed to power a large language model.
The pressure is real. The EO on Safe, Secure, and Trustworthy AI directed civilian agencies to designate Chief AI Officers, publish inventories of high-impact AI use cases, and begin assessing their AI maturity. OMB M-24-10 added accountability structures and risk management requirements on top of that. The CISA AI roadmap pushed agencies toward active threat modeling of AI systems. Each layer of policy is legitimate on its own. Stacked, they create a mandate environment that agencies are trying to outrun with infrastructure that was never designed for the race.
What follows is not a policy summary. There are plenty of those. This is a note on what we actually see on the ground when we walk into federal AI engagements, the patterns that predict failure, and the sequencing that works.
"The agencies making real progress on AI are not the ones with the biggest AI budgets. They are the ones that spent the previous two years fixing their data."Gaurav Arora, Founder & CEO
Let us ground this quickly before going further. Three documents shape most of what agencies are being asked to do right now.
OMB M-24-10 (Advancing Governance, Innovation, and Risk Management for Agency Use of Artificial Intelligence) requires agencies to designate a Chief AI Officer with real authority, not a ceremonial title bolted onto an existing role. It requires risk management practices for high-impact AI, minimum practices for rights-impacting and safety-impacting systems, and annual reporting on use case inventories. The accountability structure is meaningful: CAIOs are responsible to agency heads, and non-compliance carries the kind of visibility that tends to show up in FITARA scorecards and congressional inquiries.
EO 14110 follow-on actions created specific cross-agency requirements around AI workforce development, procurement standards for AI vendors, and interagency data sharing for AI safety evaluation. The AI Safety Institute at NIST is now a real counterpart for agencies trying to evaluate frontier model risk. Procurement is getting harder, not easier, which means agencies that want to move fast on AI need FedRAMP-authorized or equivalent paths, not experimental vendor relationships.
Zero-trust architecture requirements under M-22-09 and M-23-22 intersect with AI in ways that are not always obvious. AI systems consume identity signals, network telemetry, and device posture data to function effectively. An agency that has not completed its zero-trust data pillar is missing the instrumentation that makes behavioral AI useful. The two programs, ZTA and AI, are not sequential. They are parallel, and agencies that treat them as separate work streams end up doing redundant infrastructure work at significant cost.
Chief AI Officers must have the authority to halt high-impact AI deployments that fail risk review. That authority means little without a functioning governance process, which requires data lineage, audit logging, and a model registry. Very few agencies have all three.
Program offices submitting use case inventories need to classify each use case against the rights-impacting and safety-impacting thresholds in the OMB memo. Classification requires documentation of training data provenance, model version, deployment context, and human oversight mechanism. For most legacy AI deployments, that documentation simply does not exist.
Across the federal AI engagements we have run, the failures cluster around the same three patterns. Naming them is useful because they are preventable, but only if you recognize them early.
The most common failure mode is also the most avoidable. An agency runs a proof of concept with a commercial large language model, gets impressive demo results, and kicks off a production authorization effort, only to discover three months in that the underlying data is inconsistent, unlinked across systems, or governed by legal restrictions that make it unusable for model training or inference in production.
The demo works because demos are staged. The model is pointed at a curated sample of clean documents. Production does not look like that. Production looks like seventeen authoritative data sources with inconsistent schema, PII that has not been scanned or redacted, retention policies that have not been adjudicated for AI use, and access controls that were designed for human users, not service accounts running inference workloads.
The fix is not to slow down the model evaluation. It is to run the data readiness assessment in parallel, before you commit engineering capacity to the authorization path. Six weeks of data profiling is cheaper than six months of an authorization effort that dies when the data layer fails review.
FedRAMP authorization means a cloud service provider has met a defined set of security controls. It does not mean the service is appropriate for your use case, your data classification level, or your agency's specific risk posture. We see agencies short-circuit their own vendor evaluation because a vendor has a FedRAMP Moderate authorization, when the actual workload involves data that should be handled at High, or when the authorized boundary does not include the specific AI service the agency wants to use.
The FedRAMP marketplace is maturing for AI services, but it is still incomplete. Several of the most capable commercial AI services remain under review or authorized only at Moderate. Agencies with High or Sensitive Compartmented requirements need to plan for either on-premise or air-gapped deployment, or accept that the timeline for commercial cloud authorization will drive their schedule. Neither of those is a bad answer. The mistake is assuming FedRAMP Moderate is a blanket approval to proceed.
The third failure mode is structural. An agency stands up an AI center of excellence at the enterprise level, but the actual AI deployments are happening inside program offices with their own IT shops, their own data systems, and their own interpretations of what the CAIO memo requires of them. The center of excellence publishes guidance. Program offices ignore it or work around it, not out of malice but because they have delivery pressure and the governance process adds time without adding value in their specific context.
The result is an inventory of AI use cases that does not match reality, risk reviews conducted after the fact, and a CAIO who is nominally accountable for systems they were not consulted on. That is the environment that produces the kind of AI incidents that trigger congressional scrutiny.
The agencies making real progress are not distinguished by having larger budgets or more permissive acquisition vehicles, though both help. They share three operational characteristics that are replicable.
They invested in data infrastructure before the AI mandate arrived. Cloud migration, data lake consolidation, and metadata management work that happened in 2022 and 2023 is now paying forward. Agencies that treated those programs as modernization for its own sake, not as AI preparation, are finding that the boring infrastructure work is the actual AI readiness investment. The ones that deferred it are now trying to do two things at once.
They have a functioning CAIO with cross-agency authority. Not a coordinator, not a liaison. An officer with the standing to pause a deployment and require remediation before it proceeds. That kind of authority only works if leadership has explicitly signaled it, and if the governance process is fast enough to not be a bottleneck. The most effective CAIO offices we have seen run a lightweight, tiered review: low-impact use cases get a fast-track review in two weeks, high-impact ones get a full panel, and the criteria are published and consistent.
They treat AI as an engineering problem, not a procurement problem. The agencies that are moving are the ones where the CAIO office and the CIO office are working the same problem from both ends: governance from one side, infrastructure from the other. The agencies that are stuck are the ones where AI is primarily a contracting conversation, where the question is which vendor to award to rather than what the agency needs to build or fix to make any vendor successful.
"An ATO on a broken data pipeline is a compliance artifact. It is not a working AI system."From the federal practice
When we come into a federal AI engagement, we use a five-horizon model called Aizen, developed through our Otonmi AI division, to sequence the work. The model is designed to prevent what we call pilot purgatory: the state in which an agency has demonstrated AI capability but cannot move it to production because the surrounding infrastructure was never built.
The five horizons are not phases with hard gates. They are parallel tracks with dependencies. Here is how they map to the federal context.
If you are reading this from inside a federal agency, a specific sequence of actions is more useful than a framework. Here is what we would recommend given where most agencies are in their AI journey right now.
Audit your existing AI deployments before submitting your use case inventory. OMB requires inventories to include high-impact use cases. Many agencies have systems that meet the definition of high-impact AI that were deployed before the current governance requirements existed. Finding those systems and bringing them under governance is harder but more important than identifying new use cases. Undocumented deployments are a liability, not an asset.
Run a data readiness assessment on your top three candidate use cases. Six weeks, real data, with your data stewards and legal counsel in the room from the start. The output should be a clear statement of what exists, what is missing, what requires remediation, and what the timeline and cost of remediation looks like. That document is what separates an AI strategy from an AI aspiration.
Map your AI infrastructure needs against your zero-trust implementation status. If you are behind on ZTA, your AI program will be slower than you expect, because the data instrumentation required for both programs overlaps significantly. Understanding the intersection lets you sequence the work to avoid doing it twice.
Establish the CAIO governance process before announcing use cases publicly. The governance process needs to be operational before it is tested. Announcing use cases and then building the review process creates the wrong kind of pressure. Agencies that have gotten this right stood up the process quietly, ran a few use cases through it to calibrate the criteria, and then opened it to program offices with a working model to follow.
The federal AI moment is real. The policy pressure is real. The opportunity to use AI to improve service delivery, reduce administrative burden, and make better decisions with complex data is also real. None of that changes the fact that the work is hard, and that most of the hard work is not about AI. It is about the data infrastructure, the governance process, and the organizational clarity that AI requires to function at scale.
The agencies that will be ahead in two years are the ones doing that unglamorous work right now, while everyone else is still debating which model to buy.
Bring the problem. We'll come back with a written brief: what to build, what to defer, and where AI actually moves the number. No deck pitches.