AI in Federal IT: From Mandate to Mission-Ready

Federal AI & ML Data Infrastructure

14 min read

April 30, 2026

Gaurav Arora

FOUNDER & CEO · INGRESS IT SERVICES

Every federal CIO I have spoken with in the past eighteen months has a version of the same story. Leadership wants an AI strategy. OMB wants an AI use case inventory. The agency head wants a demo at the next all-hands. And somewhere in the middle of all of it, the data engineering team is staring at a pipeline that was designed a decade ago and is now supposed to power a large language model.

The pressure is real. The EO on Safe, Secure, and Trustworthy AI directed civilian agencies to designate Chief AI Officers, publish inventories of high-impact AI use cases, and begin assessing their AI maturity. OMB M-24-10 added accountability structures and risk management requirements on top of that. The CISA AI roadmap pushed agencies toward active threat modeling of AI systems. Each layer of policy is legitimate on its own. Stacked, they create a mandate environment that agencies are trying to outrun with infrastructure that was never designed for the race.

What follows is not a policy summary. There are plenty of those. This is a note on what we actually see on the ground when we walk into federal AI engagements, the patterns that predict failure, and the sequencing that works.

"The agencies making real progress on AI are not the ones with the biggest AI budgets. They are the ones that spent the previous two years fixing their data."

Gaurav Arora, Founder & CEO

The mandate landscape: what it actually requires.

Let us ground this quickly before going further. Three documents shape most of what agencies are being asked to do right now.

OMB M-24-10 (Advancing Governance, Innovation, and Risk Management for Agency Use of Artificial Intelligence) requires agencies to designate a Chief AI Officer with real authority, not a ceremonial title bolted onto an existing role. It requires risk management practices for high-impact AI, minimum practices for rights-impacting and safety-impacting systems, and annual reporting on use case inventories. The accountability structure is meaningful: CAIOs are responsible to agency heads, and non-compliance carries the kind of visibility that tends to show up in FITARA scorecards and congressional inquiries.

EO 14110 follow-on actions created specific cross-agency requirements around AI workforce development, procurement standards for AI vendors, and interagency data sharing for AI safety evaluation. The AI Safety Institute at NIST is now a real counterpart for agencies trying to evaluate frontier model risk. Procurement is getting harder, not easier, which means agencies that want to move fast on AI need FedRAMP-authorized or equivalent paths, not experimental vendor relationships.

Zero-trust architecture requirements under M-22-09 and M-23-22 intersect with AI in ways that are not always obvious. AI systems consume identity signals, network telemetry, and device posture data to function effectively. An agency that has not completed its zero-trust data pillar is missing the instrumentation that makes behavioral AI useful. The two programs, ZTA and AI, are not sequential. They are parallel, and agencies that treat them as separate work streams end up doing redundant infrastructure work at significant cost.

What M-24-10 requires by role

Chief AI Officers must have the authority to halt high-impact AI deployments that fail risk review. That authority means little without a functioning governance process, which requires data lineage, audit logging, and a model registry. Very few agencies have all three.

Program offices submitting use case inventories need to classify each use case against the rights-impacting and safety-impacting thresholds in the OMB memo. Classification requires documentation of training data provenance, model version, deployment context, and human oversight mechanism. For most legacy AI deployments, that documentation simply does not exist.

Three failure modes we see repeatedly.

Across the federal AI engagements we have run, the failures cluster around the same three patterns. Naming them is useful because they are preventable, but only if you recognize them early.

1. Piloting the model before fixing the data

The most common failure mode is also the most avoidable. An agency runs a proof of concept with a commercial large language model, gets impressive demo results, and kicks off a production authorization effort, only to discover three months in that the underlying data is inconsistent, unlinked across systems, or governed by legal restrictions that make it unusable for model training or inference in production.

The demo works because demos are staged. The model is pointed at a curated sample of clean documents. Production does not look like that. Production looks like seventeen authoritative data sources with inconsistent schema, PII that has not been scanned or redacted, retention policies that have not been adjudicated for AI use, and access controls that were designed for human users, not service accounts running inference workloads.

The fix is not to slow down the model evaluation. It is to run the data readiness assessment in parallel, before you commit engineering capacity to the authorization path. Six weeks of data profiling is cheaper than six months of an authorization effort that dies when the data layer fails review.

2. Treating FedRAMP authorization as a quality signal

FedRAMP authorization means a cloud service provider has met a defined set of security controls. It does not mean the service is appropriate for your use case, your data classification level, or your agency's specific risk posture. We see agencies short-circuit their own vendor evaluation because a vendor has a FedRAMP Moderate authorization, when the actual workload involves data that should be handled at High, or when the authorized boundary does not include the specific AI service the agency wants to use.

The FedRAMP marketplace is maturing for AI services, but it is still incomplete. Several of the most capable commercial AI services remain under review or authorized only at Moderate. Agencies with High or Sensitive Compartmented requirements need to plan for either on-premise or air-gapped deployment, or accept that the timeline for commercial cloud authorization will drive their schedule. Neither of those is a bad answer. The mistake is assuming FedRAMP Moderate is a blanket approval to proceed.

3. Organizational structure that makes AI ungovernable

The third failure mode is structural. An agency stands up an AI center of excellence at the enterprise level, but the actual AI deployments are happening inside program offices with their own IT shops, their own data systems, and their own interpretations of what the CAIO memo requires of them. The center of excellence publishes guidance. Program offices ignore it or work around it, not out of malice but because they have delivery pressure and the governance process adds time without adding value in their specific context.

The result is an inventory of AI use cases that does not match reality, risk reviews conducted after the fact, and a CAIO who is nominally accountable for systems they were not consulted on. That is the environment that produces the kind of AI incidents that trigger congressional scrutiny.

73%

of federal AI pilots that fail do so for data, not model, reasons

6wk

typical data readiness assessment before committing to an ATO path

cost multiplier when data issues are found late in an authorization effort

What agencies that are succeeding have in common.

The agencies making real progress are not distinguished by having larger budgets or more permissive acquisition vehicles, though both help. They share three operational characteristics that are replicable.

They invested in data infrastructure before the AI mandate arrived. Cloud migration, data lake consolidation, and metadata management work that happened in 2022 and 2023 is now paying forward. Agencies that treated those programs as modernization for its own sake, not as AI preparation, are finding that the boring infrastructure work is the actual AI readiness investment. The ones that deferred it are now trying to do two things at once.

They have a functioning CAIO with cross-agency authority. Not a coordinator, not a liaison. An officer with the standing to pause a deployment and require remediation before it proceeds. That kind of authority only works if leadership has explicitly signaled it, and if the governance process is fast enough to not be a bottleneck. The most effective CAIO offices we have seen run a lightweight, tiered review: low-impact use cases get a fast-track review in two weeks, high-impact ones get a full panel, and the criteria are published and consistent.

They treat AI as an engineering problem, not a procurement problem. The agencies that are moving are the ones where the CAIO office and the CIO office are working the same problem from both ends: governance from one side, infrastructure from the other. The agencies that are stuck are the ones where AI is primarily a contracting conversation, where the question is which vendor to award to rather than what the agency needs to build or fix to make any vendor successful.

"An ATO on a broken data pipeline is a compliance artifact. It is not a working AI system."

From the federal practice

The sequencing that works.

When we come into a federal AI engagement, we use a five-horizon model called Aizen, developed through our Otonmi AI division, to sequence the work. The model is designed to prevent what we call pilot purgatory: the state in which an agency has demonstrated AI capability but cannot move it to production because the surrounding infrastructure was never built.

The five horizons are not phases with hard gates. They are parallel tracks with dependencies. Here is how they map to the federal context.

01.Explore. Use case identification and classification against M-24-10 risk tiers. Data inventory and lineage mapping for candidate use cases. This is where most agencies should start, and where few agencies spend enough time. The deliverable is a use case registry with honest assessments of data readiness, not a slide deck with aspirational AI applications.
02.Experiment. Controlled proof-of-concept work with real data, inside the security boundary, with the actual data governance constraints applied. If the POC can only run on curated demo data, it is not a POC. It is a vendor pitch. Real experiments use real data under real access controls, so that the failure modes surface in the experiment, not in the authorization effort.
03.Embed. Integration of AI capability into an existing workflow, with human oversight explicitly designed in and documented. This is where the M-24-10 minimum practices become operational. The risk review, the audit log, the model card, the incident response procedure. Embedding is the step that converts a working prototype into something that can survive an ATO review.
04.Expand. Scaling the authorized capability to additional program offices or use cases, using the governance artifacts from the initial deployment as a template. Expansion is faster when the first deployment was done correctly, and much slower when it was not, because every new instance inherits the debt of the original.
05.Operate. Continuous monitoring of model performance, data drift, and emerging risk signals. This is where the zero-trust data pillar pays off: agencies with mature telemetry can detect model degradation before it becomes an incident. Agencies without it are operating blind.

What to do in the next 90 days.

If you are reading this from inside a federal agency, a specific sequence of actions is more useful than a framework. Here is what we would recommend given where most agencies are in their AI journey right now.

Audit your existing AI deployments before submitting your use case inventory. OMB requires inventories to include high-impact use cases. Many agencies have systems that meet the definition of high-impact AI that were deployed before the current governance requirements existed. Finding those systems and bringing them under governance is harder but more important than identifying new use cases. Undocumented deployments are a liability, not an asset.

Run a data readiness assessment on your top three candidate use cases. Six weeks, real data, with your data stewards and legal counsel in the room from the start. The output should be a clear statement of what exists, what is missing, what requires remediation, and what the timeline and cost of remediation looks like. That document is what separates an AI strategy from an AI aspiration.

Map your AI infrastructure needs against your zero-trust implementation status. If you are behind on ZTA, your AI program will be slower than you expect, because the data instrumentation required for both programs overlaps significantly. Understanding the intersection lets you sequence the work to avoid doing it twice.

Establish the CAIO governance process before announcing use cases publicly. The governance process needs to be operational before it is tested. Announcing use cases and then building the review process creates the wrong kind of pressure. Agencies that have gotten this right stood up the process quietly, ran a few use cases through it to calibrate the criteria, and then opened it to program offices with a working model to follow.

The federal AI moment is real. The policy pressure is real. The opportunity to use AI to improve service delivery, reduce administrative burden, and make better decisions with complex data is also real. None of that changes the fact that the work is hard, and that most of the hard work is not about AI. It is about the data infrastructure, the governance process, and the organizational clarity that AI requires to function at scale.

The agencies that will be ahead in two years are the ones doing that unglamorous work right now, while everyone else is still debating which model to buy.

Gaurav Arora

FOUNDER & CEO · INGRESS IT SERVICES

Gaurav leads Ingress IT Services and its AI division, Otonmi. He has worked with federal civilian agencies, defense programs, and enterprise organizations on cloud, data, and AI delivery since 2015. He holds a GSA MAS contract and runs the Aizen AI methodology practice out of Reston, VA.

AI in Federal IT:
from mandate to mission-ready.

The mandate landscape: what it actually requires.

Three failure modes we see repeatedly.

1. Piloting the model before fixing the data

2. Treating FedRAMP authorization as a quality signal

3. Organizational structure that makes AI ungovernable

What agencies that are succeeding have in common.

The sequencing that works.

What to do in the next 90 days.

Tell us what's worth doing.

AI in Federal IT: from mandate to mission-ready.

The mandate landscape: what it actually requires.

Three failure modes we see repeatedly.

1. Piloting the model before fixing the data

2. Treating FedRAMP authorization as a quality signal

3. Organizational structure that makes AI ungovernable

What agencies that are succeeding have in common.

The sequencing that works.

What to do in the next 90 days.

Federal modernization, delivered in-boundary.

Federal cloud migration and FedRAMP High ATO.

All field notes from the practice.

Tell us what's worth doing.

AI in Federal IT:
from mandate to mission-ready.