Agentic AI Is Changing What the Government Is Buying
Federal agencies are beginning to consider AI systems that can act, not only advise. That shift makes authority, control, evidence, and accountability central acquisition questions.
Salesforce executive Paul Tatum recently predicted that government could become the largest user of agentic artificial intelligence. The claim is notable, but it is neither independently established nor currently measurable. There is no agreed method for comparing agent adoption across industries, and the public record does not provide a complete inventory of federal agentic AI deployments.
Still, the prediction is a useful market signal.
The capability is more than incremental. Federal agencies are considering systems that can do more than retrieve information, summarize records, or generate draft content. Agentic systems may interpret objectives, plan multiple steps, access government data, invoke software tools, and execute portions of government workflows with direct operational effects.
That changes the procurement question.
The issue is no longer only whether an artificial intelligence system can produce a useful answer. It is increasingly what the system is permitted to do, how the agency will control it, how its actions can be reconstructed, and who will be responsible when those actions affect government operations.
Agentic AI is not yet a settled federal procurement category. Most publicly disclosed federal activity remains developmental, limited, or embedded within broader software and modernization initiatives. Near-term deployments are also more likely to be bounded and human-supervised than fully autonomous.
Even with those qualifications, the acquisition implications are significant. Agentic AI is beginning to change what the government is buying.
Agentic AI Is Not Simply the Next Chatbot
A conventional chatbot or generative AI assistant typically responds to a human request by producing information or content. It may answer a question, summarize a document, search a knowledge base, or prepare a draft for review.
Agentic AI can go further. An agentic system may receive an objective, break it into steps, retrieve relevant information, select among available tools, execute an action, evaluate the result, and determine what should happen next. Depending on its design, it might also request human approval, escalate an exception, or coordinate with other agents.
Careful Adoption of Agentic AI Services, federal cybersecurity guidance developed by the Cybersecurity and Infrastructure Security Agency, the National Security Agency, and U.S. and international partners, describes agentic systems as combining AI models with tools, data sources, memory, planning, and execution capabilities.
This distinguishes agentic AI from traditional automation, which generally follows predefined rules in structured environments and does not independently interpret an underspecified objective.
That does not mean federal agencies are preparing to give AI systems unrestricted authority. The more plausible federal model is bounded, permissioned, task-specific, monitored, and subject to defined human approval.
An agent might be allowed to retrieve records, prepare a case summary, route an inquiry, initiate a routine service request, or recommend the next step in a process. It may not be permitted to make a final eligibility determination, approve a payment, alter an official record, or communicate externally without human authorization.
The relevant acquisition distinction is therefore not simply whether a product uses a large language model. It is whether model output can be translated into action affecting a government workflow, record, system, employee, contractor, applicant, beneficiary, or member of the public.
Once AI can act, the procurement consequences become more demanding.
The Federal Market Is Forming Before the Category Is Settled
Agentic AI is entering the federal market, but generally not through procurements labeled exclusively as “agentic AI.”
The capability is more likely to appear within enterprise software, cloud platforms, customer relationship management systems, information technology service management, contact-center modernization, case management, intelligent automation, data platforms, systems integration, and managed services.
The General Services Administration’s OneGov agreements provide an early illustration. The agency’s agreement with ServiceNow makes advanced agentic capabilities and AI agents available through a simplified licensing model for eligible federal customers. The Gemini for Government agreement with Google similarly includes agentic capabilities, prepackaged AI agents, and tools that allow federal employees to create their own agents within a broader AI and cloud offering.
These agreements matter because they create acquisition pathways and make agentic capabilities available to federal buyers. They do not, by themselves, establish that agencies have deployed those capabilities in production or achieved the projected operational benefits.
That distinction is important. Market availability is not the same as agency adoption. A contract award is not proof of successful performance. An entry in an agency AI inventory does not necessarily establish operational deployment. A vendor demonstration does not show that the technology will perform reliably in a federal production environment.
The public record remains uneven. Agencies disclose AI activity through inventories, acquisition announcements, strategy documents, and pilot descriptions, but those materials do not always identify the contractor, architecture, autonomy level, procurement vehicle, operational scale, or measured result. The General Services Administration’s 2025 AI use-case inventory, for example, distinguishes among predeployment, pilot, and deployed systems, illustrating how widely maturity can vary even within one agency.
At the same time, federal standards and security organizations are beginning to treat AI agents as a distinct operational concern.
In February 2026, the National Institute of Standards and Technology launched its AI Agent Standards Initiative. The initiative focuses on industry-led standards, open-source protocol development, and research into agent security and identity. The National Cybersecurity Center of Excellence is separately examining software and AI agent identity and authorization, including how organizations should manage the identities, permissions, and access of software and AI agents.
These developments establish that the federal government is preparing for agentic AI as an operational technology. They also show that the underlying standards remain immature.
The government is creating purchasing channels and exploring use cases before it has developed a uniform acquisition definition, standard clause set, mature pricing model, or consistent operating framework for agentic AI.
Agentic AI Changes the Object of Acquisition
An agency acquiring an agentic capability may appear to be purchasing a software product. In practice, it may be acquiring an interconnected system capable of taking actions with operational consequences.
That system can include foundation and specialized models, agency data, retrieval systems, business rules, workflow engines, application programming interfaces, external tools, connectors, identity controls, human approval points, audit logs, monitoring services, cloud infrastructure, contractor personnel, and third-party dependencies.
The performance of the system depends on how those components interact.
A model might generate an accurate recommendation while a connector retrieves the wrong record. An agent might interpret the task correctly but receive excessive permissions. A workflow might function in routine conditions but fail to escalate an exception. A platform update could change system behavior. A third-party service could become unavailable. A consumption-based pricing structure could produce higher-than-expected costs. Inadequate logging could leave the agency unable to determine how an incorrect action occurred.
The government is therefore not merely purchasing intelligence. It may be purchasing the controlled ability to act.
That distinction affects the entire acquisition lifecycle.
Requirements must define not only the outcome the agency wants, but also the authority the system will exercise. Market research must evaluate architecture, permissions, tool access, portability, monitoring, pricing, and transition risk. Source selection must assess operational controls as well as technical capability. Acceptance testing must evaluate the system under realistic conditions. Contract terms must address data rights, configuration changes, system dependencies, performance metrics, and government access to operational evidence.
Existing acquisition authorities provide much of the necessary foundation. The challenge is applying them to a system that is more dynamic, interconnected, and operationally consequential than a conventional software license.
Federal Policy Already Points in This Direction
The federal government does not lack an AI acquisition policy foundation.
Office of Management and Budget Memorandum M-25-22, issued in April 2025, directs agencies to improve the acquisition of AI systems and services. It emphasizes cross-functional engagement, broad market research, testing in scenarios that reflect intended operating conditions, performance-based acquisition, data and intellectual-property planning, vendor-lock-in protections, portability, interoperability, pricing transparency, and ongoing performance monitoring.
The memorandum is not specific to agentic AI, but its direction becomes more consequential when an acquired system can act. Testing in the intended operating environment matters more when a system has access to agency tools and data. Portability matters more when workflows and operational history are embedded in a proprietary platform. Performance monitoring matters more when system behavior may change during contract performance.
Office of Management and Budget Memorandum M-25-21 provides the broader governance framework for federal AI use. It establishes minimum risk-management practices for high-impact AI, including predeployment testing, AI impact assessments, ongoing monitoring, and appropriate human oversight, intervention, and accountability.
Not every agentic use case will qualify as high-impact AI. When an AI system serves as a principal basis for a decision or action with significant effects on rights or safety, however, those requirements may become directly relevant.
NIST’s work on agent standards, identity, authorization, security, and interoperability further illustrates where implementation questions remain unsettled. Federal cybersecurity guidance recommends restricted access, low-risk initial use cases, continuous monitoring, and lifecycle controls.
The policy foundation is not absent. The harder task is translating broad AI policy into acquisition requirements, evaluation methods, contract terms, and surveillance practices for systems capable of action. The buyer-confidence framework below offers one way to begin that translation.
A Federal Agentic AI Buyer-Confidence Framework
What would an agency need to believe before trusting an agentic system in production?
The following eight-part framework is not an official government standard. It is an analytical structure derived from federal AI acquisition policy, cybersecurity guidance, standards activity, and procurement practice.
1. Mission Fit
Is an agentic system appropriate for the task, or would a simpler and more predictable form of automation work?
The agency should consider the operational need, process maturity, data readiness, risk level, anticipated mission value, and availability of less complex alternatives. For citizen-facing or consequential uses, the analysis should also address legal authority, privacy, records management, accessibility, civil rights, due process, and the agency’s capacity to supervise the system.
2. Authority
What may the agent do?
The agency must define which systems the agent may access, which data it may use, which tools it may invoke, which records it may create or modify, which communications it may initiate, and which actions require human approval. Prohibited actions should be as clear as permitted ones.
Authority should be an explicit system and contract design question, not an assumption embedded in technical configuration.
3. Evidence
What can the contractor demonstrate before the system operates in a federal environment?
Relevant evidence may include test results, deployment history, failure analysis, red-team findings, security assessments, governance documentation, performance data, and known limitations. Claims such as “enterprise ready,” “autonomous,” or “production proven” have limited procurement value unless the contractor can define and demonstrate them under conditions relevant to the agency.
These first three dimensions address the threshold decision: whether the use case is appropriate, whether the system’s authority is properly bounded, and whether the contractor has supplied enough evidence to justify moving forward.
4. Control
Can authorized humans supervise, constrain, interrupt, correct, override, or shut down the system?
Depending on the use case, effective control may require approval thresholds, escalation procedures, override mechanisms, emergency suspension, manual fallback, and rollback capability. Human oversight should be designed into specific decision and action points rather than offered as a general assurance.
5. Traceability
Can the agency reconstruct what happened?
That may require records of tool calls, data sources, agent actions, human approvals, prompt and policy versions, configuration changes, errors, and escalations. Without sufficient traceability, the agency may be unable to investigate an incident, validate performance, preserve federal records, respond to oversight, or determine responsibility.
6. Security and Resilience
Can the system resist manipulation, limit the effects of compromise, and fail safely?
The security boundary extends beyond the model to credentials, permissions, external content, connectors, tools, memory, retrieval systems, code execution, and downstream actions. Agencies also need to understand what happens when a model, tool, data source, connector, or external service fails, and whether the system can degrade safely while preserving mission continuity.
These operational dimensions determine whether the agency can remain in control after deployment, observe what the system is doing, and respond effectively when something goes wrong.
7. Performance and Cost
Does the system produce measurable mission value under real operating conditions?
Relevant measures may include completion rate, error rate, escalation rate, rework, processing time, human-review burden, user experience, cost per completed transaction, and mission outcome.
Cost must be evaluated with performance. A system that appears inexpensive per action may become costly if it enters loops, retries failed tasks, generates excessive escalations, requires substantial human review, or depends on multiple consumption-priced services.
8. Portability and Accountability
Can the government preserve its operational history, move away from the provider, and identify who is responsible?
The agency may need access to data, logs, workflow definitions, configurations, performance histories, prompts, policies, and audit records, along with usable export mechanisms and transition support.
Accountability must extend across the delivery chain. The prime contractor, model provider, platform provider, cloud provider, and subcontractors may each control different elements of system behavior. Their responsibilities should be visible before an incident or performance failure occurs.
Together, these dimensions move the acquisition discussion beyond whether an agent can complete a demonstration. They ask whether the agency can govern the capability throughout its operational and contractual life.
Federal AI Contracting Must Define Action
The first contracting challenge is requirements development.
A requirement to “provide an AI agent to improve case management” does not define the system’s authority, boundaries, or acceptable behavior. The agency may need to specify permitted objectives, authorized and prohibited actions, data boundaries, approval points, escalation conditions, logging requirements, and stop conditions.
The requirement should also distinguish between advice and execution. A system that recommends a case disposition presents a different risk profile from one that updates an official record, initiates correspondence, or triggers a payment workflow.
Vendor demonstrations will not resolve these questions. An agentic system may perform well in a scripted environment while failing when confronted with incomplete data, conflicting instructions, malicious content, revoked permissions, unavailable tools, or unpredictable user behavior. Evaluation and acceptance testing should therefore examine how the system behaves under abnormal and adversarial conditions, not only whether it completes an ideal task.
Pricing presents a separate challenge. Commercial agent platforms may charge by user, agent, action, conversation, transaction, token, credit, or compute usage. Federal buyers will need to define what constitutes a billable action, whether failed actions and retries count, how testing is treated, how loops are controlled, when usage alerts are triggered, and whether model substitutions change price.
Agencies must also account for the costs surrounding the agent itself, including integration, data preparation, cybersecurity, governance, monitoring, human review, and transition.
Change control may become continuous. Agent behavior can be affected by a new model, revised prompt, different retrieval source, additional connector, changed permission, new tool, or platform release. Contracts may need to distinguish routine updates from material changes requiring notice, consent, retesting, security review, updated documentation, or rollback capability.
Data rights and portability also become operational concerns. If an agency cannot retrieve its workflow definitions, logs, configurations, performance records, and operating history, changing vendors may require rebuilding the system while losing evidence needed for oversight.
Post-award monitoring must therefore be designed into the contract. Relevant measures may include completion and error rates, escalation and override rates, unauthorized-action attempts, security incidents, data-access violations, consumption costs, model changes, configuration changes, complaints, and performance drift.
For agentic systems, contract administration may be as important as source selection.
What Contractors Should Do Now
Contractors should begin by defining the capability precisely.
A federal customer should be able to understand what the system does, what it does not do, what it can access, what it can change, when a human must intervene, and what happens when the system is uncertain. “Agentic” should not function as a substitute for an operational description.
Claims should be aligned with evidence. Proposal language, capability statements, demonstrations, and agency conversations should be supported by testing, governance documentation, security controls, performance data, deployment history, and candid disclosure of limitations.
Common overclaims create avoidable risk. A contractor should not describe a pilot as production-proven performance, claim autonomy without defining the actions involved, promise portability without documenting proprietary dependencies, describe a system as auditable when the government cannot access the necessary records, or claim savings without accounting for integration, monitoring, exception handling, and human review.
Authority and control should be built into the architecture. Contractors should be prepared to produce permission matrices, human-approval workflows, prohibited-action lists, escalation rules, shutdown procedures, and manual alternatives.
Observability should be treated as a procurement feature. Federal buyers may need tool-call records, action histories, source records, configuration histories, change notices, and incident evidence. Contractors that cannot provide this visibility may struggle to establish buyer confidence even when the underlying model performs well.
Portability should be addressed before award. Companies should know what the agency can retrieve and reuse if the contract ends, including data, logs, workflow definitions, configurations, performance history, and agent policies.
Pricing should account for the full operating model. Contractors should explain what drives usage, what constitutes a completed transaction, how failed actions are treated, how consumption is monitored, how human oversight affects economics, and which integration or governance costs sit outside the platform license.
Contractors should also inventory every material dependency, including models, cloud providers, platforms, tools, connectors, data sources, subcontractors, open-source components, and external services. A vendor cannot credibly allocate risk or promise portability without understanding its delivery chain.
Teaming arrangements require particular attention. When agentic capabilities are distributed among a prime contractor, model provider, platform vendor, and subcontractors, the parties should establish which entity controls portability, audit access, system changes, and responsibility for performance.
Overstating agentic capability creates risk across the procurement lifecycle. Evaluation claims that cannot be demonstrated, delivery commitments that the architecture cannot support, and contract failures that damage past performance are all downstream consequences of positioning that outruns the evidence.
Adoption May Be Significant Without Being Fast
Government is a plausible major market for agentic AI. Federal agencies operate at enormous scale, manage complex administrative processes, interact with large populations, and face continuing pressure to improve service and productivity.
Those conditions create genuine demand. They do not eliminate the barriers to adoption.
Agent standards remain immature. Security guidance recommends constrained access and low-risk initial uses. Agency data and legacy systems may not support reliable automation. Acquisition and security authorization can take time. Performance baselines are often weak. Consumption pricing may complicate budgeting. Acquisition teams may lack the technical expertise needed to evaluate agent architecture. Consequential public-facing uses may also raise privacy, civil rights, due-process, records, accessibility, and public-trust concerns.
A bounded use case performing well in a controlled environment does not establish that the same system will operate reliably across multiple missions, datasets, components, and security boundaries.
The near-term federal market is therefore more likely to center on human-supervised, task-specific uses within administrative and service workflows. Case assistance, knowledge retrieval, contact-center support, internal operations, document processing, and controlled workflow execution are more plausible starting points than broad autonomous authority.
That is still a meaningful acquisition shift. It is simply more measured than the largest vendor forecasts suggest.
When the object of acquisition includes the controlled ability to act, buyer confidence is not a secondary concern. It is the central one.
The Market Will Turn on Buyer Confidence
The most important question is not how many AI agents the federal government will deploy.
It is whether agencies can trust those agents to remain within defined authority, protect government information, produce measurable value, withstand manipulation, preserve an adequate audit trail, and remain subject to accountable human control.
The market will not mature merely because artificial intelligence can act. It will mature when agencies know how to define, buy, test, constrain, price, observe, monitor, and accept responsibility for those actions.
In the federal market, the advantage will not belong to the company promising the most autonomy. It will belong to the company that gives the government the most confidence in how that autonomy is controlled.