For the complete documentation index, see llms.txt. Markdown versions of all docs pages are available by appending .md to any docs URL.
Model costs
Price LLM requests with a model cost catalog and expose realized USD costs in logs, traces, and metrics.
Agentgateway can compute the realized USD cost of each LLM request when you provide a model cost catalog. With a catalog in place, agentgateway attributes cost per request in access logs, traces, and metrics, and exposes the values to CEL expressions as llm.cost and llm.costRates.
Agentgateway does not ship a built-in catalog. Costs are computed only when you configure one (for example, a catalog that you generate with agctl costs import).
In Kubernetes mode, you deliver the catalog as a ConfigMap and reference it from a Gateway-level AgentgatewayParameters resource.
Step 1: Prepare a catalog
Prepare a catalog by creating your own JSON file or using the agctl costs import command.
Catalog JSON format
A model cost catalog is JSON with the following high-level structure. Field names are camelCase, and unknown fields are rejected.
{
"providers": {
"<provider-id>": {
"models": {
"<model-name>": {
"rates": {
"input": "0.0",
"output": "0.0",
"cacheRead": "0.0",
"cacheWrite": "0.0",
"reasoning": "0.0",
"inputAudio": "0.0",
"outputAudio": "0.0"
},
"tiers": [
{
"contextOver": 200000,
"rates": {
"input": "0.0",
"output": "0.0"
}
}
]
}
}
}
}
}Key points:
- Lookups are by provider id (such as
openai,anthropic, orgcp.gemini) and model name (such asgpt-4o-mini). - Rates are strings (exact decimals), in USD per 1,000,000 tokens.
- If a rate is omitted, that token type is not priced for the model.
tiers[]is optional. Each tier selects alternaterateswhen the request context length is over the tier’scontextOvervalue. Tiers must be ordered by strictly increasingcontextOver.
The following minimal example prices two OpenAI models and one tiered Gemini model:
{
"providers": {
"openai": {
"models": {
"gpt-4o-mini": {
"rates": { "input": "0.15", "output": "0.6", "cacheRead": "0.075" }
}
}
},
"gcp.gemini": {
"models": {
"gemini-2.5-pro": {
"rates": { "input": "1.25", "output": "10", "cacheRead": "0.125" },
"tiers": [
{
"contextOver": 200000,
"rates": { "input": "2.5", "output": "15", "cacheRead": "0.25" }
}
]
}
}
}
}
}Generate a catalog with agctl
Use agctl costs import to generate a catalog JSON file, then load it into a ConfigMap.
Generate a catalog from a supported source. By default,
agctl costs importimports every provider that the proxy supports from models.dev. To import only a subset of providers, pass a comma-separated list to--providers.agctl costs import --pretty --providers openai,anthropic --out ./catalog.jsonCreate or update the ConfigMap from the generated file. The
--from-filesyntax sets the data key tocatalog.json.kubectl create configmap my-model-costs \ --from-file=catalog.json=./catalog.json \ -n agentgateway-system \ --dry-run=client -o yaml | kubectl apply -f-Review the ConfigMap catalog.
kubectl describe configmap my-model-costs -n agentgateway-systemExample output:
Name: my-model-costs Namespace: agentgateway-system Labels: <none> Annotations: <none> Data ==== catalog.json: ---- { "providers": { "anthropic": { "models": { "claude-3-5-haiku-latest": { "rates": { "input": "0.8", "output": "4", "cacheRead": "0.08", "cacheWrite": "1" } }, ...Reference the ConfigMap from your AgentgatewayParameters resource, as shown in the next section, Configure a catalog as a ConfigMap.
For all options, see the agctl costs import reference.
Step 2: Configure a catalog as a ConfigMap
Create a ConfigMap that holds the catalog JSON. The ConfigMap must be in the same namespace as the Gateway that references it. By default, the catalog is read from the
catalog.jsondata key. If you used theagctl costs importcommand, you already created the ConfigMap.kubectl apply -f- <<EOF apiVersion: v1 kind: ConfigMap metadata: name: my-model-costs namespace: agentgateway-system data: catalog.json: | { "providers": { "openai": { "models": { "gpt-4o-mini": { "rates": { "input": "0.15", "output": "0.6", "cacheRead": "0.075" } } } } } } EOFCreate an AgentgatewayParameters resource that references the ConfigMap as a catalog source. Sources are merged in order, with later sources taking precedence at the model level.
kubectl apply -f- <<EOF apiVersion: agentgateway.dev/v1alpha1 kind: AgentgatewayParameters metadata: name: my-agwp namespace: agentgateway-system spec: modelCatalog: sources: - configMap: name: my-model-costs key: catalog.json EOFAttach the AgentgatewayParameters resource to your Gateway with
infrastructure.parametersRef. Thekeyfield is optional and defaults tocatalog.json.kubectl apply -f- <<EOF apiVersion: gateway.networking.k8s.io/v1 kind: Gateway metadata: name: agentgateway-proxy namespace: agentgateway-system spec: gatewayClassName: agentgateway infrastructure: parametersRef: name: my-agwp group: agentgateway.dev kind: AgentgatewayParameters listeners: - name: http port: 80 protocol: HTTP allowedRoutes: namespaces: from: All EOF
modelCatalog is honored only on a Gateway-level AgentgatewayParameters resource (attached through Gateway.spec.infrastructure.parametersRef). modelCatalog is ignored on a GatewayClass-level AgentgatewayParameters resource, because ConfigMap references are resolved from the Gateway’s deployment namespace.Step 3: Generate traffic
Generate traffic through agentgateway that matches a model entry from the catalog. For example steps, try the LLM getting started.
Step 4: Use cost data in CEL, logs, traces, and metrics
When a request matches an entry in the catalog, agentgateway populates the following CEL fields:
llm.cost: The realized USD cost of the request. Includestotalplus per-token-type components:input,output,cacheRead,cacheWrite,reasoning,inputAudio, andoutputAudio. Unset when the model cannot be priced.llm.costRates: The effective USD-per-1,000,000-token rates that were applied, after tier selection. Unset when the model cannot be priced.
The request access log always includes agw.ai.usage.cost.total for LLM requests (it is 0 when the model cannot be priced). For how to view logs and add cost fields, see Metrics and logs.
Step 5: Monitor catalog lookups
Every cost lookup increments the agentgateway_cost_catalog_lookups_total counter, labeled with the lookup status and the request’s gen_ai_system (provider), gen_ai_request_model, and gen_ai_response_model. Use the lookup to confirm that your catalog prices your traffic.
The status label is one of the following values:
| Status | Meaning |
|---|---|
Exact | The provider and model were found in the catalog and priced. |
Unpriced | The model was found, but the token types in the request had no matching rates. |
Missing | The provider or model was not found in the catalog. |
NoCatalog | No catalog is configured. |
To view the metric, port-forward the proxy and query the metrics endpoint:
Port-forward the gateway proxy.
kubectl port-forward deployment/agentgateway-proxy -n agentgateway-system 15020Query the metrics endpoint.
curl -s http://localhost:15020/metrics | grep agentgateway_cost_catalog_lookups_totalReview the metrics.
agentgateway_cost_catalog_lookups_total{status="NoCatalog",gen_ai_operation_name="chat",gen_ai_system="openai",gen_ai_request_model="gpt-3.5-turbo",gen_ai_response_model="gpt-3.5-turbo-0125",bind="80/agentgateway-system/agentgateway-proxy",gateway="agentgateway-system/agentgateway-proxy",listener="http",route="agentgateway-system/openai",route_rule="unknown"} 1
A rising Missing or Unpriced count means requests are flowing through models that your catalog does not price. Add the missing providers or models to your catalog and update the ConfigMap.