Model costs

Price LLM requests with a model cost catalog and expose realized USD costs in logs, traces, and metrics.

Agentgateway can compute the realized USD cost of each LLM request when you provide a model cost catalog. With a catalog in place, agentgateway attributes cost per request in access logs, traces, and metrics, and exposes the values to CEL expressions as llm.cost and llm.costRates.

Agentgateway does not ship a built-in catalog. Costs are computed only when you configure one (for example, a catalog that you generate with agctl costs import).

In Kubernetes mode, you deliver the catalog as a ConfigMap and reference it from a Gateway-level AgentgatewayParameters resource.

Step 1: Prepare a catalog

Prepare a catalog by creating your own JSON file or using the agctl costs import command.

Catalog JSON format

A model cost catalog is JSON with the following high-level structure. Field names are camelCase, and unknown fields are rejected.

{
  "providers": {
    "<provider-id>": {
      "models": {
        "<model-name>": {
          "rates": {
            "input": "0.0",
            "output": "0.0",
            "cacheRead": "0.0",
            "cacheWrite": "0.0",
            "reasoning": "0.0",
            "inputAudio": "0.0",
            "outputAudio": "0.0"
          },
          "tiers": [
            {
              "contextOver": 200000,
              "rates": {
                "input": "0.0",
                "output": "0.0"
              }
            }
          ]
        }
      }
    }
  }
}

Key points:

Lookups are by provider id (such as openai, anthropic, or gcp.gemini) and model name (such as gpt-4o-mini).
Rates are strings (exact decimals), in USD per 1,000,000 tokens.
If a rate is omitted, that token type is not priced for the model.
tiers[] is optional. Each tier selects alternate rates when the request context length is over the tier’s contextOver value. Tiers must be ordered by strictly increasing contextOver.

The following minimal example prices two OpenAI models and one tiered Gemini model:

{
  "providers": {
    "openai": {
      "models": {
        "gpt-4o-mini": {
          "rates": { "input": "0.15", "output": "0.6", "cacheRead": "0.075" }
        }
      }
    },
    "gcp.gemini": {
      "models": {
        "gemini-2.5-pro": {
          "rates": { "input": "1.25", "output": "10", "cacheRead": "0.125" },
          "tiers": [
            {
              "contextOver": 200000,
              "rates": { "input": "2.5", "output": "15", "cacheRead": "0.25" }
            }
          ]
        }
      }
    }
  }
}

Generate a catalog with agctl

Use agctl costs import to generate a catalog JSON file, then load it into a ConfigMap.

Generate a catalog from a supported source. By default, agctl costs import imports every provider that the proxy supports from models.dev. To import only a subset of providers, pass a comma-separated list to --providers.
```
agctl costs import --pretty --providers openai,anthropic --out ./catalog.json
```

Create or update the ConfigMap from the generated file. The --from-file syntax sets the data key to catalog.json.

kubectl create configmap my-model-costs \
  --from-file=catalog.json=./catalog.json \
  -n agentgateway-system \
  --dry-run=client -o yaml | kubectl apply -f-

Review the ConfigMap catalog.

kubectl describe configmap my-model-costs -n agentgateway-system

Example output:

Name:         my-model-costs
Namespace:    agentgateway-system
Labels:       <none>
Annotations:  <none>

Data
====
catalog.json:
----
{
  "providers": {
    "anthropic": {
      "models": {
        "claude-3-5-haiku-latest": {
          "rates": {
            "input": "0.8",
            "output": "4",
            "cacheRead": "0.08",
            "cacheWrite": "1"
          }
        },
...

Reference the ConfigMap from your AgentgatewayParameters resource, as shown in the next section, Configure a catalog as a ConfigMap.

For all options, see the agctl costs import reference.

Step 2: Configure a catalog as a ConfigMap

Create a ConfigMap that holds the catalog JSON. The ConfigMap must be in the same namespace as the Gateway that references it. By default, the catalog is read from the catalog.json data key. If you used the agctl costs import command, you already created the ConfigMap.

kubectl apply -f- <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
  name: my-model-costs
  namespace: agentgateway-system
data:
  catalog.json: |
    {
      "providers": {
        "openai": {
          "models": {
            "gpt-4o-mini": {
              "rates": { "input": "0.15", "output": "0.6", "cacheRead": "0.075" }
            }
          }
        }
      }
    }
EOF

Create an AgentgatewayParameters resource that references the ConfigMap as a catalog source. Sources are merged in order, with later sources taking precedence at the model level.

kubectl apply -f- <<EOF
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayParameters
metadata:
  name: my-agwp
  namespace: agentgateway-system
spec:
  modelCatalog:
    sources:
      - configMap:
          name: my-model-costs
          key: catalog.json
EOF

Attach the AgentgatewayParameters resource to your Gateway with infrastructure.parametersRef. The key field is optional and defaults to catalog.json.

kubectl apply -f- <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: agentgateway-proxy
  namespace: agentgateway-system
spec:
  gatewayClassName: agentgateway
  infrastructure:
    parametersRef:
      name: my-agwp
      group: agentgateway.dev
      kind: AgentgatewayParameters
  listeners:
    - name: http
      port: 80
      protocol: HTTP
      allowedRoutes:
        namespaces:
          from: All
EOF

modelCatalog is honored only on a Gateway-level AgentgatewayParameters resource (attached through Gateway.spec.infrastructure.parametersRef). modelCatalog is ignored on a GatewayClass-level AgentgatewayParameters resource, because ConfigMap references are resolved from the Gateway’s deployment namespace.

Step 3: Generate traffic

Generate traffic through agentgateway that matches a model entry from the catalog. For example steps, try the LLM getting started.

Step 4: Use cost data in CEL, logs, traces, and metrics

When a request matches an entry in the catalog, agentgateway populates the following CEL fields:

llm.cost: The realized USD cost of the request. Includes total plus per-token-type components: input, output, cacheRead, cacheWrite, reasoning, inputAudio, and outputAudio. Unset when the model cannot be priced.
llm.costRates: The effective USD-per-1,000,000-token rates that were applied, after tier selection. Unset when the model cannot be priced.

The request access log always includes agw.ai.usage.cost.total for LLM requests (it is 0 when the model cannot be priced). For how to view logs and add cost fields, see Metrics and logs.

Step 5: Monitor catalog lookups

Every cost lookup increments the agentgateway_cost_catalog_lookups_total counter, labeled with the lookup status and the request’s gen_ai_system (provider), gen_ai_request_model, and gen_ai_response_model. Use the lookup to confirm that your catalog prices your traffic.

The status label is one of the following values:

Status	Meaning
`Exact`	The provider and model were found in the catalog and priced.
`Unpriced`	The model was found, but the token types in the request had no matching rates.
`Missing`	The provider or model was not found in the catalog.
`NoCatalog`	No catalog is configured.

To view the metric, port-forward the proxy and query the metrics endpoint:

Port-forward the gateway proxy.

kubectl port-forward deployment/agentgateway-proxy -n agentgateway-system 15020

Query the metrics endpoint.

curl -s http://localhost:15020/metrics | grep agentgateway_cost_catalog_lookups_total

Review the metrics.

agentgateway_cost_catalog_lookups_total{status="NoCatalog",gen_ai_operation_name="chat",gen_ai_system="openai",gen_ai_request_model="gpt-3.5-turbo",gen_ai_response_model="gpt-3.5-turbo-0125",bind="80/agentgateway-system/agentgateway-proxy",gateway="agentgateway-system/agentgateway-proxy",listener="http",route="agentgateway-system/openai",route_rule="unknown"} 1

A rising Missing or Unpriced count means requests are flowing through models that your catalog does not price. Add the missing providers or models to your catalog and update the ConfigMap.

Rate limiting for LLMs LLM cost tracking

Was this page helpful?