Skip to main content

Overview

Conversation replays let you re-execute logged conversations using your own LLM provider API keys. This is useful for:
  • A/B testing: Compare responses from different models or providers
  • Model evaluation: Test how a new model handles existing conversations
  • Debugging: Reproduce and analyze model behavior
  • Regression testing: Verify model updates maintain quality

Prerequisites

Before using replays, you need:
  1. Logged conversations: Conversations must be ingested via the Direct API or OpenLLMetry
  2. Provider API key: Your own API key from OpenAI, Anthropic, or other supported providers

Setting up provider credentials

Provider credentials are stored securely (encrypted at rest) and used to make API calls when replaying conversations.

Adding a credential

  1. Go to Settings in the dashboard
  2. Click the Provider Keys tab
  3. Click Add Provider Key
  4. Enter:
    • Name: A descriptive name (e.g., “Production OpenAI Key”)
    • Provider: Select OpenAI or Anthropic
    • API Key: Your provider API key
  5. Click Add Key
Your API key is encrypted before storage. However, replays will make real API calls using your key, which will incur charges from the provider.

Supported providers

ProviderModels
OpenAIGPT-4o (default), other GPT models
AnthropicClaude Sonnet 4 (default), other Claude models

Running a replay

From the dashboard

  1. Navigate to a conversation in the Conversations view
  2. Click the Replay button in the header
  3. Select a provider credential from the dropdown
  4. Click Start Replay
The replay runs asynchronously. You can view the results once it completes.

Viewing results

After a replay completes, click View Replay to see a side-by-side comparison:
  • Original: The response from the original conversation
  • Replayed: The new response from the replay

How replays work

  1. The system fetches the original conversation messages from storage
  2. User messages are sent to the selected provider using your API key
  3. The new assistant response is captured and stored
  4. Results are displayed alongside the original for comparison
Replays exclude the last assistant message from the input, then re-generate it. This allows you to compare how different models respond to the same conversation context.

API reference

Create a replay

POST /api/tenants/{tenantId}/replays
Request body:
{
  "conversationId": "conv-123",
  "credentialId": "cred-abc"
}
Response:
{
  "id": "replay-xyz",
  "conversationId": "conv-123",
  "credentialId": "cred-abc",
  "status": "PENDING",
  "createdAt": "2024-01-15T10:30:00Z"
}

Get replay status

GET /api/tenants/{tenantId}/replays/{replayId}
Response (completed):
{
  "id": "replay-xyz",
  "conversationId": "conv-123",
  "credentialId": "cred-abc",
  "status": "COMPLETED",
  "originalMessages": [...],
  "replayedMessages": {
    "provider": "openai",
    "model": "gpt-4o",
    "response": {
      "role": "assistant",
      "content": "..."
    },
    "usage": {
      "prompt_tokens": 100,
      "completion_tokens": 50
    }
  },
  "startedAt": "2024-01-15T10:30:01Z",
  "completedAt": "2024-01-15T10:30:05Z"
}

List replays

GET /api/tenants/{tenantId}/replays?conversationId={conversationId}

Replay status

StatusDescription
PENDINGReplay is queued
RUNNINGReplay is in progress
COMPLETEDReplay finished successfully
FAILEDReplay failed (check error field)

Limitations

  • Replays do not support streaming responses
  • Tool use is captured but tools are not re-executed
  • Only the final assistant response is replayed (not intermediate turns)
  • Image inputs are not currently supported in replays

Security

  • Provider credentials are encrypted at rest using AES-256-GCM
  • Only team members with Admin or Owner role can manage credentials
  • Credentials are never exposed in API responses (only ID and metadata)
  • Replay results are stored and visible only to team members