Voice Call Transcripts (Raw)

Overview

The vogent-raw endpoint accepts raw voice call transcript data and normalizes it into Moda’s conversation format. Unlike the call transcript format in the Direct API (which expects pre-normalized {role, content} turns), this endpoint handles raw transcript data with:

Speaker turns with timing data (start/end timestamps in milliseconds)
IVR navigation markers (<|press:1|>, <|silence|>, <|hangup|>)
Embedded function calls within transcript entries
Detail types for function call responses

Each call is fan-out processed: one conversation log per spoken utterance, with action markers filtered and function calls extracted into separate entries. The full unfiltered transcript is preserved in a content block on the first message.

Endpoint

POST https://moda-ingest.modas.workers.dev/v1/ingest/vogent-raw

Authentication

Include your Moda API key in the Authorization header:

-H "Authorization: Bearer YOUR_MODA_API_KEY"

Request Format

{
  "environment": "production",
  "events": [
    {
      "id": "call-abc-123",
      "conversationId": "session-456",
      "userId": "user-789",
      "organizationId": "org-001",
      "callType": "phone",
      "transcript": [
        {
          "text": "Thank you for calling. How can I help?",
          "speaker": "AI",
          "startTimeMs": 1000,
          "endTimeMs": 3500
        },
        {
          "text": "I need to check my account balance.",
          "speaker": "HUMAN",
          "startTimeMs": 4000,
          "endTimeMs": 6200
        },
        {
          "text": "<|silence|>",
          "speaker": "AI",
          "startTimeMs": 6200,
          "endTimeMs": 7000
        },
        {
          "text": "",
          "speaker": "AI",
          "startTimeMs": 7000,
          "endTimeMs": 7500,
          "functionCalls": [
            {
              "name": "lookup_account",
              "arguments": { "user_id": "user-789" }
            }
          ]
        },
        {
          "text": "Your current balance is $1,234.56.",
          "speaker": "AI",
          "startTimeMs": 8000,
          "endTimeMs": 10500
        }
      ]
    }
  ]
}

Event Fields

Required Fields

Field	Type	Description
`id`	string	Unique identifier for this call event from your telephony system (e.g., Vogent dial ID). Used as the base for generating per-utterance `trace_id` values (`{id}_0`, `{id}_1`, etc.)
`conversationId`	string	Groups all utterances from this call into a single conversation in Moda. Maps to `conversation_id` in the database. Use the same value across related calls if they belong to the same session
`transcript`	array	Array of transcript entries (see below)

Your tenant/organization ID is not included in the request body. It is automatically derived from your API key.

Optional Fields

Field	Type	Description
`userId`	string	Identifier for the end user on the call (e.g., the customer’s phone number or account ID). Maps to `user_id` in the database
`organizationId`	string	Organization identifier for multi-tenant scenarios
`callType`	string	Type of call (e.g., `phone`, `video`, `voice`)

Transcript Entry Fields

Field	Type	Required	Description
`text`	string	Yes	Spoken text, or an action marker (e.g., `<\|silence\|>`)
`speaker`	string	Yes	Must be `"AI"` or `"HUMAN"`
`startTimeMs`	number	No	Start time in milliseconds
`endTimeMs`	number	No	End time in milliseconds
`detailType`	string	No	Entry type (e.g., `"function"` for function call responses)
`functionCalls`	array	No	Embedded function calls (see below)
`functionCallId`	string	No	ID linking to a function call

Function Call Fields

Field	Type	Description
`name`	string	Function/tool name
`arguments`	object	Arguments passed to the function

Processing Behavior

Speaker Mapping

Transcript Speaker	Moda Role	`is_client`
`HUMAN`	`user`	`true`
`AI`	`assistant`	`false`

Action Marker Filtering

The following action markers are filtered from the per-utterance fan-out but preserved in the full transcript content block:

Marker	Description
`<\|silence\|>`	Silence period
`<\|press:N\|>`	IVR keypress (e.g., `<\|press:1\|>`)
`<\|hangup\|>`	Call termination

Function Call Extraction

Transcript entries with functionCalls are extracted into separate conversation log entries with:

message_source: "tool_call"
A vogent_tool_call content block containing the tool name and arguments
has_tool_use: true

Entries with empty text and function calls are not emitted as utterances (only the tool call log is created). Entries with detailType: "function" (function call responses) are also skipped from the utterance fan-out.

Duration Computation

Call duration is automatically computed from the transcript timing data (startTimeMs and endTimeMs) and stored on the first utterance. Duration is calculated as (maxEndTimeMs - minStartTimeMs) / 1000 in seconds.

Content Block

The first emitted utterance includes a vogent_call_transcript content block containing:

The full unfiltered transcript (including action markers)
The call ID
All function calls aggregated from the transcript

Batch Ingestion

Send multiple calls in a single request:

{
  "events": [
    {
      "id": "call-1",
      "conversationId": "session-1",
      "transcript": [...]
    },
    {
      "id": "call-2",
      "conversationId": "session-2",
      "transcript": [...]
    }
  ]
}

Examples

curl https://moda-ingest.modas.workers.dev/v1/ingest/vogent-raw \
  -H "Authorization: Bearer YOUR_MODA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "events": [
      {
        "id": "call-001",
        "conversationId": "session-001",
        "organizationId": "org-456",
        "callType": "phone",
        "transcript": [
          {
            "text": "Hi, I need help with my order.",
            "speaker": "HUMAN",
            "startTimeMs": 1000,
            "endTimeMs": 3000
          },
          {
            "text": "Of course! Can you give me your order number?",
            "speaker": "AI",
            "startTimeMs": 3500,
            "endTimeMs": 6000
          }
        ]
      }
    ]
  }'

Response

Success Response

{
  "success": true,
  "count": 2,
  "requestId": "550e8400-e29b-41d4-a716-446655440000",
  "details": {
    "calls": 1,
    "utterances": 2,
    "function_calls": 0
  }
}

Field	Type	Description
`success`	boolean	Whether the request succeeded
`count`	number	Total conversation logs created (utterances + function calls)
`requestId`	string	Unique request ID for debugging
`details.calls`	number	Number of calls processed
`details.utterances`	number	Number of spoken utterances (excluding filtered markers)
`details.function_calls`	number	Number of function call entries extracted

Error Response

{
  "success": false,
  "count": 0,
  "message": "Event 0: Missing or invalid 'id' field",
  "requestId": "550e8400-e29b-41d4-a716-446655440000"
}

Validation

The endpoint validates:

Each event has a non-empty id and conversationId
Each event has a non-empty transcript array
Each transcript entry has a text field
Each transcript entry has a speaker of "AI" or "HUMAN"

Batch Limits

Limit	Value
Max events per request	1,000

Error Handling

Status	Meaning	Retryable
200	Success	-
400	Invalid request format or validation error	No
401	Invalid or missing API key	No
503	Service temporarily unavailable	Yes

For 503 errors, use exponential backoff when retrying. Start with 1 second and double each retry, up to a maximum of 30 seconds.

Getting Started

Ingestion

Frameworks

Data API

Voice Call Transcripts (Raw)

Overview

Endpoint

Authentication

Request Format

Event Fields

Required Fields

Optional Fields

Transcript Entry Fields

Function Call Fields

Processing Behavior

Speaker Mapping

Action Marker Filtering

Function Call Extraction

Duration Computation

Content Block

Batch Ingestion

Examples

Response

Success Response

Error Response

Validation

Batch Limits

Error Handling

Getting Started

Ingestion

Frameworks

Data API

​Overview

​Endpoint

​Authentication

​Request Format

​Event Fields

​Required Fields

​Optional Fields

​Transcript Entry Fields

​Function Call Fields

​Processing Behavior

​Speaker Mapping

​Action Marker Filtering

​Function Call Extraction

​Duration Computation

​Content Block

​Batch Ingestion

​Examples

​Response

​Success Response

​Error Response

​Validation

​Batch Limits

​Error Handling

Overview

Endpoint

Authentication

Request Format

Event Fields

Required Fields

Optional Fields

Transcript Entry Fields

Function Call Fields

Processing Behavior

Speaker Mapping

Action Marker Filtering

Function Call Extraction

Duration Computation

Content Block

Batch Ingestion

Examples

Response

Success Response

Error Response

Validation

Batch Limits

Error Handling