Pipeline (advanced)

Prefer to drive each step yourself? The granular endpoints expose the full pipeline. Most users should use Reframe a video instead.

Endpoints

Method	Path	Purpose
POST	`/api/v2/ingest`	Download a video, returns `video_id`
POST	`/api/v2/analyze`	Run a model (`scene_detection`, `face_tracking_yolov8`, `active_speaker`, `transcription`, `diarization`)
POST	`/api/v2/transform`	Apply a transform (`reframe`, `reframe_llm`)
GET	`/api/v2/status/{video_id}`	Poll the most recent step for a video
GET	`/api/v2/download/{video_id}`	Download an output (`?suffix=_reframe`)

Step order

Run the steps in sequence, polling /status/{video_id} to completed before starting the next:

Bash

# 1. Ingest → returns { video_id }
POST /api/v2/ingest        { "url": "https://example.com/video.mp4" }
# 2. Scene detection
POST /api/v2/analyze       { "video_id": "...", "model_type": "scene_detection" }
# 3. Face tracking
POST /api/v2/analyze       { "video_id": "...", "model_type": "face_tracking_yolov8" }
# 4. Active speaker
POST /api/v2/analyze       { "video_id": "...", "model_type": "active_speaker",
                             "parameters": { "face_tracking_model": "face_tracking_yolov8" } }
# 5. Reframe
POST /api/v2/transform     { "video_id": "...", "transform_type": "reframe",
                             "parameters": { "face_tracking_model": "face_tracking_yolov8" } }
# 6. Download
GET  /api/v2/download/{video_id}?suffix=_reframe

Models & results

Each analysis runs async. Poll /status/{video_id}, and the result is also persisted on the volume. Available models:

Model	Returns
`scene_detection`	Scene boundaries (start/end time per scene)
`face_tracking_yolov8`	Tracked faces with bounding boxes over time
`active_speaker`	Per-track speaking scores (see below)
`transcription`	Word- and segment-level transcript
`diarization`	Speaker-labelled segments (who spoke when)

Active speaker results are track-based, not frame-by-frame:

JSON

{
  "model": "active_speaker",
  "video_fps": 25.0,
  "total_frames": 4432,
  "track_count": 2,
  "tracks": [
    {
      "face_id": 1,
      "first_seen": 0.0,
      "last_seen": 12.4,
      "speaking_fraction": 0.62,
      "scores": [
        { "timestamp": 0.0, "frame_number": 0, "score": 0.92, "speaking": true }
      ]
    }
  ]
}

Billing

With the granular flow you are charged when you download the output: 1 credit per second of source video (rounded down), once per video.