Pipeline (advanced)
Prefer to drive each step yourself? The granular endpoints expose the full pipeline. Most users should use Reframe a video instead.
Endpoints
| Method | Path | Purpose |
|---|---|---|
| POST | /api/v2/ingest | Download a video, returns video_id |
| POST | /api/v2/analyze | Run a model (scene_detection, face_tracking_yolov8, active_speaker, transcription, diarization) |
| POST | /api/v2/transform | Apply a transform (reframe, reframe_llm) |
| GET | /api/v2/status/{video_id} | Poll the most recent step for a video |
| GET | /api/v2/download/{video_id} | Download an output (?suffix=_reframe) |
Step order
Run the steps in sequence, polling /status/{video_id} to completed before starting the next:
Bash
# 1. Ingest → returns { video_id }
POST /api/v2/ingest { "url": "https://example.com/video.mp4" }
# 2. Scene detection
POST /api/v2/analyze { "video_id": "...", "model_type": "scene_detection" }
# 3. Face tracking
POST /api/v2/analyze { "video_id": "...", "model_type": "face_tracking_yolov8" }
# 4. Active speaker
POST /api/v2/analyze { "video_id": "...", "model_type": "active_speaker",
"parameters": { "face_tracking_model": "face_tracking_yolov8" } }
# 5. Reframe
POST /api/v2/transform { "video_id": "...", "transform_type": "reframe",
"parameters": { "face_tracking_model": "face_tracking_yolov8" } }
# 6. Download
GET /api/v2/download/{video_id}?suffix=_reframeModels & results
Each analysis runs async. Poll /status/{video_id}, and the result is also persisted on the volume. Available models:
| Model | Returns |
|---|---|
scene_detection | Scene boundaries (start/end time per scene) |
face_tracking_yolov8 | Tracked faces with bounding boxes over time |
active_speaker | Per-track speaking scores (see below) |
transcription | Word- and segment-level transcript |
diarization | Speaker-labelled segments (who spoke when) |
Active speaker results are track-based, not frame-by-frame:
JSON
{
"model": "active_speaker",
"video_fps": 25.0,
"total_frames": 4432,
"track_count": 2,
"tracks": [
{
"face_id": 1,
"first_seen": 0.0,
"last_seen": 12.4,
"speaking_fraction": 0.62,
"scores": [
{ "timestamp": 0.0, "frame_number": 0, "score": 0.92, "speaking": true }
]
}
]
}Billing
With the granular flow you are charged when you download the output: 1 credit per second of source video (rounded down), once per video.