naralens/Docs

Pipeline (advanced)

Prefer to drive each step yourself? The granular endpoints expose the full pipeline. Most users should use Reframe a video instead.

Endpoints

MethodPathPurpose
POST/api/v2/ingestDownload a video, returns video_id
POST/api/v2/analyzeRun a model (scene_detection, face_tracking_yolov8, active_speaker, transcription, diarization)
POST/api/v2/transformApply a transform (reframe, reframe_llm)
GET/api/v2/status/{video_id}Poll the most recent step for a video
GET/api/v2/download/{video_id}Download an output (?suffix=_reframe)

Step order

Run the steps in sequence, polling /status/{video_id} to completed before starting the next:

Bash
# 1. Ingest → returns { video_id }
POST /api/v2/ingest        { "url": "https://example.com/video.mp4" }
# 2. Scene detection
POST /api/v2/analyze       { "video_id": "...", "model_type": "scene_detection" }
# 3. Face tracking
POST /api/v2/analyze       { "video_id": "...", "model_type": "face_tracking_yolov8" }
# 4. Active speaker
POST /api/v2/analyze       { "video_id": "...", "model_type": "active_speaker",
                             "parameters": { "face_tracking_model": "face_tracking_yolov8" } }
# 5. Reframe
POST /api/v2/transform     { "video_id": "...", "transform_type": "reframe",
                             "parameters": { "face_tracking_model": "face_tracking_yolov8" } }
# 6. Download
GET  /api/v2/download/{video_id}?suffix=_reframe

Models & results

Each analysis runs async. Poll /status/{video_id}, and the result is also persisted on the volume. Available models:

ModelReturns
scene_detectionScene boundaries (start/end time per scene)
face_tracking_yolov8Tracked faces with bounding boxes over time
active_speakerPer-track speaking scores (see below)
transcriptionWord- and segment-level transcript
diarizationSpeaker-labelled segments (who spoke when)

Active speaker results are track-based, not frame-by-frame:

JSON
{
  "model": "active_speaker",
  "video_fps": 25.0,
  "total_frames": 4432,
  "track_count": 2,
  "tracks": [
    {
      "face_id": 1,
      "first_seen": 0.0,
      "last_seen": 12.4,
      "speaking_fraction": 0.62,
      "scores": [
        { "timestamp": 0.0, "frame_number": 0, "score": 0.92, "speaking": true }
      ]
    }
  ]
}

Billing

With the granular flow you are charged when you download the output: 1 credit per second of source video (rounded down), once per video.