Video Processing

When you upload a video to Keep'em, the platform runs an automated processing pipeline that prepares your content for interactive viewing. No manual configuration required.

The Pipeline

Audio Extraction

The audio track is extracted from your video at high quality (16kHz mono). This audio is used for transcription. The original video remains untouched for playback.

Transcription

The extracted audio is transcribed to text with precise timestamp alignment. The transcription captures every spoken word and maps it to exact positions in the video.

This transcript serves multiple purposes: it's the primary knowledge source for AI chat, it generates subtitles, and it's used for chapter detection.

Chapter Generation

Keep'em analyzes the transcript for natural topic transitions and generates chapters automatically. Each chapter gets a title and a start timestamp.

Chapters appear in the video player as navigable markers, letting viewers jump to specific sections. They're especially valuable for longer videos where viewers might want to revisit a particular topic.

You can edit generated chapters, add your own, or remove ones that aren't useful.

Subtitle Generation

WebVTT subtitle files are created from the transcript. These display as text overlays in the video player and improve accessibility for viewers who are deaf or hard of hearing, watching in noisy environments, or viewing in a language they're still learning.

Subtitles also improve SEO — search engines can index the full text of your video content.

Chunking and Embedding

The transcript is split into segments of approximately 30 seconds each. Each chunk retains its start and end timestamps. These chunks are then converted into vector embeddings — mathematical representations of meaning.

When a viewer asks a question, the AI converts their question into the same kind of embedding and searches for the most similar chunks. This semantic search understands meaning, not just keywords. A viewer asking "how do I set up payments?" will match transcript segments about Stripe integration even if the word "payments" never appears.

Processing Time

Processing time depends on video length. As a rough guide:

A 10-minute video typically processes in 2–3 minutes
A 30-minute video typically processes in 5–8 minutes
A 60-minute video typically processes in 10–15 minutes

You can continue configuring your event while processing completes. The event becomes viewable once processing reaches the "Ready" status.

Supported Formats

Keep'em accepts most common video formats including MP4, MOV, WebM, and MKV. For best results, upload in MP4 format with H.264 video and AAC audio.

Maximum upload size depends on your plan's storage allocation.

Video Processing

On this page