How It Works
Keep'em transforms a standard video file into a fully interactive experience through an automated processing pipeline. Here's what happens at each stage.
1. Upload Your Video
You upload a video file through the dashboard or API. Keep'em accepts most common video formats and handles all transcoding internally.
2. Automatic Processing
Once uploaded, the platform processes your video through several steps — all automatic, no configuration required:
Audio extraction — The audio track is separated from the video for transcription.
Transcription — The full audio is transcribed to text with precise timestamps. This transcript becomes the foundation of the AI's knowledge about your video.
Chapter generation — The system analyzes the transcript for natural topic changes and generates chapters. These appear as navigation markers in the video player, letting viewers jump to specific sections.
Subtitle generation — WebVTT subtitle files are created for accessibility and SEO. Subtitles appear as an overlay on the video player.
Chunking and embedding — The transcript is split into segments (roughly 30 seconds each) and converted into vector embeddings. These embeddings enable the AI to search your content semantically — understanding meaning, not just matching keywords.
This processing typically completes within a few minutes, depending on video length.
3. Add Your Knowledge Base (Optional)
You can upload additional documents — PDFs, help articles, URLs, or plain text — to expand the AI's knowledge beyond the video transcript.
Documents go through a similar pipeline: they're chunked into logical sections, embedded as vectors, and stored alongside your video content. When a viewer asks a question, the AI searches both the transcript and your documents for the best answer.
4. Viewers Watch and Interact
When a viewer accesses your event (via a hosted page or embedded widget), they see a video player with an integrated chat panel. As the video plays:
- Suggested questions appear based on the current video position, prompting engagement.
- Chapters let viewers navigate to topics they care about.
- The chat panel is always available for questions.
When a viewer asks a question, the AI retrieves the most relevant chunks from your transcript and documents, considers the viewer's current position in the video, and generates a context-aware response. This typically takes 2–5 seconds.
5. Human Escalation (When Needed)
If the AI isn't confident in its answer, or if the viewer explicitly asks for a human, the question gets routed to your team. You receive the notification in Slack or via email with full context: what the viewer was watching, their exact question, and what the AI already attempted.
You respond from wherever you are, and the viewer sees a seamless continuation of their chat.
6. Analytics Tell You What to Improve
As viewers interact with your content, Keep'em tracks everything: where they drop off, what questions come up most often, what the AI handles confidently versus what needs human help.
This data closes the loop. If everyone drops off at the six-minute mark, something's wrong with that section. If the same question keeps coming up, add it to the video or improve your documentation. Your content gets better because you understand how people experience it.