Omni Describer - User Guide

Omni Describer User Guide

Giving a Voice to the Visual World with AI.

It all started with my love for movies. When I realized how many details in my favorite scenes were lost without good audio description, an idea sparked: "Well, couldn't AI make this easier for us?" I dreamed of a tool that wouldn't just generate descriptions but would also give full control to the user. After months of intense work, countless trials, and overcoming many technical hurdles, I developed Omni Describer as the product of that dream.

What's in a Name?
System Requirements
Getting Started: Setting Up Your API Keys
Quick Start: Generating Your First Description
Main Features
A Deep Dive into Advanced Settings
Tips and Tricks for the Best Results
Frequently Asked Questions (FAQ)
Keyboard Shortcuts
Acknowledgments, Contact, and Contributors

What's in a Name?

The "Omni" in the name comes from Latin, meaning "all" or "everything." I chose this name because I didn't want the tool to serve just one purpose. Yes, Omni Describer primarily aims to make media accessible for blind and visually impaired individuals by creating audio descriptions. However, its purpose is not limited to that.

This is also an exploration tool. A film critic, a student, an artist, or anyone curious about visual details can use features like "Scene Explorer" or "Ask More" to delve into the layers of a video like never before. Omni Describer is a window to see the world through the "eyes" of AI and understand it differently. In short, it is "a describer for everything, for everyone."

System Requirements

To get the best performance from Omni Describer, I recommend meeting the following minimum system requirements:

Operating System: Windows 10 or newer (64-bit).
Memory (RAM): At least 4 GB of RAM.
Storage: At least 500 MB of free disk space for the application and temporary files.
Internet Connection: An active internet connection is required to connect to AI services (Google Gemini, OpenAI) and download videos.
Screen Reader: For full accessibility, a screen reader like JAWS, NVDA, or Windows Narrator is recommended.

Getting Started: Setting Up Your API Keys

Omni Describer uses cloud-based AI services to analyze and voice descriptions. Therefore, you need to enter your own API keys before you can start.

Open Settings: Go to the File menu and select Settings... (or press Ctrl + ,).
AI Settings Tab:
- Gemini API Key: This is mandatory for video analysis. Paste your key into the "Gemini API Key:" field. You can get a free API key from Google AI Studio.
- OpenAI API Key (for TTS): This is required for high-quality text-to-speech. Paste your key into this field. You can still use the built-in Windows voices without this key, but OpenAI is recommended for the best results. You can get a key from the OpenAI Platform.
Save: Click Apply or OK to save your settings. You're now ready to go!

Please Note: Your API keys are stored securely on your computer in the application's settings file and are never sent anywhere else except to connect to the respective AI services.

Quick Start: Generating Your First Description

Let's get started! Just follow these simple steps:

Choose a Video: Click a button like "Local Video File" on the main window or select your video source from the File menu.
Select a Prompt (Optional): The dropdown menu lists pre-made instructions that guide the AI. For your first try, "Standard Description" is a great starting point.
Start Processing: The application will now begin analyzing your video. You can follow the progress in the "Status Log" at the bottom of the window. This may take a few minutes, depending on the length of the video.

When the process is complete, the Described Video Player will open automatically, and you can start enjoying your newly described video!

Main Features

The Described Video Player

This is your personal, described movie theater. As the video plays normally, your installed screen reader (like JAWS or NVDA) will read the generated audio descriptions at the correct moments.

Playback Controls: Use the Play/Pause, Rewind/Forward buttons, or the seek slider to navigate the video.
Current Audio Description Area: You can follow the text of the currently active description here.
Edit Descriptions: If a description is inaccurate, poorly timed, or you want to remove it, click the "Edit Descriptions..." button to easily correct or delete it.
AI Token Usage: This area shows you how many AI "tokens" were used during the process, helping you keep track of your API usage.

Managing Prompt Presets

Prompts are powerful instructions that determine what the AI focuses on. By changing the prompt, you can get descriptions in vastly different styles.

Selecting a Preset: Before processing a video, choose a preset from the dropdown menu on the main window.
Managing Presets: Go to File -> Manage Prompt Presets.... Here you can Add, Edit, or Delete your own custom prompts. This is perfect for saving instructions you use frequently.
Language-Specific: Your prompt presets are saved separately for each language you select in the Settings.

Ask More About the Scene

Ever wonder what a character is holding or what a sign in the background says? This feature lets you ask anything that comes to mind about the scene.

Pause the video at the moment you're curious about.
Click the Ask More... button.
Type your question in the "Your New Question:" field (e.g., "What color is the woman's hat?" or "What does the writing on the wall say?").
Select how many seconds of video the AI should analyze, starting from the cursor's current position.
Click "Submit Question." The AI's answer will appear in the "Conversation History" area.

Scene Explorer

Scene Explorer is an interactive way to understand the spatial layout of a scene. It puts you in a virtual room that you can navigate with your keyboard.

Pause the video on a scene you want to explore in detail.
Click the Explore Scene... button, then click "Analyze Scene".

You are now in the Scene Explorer. Use your keyboard to explore:

Arrow Keys: Move your virtual position on a grid.
D: Provides a detailed description of the overall scene layout.
L: Announces a list of all objects in the scene.
Shift + L: Switch to "Jump Mode" to select an object and go directly to it.
Enter: Get a detailed description of the nearest object.
Escape: Close the Scene Explorer.

Exporting Your Work

Once you're happy with your descriptions, you can export them from the Player Window in different formats:

Export to .TXT: A simple text file with timestamps.
Export to .SRT: A standard subtitle file you can use in video players like VLC.
Export Audio (MP3): This is perhaps the most exciting feature. It voices your descriptions with the voice you selected in Settings (SAPI5 or OpenAI), mixes it with the original video audio, and automatically lowers the background sound during descriptions to create an MP3 file you can listen to anywhere.

A Deep Dive into Advanced Settings

The Settings window (Ctrl + ,) gives you fine-grained control over Omni Describer's behavior.

AI Settings Tab

Frame Rate for AI Analysis: Determines the number of frames per second sent to the AI. A lower value (e.g., 5 FPS) can reduce API costs but introduces a chance of missing very fast actions.
Send Video Only (No Audio) to AI: This is a useful option to prevent audio from the video (dialogue, music, sound effects) from confusing the AI. Instead of trying to describe an explosion it hears, the AI will focus only on what's happening visually.
Disable Safety Filters (Use with caution): This option may allow the AI to process content and generate descriptions that it might normally flag as sensitive. However, this is not an absolute override. The output is still subject to Google's core safety policies, and there is no guarantee that all filters will be bypassed. Please remember that you are responsible for how you use this feature.

Audio Output Tab

Text-to-Speech Engine:
- SAPI5 (Windows Built-in): Uses SAPI5-compliant voices that come with Windows or that you have installed. It does not require an additional API key. The audio quality will vary depending on the quality of the voices installed on your system.
- OpenAI TTS (High Quality): Generally produces more natural and fluent-sounding voices. Using this option requires an OpenAI API key and a payment method associated with your account.

Tips and Tricks for the Best Results

Creating great audio descriptions is an art. While AI is an effective assistant in this art, you'll get the best results when you guide it correctly.

The Power of Prompts: Your Director's Notes

The application has a set of core rules it teaches the AI (like not talking over dialogue). Think of the Prompt Preset area on the main screen as the place where you provide your director's notes for that specific video. A good note helps the AI focus on a particular style or detail, while a vague one can lead to unexpected results.

When (and How) to Use a Prompt

Much of the time, the AI can produce excellent results with no special prompt, relying only on its core rules. I recommend using this feature only when you have a specific goal in mind.

Tip #1: The "Focus on Names" Prompt
In a video with many characters where names are important, the AI can sometimes be too hesitant to use a name. To prioritize name tracking, you can create a custom prompt:

For this video, your highest priority is to identify and use the correct character names as soon as they are spoken in the dialogue. This is more important than being overly concise. While focusing on this, try to adhere to all other system rules as best you can.

Tip #2: The "Describe the Atmosphere" Prompt
In visually rich films where the atmosphere is key, you can guide the AI to focus on the environment:

Focus on describing the setting, atmosphere, and environmental details. To create a rich visual world, mention the lighting, colors, and the overall mood of the scene. Focus less on minor character movements unless they are critical.

What to Avoid in Prompts

For best results, it's important to avoid instructions that contradict the AI's core principles. Since the AI always tries to follow instructions, giving it a flawed one can cause it to misinterpret the video.

Bad Prompt Example: "Tell me everything that happens." This can make the AI overly insensitive, causing it to focus on unimportant details like "(character is talking)" instead of meaningful action.
Bad Prompt Example: "Tell me what the characters are saying." This can cause it to violate the "describe only visuals" rule and describe dialogue, like "the character said to take this."

In short: Use prompts not to change the fundamental rules of good audio description, but to guide the AI on a specific focus.

Frequently Asked Questions (FAQ)

Q: Are my API keys secure?
A: Yes. Your keys are stored only on your computer and are never shared with anyone except to connect to the Google/OpenAI services.

Q: Why does generating descriptions take so long?
A: The time depends on the length of your video, your internet speed (for uploading the video to the AI), the frame rate you've selected, and the current load on the AI services. Using the "Enable Video Chunking" feature is highly recommended for long videos.

Q: Why didn't the AI describe something I saw on screen?
A: The AI is trained to prefer silence over making a mistake or talking over dialogue. You can use the "Ask More..." feature to inquire about specific moments or select the "Detailed" verbosity level in Settings.

Keyboard Shortcuts

Ctrl + O: Open Local Video
Ctrl + U: Open from Direct URL
Ctrl + Y: Open from YouTube
Ctrl + ,: Open Settings
F1: View Help

Acknowledgments, Contact, and Contributors

Thank you so much for using Omni Describer! This application is a reflection of my desire to make visual media more accessible and enjoyable for everyone. Having users like you use this tool and provide feedback is the greatest motivation to continue developing it.

Feedback and Support

Do you have a question, a bug report, or an idea for a new feature? I would love to hear from you! The best way to reach me is by email. Your feedback is invaluable for making Omni Describer even better.

Email: info [at] audioses [dot] com. (Please replace '[at]' with '@' and '[dot]' with '.' when sending.)
Website: audioses.com

← Back