Omni Describer User Guide
Giving a Voice to the Visual World with AI.
It all started with my love for movies. When I realized how many details in my favorite scenes were lost without good audio description, an idea sparked: "Well, couldn't AI make this easier for us?" I dreamed of a tool that wouldn't just generate descriptions but would also give full control to the user. After months of intense work, countless trials, and overcoming many technical hurdles, I developed Omni Describer as the product of that dream.
Table of Contents
- What's in a Name?
- System Requirements
- Getting Started: Setting Up Your API Keys
- Quick Start: Generating Your First Description
- Main Features
- A Deep Dive into Advanced Settings
- Tips and Tricks for the Best Results
- Frequently Asked Questions (FAQ)
- Keyboard Shortcuts
- Acknowledgments, Contact, and Contributors
What's in a Name?
The "Omni" in the name comes from Latin, meaning "all" or "everything." I chose this name because I didn't want the tool to serve just one purpose. Yes, Omni Describer primarily aims to make media accessible for blind and visually impaired individuals by creating audio descriptions. However, its purpose is not limited to that.
This is also an exploration tool. A film critic, a student, an artist, or anyone curious about visual details can use features like "Scene Explorer" or "Ask More" to delve into the layers of a video like never before. Omni Describer is a window to see the world through the "eyes" of AI and understand it differently. In short, it is "a describer for everything, for everyone."
System Requirements
To get the best performance from Omni Describer, I recommend meeting the following minimum system requirements:
- Operating System: Windows 10 or newer (64-bit).
- Memory (RAM): At least 4 GB of RAM.
- Storage: At least 500 MB of free disk space for the application and temporary files.
- Internet Connection: An active internet connection is required to connect to AI services (Google Gemini, OpenAI) and download videos.
- Screen Reader: For full accessibility, a screen reader like JAWS, NVDA, or Windows Narrator is recommended.
Getting Started: Setting Up Your API Keys
Omni Describer uses cloud-based AI services to analyze and voice descriptions. Therefore, you need to enter your own API keys before you can start.
- Open Settings: Go to the File menu and select Settings... (or press Ctrl + ,).
- AI Settings Tab:
- Gemini API Key: This is mandatory for video analysis. Paste your key into the "Gemini API Key:" field. You can get a free API key from Google AI Studio.
- OpenAI API Key (for TTS): This is required for high-quality text-to-speech. Paste your key into this field. You can still use the built-in Windows voices without this key, but OpenAI is recommended for the best results. You can get a key from the OpenAI Platform.
- Save: Click Apply or OK to save your settings. You're now ready to go!
Quick Start: Generating Your First Description
Let's get started! Just follow these simple steps:
- Choose a Video: Click a button like "Local Video File" on the main window or select your video source from the File menu.
- Select a Prompt (Optional): The dropdown menu lists pre-made instructions that guide the AI. For your first try, "Standard Description" is a great starting point.
- Start Processing: The application will now begin analyzing your video. You can follow the progress in the "Status Log" at the bottom of the window. This may take a few minutes, depending on the length of the video.
When the process is complete, the Described Video Player will open automatically, and you can start enjoying your newly described video!
Main Features
The Described Video Player
This is your personal, described movie theater. As the video plays normally, your installed screen reader (like JAWS or NVDA) will read the generated audio descriptions at the correct moments.
- Playback Controls: Use the Play/Pause, Rewind/Forward buttons, or the seek slider to navigate the video.
- Current Audio Description Area: You can follow the text of the currently active description here.
- Edit Descriptions: If a description is inaccurate, poorly timed, or you want to remove it, click the "Edit Descriptions..." button to easily correct or delete it.
- AI Token Usage: This area shows you how many AI "tokens" were used during the process, helping you keep track of your API usage.
Managing Prompt Presets
Prompts are powerful instructions that determine what the AI focuses on. By changing the prompt, you can get descriptions in vastly different styles.
- Selecting a Preset: Before processing a video, choose a preset from the dropdown menu on the main window.
- Managing Presets: Go to File -> Manage Prompt Presets.... Here you can Add, Edit, or Delete your own custom prompts. This is perfect for saving instructions you use frequently.
- Language-Specific: Your prompt presets are saved separately for each language you select in the Settings.
Ask More About the Scene
Ever wonder what a character is holding or what a sign in the background says? This feature lets you ask anything that comes to mind about the scene.
- Pause the video at the moment you're curious about.
- Click the Ask More... button.
- Type your question in the "Your New Question:" field (e.g., "What color is the woman's hat?" or "What does the writing on the wall say?").
- Select how many seconds of video the AI should analyze, starting from the cursor's current position.
- Click "Submit Question." The AI's answer will appear in the "Conversation History" area.
Scene Explorer
Scene Explorer is an interactive way to understand the spatial layout of a scene. It puts you in a virtual room that you can navigate with your keyboard.
- Pause the video on a scene you want to explore in detail.
- Click the Explore Scene... button, then click "Analyze Scene".
You are now in the Scene Explorer. Use your keyboard to explore:
- Arrow Keys: Move your virtual position on a grid.
- D: Provides a detailed description of the overall scene layout.
- L: Announces a list of all objects in the scene.
- Shift + L: Switch to "Jump Mode" to select an object and go directly to it.
- Enter: Get a detailed description of the nearest object.
- Escape: Close the Scene Explorer.
Exporting Your Work
Once you're happy with your descriptions, you can export them from the Player Window in different formats:
- Export to .TXT: A simple text file with timestamps.
- Export to .SRT: A standard subtitle file you can use in video players like VLC.
- Export Audio (MP3): This is perhaps the most exciting feature. It voices your descriptions with the voice you selected in Settings (SAPI5 or OpenAI), mixes it with the original video audio, and automatically lowers the background sound during descriptions to create an MP3 file you can listen to anywhere.
A Deep Dive into Advanced Settings
The Settings window (Ctrl + ,) gives you fine-grained control over Omni Describer's behavior.
AI Settings Tab
- Frame Rate for AI Analysis: Determines the number of frames per second sent to the AI. A lower value (e.g., 5 FPS) can reduce API costs but introduces a chance of missing very fast actions.
- Send Video Only (No Audio) to AI: This is a useful option to prevent audio from the video (dialogue, music, sound effects) from confusing the AI. Instead of trying to describe an explosion it hears, the AI will focus only on what's happening visually.
- Disable Safety Filters (Use with caution): This option may allow the AI to process content and generate descriptions that it might normally flag as sensitive. However, this is not an absolute override. The output is still subject to Google's core safety policies, and there is no guarantee that all filters will be bypassed. Please remember that you are responsible for how you use this feature.
Audio Output Tab
- Text-to-Speech Engine:
- SAPI5 (Windows Built-in): Uses SAPI5-compliant voices that come with Windows or that you have installed. It does not require an additional API key. The audio quality will vary depending on the quality of the voices installed on your system.
- OpenAI TTS (High Quality): Generally produces more natural and fluent-sounding voices. Using this option requires an OpenAI API key and a payment method associated with your account.
Tips and Tricks for the Best Results
Creating great audio descriptions is an art. While AI is an effective assistant in this art, you'll get the best results when you guide it correctly.
The Power of Prompts: Your Director's Notes
When (and How) to Use a Prompt
Much of the time, the AI can produce excellent results with no special prompt, relying only on its core rules. I recommend using this feature only when you have a specific goal in mind.
In a video with many characters where names are important, the AI can sometimes be too hesitant to use a name. To prioritize name tracking, you can create a custom prompt:
For this video, your highest priority is to identify and use the correct character names as soon as they are spoken in the dialogue. This is more important than being overly concise. While focusing on this, try to adhere to all other system rules as best you can.
In visually rich films where the atmosphere is key, you can guide the AI to focus on the environment:
Focus on describing the setting, atmosphere, and environmental details. To create a rich visual world, mention the lighting, colors, and the overall mood of the scene. Focus less on minor character movements unless they are critical.
What to Avoid in Prompts
For best results, it's important to avoid instructions that contradict the AI's core principles. Since the AI always tries to follow instructions, giving it a flawed one can cause it to misinterpret the video.
- Bad Prompt Example:
"Tell me everything that happens."
This can make the AI overly insensitive, causing it to focus on unimportant details like "(character is talking)" instead of meaningful action. - Bad Prompt Example:
"Tell me what the characters are saying."
This can cause it to violate the "describe only visuals" rule and describe dialogue, like "the character said to take this."
In short: Use prompts not to change the fundamental rules of good audio description, but to guide the AI on a specific focus.
Frequently Asked Questions (FAQ)
Q: Are my API keys secure?
A: Yes. Your keys are stored only on your computer and are never shared with anyone except to connect to the Google/OpenAI services.
Q: Why does generating descriptions take so long?
A: The time depends on the length of your video, your internet speed (for uploading the video to the AI), the frame rate you've selected, and the current load on the AI services. Using the "Enable Video Chunking" feature is highly recommended for long videos.
Q: Why didn't the AI describe something I saw on screen?
A: The AI is trained to prefer silence over making a mistake or talking over dialogue. You can use the "Ask More..." feature to inquire about specific moments or select the "Detailed" verbosity level in Settings.
Keyboard Shortcuts
- Ctrl + O: Open Local Video
- Ctrl + U: Open from Direct URL
- Ctrl + Y: Open from YouTube
- Ctrl + ,: Open Settings
- F1: View Help
Acknowledgments, Contact, and Contributors
Thank you so much for using Omni Describer! This application is a reflection of my desire to make visual media more accessible and enjoyable for everyone. Having users like you use this tool and provide feedback is the greatest motivation to continue developing it.
Feedback and Support
Do you have a question, a bug report, or an idea for a new feature? I would love to hear from you! The best way to reach me is by email. Your feedback is invaluable for making Omni Describer even better.
- Email: info [at] audioses [dot] com. (Please replace '[at]' with '@' and '[dot]' with '.' when sending.)
- Website: audioses.com