Omni Describer User Guide
Giving a Voice to the Visual World with AI.
It all started with my love for movies. When I realized how many details in my favorite scenes were lost without good audio description, an idea sparked: "Well, couldn't AI make this easier for us?" I dreamed of a tool that wouldn't just generate descriptions but would also give full control to the user. After months of intense work, countless trials, and overcoming many technical hurdles, I developed Omni Describer as the product of that dream.
Table of Contents
What's in a Name?
The "Omni" in the name comes from Latin, meaning "all" or "everything." I chose this name because I didn't want the tool to serve just one purpose. Yes, Omni Describer primarily aims to make media accessible for blind and visually impaired individuals by creating audio descriptions. However, its purpose is not limited to that.
This is also an exploration tool. A film critic, a student, an artist, or anyone curious about visual details can use features like "Scene Explorer" or "Ask More" to delve into the layers of a video like never before. Omni Describer is a window to see the world through the "eyes" of AI and understand it differently. In short, it is "a describer for everything, for everyone."
System Requirements
To get the best performance from Omni Describer, I recommend meeting the following minimum system requirements:
- Operating System: Windows 10 or newer (64-bit).
- Memory (RAM): At least 4 GB of RAM.
- Storage: At least 500 MB of free disk space for the application and temporary files.
- Internet Connection: An active internet connection is required to connect to AI services (Google Gemini, OpenAI) and download videos.
- Screen Reader: For full accessibility, a screen reader like JAWS, NVDA, or Windows Narrator is recommended.
Getting Started: Setting Up Your API Keys
Omni Describer uses cloud-based AI services to analyze and voice descriptions. Therefore, you need to enter your own API keys before you can start.
- Open Settings: Go to the File menu and select Settings... (or press Ctrl + ,).
- AI Settings Tab:
- Gemini API Key: This is mandatory for video analysis. Paste your key into the "Gemini API Key:" field. You can get a free API key from Google AI Studio.
- OpenAI API Key (for TTS): This is required for high-quality text-to-speech. Paste your key into this field. You can still use the built-in Windows voices without this key, but OpenAI is recommended for the best results. You can get a key from the OpenAI Platform.
- Save: Click Apply or OK to save your settings. You're now ready to go!
Secure Storage: Your API keys are encrypted and stored securely on your computer in the application's settings file. They are never sent anywhere else except to connect directly to the respective AI services.
Quick Start: Generating Your First Description
Let's get started! Just follow these simple steps:
- Choose a Video: Click a button like "Local Video File" on the main window or select your video source from the File menu.
- Select a Prompt (Optional): The dropdown menu lists pre-made instructions that guide the AI. For your first try, leaving it at "(No Preset Selected)" is a great starting point.
- Start Processing: The application will now begin analyzing your video. You can follow the progress in the "Status Log" at the bottom of the window. This may take a few minutes, depending on the length of the video.
When the process is complete, the Described Video Player will open automatically, and you can start enjoying your newly described video!
Main Features
The Described Video Player
This is your personal, described movie theater. As the video plays, your installed screen reader (like JAWS or NVDA) will read the generated audio descriptions aloud at the correct moments.
- Playback Controls: Use the Play/Pause, Rewind/Forward buttons, or the seek slider to navigate the video.
- Current Description Area: You can follow the text of the currently active description here.
- Edit and Refine: Click the "Edit Descriptions..." button to open a powerful editor for full control over your project.
- AI Token Usage: This area shows you how many AI "tokens" were used, helping you keep track of your API usage.
Editing Descriptions: The Power to Refine
The AI provides a fantastic starting point, but true quality comes from refinement. The "Edit Descriptions..." button in the player opens a powerful editor that gives you full control over every aspect of your project.
- Select and Modify: Choose any description from the dropdown list at the top to load its details into the editor.
- Adjust Timestamps: Directly edit the
Start Time and End Time fields. When you change the start time, the end time automatically shifts to preserve the description's duration. All times are validated to ensure they are within the bounds of your video.
- Rewrite Text: Freely edit the description text in the main text box to improve clarity, add detail, or correct inaccuracies.
- Add New Descriptions: Click the "Add New..." button to open a simple dialog where you can create a brand new description from scratch, setting its time and text.
- Delete Descriptions: Select a description and click the "Delete" button to permanently remove it.
Save Your Work: Changes you make in the editor (adding, deleting, modifying) are instantly applied to your current session. When you are done, simply click the "Close" button to return to the player. Your updated list of descriptions will be used for playback and for any subsequent exports.
Managing Prompt Presets
Prompts are powerful instructions that determine what the AI focuses on. By changing the prompt, you can get descriptions in vastly different styles.
- Selecting a Preset: Before processing a video, choose a preset from the dropdown menu on the main window.
- Managing Presets: Go to File -> Manage Prompt Presets.... Here you can Add, Edit, or Delete your own custom prompts. This is perfect for saving instructions you use frequently.
- Language-Specific: Your prompt presets are saved separately for each language you select in the Settings.
Ask More About the Scene
Ever wonder what a character is holding or what a sign in the background says? This feature lets you ask anything that comes to mind about the scene.
- Pause the video at the moment you're curious about.
- Click the Ask More... button.
- Type your question in the "Your New Question:" field (e.g., "What color is the woman's hat?" or "What does the writing on the wall say?").
- Select how many seconds of video the AI should analyze around the current time.
- Click "Submit Question." The AI's answer will appear in the "Conversation History" area.
Scene Explorer
Scene Explorer is an interactive way to understand the spatial layout of a scene. It puts you in a virtual room that you can navigate with your keyboard.
- Pause the video on a scene you want to explore in detail.
- Click the Explore Scene... button, then click "Analyze Scene".
You are now in the Scene Explorer. Use your keyboard to explore:
- Arrow Keys: Move your virtual position on a grid.
- D: Provides a detailed description of the overall scene layout.
- L: Announces a list of all objects in the scene.
- Shift + L: Switch to "Jump Mode" to select an object and go directly to it.
- Enter: Get a detailed description of the nearest object.
- Escape: Close the Scene Explorer.
Exporting Your Work
Once you're happy with your descriptions, you can export them from the Player Window in different formats:
- Export to .TXT: A simple text file with timestamps.
- Export to .SRT: A standard subtitle file you can use in video players like VLC.
- Export Audio (MP3): This is perhaps the most exciting feature. It voices your descriptions with the voice you selected in Settings (SAPI5 or OpenAI), mixes it with the original video audio, and automatically lowers the background sound during descriptions to create an MP3 file you can listen to anywhere.
A Deep Dive into Settings
The Settings window (Ctrl + ,) gives you fine-grained control over Omni Describer's behavior.
General Tab
- Allow descriptions to interrupt current speech: When checked, a new description will start speaking immediately, even if the previous one hasn't finished. Uncheck this to let each description finish completely before the next one starts. Useful for very fast-paced scenes.
AI Settings Tab
- Frame Rate for AI Analysis: Determines how many video frames per second are sent to the AI. A lower value (e.g., 5 FPS) can significantly reduce API costs and help avoid processing limits, but may cause the AI to miss very fast actions.
- Enable Video Chunking: For long videos (over 10-15 minutes), the AI can sometimes run out of processing capacity. This feature automatically splits the video into smaller parts, analyzes them sequentially, and stitches the results together. It is highly recommended for long-form content.
- Disable Safety Filters (Use with caution): This option may allow the AI to process content that it might normally flag as sensitive. However, this is not an absolute override. The output is still subject to the AI provider's core safety policies. Please remember that you are responsible for how you use this feature.
Audio Output Tab
- Text-to-Speech Engine:
- SAPI5 (Windows Built-in): Uses voices that come with Windows or that you have installed. It does not require an additional API key. The app can access both modern 64-bit voices and legacy 32-bit voices. If you have older, favorite voices from past systems, choosing the "SAPI5 (32-bit Voices)" option will allow you to use them.
- OpenAI TTS (High Quality): Produces more natural and fluent-sounding voices. Using this option requires a paid OpenAI account and API key. You can create and manage custom voice presets for OpenAI.
Tips and Tricks for the Best Results
Creating great audio descriptions is an art. While AI is an effective assistant, you'll get the best results when you guide it correctly.
The Power of Prompts: Your Director's Notes
The application has a set of core rules it teaches the AI (like not talking over dialogue). Think of the Prompt Preset area on the main screen as the place where you provide your director's notes for that specific video. A good note helps the AI focus on a particular style or detail.
Tip: The "Focus on Names" Prompt
In a video with many characters, you can create a custom prompt to prioritize name tracking:
Your highest priority for this video is to identify and use the correct character names as soon as they are spoken.
Tip: The "Describe the Atmosphere" Prompt
For visually rich films, guide the AI to focus on the environment:
Focus on describing the setting, atmosphere, and environmental details. Mention the lighting, colors, and the overall mood of the scene.
What to Avoid in Prompts
Avoid instructions that contradict the AI's core principles (describing only visuals, not speaking over dialogue). Giving it a flawed instruction can cause poor results.
- Bad Prompt:
"Tell me everything that happens." This is too vague and can cause the AI to describe unimportant details.
- Bad Prompt:
"Tell me what the characters are saying." This will cause it to violate the "visuals only" rule.
In short: Use prompts to guide the AI's focus, not to change the fundamental rules of good audio description.
Frequently Asked Questions (FAQ)
Q: My API keys are not working. What should I do?
A: First, double-check that you have copied the entire key correctly. For OpenAI, ensure you have set up a payment method in your account, as their TTS service is not free. For Gemini, ensure the API is enabled in your Google Cloud project.
Q: The process failed with a "MAX_TOKENS" error. What does this mean?
A: This means your video is too long or visually complex for the AI to process in a single pass. This is a capacity limit, not a content error. The best solution is to go to Settings -> AI Settings and enable "Video Chunking". This will automatically split the video into smaller, more manageable parts for the AI.
Q: The AI failed to generate descriptions because of "Safety Filters". What can I do?
A: This is a known issue where the AI's safety system can be overly cautious. You have a few options to try, in order: 1) Go to Settings -> AI Settings and enable "Disable Safety Filters" (this often helps). 2) If that fails, try reducing the "Frame Rate for AI Analysis" to send less data to the AI. 3) As a last resort, you can use the "Gemini Model Override" setting and enter gemini-2.5-pro. This model is more powerful and may handle sensitive content better, but it is slower and may be more expensive if you are on a paid API tier.
Q: Some of my old voices are missing from the SAPI5 list. Where did they go?
A: Modern Windows systems are 64-bit, but many classic, beloved text-to-speech voices were 32-bit. In Settings under the "Audio Output" tab, you will see separate engine choices for "SAPI5 (64-bit)" and "SAPI5 (32-bit)". To access your older voices, simply select the 32-bit engine.
Q: Why does generating descriptions take so long?
A: The time depends on your video's length, your internet speed (for the upload to the AI), the selected frame rate, and the current load on the AI services. Using "Video Chunking" is highly recommended for long videos.
Keyboard Shortcuts
- Ctrl + O: Open Local Video
- Ctrl + U: Open from Direct URL
- Ctrl + Y: Open from YouTube
- Ctrl + ,: Open Settings
- F1: View Help
Thank you so much for using Omni Describer! This application is a reflection of my desire to make visual media more accessible and enjoyable for everyone. Having users like you use this tool and provide feedback is the greatest motivation to continue developing it.
Feedback and Support
Do you have a question, a bug report, or an idea for a new feature? I would love to hear from you! The best way to reach me is by email. Your feedback is invaluable for making Omni Describer even better.