Omni Describer User Guide

Giving a Voice to the Visual World with AI.

It all started with my love for movies. When I realized how many details in my favorite scenes were lost without good audio description, an idea sparked: "Well, couldn't AI make this easier for us?" I dreamed of a tool that wouldn't just generate descriptions but would also give full control to the user. After months of intense work, countless trials, and overcoming many technical hurdles, I developed Omni Describer as the product of that dream.

Table of Contents

What's in a Name?

The "Omni" in the name comes from Latin, meaning "all" or "everything." I chose this name because I didn't want the tool to serve just one purpose. Yes, Omni Describer primarily aims to make media accessible for blind and visually impaired individuals by creating audio descriptions. However, its purpose is not limited to that.

This is also an exploration tool. A film critic, a student, an artist, or anyone curious about visual details can use features like "Scene Explorer" or "Ask More" to delve into the layers of a video like never before. Omni Describer is a window to see the world through the "eyes" of AI and understand it differently. In short, it is "a describer for everything, for everyone."

System Requirements

To get the best performance from Omni Describer, I recommend meeting the following minimum system requirements:

Getting Started: Setting Up Your API Keys

Omni Describer uses cloud-based AI services to analyze and voice descriptions. Therefore, you need to enter your own API keys before you can start.

  1. Open Settings: Go to the File menu and select Settings... (or press Ctrl + ,).
  2. AI Settings Tab:
    • Gemini API Key: This is mandatory for video analysis. Paste your key into the "Gemini API Key:" field. You can get a free API key from Google AI Studio.
    • OpenAI API Key (for TTS): This is required for high-quality text-to-speech. Paste your key into this field. You can still use the built-in Windows voices without this key, but OpenAI is recommended for the best results. You can get a key from the OpenAI Platform.
  3. Save: Click Apply or OK to save your settings. You're now ready to go!
Secure Storage: Your API keys are encrypted and stored securely on your computer in the application's settings file. They are never sent anywhere else except to connect directly to the respective AI services.

Quick Start: Generating Your First Description

Let's get started! Just follow these simple steps:

  1. Choose a Video: Click a button like "Local Video File" on the main window or select your video source from the File menu.
  2. Select a Prompt (Optional): The dropdown menu lists pre-made instructions that guide the AI. For your first try, leaving it at "(No Preset Selected)" is a great starting point.
  3. Start Processing: The application will now begin analyzing your video. You can follow the progress in the "Status Log" at the bottom of the window. This may take a few minutes, depending on the length of the video.

When the process is complete, the Described Video Player will open automatically, and you can start enjoying your newly described video!

Main Features

The Described Video Player

This is your personal, described movie theater. As the video plays, your installed screen reader (like JAWS or NVDA) will read the generated audio descriptions aloud at the correct moments.

Editing Descriptions: The Power to Refine

The AI provides a fantastic starting point, but true quality comes from refinement. The "Edit Descriptions..." button in the player opens a powerful editor that gives you full control over every aspect of your project.

Save Your Work: Changes you make in the editor (adding, deleting, modifying) are instantly applied to your current session. When you are done, simply click the "Close" button to return to the player. Your updated list of descriptions will be used for playback and for any subsequent exports.

Managing Prompt Presets

Prompts are powerful instructions that determine what the AI focuses on. By changing the prompt, you can get descriptions in vastly different styles.

Ask More About the Scene

Ever wonder what a character is holding or what a sign in the background says? This feature lets you ask anything that comes to mind about the scene.

  1. Pause the video at the moment you're curious about.
  2. Click the Ask More... button.
  3. Type your question in the "Your New Question:" field (e.g., "What color is the woman's hat?" or "What does the writing on the wall say?").
  4. Select how many seconds of video the AI should analyze around the current time.
  5. Click "Submit Question." The AI's answer will appear in the "Conversation History" area.

Scene Explorer

Scene Explorer is an interactive way to understand the spatial layout of a scene. It puts you in a virtual room that you can navigate with your keyboard.

  1. Pause the video on a scene you want to explore in detail.
  2. Click the Explore Scene... button, then click "Analyze Scene".

You are now in the Scene Explorer. Use your keyboard to explore:

Exporting Your Work

Once you're happy with your descriptions, you can export them from the Player Window in different formats:

A Deep Dive into Settings

The Settings window (Ctrl + ,) gives you fine-grained control over Omni Describer's behavior.

General Tab

AI Settings Tab

Audio Output Tab

Tips and Tricks for the Best Results

Creating great audio descriptions is an art. While AI is an effective assistant, you'll get the best results when you guide it correctly.

The Power of Prompts: Your Director's Notes

The application has a set of core rules it teaches the AI (like not talking over dialogue). Think of the Prompt Preset area on the main screen as the place where you provide your director's notes for that specific video. A good note helps the AI focus on a particular style or detail.
Tip: The "Focus on Names" Prompt
In a video with many characters, you can create a custom prompt to prioritize name tracking: Your highest priority for this video is to identify and use the correct character names as soon as they are spoken.
Tip: The "Describe the Atmosphere" Prompt
For visually rich films, guide the AI to focus on the environment: Focus on describing the setting, atmosphere, and environmental details. Mention the lighting, colors, and the overall mood of the scene.

What to Avoid in Prompts

Avoid instructions that contradict the AI's core principles (describing only visuals, not speaking over dialogue). Giving it a flawed instruction can cause poor results.

In short: Use prompts to guide the AI's focus, not to change the fundamental rules of good audio description.

Frequently Asked Questions (FAQ)

Q: My API keys are not working. What should I do?
A: First, double-check that you have copied the entire key correctly. For OpenAI, ensure you have set up a payment method in your account, as their TTS service is not free. For Gemini, ensure the API is enabled in your Google Cloud project.

Q: The process failed with a "MAX_TOKENS" error. What does this mean?
A: This means your video is too long or visually complex for the AI to process in a single pass. This is a capacity limit, not a content error. The best solution is to go to Settings -> AI Settings and enable "Video Chunking". This will automatically split the video into smaller, more manageable parts for the AI.

Q: The AI failed to generate descriptions because of "Safety Filters". What can I do?
A: This is a known issue where the AI's safety system can be overly cautious. You have a few options to try, in order: 1) Go to Settings -> AI Settings and enable "Disable Safety Filters" (this often helps). 2) If that fails, try reducing the "Frame Rate for AI Analysis" to send less data to the AI. 3) As a last resort, you can use the "Gemini Model Override" setting and enter gemini-2.5-pro. This model is more powerful and may handle sensitive content better, but it is slower and may be more expensive if you are on a paid API tier.

Q: Some of my old voices are missing from the SAPI5 list. Where did they go?
A: Modern Windows systems are 64-bit, but many classic, beloved text-to-speech voices were 32-bit. In Settings under the "Audio Output" tab, you will see separate engine choices for "SAPI5 (64-bit)" and "SAPI5 (32-bit)". To access your older voices, simply select the 32-bit engine.

Q: Why does generating descriptions take so long?
A: The time depends on your video's length, your internet speed (for the upload to the AI), the selected frame rate, and the current load on the AI services. Using "Video Chunking" is highly recommended for long videos.

Keyboard Shortcuts

Acknowledgments, Contact, and Contributors

Thank you so much for using Omni Describer! This application is a reflection of my desire to make visual media more accessible and enjoyable for everyone. Having users like you use this tool and provide feedback is the greatest motivation to continue developing it.

Feedback and Support

Do you have a question, a bug report, or an idea for a new feature? I would love to hear from you! The best way to reach me is by email. Your feedback is invaluable for making Omni Describer even better.