Avatar
Year: 2025
Project Name: Vizor
Category: Research
Screenshots:
One Liner:

Platform Agnostic Accessibility Software Leveraging Computer Vision and Speech Processing

Abstract:

The purpose of this project is to showcase how improvements in computer vision and speech processing can be leveraged to make accessibility software that is open, customizable, and free of platform lock-in.

Description:

Accessibility software has historically been built behind the walled gardens of tech giants such as Microsoft and Apple. A sensible choice when deep operating-system integration was required for application metadata, keyboard and mouse control, and more. However, as computer vision has greatly improved, building accessibility tools no longer requires these operating-system hooks. I demonstrate this by building an open-source reproduction of Windows Voice Access that leverages a fine-tuned YOLO-v8 model to parse GUI elements and a Whisper Tiny transcription model to process speech commands. Together, this yields a platform-agnostic desktop application that parses and numbers on-screen elements in real time and gives users complete voice-based control of their desktop, using commands like “Click 8,” where “8” corresponds to a GUI element (for example, the Google Chrome icon).

Video: https://1513041.mediaspace.kaltura.com/media/VizorSeniorProject/1_29j41y67
Digital: View Poster

Team Members

Avatar
Sam Pagon

sam.pagon@drexel.edu

Advisors

Avatar
Chad Peiper

chad.e.peiper@drexel.edu