AI products, like most, need more user-enabled functionality and augmentation.
I wrote the majority of this article using Otter.ai and standard recorded voice memos from some preloaded app that comes on newer iPhones. This post comes out of research I’m doing for a book on design and machine learning. So much of writing this book has been a struggle, not for lack of ideas, but because of COVID. In the middle of the writing process, I became somewhat disabled, suffering from long COVID. Suddenly, all the patterns and tools and software I had come to rely on and use in one way I needed to use in a completely different and new way.
This article, and the book, are not technically on disability or inclusivity, which are important topics, but are topics I feel like I lack the experience to write about. There are plenty of other much more in-depth and fantastic books and resources you can use to learn about disability and inclusivity. But like other disabled designers, I’ve started to rely much more heavily on AI tools, particularly voice-to-text, tools, and translation tools- and this is what this article covers. AI has a UX problem, but so do most products out there. The issue here is a lack of augmentation. Products are so specific, so ‘useful’, that they are too smooth, and too narrow to allow for a wide variety of use cases, and this is a diversity and inclusivity issue. We’ve been in a generation of smooth, minimal, works-with-one-tap design, and that creates an assumption of a singular idea and a hyper-specific singular use case of who and what a user is that falls apart for disabled people (and many others).
This one button and one size fits none is now falling apart even more within AI-driven tools.
When my tricks of the trade used to be Figma, Miro, and the Adobe Suite, after contracting long COVID, I’ve been relying much more heavily on voice memos, and Otter.ai to do my work. This change of tools comes with a multitude of issues, and it’s in highlighting these issues, I want to highlight a criticism I have of productivity tools built on top of general-purpose AI software. These tools are not good enough and inclusive enough, even in how the product is designed on top of the software that is already problematic. AI is biased, and that bias cannot be removed, but related to that, the limitations of this particular user experience design create spaces of narrow engagement, and that narrow design makes the tool difficult to use.
This brings me to Otter.ai, while has been helpful in allowing me to work, it’s far from perfect. I can’t adapt or use Otter.ai in particularly specific ways that I need it to function.
My long COVID manifests as exhaustion and intense joint pain. There are some days I feel so similar to my old self, using my arms at 100% capacity. But other days, my joints hurt to a point where functioning is difficult if not impossible. The pain manifests mainly in my elbows and wrists and knees; I can’t wireframe or design for 8 hours a day anymore. Often this pain and hurt are combined with an exhaustion that makes me too tired to work for more than a few hours at a time. The exhaustion has changed my work day and work flow; I have to create spaces for rest. The joint pain changes how I make work as a designer; projects get stretched out and I’ve learned how to co-design and co-make with collaborators. Like other disabled people, I have created a workaround to be able to do some work even if the exhaustion or pain is intense. I use voice memos and transcriptions that generate voice-to-text. With these tools, I can respond to emails, dictate articles, craft research plans, interview, and respond to general work inquiries.
However, my workarounds come with their own particular issues, and limitations. Voice-to-text tools differ from tool to tool. Otter.ai does not function the same way as Siri, for example. Their diction ‘interfaces’ (e.g. the actual voice commands a user uses to communicate with the product) are different. I can say ‘exclamation point’ and Siri generates the symbol !; Otter.ai just writes out ‘exclamation point’ instead of generating the symbol. With Siri, you can both articulate as well as a command at the same time. You can’t do that with Otter.ai.
I could take my voice memos and my interviews and run them through something better, like a transcription service using humans. However, those services are so much more expensive and human translation tools come with a whole other suite of problems. Much like systems where humans function as services and hidden labor within AI and machine learning tools like Mechanical Turkers, human transcribers are being underpaid and mistreated in their work and labor, too.
There are many reasons to use Otter.ai from equitability to affordability. But the tool isn’t perfect. I can see it laid out in my designer’s mind, the kinds of augmentations I would love to see on top of Otter.ai. I would love to be able to highlight words, commit them to memory in the tool, teach or show the tool the way I pronounce things in my own particular way. My southern accent, flattened after many years in living New York and abroad, still possesses a specific kind of pronunciation and nuance that the tool has a difficult time understanding. In real time, I can see the tool getting something right, and then changing itself to something wrong. No, I shout, and that shouting suddenly becomes a part of the transcription as its live transcribing. Even in making this transcription for this essay, there are extra strange words added from the tool misunderstanding and mishearing, that I remove with line-by-line edits.
Looking at Otter.ai, it’s so clear to see this was designed as a tool for memos, and for asynchronous human collaboration. You can leave notes for each other, comment, provide links, etc. But if the goal is this kind of collaboration, then shouldn’t there be a way for me to teach and correct things especially if I’m using the same words over and over again.
Otter.ai has a hard time understanding anything outside of non-standard English or Western-sounding names. My husband is Croatian, and the way his name is pronounced is different from how it’s spelled. I have a collaborator named Bojana (Otter understood this as ‘Briana’ in the transcription); she’s one of my main collaborators. Why can’t I teach this tool what Bojana sounds like. Who wouldn’t want this kind of additional functionality?
We can’t really teach Otter.ai new words (Otter heard this as ‘worlds’) but what UI and UX can do is allow for batch changes and editing such as selecting a document or a bunch of documents to change.
Imagine having a dictionary; can you see it in this lower, empty corner in Otter? Imagine if I could just select a word from that dictionary, after highlighting or searching for a word to replace. Boom. Easy. it’s replaced. In all of these documents, when the words ‘boy plus Jana’ and ‘Briana’ appear please change them to ‘Bojana.’ Problem solved.
This is just one idea of what a magical dashboard could do, with just a few more small added amounts of functionality in this tool that already works fairly well, but not well enough, for me. This is where the kinds of needed augmentation butts up against our current design trends of overtly and narrow minimalism that it perpetuates inside of product design and software.
I took this screenshot as I was dictating an interaction of this exact article, September 8, 2022
This added part is the necessary role of UX design in AI and its understanding that for a part of the population, AI tools that almost kind of work are fine, but not fine enough for others. We need to move beyond just having okay accuracy and allowing for smarter kinds of design that allow for user augmentation to really have AI tools to be really used and engaged with by the general public. Our tools, AI or not, can be better and they should be better. We need to allow for spaces of user agency and augmentation.
These tools I wish existed function almost like a bridge between teaching and training and refining the model for myself, even though saying that out loud, I know that’s impossible. To create a model so specific for an in-browser tool is a dream that perhaps we’ll see actualized in ten years. But following the thread of that want and desire, we can use UX and UI to conceptualize and create that kind of functionality I’m describing. We can allow for our own kind of human agency and input that allows me with the precision of a doctor with a scalpel to change and edit so the tool works for me. This is my use case.
I wrote this in 2018 but it still rings true, at least for me, in how I wish AI product design could function:
“As utilitarian as it is, I dream of dashboards and analytical UX/UI settings. I dream of labels, of warnings. I want to see visualisations of changes in the data corpus, network maps of words, and system prompts for human help in defining as-yet-unanalysed words. What I imagine isn’t lean or material design; it is not minimal, but it is verbose. It’s more a cockpit of dials, buttons and notifications, than a slick slider that works only on mobile. Imagine warnings and prompts, and daily, weekly, and monthly analyses to see what is changing within the data. Imagine an articulation, a visualisation, of what those changes “mean” to the system.
What if all systems had ratings, clear stamps in large font, that say:
“THIS IS INTENDED FOR USE WITH HUMANS, AND NOT INTENDED TO RUN AUTONOMOUSLY. PLEASE CHECK EVERY FEW WEEKS. MISTAKES WILL BE MADE.”
Imagine labels, prompts, how-to’s, and walk-throughs. Imagining the emphasis of what is in the data set, of the origins of the algorithm- of knowing where it came from, who created it, and why it was created. Imagine data ingredients, a prompt “I” for ingredients label that is touchable, and explorable.”