top of page

Navigating the Challenges and Opportunities of Synthetic Voices in Dental

We’re sharing lessons from a small-scale preview of Voice Engine, a model for creating custom voices conducted by OpenAI, with our reference for dental.





Zora is committed to developing safe and broadly beneficial AI. Today we are sharing preliminary insights and results from a small-scale preview OpenAI conducted of a model called Voice Engine, which uses text input and a single 15-second audio sample to generate natural-sounding speech that closely resembles the original speaker. It is notable that a small model with a single 15-second sample can create emotive and realistic voices.


OpenAI first developed Voice Engine in late 2022, and has used it to power the preset voices available in the text-to-speech API as well as ChatGPT Voice and Read Aloud. At the same time, we at Zora are taking a cautious and informed approach to a broader release due to the potential for synthetic voice misuse. We hope to start a dialogue on the responsible deployment of synthetic voices, and how society can adapt to these new capabilities. Based on these conversations and the results of these small scale tests, we will make a more informed decision about whether and how to deploy this technology at scale in dental.


Early applications of Voice Engine


To better understand the potential uses of this technology, late last year OpenAI started privately testing it with a small group of trusted partners. We've been impressed by the applications this group has developed. These small-scale deployments are helping to inform our approach, safeguards, and thinking about how Voice Engine could be used for good across various industries. A few early examples include:


Providing reading assistance to non-readers and children through natural-sounding, emotive voices representing a wider range of speakers than what's possible with preset voices. Age of Learning, an education technology company dedicated to the academic success of children, has been using this to generate pre-scripted voice-over content. They also use Voice Engine and GPT-4 to create real-time, personalized responses to interact with students. With this technology, Age of Learning has been able to create more content for a wider audience.


  1. Reference Audio


2. Generated audio

Some of the most amazing habitats on Earth are found in the rainforest. A rainforest is a place with a lot of precipitation and it has many kinds of animals trees and other plants. Tropical rainforests are usually not too far from the equator and are warm all year.




Translating content, like videos and podcasts, so creators and businesses can reach more people around the world, fluently and in their own voices. One early adopter of this is HeyGen, an AI visual storytelling platform that works with their enterprise customers to create custom, human-like avatars for a variety of content, from product marketing to sales demos. They use Voice Engine for video translation, so they can translate a speaker's voice into multiple languages and reach a global audience. When used for translation, Voice Engine preserves the native accent of the original speaker: for example generating English with an audio sample from a French speaker would produce speech with a French accent.



  1. Reference Audio - English


2. Generated Audio - Spanish

La amistad es un tesoro universal aporta alegría apoyo y risas a nuestras vidas sin importar donde estemos en el mundo. Los verdaderos amigos están con nosotros en las buenas y en las malas compartiendo nuestras alegrías y aliviando nuestras penas. Celebremos los lazos de amistad que nos conectan a todos a través de cada idioma y cultural.




Supporting people who are non-verbal, such as therapeutic applications for individuals with conditions that affect speech and educational enhancements for those with learning needs. Livox, an AI alternative communication app, powers Augmentative & Alternative Communication (AAC) devices that enable people with disabilities to communicate. By using Voice Engine, they are able to offer people who are non-verbal unique and non-robotic voices across many languages. Their users can choose speech that best represents them, and for multilingual users, maintain a consistent voice across each spoken language.



  1. Reference Audio


2. Generated Audio

Excuse me can I get your attention? Thank you for your help. Can we watch a movie tonight? Could you please help me find my glasses? Thank you for your understanding, it means a lot to me.




Helping patients recover their voice, for those suffering from sudden or degenerative speech conditions. The Norman Prince Neurosciences Institute at Lifespan(opens in a new window), a not-for-profit health system that serves as the primary teaching affiliate of Brown University's medical school, is exploring uses of AI in clinical contexts. They've been piloting a program offering Voice Engine to individuals with oncologic or neurologic etiologies for speech impairment. Since Voice Engine requires such a short audio sample, doctors Fatima Mirza, Rohaid Ali and Konstantina Svokos were able to restore the voice of a young patient who lost her fluent speech due to a vascular brain tumor, using audio from a video recorded for a school project.



  1. Current Voice


2. Reference Audio


3. Generated Audio

Hi everyone, this is what my voice sounds like using OpenAI's new text to speech model called Voice Engine. I was able to use just 15 seconds of a video that I made for a class project to be the reference audio source for the voice you hear right now. What do you think?





How Synthetic Voice Technology Can Be Applied in Dental Practices


The integration of synthetic voice technology into dental practices promises to revolutionize patient communication and streamline office operations, enhancing the overall patient experience significantly. This technology's potential applications in dental settings are vast, ranging from automating appointment confirmations to providing nurturing support over the phone.



Automating Routine Communications


One of the most immediate applications of synthetic voice technology in dental practices is in automating routine communications. Traditionally, tasks such as appointment reminders, follow-up calls, and patient education have required substantial time investment from front desk staff. With synthetic voice technology, these communications can be automated, freeing up staff to focus on more critical tasks that require a personal touch.

For instance, a dental office can utilize synthetic voice systems to send out appointment reminders and confirmations. The system can interact with patients using a voice that is indistinguishable from a human's, asking if they would like to confirm, reschedule, or cancel their appointment. This interaction is not only efficient but also maintains a personal feel, ensuring that the patient experience remains at the forefront.



Enhanced Patient Support and Education


Synthetic voices can also play a crucial role in patient support and education. By employing AI-driven voice systems, dental practices can provide patients with 24/7 access to pre- and post-care information. For example, after a significant dental procedure, a synthetic voice system can call patients to remind them of care instructions, answer common questions, and provide reassurance about what to expect during the recovery process.


These systems can be programmed to handle a wide range of inquiries, offering detailed explanations that are tailored to the individual needs of each patient. If a question arises that the system cannot handle, it can escalate the call to a human staff member, ensuring that all patient concerns are addressed appropriately.


Real-Life Implementation Example


Consider a dental clinic that has implemented a synthetic voice system to handle its outbound patient engagement. The system is configured to call patients after dental surgeries, offering care advice based on the specific procedures they underwent. For example, if a patient had an extraction, the system would provide them with detailed instructions on managing pain, avoiding infection, and recognizing signs of complications.

In this scenario, the synthetic voice is calm, empathetic, and informative, making the patient feel cared for and supported without the need for direct human intervention. This not only improves the patient's experience but also reduces the workload on the clinic's staff.


Conclusion


The application of synthetic voice technology in dental practices, as demonstrated by OpenAI’s Voice Engine, aligns seamlessly with the industry’s move towards more digital and patient-centric services. By automating routine tasks and providing a comforting and professional voice interface, dental practices can enhance patient satisfaction, reduce operational costs, and allow dental staff to concentrate on delivering top-notch care. As we continue to explore and refine the use of this technology, it holds the promise of setting new standards in how dental care is delivered and experienced.



0 comments

Comments


bottom of page