ChatGPT Unveils Voice and Image Recognition Features

Voice chat and image input demo from OpenAI
Voice chat and image input demo from OpenAI

ChatGPT is evolving fast and the latest update brings voice communication and image input features. These enhancements bring the internet of knowledge to your fingertips, ushering in an era of intuitive interactions. They are now directly competing with the likes of Google Assistant, Apple’s Siri and Google Lens. Now, you can converse vocally with ChatGPT, or visually illustrate your need for help.

Voice chat and image input demo from OpenAI. credits: OpenAI

These fresh features enrich the versatility of ChatGPT, offering a plethora of practical applications. Imagine travelling and taking a photo of a fascinating monument. You then discuss and learn about its historical significance with ChatGPT. You’re hungry, but your refrigerator or pantry has just a few ingredients. Take the picture of the ingredients and ChatGPT will now be able to whip up a delicious meal with step-by-step recipes. It could also assist in developmental learning for children in science and mathematics.

Plus and Enterprise users will gain access to these features on their ChatGPT app over the following weeks. You can use voice functionality on iOS and Android platforms by opting-in through your settings. Image recognition is available on all platforms. OpenAI plans to expand access to developers and other user groups soon after.

Engage in Vocal Discourse with ChatGPT

This innovative feature enables users to indulge in verbal exchanges with their digital assistant. ChatGPT is at your service whenever you want to have a conversation about anything you might indulge in.

To activate this feature, navigate to ‘Settings’, choose ‘New Features’, and opt for ‘voice conversations’. In the app’s home screen, tap the headphone icon at the top-right corner, and select a voice that resonates with you from five different options.

The vocal feature uses a modern text-to-speech model. It can create lifelike audio from text and a small speech sample. Professional voice actors have collaborated with us to create each voice. Whisper the OpenAI’s open-source speech recognition system, transcribe your spoken words into text.

Converse Visually with ChatGPT

This remarkable functionality allows users to present one or more images to ChatGPT. ChatGPT can then assist with a range of tasks. It can guide you to fix a broken grill, suggest meal ideas based on ingredients, and analyze complex work graphs. Users can focus on a specific part of the image by utilizing the drawing tool in the mobile app.

Image recognition is empowered by multimodal GPT-3.5 and GPT-4. The models use language reasoning skills on various types of images. These documents can be photos, screenshots, and documents that have both text and images.

Model Limitations

These voice and image inputs have many creative and accessibility-focused applications. However, there are also potential risks associated with it. For example, people could misuse it for impersonation or fraud. Hence, OpenAI has restricted the voices to the number of voice actors.

They have taken steps to prevent ChatGPT from analyzing and making direct statements about individuals in relation to image input.

ChatGPT is good at transcribing English, but it has trouble with some other languages, especially those that use non-roman script.

Read more about this announcement on OpenAI’s blog.