Episode 3

Expert-led podcast discussing trending articles, and news in the AI and attention spaces.

Rectangle 3102 (2)

Podcasts led by experts

Exploring the Future of Voice, Vision, and ChatGPT

Group 48097478

Everybody is talking about the new updates to Open AI’s ChatGPT. The Independent shared an overview that summarized the changes and gave examples of how they could be used.
Jeanne Masche, a Solutions Architect with almost 15 years of experience in data and business intelligence, joined Beka Ventham to discuss AI’s influence, security concerns, technical complexities, and use cases in this episode of Game of Attention.

Q: OpenAI recently announced an upgrade to ChatGPT that allows for voice conversations and image-based interactions. How do you see these new capabilities influencing the landscape of AI-powered applications and products, and what opportunities or challenges do they present?

Jeanne: If you look at it from a speech point of view, I don't know what the level of integrating different accents is, for example, because like if you take our South African accent and you try and do the whole text to speech thing, it doesn't pick up half the things you say and things you said something else. If they can get that right and perfect with everyone's accents around the world, that would be great. I haven't tested the accent thing. I know my accent doesn't work on most things where you can use your text-to-speech.
But from the image point of view, I've seen some amazing things that you can do with it. You can go and draw on a whiteboard and you can design an app, for example, on a whiteboard. And you will say, or like a website and you can say what you want it to do. You can change things around and then you send that picture into Chat GPT. Chat GPT will then produce the code that you can then go and put into your application that you're creating. And it even takes out the things that you've scratched out or things you've moved around it, it looks at that image and it's able to change everything and develop that. code for you that you just go and copy and paste.
I think for a lot of developers who are trying to do POCs or do small projects or have just started out and they've just actually done this rough diagram on a whiteboard, ChatGPT can already write most of the code for them.

Q: It's such a powerful thing that yes, it's great for efficiency and it's great for kind of making things easier, writing code for you. I use it for research and it halves my research time easily. But what about the people who don't do that and are just trying to use it as a shortcut?

Jeanne: They've designed this tool to go, okay, how much was produced by AI and how much is someone's actual work, because now you also get people who are going to hand in CVs and they're going to, let's say like in our field, we get technical interviews, but some companies, they send us an assessment. We've got to do an assessment to prove like, you know, we have the skills. And what if we're using Chat GPT for that? There is bad and some people will. be able to get away with it. And there are those cool tools that you can sort of see how much was done by AI.

Q: The voice capability is powered by a text-to-speech model and speech recognition system. On the technical considerations and architectural aspects involved in implementing that, how can those technologies be leveraged to ensure a seamless user experience while maintaining data privacy and security?

Jeanne: They've got to work on their latency side of things, which is something they've got to consider while building it. And depending on where people are in the world, what's the internet like, because that's also going to affect their latency. And then from a security point of view, they need to have that end-to-end encryption so that no one else can go and grab that voice data and then use it in a bad way because some people do password-protect things with their voice. And if someone gets that out, then they can now use your voice to go and do voice recognition for security reasons. What they also tend to do is make you acknowledge that they are going to do the recording when you start sending over your voice and they also tell you what the intended use is for your voice data so that you're fully aware of what can and can't happen with it. And all of this has to be taken into account.

Q: OpenAI's update also includes image-based interactions with chat GPT. What are your thoughts on the technical complexities of integrating image recognition into a text-based AI system? And how might this enhance the utility of AI applications in various domains? Additionally, what potential use cases do foresee for this capability?

Jeanne: Let's break that up. If you look at chat, GPT taking an image and that interactivity. So that's like that example of drawing stuff on a whiteboard. But also, if you look into that, let's say doctors start doing this and they use their handwriting. How many people can read a doctor's handwriting? I do know that they use like deep learning models and everything to enhance it, but like, I feel like that's going to take a lot of time to get to like the recognition of handwriting, um, in chat GPT, but like it's a large language model. It learns from itself continuously and it will continuously improve.
Let's carry on with the doctor's thread and healthcare. Let's take an x-ray of a person and they've got an x-ray; you submit this x-ray into ChatGPT and it can help healthcare providers come up with multiple diagnoses back based on the x-ray that's provided. The healthcare providers hopefully not just use what ChatGPT says, but it could come up with possible diagnoses that the doctor maybe didn't think of, and it can help them in that sense, come up with all the possible problems and then solutions. But at the end of the day, the doctor still needs to make the choice with that.

Q: Do you think these updates are going to have a massive impact on what you do on a day-to-day basis? Jeanne: Not for me personally but I think for people starting out in their careers, you don't know how to put things in layman's terms. It would be very helpful for them.

Q: You're in Germany, learning German, and dealing with German clients. In terms of that kind of translation aspect, do you think it could be useful there?

Jeanne: I actually do use Chat GPT to translate from English into German. Also, the nice thing about Chat GPT is I can say, okay, it needs to be in the formal version or semi-formal or informal version of German. And it's the most accurate translator that I've used.

Speaking with Jeanne has been so insightful given her background and the clients she works with. 

Make sure to subscribe on LinkedIn or follow on Spotify, so you don’t miss our next episode which discusses whether AI is smarter than humans.