Catching the stuff up
In recent years, and even months, we have been witnessing amazing progress. Many new technologies that previously could have only been possible in dreams and science-fiction emerged, and boy, we are not stopping! Some things that were considered astonishing, cutting edge, state-of-the-art a year ago, can be considered outdated and legacy today.
Now, let’s just briefly (and I mean it) summarize the most important recent discoveries, before we move to the newies, so we can stay on the same boat.
Yes, you got it, there’s no talking about AI without mentioning GPT… This model has been with us since about 2018, but it got really crazy with the recent release of ChatGPT. Well, in the AI world, it wasn’t so recent, we are talking a history here, although timewise it was like a year ago, at the moment of writing. But that doesn’t matter, since I told you – a history…
What’s this GPT? It’s an AI model capable of doing almost anything. Trained on an enormous amount of data it is supposed to perform very well in any given task, as opposed to other models that usually were fine tuned to achieve a certain task. Simply speaking – ask it anything and it shall do it. Solve your homework, write a code for you? Sure!
Midjourney, DALLE-2, Stable Diffusion
If, for any reason, you haven’t seen any of those, then, oh boy, oh boy, prepare to have your mind blown! We are talking about generating images by AI, from our text prompt. What does it mean? You simply type, in a natural language, what you want to see in the picture, and the AI generates it for you. Simple as that!
What’s remarkable is that these applications get better every day, so if you have seen them a year ago, and they left you unimpressed, then I strongly encourage you to give it another shot.
Sample image generated using Midjourney (yes, generated! That’s not a stock image!)
Santa Claus programming in space. Also made with Midjourney.
Ever wondered what it would be like to do pair programming, but if your colleague would just cut to the chase, without his irritating remarks about your font size, and the way your icons are placed on the desktop? We’ve got you covered! GitHub Copilot is your AI assistant, integrated into your IDE (many integrations available e.g., Visual Studio, Jet Brains, VS Code). It can help write and complete the code with its suggestions. It can be a great time saver. Occasionally, it can even come up with very nice solutions one wouldn’t think of! Definitely worth trying!
Copilot in action
The latest version of GPT model. With its release, it made “old” ChatGPT look like a toy. Improved reasoning, less “hallucinations”, less likely to make things up. It can be used in the same way as the chat model (gpt-3.5) was used, but it’s more secure – it shouldn’t tell a user how to make a bomb, even if asked nicely.
The great change is increased context. GPT-4 comes in two variants, 8k and 32k (tokens per context). What does that really mean? Its “memory” can hold up to about 8 or 32 thousand words, giving a possibility to generate longer outputs or being able to process way more data.
You can literally paste the whole documentation of a library and ask it to generate some code based on it. It will do it, even if that library was created yesterday (remember it has a knowledge cutoff in late 2021) or was simply never published.
OpenAI states that GPT-4 can also understand images, but this feature is still in the preview, and is not publicly available.
GPT-4 showing improvements over GPT-3.5
If you ask someone what’s the biggest limitation of Chat-GPT, I believe they might answer its problem with a knowledge cutoff and the fact it can make up facts. Why can’t it just ‘update’ its knowledge… or something? Well, it can’t, that’s the way it works, but that’s a story for a different occasion.
So, what could we do to overcome this? Microsoft has an answer! Bing AI is using Chat-GPT under the hood (probably the GPT-4 version), but for each request it tries to do a web search to get knowledge required to answer a certain question. Basically, it searches the web for you. What’s really great is that it also lists sources, so you can quickly verify if what it said was true!
To use it, simply head over to the bing.com and go to “chat” and add yourself to the waiting list.
Bing AI answering according to the latest web search results
While Whisper by OpenAI is not the newest kid on the block, surprisingly many people haven’t heard about it, so I feel obliged to show it. Whisper is a speech-to-text neural network that simply works great! It behaves very well. I’ve tested it in some harsh conditions, with mumbling and lisping, with superb results. Moreover, it’s open source! And achieves impressive results with multiple languages (works excellent even with my native language – Polish)! Like, no reasons not to like it, at all.
You can host it yourself, or use it via OpenAI API.
There’s also a space where you can test it, so go give it a shot!
Text to speech… We’ve heard this before. Well, yes, except this time it’s really awesome! You can either create your own voice from existing presets, and adjust it or (this is huge) you can upload a sample of any voice and then use it to convert text to speech! Now, your chatbot can speak in your voice (or any other you can imagine, but please remember about copyrights and ethics in general).
Have a look at this fantastic demo, or try it yourself (there’s a free version, but to unlock the voice cloning option you need a paid subscription).
I hope you enjoyed this journey into the world of AI.
This whole issue has been written without using ChatGPT. Call it old school if you like… 😉