Growing popularity of generative AI
GPT 1.0, BERT, …
After the introduction of transformers, some smart tech people quickly got down to business, working on how to use their architecture to achieve even better results. In 2018, the OpenAI team dropped a scientific paper called “Improving Language Understanding by Generative Pre-Training”, where they unveiled their GPT 1.0 model. Generative Pre-trained Transformers, or GPT, is a type of model that’s been pre-trained on a ginormous amount of data, using both supervised and unsupervised learning techniques. This means it’s capable of generating text based on the given context. And let me tell you, it was able to spit out texts that sounded so human-like, it was mind-blowing.
Around the same time, people from Google got in on the action, too. They dropped a scientific paper titled “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding” and brought their own open-source BERT model to the table, making it accessible to everyone.
GPT 3, ChatGPT, Bing, Bard
Despite transformer models being known for several years, it wasn’t until the events of late 2022 that they caught the attention of a wider audience. They became more accessible and helpful for ordinary people and found their way into the discussions of tech enthusiasts and journalists everywhere.
In November 2022, the big reveal was ChatGPT, which became a real game-changer in the world of generative AI. It started with the GPT-3 model and later upgraded to GPT-4, opening up an interface for casual conversations with AI, capable of understanding context and responding to user queries. OpenAI’s masterpiece offered unprecedented quality, making conversations with AI feel completely natural with logical, grammatically correct, and contextually fitting responses.
Though the GPT-3 model itself was known since mid-2020, it was ChatGPT in the form of an AI chatbot that captured the world’s attention. Large language models were primarily used by specialists before, but with ChatGPT, anyone could witness their capabilities and start implementing them in their daily applications.
For many of us ChatGPT outperformed search engines, presenting information in a user-friendly way. However, its limitation lay in the temporal knowledge availability, as it could only access information up until 2021. Despite occasional drawbacks like sporadic incorrect or off-topic responses, OpenAI’s application rapidly gained popularity, becoming one of the fastest-growing apps in history. Even colossal platforms like Instagram, TikTok, or Twitter could only dream of achieving the staggering milestone of 100 million users within just a few months after its debut.
The tremendous success of ChatGPT brought AI and OpenAI to the forefront of virtually every internet user’s awareness. This popularity inevitably led many companies to consider using AI in their products, especially pondering the use of models crafted by OpenAI. Without a doubt, OpenAI isn’t the sole beneficiary of this triumph; their primary investor, Microsoft, also has reason to celebrate. Their collaboration with OpenAI began in 2016 and over the years, they invested more resources, becoming the sole cloud service provider for OpenAI’s solutions, compelling interested users to utilize the Azure platform. Simultaneously, Microsoft’s vast hardware resources provided the necessary computational power required for training such advanced models.
One of the most buzz-worthy projects arising from OpenAI and Microsoft’s collaboration was the enhancement of the Bing search engine. They combined OpenAI’s models with internet information analysis and search capabilities, allowing Bing to address the data availability challenge faced by ChatGPT, making use of related information tied to user searches.
Shortly after implementing the new version of Bing, Microsoft achieved 100 million active daily users. Although it wasn’t a massive number compared to Google’s rival search engine, it turned Bing into an attractive alternative for users.
As a response to the competitor’s innovations, Google presented its own product: Bard. It’s based on the LaMDA language model for dialogue applications, announced a couple of years ago. However, it still remains limited and unavailable in many countries, and it relies on a lighter version of the LaMDA model, causing its capabilities to fall short compared to the competition, which didn’t escape the notice of many users.
AI solutions entering our everyday life
Before the sudden surge in popularity of AI-based solutions, end-users were generally unaware of their interactions with such technology. It mostly operated in the background, providing better user experiences with various products. However, as large language models gained popularity, awareness of AI’s presence in our lives grew. People realized the immense capabilities it possesses and that anyone can harness it to improve their work, enhance productivity, or simply find more satisfaction in their daily tasks.
Google Trends – AI
In recent months, there has been a significant increase in interest in AI, as shown by the Google Trends chart above, displaying the rising number of searches related to AI. Alongside the increased interest in AI itself, curiosity about its applications emerged. Students and learners quickly found support in OpenAI’s product for homework and task-solving. Programmers started using it as a tool to assist their work by asking questions or generating code snippets, while content creators found it helpful for creating and editing texts. Besides text-based models, there were also solutions capable of generating images, music, and videos. This AI boom led to the natural question on everyone’s minds: “Will AI take our jobs?” The impressive capabilities presented by these solutions sparked concerns that AI might replace humans in many domains.
However, a closer examination of how these products work revealed that they are far from capable of replacing humans. For example, AI-generated articles often deviate from the truth, contain grammatical errors, and cite non-existent sources due to their relatively limited training data. Models like GPT and their counterparts in the competition are susceptible to hallucinations, producing text fragments unrelated to context or factual sources. These mistakes are easily detectable by humans, and it doesn’t help that the models can be overconfident in their incorrect outputs. Such situations have a significantly negative impact on user perception, trust, and the credibility of these AI models.
Despite their imperfections, only a fool could ignore the potential they bring. AI excels in the role of an assistant, aiding in performing monotonous, repetitive, and time-consuming tasks, allowing individuals to focus on more creative and attention-demanding endeavors.
Open source to the rescue of privacy
The rise in the popularity of artificial intelligence has led many people to pay more attention to their privacy and data security. Training AI models often requires a massive amount of data, which may come from various sources, not always legal. This, combined with a lack of transparency in AI companies, can result in the unwanted misuse of user data. Moreover, using AI-powered programs can mean that our interactions with services contribute to further training the models, potentially leading to the uncontrolled use of sensitive data, such as snippets of internal code inadvertently pasted into a chatbot or contents of company meetings processed by an assistant to generate notes.
The answer to these issues could lie in utilizing open-source solutions, which, by their nature, should prioritize transparency and well-documented processes, leaving no doubts about how the system operates. Additionally, many open-source AI solutions work entirely locally, meaning that none of our data ever needs to leave our device or local network.
Due to the relatively recent surge in popularity and development of many AI solutions, there is often a lack of good open-source alternatives for them. However, with each passing day, more well-developed projects are emerging, many of which can now serve as viable replacements for their paid, closed counterparts.
Just a few months ago, if anyone wanted to chat with a well-developed AI chatbot, they had to rely on ChatGPT. However, the game changed when the team at Meta (formerly known as Facebook) added their contribution to the world of large language models with the LLama project – Large Language Model Meta AI. They took a different approach to training the model, theoretically achieving better accuracy with fewer parameters. This, in turn, could lead to lower hardware requirements, making it possible to run the model even on an average personal computer. The LLama model comes in various versions: 65B, 33B, 13B, and 7B, where the numbers indicate the number of parameters, and compared to GPT-3, which had 175B parameters, there’s a significant reduction. This reduction in parameters is noticeable in the generated text’s quality, but it still remains at a very high level.
The release of this model stirred the entire open-source AI community because never before had such a powerful tool been made available, accessible to anyone without any restrictions. This led to a growth of projects based on this solution, enabling even higher-quality results. One example of such projects is llama.cpp, a port of the LLama model in C/C++. Its goal was to run the model in 4-bit quantization on a MacBook without external dependencies, using the processing power of the CPU (later, GPU support was also added). This project has already gained nearly 35,000 stars on GitHub, indicating the tremendous interest it generated. Another equally popular project is Open Assistant, an assistant based on chat that allows interaction with external systems.
LAION AI – Open Assistant
Although open-source models may not directly compete with models like GPT-4, the ability to run them in a local environment, facilitating secure interactions with sensitive data, can be highly appealing to many individuals and companies. As a result, these open-source projects are bound to continue evolving, and with them, their quality is expected to improve as well.
When talking about open-source AI solutions, we can’t ignore projects created by OpenAI. Initially, it was intended to be a beacon of hope for the open-source community, but it has, to a large extent, evolved into closed commercial solutions, disappointing many users. Nevertheless, OpenAI still boasts projects like Whisper, a speech recognition system available to everyone as open-source, enabling local speech transcription in multiple languages and translation to English.
The combination of its advanced capabilities and open accessibility has unlocked unprecedented opportunities, leading to the creation of projects like WhisperX and Whisper.cpp. Both make it easy to implement and utilize OpenAI’s solution while adding various additional functionalities. By using programs like Whisper.cpp, we can conduct advanced transcription using just our processor’s computational power (partially supported by the GPU). The project incorporates multiple optimizations and can even be combined with other projects like tinydiarize, allowing partial diarization of speech. While not yet perfect, it provides an excellent alternative to cloud-based transcription solutions, granting us full control over our data and facilitating easy analysis, even for sensitive meetings.
Even solutions like DALL-E and Midjourney, which are well-known to anyone interested in AI, have their open-source alternatives in the form of projects developed by Stability AI – Stable Diffusion. Along with available tools like ControlNet, they allow the creation of beautiful images, while also providing an alternative to the increasingly popular Adobe Firefly. But to keep it brief, let me leave you with my most stunning creation using Stable Diffusion – Shrek exploring the world of Harry Potter. With AI, things once deemed impossible become a reality…
Stay tuned for the second part of the article which will cover AI in the world of mobile apps development!