Generative artificial intelligence is progressing at a steady pace

The first half of 2023 has been marked by an unprecedented computing revolution. Today, we have access to artificial intelligence systems, both free and paid, that can support any type of dialogue with humans and assist them (or even replace them) in jobs requiring knowledge and creativity. This is not a fleeting hype or a ‘bubble,’ as many thought or perhaps hoped. It’s not a dream (or a nightmare for some). It’s reality. At the beginning of the year, we had ChatGPT as a reference point and the widespread adoption of Midjourney version 4, which provided a snapshot of the state of the art of this technology. Now, we have the subsequent GPT-4 model and Midjourney version 5.1, which allows us to understand the real speed of generative AI evolution. Let’s start with OpenAI’s recent GPT-4. It’s pointless to look for impressive numbers to quantify the increased power (meaningless figures are circulating), as examining the various tests presented by OpenAI is enough to show that the latest evolution of their language model surpasses the previous one in every field. The most significant data point for understanding this artificial intelligence evolution lies in Microsoft’s enormous investment and involvement in the project. OpenAI can now use Microsoft’s supercomputers (worth hundreds of millions of dollars) to train and evolve its AI on a scale very different from what researchers and programmers have been working on in recent years. Microsoft has even gone further by announcing the development of even more powerful and expensive systems based on Nvidia technology (another giant deeply engaged in the field of artificial intelligence). The Redmond-based company has decided to dive headfirst into this sector and change the history of computing once again and forever. A natural consequence of this is the integration of AI into all of Microsoft’s major productivity tools, including Office and Designer.

The latest evolution of Designer has been seen by many as a challenge to Adobe, and just a few days after Microsoft’s announcements, Adobe finally took the plunge by introducing Firefly, a set of creative models based on proprietary generative artificial intelligence. The text-to-image, which currently prevails among all fans of graphic art generated by AI, could not be missing.

Returning to Microsoft, in the latest keynote (end of May), there were numerous announcements regarding artificial intelligence. Previously, Satya Nadella (President and CEO of Microsoft) had stated:

“Today marks the next major step in the evolution of how we interact with computing, which will fundamentally change the way we work and unlock a new wave of productivity growth. With our new copilot for work, we’re giving people more agency and making technology more accessible through the most universal interface — natural language.”

He has now added: “AI is taking the computer age from the bicycle to the steam engine”

Here are some more statements:

“Generative AI is the next evolution of AI-driven creativity and productivity, transforming the conversation between creator and computer into something more natural, intuitive and powerful” David Wadhwani, president of Adobe’s Digital Media Business

Sam Altman, CEO of OpenAI, looks even further ahead and thinks about the advent of AGI, Artificial General Intelligence, also known as strong AI, something that possesses superior or much higher capabilities than humans. And with the multi-billion dollar investments made by Microsoft, as well as infrastructures designed specifically for the development of AI, this goal may not be far away: “As our systems get closer to AGI, we are becoming increasingly cautious with the creation and deployment of our models”. Will they really be cautious, or will the fierce competition push them to accelerate even more?

Here is also an excerpt from a long post published by Bill Gates: “The development of artificial intelligence is as important as the creation of the microprocessor, the personal computer, the internet, and the mobile phone. It will change the way people work, learn, travel, receive healthcare, and communicate with each other. Entire industries will reorient around it. Companies will be distinguished by how well they use it.” From Bill Gates’ post, we also learn that he met with OpenAI researchers as early as 2016… could he not have had a hand in such a revolution?

After a long period of catch-up, Google has finally revealed its cards. It seems that in the last keynote, the abbreviation “AI” was mentioned more than 150 times.

Giants like Meta and Amazon are still missing from the roster, but they have expressed their intention to enhance their products with artificial intelligence.

We cannot predict the future, but certainly, some things appear quite clear. The evolution of generative AI is very rapid and this article could become obsolete in just a few weeks. Here’s a small example: on this website, you can read an experimental work titled “The Last Cyberpunk Novel,” which consists of two science fiction stories created using ChatGPT (GPT-3.5). Considering the interest generated by this experiment, I waited for ChatGPT to become active again in Italy after the halt imposed by the Privacy Authority so that I could finally use the Plus version and the GPT-4 model. Thanks to the remarkable power of this tool, I have created not only a new, more complex story (approximately 15,000 words, 100,000 characters) but also a guide called “Creative Writing with ChatGPT.” This material will be published on May 31, 2023, in both Italian and English languages.

Before taking a look at Midjourney’s latest news, I want to address the topic, and especially the question that many are asking today, which is: will these revolutionary and innovative products presented by Microsoft and Adobe (but there are also others) replace humans in the fields of Graphic Design, Photography, Computer Graphics… but also in Journalism, Writing, and Software Development? In my humble opinion, absolutely yes. Certainly not everyone, but many will have to change their jobs. The way of designing any digital product that includes text, code, graphics, audio, and video is destined to be revolutionized in a matter of months, rather than years. Those who will be able to adapt to this change will see their workflow evolve, gaining in productivity, while others may at most carve out a niche by creating a kind of “digital craftsmanship”. Moreover, I believe that many are underestimating the creativity shown by these artificial intelligence models, convincing themselves that they are limited to a mere imitation of the images and texts with which they have been trained, and that therefore they will never be able to go ‘beyond’. Nothing could be further from the truth. Just use a tool like Midjourney, not being too exhaustive in the definition of the prompt, to see images that often interpret our request with a lot of creativity. In addition, hundreds of thousands of users experiment with new combinations of different styles and subjects every day, often obtaining images that frankly I don’t think have even been thought of before the advent of such tools.

Finally, we have a particular category of graphic designers who are currently standing aside pretending that nothing is happening, as if all of this is not destined to overwhelm them as well, and I’m referring to 3D artists. Over the last thirty years, I’ve followed the evolution of 3D computer graphics both in theory and in practice, an area in which I currently identify some strengths and resistances. But considering the evolution we are witnessing, I believe that 3D artists should seriously consider the idea that these strongholds will fall within a matter of months or at most a few years. In fact, I intend to continue maintaining and updating my introductory guide to Blender 3D, starting by giving space to the best plugins based on open-source AI models (there are some excellent ones), but above all, introducing scripting in Python and deepening parametric modeling (Blender’s geometry nodes) because it is through these doors that AI could enter and radically change CG3D. With a simple prompt, software like Blender could create highly detailed and animated 3D worlds in no time.

Let’s move on to Midjourney’s V5 version, which among the improvements introduced boasts remarkable photorealism. The V4 version had already shown to be extremely valid for any type of illustration, but now it could put photography itself in crisis. Let’s start with a comparison between the various versions, from V1 to V5, of this AI dealing with a simple prompt: “Rome street photography”. The evolution is very evident:

V1

V2

V3

V4

V5

With the more advanced versions, V4 and V5, Midjourney mainly produces black and white images because it is the style most commonly used in street photography. Additionally, I have noticed that by default, middle-aged or elderly people are depicted. If you want younger individuals, you have to specifically request them.

Update V5.1

Update V5.2 with new Zoom Out feature

If this is the evolution of just one year of Midjourney, it is inevitable to wonder what we will see in the next five years. We are facing colossal investments, and therefore, we should be able to witness even more astonishing progress. While observing these images generated by AI, we must keep in mind that the alleys and all the depicted people do not exist in reality, but are the result of the algorithm’s creativity.

Among the critics of generative artificial intelligence, there are some people who believe that these images are a kind of collage of actual photographs whose copyrights have been violated. However, they clearly have no understanding of how neural networks work. Although these images may resemble photographs, they are not and have never been. This innovation in the world of graphics now has a name: Synthography.
To use Midjourney (and other similar AIs such as Leonardo.ai), it is necessary to access the Discord app, which may initially discourage those who prefer a web page (like ChatGPT). In reality, this system is very simple, and there are step-by-step guides available online that explain how to proceed. It only takes a few hours to master everything.

If you want to imagine what the children of the future, today’s newborns, will look like, Midjourney could help you take a leap into the future (to 2040)