ChatGPT and co. – the end of the Graphical User Interface?
October 9, 2023
Generative AI is on the rise. Advances in Natural Language Processing are enabling artificial intelligence to perform tasks of unprecedented complexity. For months, there has been speculation about which jobs will be replaced first and which industries in particular should brace for disruption. That being said, however, the landing of ChatGPT and co. also implies quite different, possibly even more significant alterations: The immense proliferation of Large Language Models and other forms of generative AI is ultimately drawing profound changes in the way we interact with computers. Interestingly, these ways of interacting bear a striking resemblance to ways of interacting that have been around since the very first generations of computers. So let's first take a look at the past.
A brief history of human-computer interaction
The triumph of computers can be traced back to the last century. Before touchscreens and cursors, users had to type commands into terminals to tell the computer what to do. The striking similarity to today's interactions with chatbots like ChatGPT is quickly apparent. However, unlike Natural Language Interfaces, which are capable of understanding natural language, the command line has the obvious disadvantage that commands must be unambiguous and allow no room for error - if you mistype, you are no closer to achieving your goal. The command line thus bears little resemblance to human conversation.
The next evolutionary step was the Graphical User Interface (GUI). Typed commands were replaced by graphical interfaces that enabled easy-to-understand interaction even for inexperienced users. This made computers more accessible to many people who previously had little interaction with them. Many paradigms from that time still exist today: the window, the icon, the menu, and the pointer (WIMP) are still an integral part of any GUI. The GUI itself has also undergone important evolutionary steps over time. Applications on older iOS versions, for example, are often much closer to their real-world counterparts than modern applications. This abstraction away from the real object opens up new possibilities for designers to create interactions - with the disadvantage that these new interaction paths must first be learned.
The GUI is undoubtedly one of the most important disruptions in the history of computing. But even if a GUI is designed to be understandable and simple, interacting with an application often still involves learning processes that get in the way of the desired goal - especially in more complex workflows. What if we could simply tell the computer what we want, just as we would tell another human being?
Natural Language Interface - the next evolutionary step?
With LLMs we are one step closer to this goal. We just write what we want and ChatGPT, Bard or LLaMA does it. Even in its basic configuration, ChatGPT is enormously impressive in its capabilities. Plugins allow us to connect to external services that further extend this feature set. Will we communicate with computers exclusively via the Natural Language Interface in the future?
It is true that the NLI offers some compelling advantages over the GUI. The abstraction layer of graphical elements does not exist in NLIs. Instead, LLMs rely on written language, possibly in combination with spoken language - a means of communication with which the vast majority of users are likely to be familiar. This reduces accessibility barriers and enables fast and uncomplicated work.
However, especially when it comes to more complex activities that require a high degree of precision or a quick visual grasp of information, ChatGPT and co. still show limitations - here, the classic GUI is often further ahead. For example, most users might prefer a simple click on the browser's address bar over a corresponding command ("open the following website in the Safari browser: ..."). Other GUI interactions, such as sliders, allow users to quickly try out and adjust parameters without having to commit specifically. Especially when many of these interactions are linked together in workflows with multiple steps, a GUI can be beneficial. In fact, GUIs make it possible to break down complex tasks into simple interaction steps. By using conventions, designers can make similar interactions the same across programs, which further increases the comprehensibility of interfaces.
NLI vs. GUI - what can a compromise look like?
So we see, the Natural Language Interface can do certain things better than the Graphical User Interface and vice versa. Instead of pitting one against the other, wouldn't it make sense to use both interfaces to their greatest advantage?
In the future, users and designers alike will be tasked with figuring out what each interface is best suited for. In any case, we don't think playing off against each other is the right way to go. Top dogs like Adobe, on the other hand, are showing what the future of human-computer interaction could look like. In its latest version of Photoshop, Adobe introduces what it calls Generative Fill, which makes it easy to "erase" and replace certain elements of an image. First, you select the desired area with the mouse and then tell the AI in written form how the area should be changed. Subsequently, further refinements can be made - either with mouse clicks or in the form of further commands.
So why not just use both? The GUI offers advantages just like the NLI. Research, science fiction and practice are already working on designs of what the future of computer interaction may look like. We can be curious about what comes next. In view of the current technical development, we are sure that the story of human-computer interaction is not over yet.