Generative AI as APIs
Let the computers talk amongst themselves
Pixar’s “Up” gave us Doug, who answered the age-old question of what dogs would say to us if they could talk.
- Can I have the ball?
- Squirrel!
- Point!
It turns out, dogs have nothing to say that we already didn’t know. In 2023, Large Language Models (LLM), such as ChatGPT, gave computers the ability to mimic human conversation. Pundits tell us that the future of computing is a Star Trek world of humans talking with machines as if they were humans. Frankly, I’ll pass.
Sure, LLMs can summarize a wiki article and save you some reading; but the Internet is made for advertising. It is about grabbing your attention and providing infinite distractions. Chatty computers will be omnipresent carnival barkers shouting from every screen. From smartphones to ATMs, devices will pepper their conversation with product referrals and influencer-style recommendations. No thanks.
I don’t need a computer pretending to be human, spamming more of my cognitive space. However, using LLMs to enable computers to talk to other computers will be a game changer.
Software is built on top of more software
Modern software is made by gluing pieces together. Unless you are doing something very low-level, no software is written from scratch. A simple mobile app can involve hundreds of modules from different suppliers. Enterprise systems are built from tens of thousands of modules connected by millions of lines of code.
To give you an idea of complexity, imagine building a simple Python web app. A common approach would be to use Django as your Python framework and Linux as your OS.
A modest web app might have 10K-20K lines of code and represent 3–6 people months of effort. Without spiraling into your databases, logging, reporting, security, etc, your app is pulling in:
- Django: 400K lines of code, 130+ modules, 113 people years of effort
- Python: 1.4M lines of code, 137K+ modules, and 422 people years of effort
- Linux (just the kernel) is 35.4M lines of code, 1K-2.5K packages, and 12,000 years of people effort!
Your 10K to 20K lines of code are sitting on more than a millennia worth of work spanning across nearly 40M lines of code; most of which you will never see or know about!
APIs, or how things connect and break
Programmers glue all the modules together using Application Programming Interfaces (APIs). These are strict contracts that make sense to computers and some people. They define inputs, outputs, functions, etc. If you imagine code as Legos, APIs are the nubs and sockets that allow all the bricks to be connected. And just like Lego, they have to follow a strict standard. If you buy some generic knockoff Legos, they don’t always fit.
As a human being, an API works like a web form, the unfriendly kind you find on old government sites, the kind that loads a page of blank fields. The fields are not documented. Their labels are confusing. You have to guess what to fill in. Is the date field? A text field? Does it take dashes? slashes? numbers?
When you make a mistake, the form returns a vague message telling you there was an error. If it is feeling helpful, it might even highlight the bad field with an asterisk. You correct that field and try again. The form finds the next error and stops checking. Once you fix that error, you move on to the next red asterisk. After multiple tries, either you are successful or have given up.
APIs, good for software on a CD-ROM
APIs work like those forms. They are up-front agreements about how components can communicate. They have strict rules about what requests and responses the components handle and what data goes into what fields.
APIs are one of the foundational concepts of modular software, invented back when software was built to be packaged and shipped. It works great for software that is built once and runs many times. It guarantees stable and efficient communication between all the components. As an end user, you are probably unaware of all the handoffs and exchanges happening under the hood.
But most software isn’t shipped anymore. It is assembled by combining many services distributed across multiple systems. Even on a single smartphone, your app, OS, and backend services are all built separately. And more importantly, updated and run separately!
No change, no pain
Imagine you have taken the time to learn how to use a complex web form. You understand its idiosyncrasies and know how to make it work. Today, you log in and it has been updated. It looks mostly the same, but now it breaks in new and imaginative ways. You have to relearn it. You have to update your understanding of the contracts. Next week, they update it again!
This is the world of distributed computing. Software now relies on a multitude of remote systems. Each system can upgrade at any time and each upgrade can change a contract. Sometimes a contract is broken by mistake, a bug. Sometimes an old contract is discontinued and you have to migrate to a new version.
In a world where software is not packaged on a CD-ROM, we have the brave new headache of managing a network of independently evolving APIs. A majority of development isn’t coding anymore, now most coding is the constant integrating and re-integrating of the rat’s nest of distributed modules. Each one is run by some other organization, gleefully updating and breaking programmers’ sanity.
It is no longer a world of build once and ship. It is built each time you run it, as it connects across the network to the various versions of the other modules that are running at the time of that specific request.
Use conversations for computers talking to computers.
For over 30 years, programmers have been trying to solve how to manage software that runs on a distributed environment. Each framework has promoted s solution for managing and distributing the API contracts. They have all failed to solve the real problem. Distributing API contracts isn’t the problem, it is keeping up with all the changes.
CORBA had Naming and Trader services. SOAP had UDDI and Yellow Pages. REST has OpenAPI has the OpenAPIRegistry. Each of these requires the providers to keep their system up to date, and that clients stay current with the updates. How far are you behind on updating your OS or smartphone? It is never at a convent time for you as the consumer. You skip a couple of small updates until you need that one feature. Because of all the interdependencies, you end up having to update dozens of items and figure out how to get them all working like they used to work before; if that is even possible!
Each time a service updates its contract, a developer has to update their software to match. If software wants to switch from one module to a new or different provider, it requires a migration project.
LLM for Discovery and Integration
Manual integration fails terribly because it relies on humans to act like computers. It requires that developers track thousands of modules and the tedious work of staying on top of everything and keeping it all up to date. Even if humans wanted to, and were good at it; most organizations don’t prioritize updates and tech debt over new development. If only we had invented some sort of machine that could handle automating all this complexity. If only there was some sort of common device that is good at tracking millions of details and carrying out repetitive, tedious, data-centric tasks. If only we had computers for this!
Wait! The computer. But only if we had figured out how to make computers understand the ambiguous, yet formulaic language we use to describe APIs. What did you say? That is exactly the type of work LLMs excel at?
Future modules should focus on describing themselves to LLM-based agents so that they can act as brokers for automating connecting software. Underneath, modules would still connect via API; but instead of requiring humans to do the tedious wiring, use LLMs. We need a Conversational Protocol for Agents to assemble modules into software.
Conversational Protocol for Assembly Agents
Many modular systems have a concept of a module registry, where all the possible components, their APIs, and their descriptions are available for humans to inspect, select, and assemble. An automated LLM based Conversational Protocol would work in several phases:
Discovery
The assembly agent would have a task it needs to complete.
It asks the registry for modules that it could delegate the request to.
For example, if you asked your phone to make reservations, your agent would ask all your installed apps if they handle requests for reservations.
Interrogation
Based on the job and selection criteria your agent has, it will need to ask the candidate modules for additional details.
Can the module make reservations for restaurants? Maybe it is a module that books hotels.
Does it require an account or payment on file?
Does it have an API that the agent can understand?
Negotiation
Once the agent has selected a module or app to work with, it will need to set up the request.
If you imagine the agent working like your personal assistant, it might come back to you as the user with clarifying questions.
Your agent assumes you are looking for the next meal, and you have already specified an area and party size. Your agent also knows your dietary constraints. As part of its interrogation of possible apps to use, it asks them to return a list of available restaurants in your area that meet your criteria.
The agent presents you with a short list to select from. Once selected, it continues to work with the app to negotiate the parameters for the request.
Can the module create a reservation at your chosen restaurant, at the selected time, and pass along your contact information?
Request Confirmation
Once the contract has been negotiated, the agent will confirm the request.
In this case, it would probably confirm it with you as the user. Does this look good? Should I move forward?
Execution
The agent and module had figured out how to connect to the module API and used that to assemble a request.
Once confirmed, the agent will execute the request and delegate the task to the module.
Response Confirmation
APIs have responses. This was part of the negotiation phase.
The module has replied with a confirmation that the agent can now share with you.
Follow-Up
If there is any clean up, or next steps, the agent can now move forward.
Did you want to kick off another chain to find a calendar to set a reminder? Maybe access a messaging module to forward the invite to your friends?
Automating Personal Automation
These steps are essentially what any human developer does when they integrate a module’s API. You wouldn’t need an assembly agent to go through the whole integration process every time. Successful integrations could be cached and reused. However, there is a whole range of occasional tasks that are less frequent than the rate at which the components are updated. Personal automation, such as chaining together apps, has always been expensive and brittle. Some techno-fetishists spend hours tinkering with their IFTT triggers or playing with routines on their Alexas or Google Assistants, but ask anyone of them how often they get out of date and break.
I don’t need LLMs highjacking my UI to pretend that they are humans. I do need reliable integrations that can evolve with their systems. Developers are not computers and don’t stay on top of all the out-of-date libraries and broken integrations. Instead of using LLMs to get computers to talk to me, use them so that computers to take over the grunt work of talking to each other. Use AI to automate assembly and integration. Let programmers be programmers and build new modules. Conversational APIs would enable computers to track and repair outdated modules and broken integrations and users to finally benefit from reliable automated integrations.