There has been much ado about OpenAI’s new GPT-3 model architecture and rightfully so. It is impressive to say the least. GPT-3 is an autoregressive language model that uses deep learning to produce human-like text. In a nutshell, GPT-3 is a promising exercise in the predictability of generalized language and may even represent a generational leap in natural language processing (NLP). And its biggest accomplishment is, well, it’s really big.

GPT-3 has modeled language with over 175 billion parameters creating a giant network of word, sentence, and phrase proximities that enables the predicted trajectory of a narrative or conversation. This spatial trajectory is created by a statistical probability, trained over an enormous dataset. And while we should be enthusiastic about GPT-3’s promise, we need to first understand the differences between spatial and natural language trajectories.

Spatial Trajectory
A spatial trajectory is the statistical prediction of the trajectory of words. Given enough text and permutations of that text, the most probable trajectory of this text can be mathematically computed. It’s akin to recording visitors to New York City and tracking their travel path from the airport, to the hotel, to a city block, to the museum. Record enough visitors and you could create geo-spatial trajectories that predict visitor routes with decent accuracy. A spatial trajectory is great at generalized knowledge, but is compromised by its biggest asset, it’s size. There is big difference between understanding where average tourists go in New York and where discrete cohorts such as doctors, mechanics, or librarians go in New York. The greater the specificity, the less accurate it becomes as it is biased by the larger corpus. To overcome this bias, you could create models specific to each domain, but this contradicts the “smaller” training set benefits of GPT-3.

Natural Language Trajectory
A natural language trajectory interprets the meaning of language — what is this conversation about and where is this conversation going? It has past, present and future context. By contrast, GPT-3 loses context as the conversation progresses since it only has a short window of predictability. It does not understand concepts or the change in concepts over time. Imagine if we wanted to diagnose frailty in a person’s medical records. GPT-3 has no concept of frailty, coronary heart disease, osteoporosis, oxygenation or accidents. How do these concepts change over time, and what is the semantic relationship between these concepts?    

What is the significance of GPT-3?
It is a monumental effort and will enhance applications for customer service, frequently asked questions, chatbots, sentence completion, writing enhancements and other automated text progression solutions. For NLP solutions that require semantic understanding of the language, GPT-3 (and other large model-based offerings) may assist in this effort. But keep in mind that the broad generalization of GPT-3 may also dilute semantic analytics. If you are investigating GPT-3 as a solution, be sure to understand your problem and your goal first and see if and how GPT-3 meets your specific requirements.

An entrepreneur and technologist, Dr. Russell Couturier has successfully launched and sold several companies and holds 11 patents for security, artificial intelligence and machine learning inventions. He is a Distinguished Engineer at IBM and technical advisor to high tech startups. He spends his spare time teaching sustainable logging practices.