Executive Summary
Throughout history we have witnessed huge breakthroughs in innovation. The pace of these innovations has typically been throttled by bottlenecks. For instance, not having electricity in US homes impeded the adoption of light bulbs. However, it is clear the pace of adoption of each novel innovation is accelerating. In this paper we briefly provide historical context and a history of AI. We then go into quasi-technical aspects of generative AI with respect to LLMs (Large Language Models). We finish off by highlighting the areas that are currently benefiting from this trend today.
It is our belief that the huge investments we are seeing companies make in AI infrastructure are just the beginning. In hindsight, the release of ChatGPT (Chat Generative Pre-Trained Transformer) will be known as the inflection point for AI. The requirements for AI to thrive include data, computing power, efficient infrastructure, and model architecture, which when combined can support massive scale. These all exist today, which leads us to believe in the continued advancement of AI. The pros and cons of what AI will do for our economies, communities, and lives are currently hard to grasp.
The long-term potential for AI that has resonated most with me is the ability to access leading perspectives in any discipline. Imagine getting a guitar lesson from the best teacher in the world or getting legal advice from a top attorney. AI models could bring humanity’s leading-edge perspectives to more people and springboard us into the next wave of innovations. The heights we could reach standing on the shoulders of AI are difficult to conceptualize. As a result, this paper focuses more on the realities of AI today and avoids very worthwhile philosophical discussions of AI in the future.
Taylor Haselgard Portfolio Manager
TaylorHaselgardisaPortfolioManagerforCabotWealthManagement.Priortojoining the team in 2021, he spent 3 years as a Portfolio Manager for Cribstone Capital Managementwherehehelpedmanagetheirinvestmentstrategiesacrossassetclasses. In addition, Taylor has worked in several different roles within the financial services industry including multiple positions at a multi-billion-dollar single-family office. Taylor received his MBA from F.W. Olin Graduate School of Business at Babson College and holdsaB.B.A.inFinanceandaB.A.inEconomicsfromtheUniversityofMassachusetts Lowell.Taylorhascompletedthefirst2levelsoftheCFAprogram.
Historical Innovations
Advancements Over Time
Throughout history, there have been foundational innovations that have sparked advancements in knowledge and economic opportunities and reshaped the direction of the world forever. While these “foundational technologies” did not at first result in huge economic opportunities, they usually led to far bigger end markets and impact than what most people could have grasped at the time.
In 1436, a German goldsmith by the name of Johannes Gutenberg created the first European printing press which over time helped mass produce books and newspapers very inexpensively. This innovation increased literacy rates and had a profound impact on the sharing of knowledge and ideas. By enabling the mass production of books, it helped expand access to printed material, which over time contributed to a more educated and productive workforce. The printing press also created whole new industries and jobs. It helped advance science by allowing for mass printing of things like textbooks and created an easier means to share potentially groundbreaking scientific research. The printing press helped lift Europe out of the dark ages and was an influential force in bringing about the renaissance (“Printing Press” 2023).
Source: https://www.history.com/news/printing-pressrenaissance
Gutenberg never got to see the profound impact his invention would have on society. In fact, it is slightly ironic that even though his invention was so influential, he died penniless, and his printing press was impounded by his debtors. Other advances needed to happen to see the full benefits of the printing press. For instance, there was no built-up distribution network for mass produced books at the time. In the 14th century, one book could cost as much as a house and books were only a possession of the wealthy elite. Literacy rates were also low, and most news was delivered by paid readers in a local forum (Roos, 2023)
It took two centuries before the printing press was taken to new heights. The first newspaper, Relation of Strasbourg was started sometime in the early 1600s in Germany. They started to appear in other European nations in the following few decades. The point is that oftentimes there are significant bottlenecks that need to be addressed to really see the full potential of an innovation (“The First Newspapers” 2024)
Today, we take electricity in the home for granted. In 1879, Thomas Edison patented the light bulb, but no homes during that era had electricity. It took 45 years for half of the US to have electricity, and then due to the Rural Electrification Act of 1936, 85% of American homes had electricity by 1945 (“The History
of Electricity Timeline” 2019). In 1971, Intel came out with its first microprocessor, the Intel 4004. The Intel 4004 only had 4-bit CPU processing power and was used for things like calculators. The chip was ~2 inches, had 2300 transistors, and at the time could process 92,000 instructions per second (“Intel 4004” 2024). Chips and software advanced over time enabling new use cases which could not have been foreseen. Today, Taiwan Semiconductor Company (TSMC) is working towards building the first trillion transistor chip by 2030 (Shilov, 2023)!
The internet was invented in 1983 but was not opened to the public until 1993-94. At the time, connection speeds were so slow that things like streaming videos, mobile apps, e-commerce, and cloud computing would have seemed like distant fantasies. In 2001, only half of the US used the internet. Today, 92% of people in the US have access to the internet, often via a smartphone (Petrosyan, 2024).
Internet Penetration in the United States
Source: Statistica
The reason for offering so many different examples is to highlight how long a horizon innovation has historically had. For historical innovations dating far back, these changes in technology sometimes took over a century to proliferate through society. Even the adoption of the lightbulb took 65 years with heavy government sponsorship to provide the necessary infrastructure. However, the pace of innovation is faster than ever, with the more recent innovations seeing meaningful traction in a decade.
ARTIFICIAL INTELLIGENCE
What is Artificial Intelligence?
The earliest thoughts around artificial intelligence were conceived during the 1950s-1970s. One of the earliest examples is Alan Turing’s “Computing Machinery and Intelligence” published in 1950, which he begins with the question “Can machines think?”. In that paper, Alan set the table for what is known as the “Turing Test” which was the earliest, more philosophical metric, for evaluating how well AI can think like a human (Turing, 1950). Keep in mind this was just 3 years after the transistor was invented by Bell Labs, which conceptually is the building block of anything digital. During those two decades, concepts such as computer vision, which is the theoretical basis for things such as facial recognition, were conceived. While all these efforts went into how to conceptually mimic various aspects of our intelligence, they were more theoretical. Hans Moravec, a doctoral student of prominent American computer scientist John McCarthy, said that “computers were still millions of times too weak to exhibit intelligence;” that reality slowed funding into AI research for some time (Rockwell, 2020).
The resurgence of AI began in the late 80s to early 90s which coincided with exponential increases in computational power. If you remember, IBM was working on the development of Deep Blue, which was a purpose-built supercomputer made to play the game of chess. In 1997, this machine managed to beat the world’s reigning chess champion Garry Kasparov in a 6-game match after failing to do so in 1996. Kasparov, a Russian chess grandmaster, was the highest rated chess player ever, until Magnus Carlson surpassed him in 2013 (“Deep Blue (Chess Computer)” 2024).
By the 21st century, AI researchers had all they needed to propel advancements in the space due to several factors including Moore’s Law which saw a doubling of transistors in chips for over 50 years, huge
Moore’s Law
repositories of data from the internet to use to train AI models, and advancement of AI training techniques which utilized GPUs (Graphics Processing Unit) which drastically sped up training computations.
In 2017, AI researchers at Google Brain published what is the most influential research to date in the field of AI titled “Attention Is All You Need.” In this paper, they introduced a new model architecture which drastically changed the quality of output and training time requirements. This new advancement called the “transformer architecture” is the breakthrough that has enabled the huge wave of LLMs we see today.
The step function it provided, which sees consistent gains with increasing scale of training data, set in motion enormous amounts of investments that only big tech companies could afford. This is unique considering that advancements throughout history have typically come from the academic community (Vaswani, 2017).
Fast forward to January of 2023, when OpenAI announced a partnership with Microsoft. The software giant made an additional investment in the company to increase its ownership to 49%, coinciding with the release of ChatGPT which went viral in January of 2023, starting the AI craze as we know it today. This new conversational model was so adept at providing human-like responses due to its unprecedented size at the time of release. AI technology became economically viable for the first time due to this massive leap forward in model performance.
The term Artificial Intelligence is a little confusing and has been given varying definitions. Prior to the recent AI frenzy, most AI-related applications would be classified as Machine Learning. In these instances, data is used to make mathematical predictions and insights. This type of AI follows a more structured, rules-based approach which would fit under the more general definition of “a field that combines computer science and robust datasets to solve problems.”
In the context of this paper, we will be focusing on generative AI. Generative AI utilizes vast amounts of data to better contextualize things like text, images, video, audio, etc. By doing so, generative AI can perform tasks associated with human intelligence. This includes things like imitating written language, or human speech.
How Do These AI/LLMs Work?
Large language models are a form of generative AI which can create general purpose written text responses. The inner workings of these models are complex and require vast amounts of data and computational power to train them. These models have been referred to satirically as “Stochastic Parrots” because the actual model does not understand the responses it is generating, and they are potentially random in nature (Bender, 2021). The basis of what these models attempt to do is to predict what the next word should be. Humans are particularly good at this; if I gave you the beginning of a sentence you could often predict what the next word should be.
Source: Reddit
For instance, if I asked one hundred people to finish the sentence “Are you going to eat __” I’d wager most respondents would answer “that” with some number of respondents choosing words like “dinner” “dessert” etc. The more context given to the group, the narrower the list of likely responses, with the opposite also being true. For example, if I asked the same thing but the sentence was something like “Why is___?” or “Where is __?” there is not any context making the range of potential responses wider and harder to predict. Perplexity is a measure of how difficult it is to predict the next word; put differently, it is a function of all probable words that could be used next. To simplify, when looking at the perplexity score across a wide range of text, the lower score on average, the better.
Humans on average have an estimated perplexity score of ~12 across a wide range of text. Prior to the recent advancements in Generative AI, the observed perplexity of models trying to generate text was 110. These models only looked at the preceding few words to hypothesize what the next word should be. When it comes to these new LLMs, they look at thousands of words, and in the case of ChatGPT 4 Turbo, this context window is the size of an entire novel. When the models can look at the broader context to calibrate what the next word should be, they become increasingly better. With greater levels of contextual information trained into these models, it resulted in a perplexity score of ~20 which has improved substantially with the newest models seen in the market today. As models improve, this will converge closer and closer to human levels of perplexity (Johnson, 2023). The jury is out on whether AI can surpass human levels altogether.
WHAT DO YOU NEED TO TRAIN MODELS? Computing Power
When ChatGPT came out in February of 2019 it had 1.5B parameters. ChatGPT3, which was completed in June 2020, had 175B parameters. The latest model released by OpenAI is ChatGPT4 Turbo whose technical details have not been disclosed. Rumor has it that the parameters could be as high as 1.8T parameters! The size of these models is expanding at a 10x rate every year and with that so too does the computing power required to train these models.
Source: META Whitepaper: https://arxiv.org/abs/2307.09288 (PDF Download - Page 7) 3
The ability to ramp up these models has required significant investments in computing power. Based on the numbers for Meta’s Llama 2 training, as you scale up the parameters, the amount of GPU hours per billion parameters is roughly linear (Touvron, 2023). If you look at OpenAI’s GPT4 with estimated 25,000 GPUs, and 95 estimated training days, that equates to 2.375M GPU days of training (Ludvigsen, 2023). Based on ChatGPT4’s estimated parameters of 1750B parameters (1.75T), the GPU hours per billion parameter is ~32,386. In the table on the right, we extrapolate a little on 10x parameter increases and GPU requirements to process these models in a similar amount of time (~100 days). In this scenario, we also assume computation power to improve at the same pace as Moore’s Law.
While this illustration is oversimplified and lacks any level of precision, it helps illustrate that it’s hard to know how high long-term demand for GPUs from a company like Nvidia could be over the next 5-7 years. Anecdotally, Meta laid out their company AI plans to involve the equivalent of 600K Nvidia GPUs by the end of this year (Eadline, 2024). OpenAI is rumored to be working on a project that will require 10M AI GPUs (Uffendell, 2023)! Comparing this to the amount of AI chip capacity, Nvidia shipped only 500K units in Q3 last year and based on these numbers will continue to hit capacity bottlenecks in the near term. Given the expectations for parameters in these models to be increasing at an alarming pace, this could mean a several times increase in demand of AI related chips. Not to mention, there is increasing sovereign level demand for AI, and all the other competing base models. All signs point to an enormous demand for GPUs for model training.
When comparing these models to the human brain, it is estimated that we have 86-100B neurons and each of those has thousands of synapses per neuron. By one of the more aggressive estimates I have seen, the adult human brain has 600-700T synapses and in a child’s brain it is estimated to be 1 quadrillion (1000T) (Wanner, 2018). This is the most relatable comparison between human brain capability vs. these new AI models. At the rate of change we have observed in LLM parameters, with ChatGPT’s models of ~10x per year increase, the number of parameters would surpass the number of synapses in the brain in only 3 years. While parameters and synapses in the brain are not directly comparable, it provides a useful anecdote (Millidge, 2022). It is also important to keep in mind that the human brain utilizes its 100B neurons for a variety of tasks, including things like physical intelligence, which current AI models don’t address. The image of the functional areas of the brain helps put this in better context; more than half our brain’s neurons are in the cerebellum which is associated with motor functions.
Lots and Lots of Data
Training Data
Even if they were to have had computing power in the 1950s, they would have had no data to train these models. The internet has been a vital source of data to train generative AI to produce human like text responses. Practitioners in AI call these large repositories of text scraped from the internet, books, and other sources a “corpus”. For these models to get better results, these data repositories will need to continue to expand, and be continually scrubbed and scrutinized for data quality. As these models get larger, the amount of data required to train them also scales in a similar fashion. Llama 2’s 70B parameter model was trained on the equivalent of 1.5T words of text. To put this into perspective, 1M novels or 1B news articles is equivalent to 1T words.
Source: META Whitepaper: https://arxiv.org/abs/2302.13971 (PDF Download - Page 2)Clean AI training sets are so impactful to the results that there has been a tremendous amount of time spent curating the vast amounts of content published on the internet. Anything from duplicate pages, low quality text sources, misinformation, and even harmful content needs to be removed from the corpus prior to training. In addition, other things need to be removed like identifiable personal information, or data that you don’t have the rights to use. This is a herculean task but luckily the work is expedited quite a bit with multiple open-source corpus repositories available to developers, so you don’t need to scape this seemingly endless amount of content off the internet and simultaneously scrub it. For further context, here is a table from a whitepaper on Meta’s Llama 2 model and where they sourced the data it used to train its base model (Touvron, 2023).
Fine Tuning Data
The helpfulness of an LLM without being fine-tuned for specific tasks is more limited. Let’s take the example of a doctor that is utilizing an LLM to assist him with interactions with his patients. In addition to the base model training, specialized medical data, and some sample output, the responses of his partially tuned LLM might produce results that while factually correct might be far too technical in nature or even quite objectionable in practice. For example, politely recommending a Mediterranean diet and other resources might be received better by patients than statistical anecdotes about obesity and heart disease. In this situation, the model could be fine-tuned by taking responses from the model and having humans rate these responses. This is what is known as reward modeling. Incorporating this feedback into the model will shape the model responses in the future to achieve more appropriate outcomes. The human responses you are incorporating could be real time customer feedback loops or from a panel of experts reviewing the content. Both avenues would take considerable resources and time.
As a current example, Meta practices fine-tuning on Llama, its open-source model on which it periodically publishes whitepapers in addition to its other AI related efforts. One paper published in the past year, “Llama 2: Open Foundation and Fine-Tuned Chat Models”, goes into detail about how they are fine tuning a Llama 2 base model for assistant-like chat applications. In the study, they used two methods of fine tuning which included multiple batches of Supervised Fine-Tuning data to show the model what the
desired output looked like, as well as a reward model to help improve the helpfulness and safety of their chat model (Touvron, 2023).
Source: META Whitepaper: https://arxiv.org/abs/2307.09288 (PDF Download - Page 18)As shown by the graphs above, the improvements on these models with each iteration of rewards training led to significant improvements in model performance. Relative to a base model without fine tuning, early gains in usefulness come from showing the model what format to use, or how the response should be structured. This is done through training the model on supervised fine-tuning data. From work done in this study, you can infer that implementing a well thought out rewards-based model will be the key ingredient differentiating companies adopting this technology.
If you take a step back and look at the ecosystem, that will mean companies that have an early head start on creating fine-tuning datasets, or feedback loops, for their AI models will likely have a greater moat versus peers. When someone says “data is the advantage” this is certainly true if you have a repository of expert information to train your models. However, creating feedback loops to drive greater model efficacy will improve usefulness and reliability, especially in use cases where the application is trying to mimic human-like responses.
One thing that has not worked is using AI model outputs to train these models further. In one such study, they created new iterations of models trained on synthetic data. With each generation removed from the original source data, the model performance got worse, to the point where it eventually just spit out incoherent nonsense. Could this change as models get more reliable output? Today this seems unlikely (Woolridge, 2023).
AI Infrastructure
The large amount of company proprietary data for fine-tuning often sits in different areas within the enterprise. It could be residing in applications and devices that could be either in the cloud, on premises, or sitting at the edge in devices like PCs and smartphones. Creating ecosystems that can efficiently store and process new data in real time will be a new requirement for AI in the enterprise. Many companies are already looking to solve the issue with new software that is triggered to ingest new data into their AI
models as it is created. In some leading solutions, software vendors are looking to not pull in and aggregate all this data centrally, but rather process it incrementally from where it resides. They view this as a more efficient solution than trying to pull in and aggregate all data across the platform. But how is this possible? At a certain point won’t these models get too large to be distributed for large scale use? To help explain, let us look at the function of a line: “y =mx +b”. In this function, “m” is a number that represents the slope of the line. In statistics, if you had a data set where you wanted to explore the relationship between “x” and “y” you would do a linear regression which looks to solve for “m,” the slope. The result of model training is solving for however many billion parameters worth of slopes, or in their terminology, “weights.” When you think about the actual file for the Llama 2 base model, it is 70B Parameters, so that means 70B numbers that have been calculated by the training data. These numbers in terms of storage size range between 2 to 4 bytes each depending on the data type they use to store these numbers (16 or 32 floating bits). As a result, the actual models themselves are significantly smaller than the training data.
In the case of the Llama 2 model, the model parameter download is ~138GB which is consistent with the 2 bytes for each of its 70B parameters (1GB is 1B Bytes). This helps inform us how much memory we need to run the model optimally for AI inference. In this case that would require loading up all 138GB of parameters to memory in addition to other memory requirements. To run this 70B parameter model on Nvidia GPUs it would take 2 A100 or H100s, given they have 80 GBs of high-speed memory (“Why do we need GPU clusters for inferencing/serving LLMs?” 2023).
Based on this understanding, the latest ChatGPT with 1.75T parameters should be between 2.15-4.3 TB (2150-4300 GB). However, applications involving AI can be solved using specialized condensed versions which may only require a much smaller number of parameters. For instance, Microsoft created a specialized LLM for just Python coding tasks which only had 1.3B parameters (Javaheripi, 2023)! So, tying this all back in, the vast insights from millions of novels worth of text and copious amounts of fine-tuning data is actually just a massive amount of numbers.
AI LIMITATIONS, RISKS, AND ETHICS
Are We Close to Artificial General Intelligence (AGI)?
When thinking about generalized intelligence, it is hard to come to a consensus of where we are because definitions vary by a wide margin. The earliest test was the Turing Test which was created by Alan Turing in 1950. In this test there are three participants, an examiner and two respondents. One of the respondents is a machine and the other is human. Each participant would be out of sight and the examiner would ask the two other participants questions through a terminal with each of the respondents replying through written text. After asking questions, the examiner would then guess which participant was a human. If the machine got the same number of responses as the human participants on this standard, it would be conceived to have human level intelligence. As you might have guessed, this test has pitfalls and over time has been modified in different ways. One modern application of this test is the CAPTCHA verification (CAPTCHA is an acronym that stands for “Completely Automated Public Turing test to tell Computers and Humans Apart) where the user must select an image to verify it is not a computer (George, 2023)
However, when people think about general intelligence, they think a little more broadly about the kinds of tasks that humans can do. In a Turing Institute lecture series, they highlighted various aspects of human intelligence where AI can do the same job as a human. The most state-of-the-art models can do natural language processing, simple and even more complex reasoning tasks, and when programmed to do so, can recall information. However, AI models today are unable to problem solve, plan, or do arithmetic, which are all critical dimensions of human mental intelligence. On the other hand, aspects of physical intelligence are much farther behind. While these LLMs are a significant leap forward, they do not address other critical gaps in things such as mobility, navigation, hand-eye coordination, and manual dexterity that would be required to meet a stricter definition of AGI (Artificial General Intelligence). The last thing I will mention is the aspect of consciousness, a tricky one since no one today agrees on why, or how, human consciousness exists in the first place (Woolridge, 2023).
Current Risks and Limitations
Researchers do not fully understand how these models work. Part of the issue is that these models have far greater dimensions than the three-dimensional world in which we live. As a result, it is hard to conceptualize how all these hidden layers of computation result in human-like responses.
Today, the biggest hurdle to higher levels of adoption of this technology is reliability. Due to how LLMs look to predict the next word, sometimes they can have what is known as hallucinations. Hallucinations are when the model gives us information in a response to a prompt. One real-world example from Microsoft’s copilot is seen with their meeting summarizer. This functionality will automatically create a summary of discussions at a meeting. On one occasion, the summary the AI model provided included that Bob spoke about product strategy. The only issue is there is no one named Bob and there was not any conversation on product strategy. The biggest problem with these hallucinations is that they look incredibly believable in most cases.
There are also known ways you can “jailbreak” a model into giving you responses that should be blocked for safety reasons. One of the more popular examples highlighting this risk is the “Grandma Mode”. In this exploit you get the system to role play your grandmother to get the model to give you answers the creators did not want to reveal. Other examples have included the use of images or other objects with white text to obscure a message which can be used to trick the AI model, or use of various coding languages to ask for certain information. These work because the guard rails are designed to block these prompts when written in English.
Ethics and AI
AI ethics will be something that will evolve over time, but the discussions are already ongoing with leaders in AI attempting to create frameworks to help drive continuity and consistency. Here are the more commonly discussed areas:
Bias
The essence of what an AI model will produce is determined by its training data set. Most of the text data in the world comes from the internet and Western cultures which creates a bias within these models. In addition to less text coming from emerging economies, the composition of available text sources from areas like the Middle East end up coming from religious text posted to the internet. Bias will be very present in all these language models given the availability of data on the internet.
Carbon Impact
Given the massive amount of compute required, AI generates a huge electricity bill. As new data centers are being built out, more emphasis will be placed on incorporating renewable energy sources. However, due to the power requirements of these data centers, and the need to have no gaps in service, they will rely on other sources of electricity that are more carbon intensive but more dependable.
Privacy and Copyright
Given the nature of LLMs, and the inability to access data used to train it, if identifiable personal information or copyrighted material is included in training data it will not be able to be removed. Given this lack of transparency into the data used in training these models, it will create a whole new set of challenges for privacy and copyright law. Given the pace of AI today, these issues will not be sorted out until well after this technology has been put into practice.
Other Considerations
There are other potential long term ethical dilemmas that will stem from continued advancements in AI. These could range from job displacement and accountability to a whole host of other issues we will not know about until we are already implementing these technologies. This area of discussion is already so vast, especially when you layer in people’s sometimes Sci-Fi like interpretation of where AI will bring us in the next decade.
While this is a critical area of discussion regarding AI, there are entire papers written on just individual issues. “Computing Power and the Governance of Artificial Intelligence” is one such paper. The paper concludes that the best way to regulate AI globally is through the inputs required, compute. Given its detectability, excludability, quantifiability, and concentrated supply chain it provides a meaningful lever for regulation. The authors provide an interesting analogy between AI computer chips and uranium enrichment, pointing out that careful management of the latter has reduced the risk of nuclear proliferation for 80 years (Sastry, 2024).
All world changing innovations have carried with them risks. If you put this into historical context, even the printing press contributed to negative consequences over time. These included uprisings and competing ideology. For example, Pope Alexander VI promised excommunication for any manuscripts printed without church approval. Soon after sending the 95 theses to the Holy Roman Catholic Church, Martin Luther used a printing press to create hundreds of copies to spread his ideas throughout Europe. This led to increased tensions which resulted in multiple wars including the Cologne War (1583-1588), War of the Jϋlich Succession (1609), and eventually the Thirty Years’ War which was the worst war seen in Europe until World War I.
INVESTING IN AI
Today, we are seeing most Fortune 500 companies at a minimum experimenting with the possibilities of this technology. Plentiful startups are looking to either create their own LLMs or develop tools from existing ones. In Microsoft’s latest earnings call, they cited that half of the Fortune 500 are already using their new AI product which is called a co-pilot. The company has stated they are seeing on average a 29% productivity boost and in the most productive cases up to 70% greater productivity. However, not all the feedback coming from early users of this co-pilot has been positive, with users citing that functionality related to Excel and PowerPoint is not particularly additive.
Training of these models requires an incredible amount of computation power which has led to a sort of arms race for AI GPUs. The predominant GPU chip manufacturer for AI applications is Nvidia. To give better context to the complexity of these AI chips, the Nvidia H100 chip has a total of 35,000 parts and weighs 70 pounds. In Q3 alone, Nvidia sold ~500K of the H100 chip with a price estimated to be around 30K a chip! That alone is $15B of revenue in one quarter which over time should continue to expand as more models get larger. As inference use cases develop, demand for AI chips will be driven more by companies applying the technology than the companies building base models. In the near term these chips will be supply constrained given the lack of capacity to make these leading-edge chips today. To that point, on Nvidia’s most recent earnings call the management team was asked how they think about allocating this scarce product across all their customers who want as many chips as possible.
In the long-term, companies like Meta and OpenAI have plans to continue to build out massive AI factories. Meta’s target for the end of this year is to have GPUs totaling 600K with 350K of those being Nvidia H100s. OpenAI has set a much longer-term goal of 10M GPUs. Now that is just the two leading companies’ purchasing expectations. During Nvidia’s last earnings call, they estimated that 60% of all AI chip demand is for model training, with the remaining 40% coming from inference demand. Over time, Inference demand is expected to be significantly larger. Simply put, the market will be huge and Nvidia will dominate, especially in the near term.
Here are some of the key areas that are benefitting from the proliferation of AI today:
ASMLandEUVTechnology
Source: https://bits-chips.nl/artikel/what-makes-an-euv-scanner-tick/
The concept of EUV (Extreme Ultraviolet Lithography) technology has been around a long time from a theoretical perspective but there have been engineering hurdles to overcome to make this
technology work. Unlike the prior iteration of lithography, EUV required a massive redesign on how the machine manipulated light to help reduce the wavelength to increase the density of transistors on our modern high-end computer chips required for AI. ASML is the only company in the world that creates the machines necessary to produce any of the leading-edge process technologies today.
This machine is an engineering marvel, releasing microscopic drops of tin that it then shoots with a laser beam to heat up to a temperature of 400,000 degrees Celsius, resulting in EUV radiation. This is then refracted by a series of mirrors to make the process work. To give you a perspective of what we are talking about, the prior generation of chips at 5 nanometers creates details which are the width of ~10 large atoms!
ASML’s machines are extraordinarily complex, and no two machines are alike. Like an airline manufacturer, ASML makes very few of its own parts and relies on hundreds of suppliers to create the necessary components to build their machines. The lead times for building these machines are long and customers will reach out with the specs of what they are looking for well in advance of the production cycles. At the same time, these fabs are coordinating with their customers such as Nvidia for their next generation products. Given the long lead time for these machines, and limited capacity relative to demand, it seems likely AI chips will be capacity constrained in the near future.
TSMCandFoundries
TSMC is responsible for 90% of the world's leading edge chip manufacturing. As you would expect, they are one of ASML’s biggest customers but also Nvidia’s biggest supplier. Nvidia is a chip designer which means that they don’t build any of the chips themselves. Instead, they rely on what are known as foundries which are chip making factories which turn silicon wafers into things like AI chips. Building these foundries for leading edge chips requires large investments, and to operate all these foundries profitably requires them to run 24/7. That takes an incredible amount of expertise that very few companies have today.
Over the medium term, TSMC will need to compete heavily with Samsung and Intel’s foundry businesses. Samsung, which has historically made its own chips for its smart phones, has made a recent push to compete aggressively in this space. Intel, which historically has failed at creating a sustainable foundry business, is 3 years into a turnaround under CEO Pat Gelsinger which is prioritizing regaining their position in the industry. All three of these companies will likely end up building AI chips for Nvidia and other companies as well.
Nvidia andChipDesigners
Nvidia is obviously the poster child of AI with their leading-edge GPUs and AI solutions focused on the entire stack including software. However, mega-cap technology companies have efforts underway to build their own processors for their AI projects. These chips are called ASICs (Application-Specific Integrated Circuit). There are a couple of players that have ASICs programs that are directly involved with Mega-cap tech, specifically Broadcom (AVGO) and Marvel (MRVL). These companies will not compete for the broader AI chip market but will work with some of the largest AI compute businesses.
These companies have the largest budgets and as a result, creating optimized chips to fit their specifications can give meaningful returns given their scale.
In addition to ASICs, AMD (Advanced Micro Devices, Inc.) is another company that is a legacy GPU designer. The company recently released their own AI chip in December of 2023, the MI300, which the company expects to generate an additional $2B in sales in 2024, selling at $15,000, half the price of Nvidia’s chip. Based on AMD’s launch event, their management team believes that their chip has 40-60% faster throughput. However, the comparisons run by AMD are believed to be using alternative software that is not yet supported by the H100 and excluded multiple recent software improvements. Since AMD has no such software, it was probably meant to show a more like-for-like comparison. Nvidia’s report showed that their H100 with these additional optimized software packages outperformed AMD by 14x. Nvidia will be releasing their next AI chip, the H200 this year as well. AMD is working on an open-source software package (ROCm) to better compete with Nvidia, but there is still a lot of work to be done to bridge the gap.
The biggest hurdle for companies adopting non-Nvidia chips is Nvidia’s parallel computing platform called CUDA (Compute Unified Device Architecture). Given the nature of GPU’s, having good software to optimize performance has been a huge advantage for the company. Over time, Nvidia has invested in the software side, building out its capabilities for the last decade, all of which are backwards compatible. This makes utilizing other kinds of AI chips more difficult. Jim Keller, a prominent microprocessor engineer who has worked for Apple, AMD, Broadcom, and Intel, draws the comparison of CUDA to x86, likening both to a swamp instead of a moat.
To give context, x86 is the dominant architecture that was created by Intel back in the late 80s and has been updated several times since. X86 prevalence in areas like PCs and Datacenter is due to the vast amount of software that has been built for this instruction set. There are more efficient chip architectures that could be used but the level of effort and cost of switching is just high enough. The argument he is making is that CUDA is not revolutionary but rather simply entrenched in AI workloads.
AIModelTrainers
In addition to OpenAI, there are other companies that are developing their own base models. Two of the most obvious examples are Alphabet and Meta. Alphabet has long been a leader in AI and released their newest LLM, Gemini, back in December of 2023. Based on the company’s press releases, the model performs better than ChatGPT 4, however they do not provide comparisons to the newest model, ChatGPT Turbo. These two companies will continue to lead with the best closed source AI models.
Meta is taking a different approach and is making all their models open source. Today, they are far behind the size and capabilities of leading companies like OpenAI and Alphabet. However, their position as the leading social media platform could provide a competitive edge over the long term. You could also make the case for the unique data and assets that Alphabet has at its disposal as well.
On pricing, OpenAI utilizes a per token pricing structure on inputs and outputs which scales up from their free-to-use model in ChatGPT all the way to ChatGPT4. In these models, you pay based on the number of tokens you use when prompting the system and on the number of tokens generated to
answer your prompt. Prices are quoted per 1000 tokens, which is generally a matter of pennies per 1000 tokens for both output and input. To put this in perspective, 1000 tokens is roughly equivalent to 750 words. For a paper like this of close to 10,000 words, it would have cost me 40 cents as an output of ChatGPT 4!
At a minimum, these three companies could see deeper moats surrounding their already attractive core businesses. All three of these companies, including Microsoft as a proxy for OpenAI, constitute some of the best businesses in the world that generate tens of billions of dollars in free cash flow a year. Meta and Google dominate the digital marketing space and AI will further their competitive edge. Microsoft is a leader and operates such a vast ecosystem that it is incredibly hard to displace them given the synergies of everything they provide customers. AI applications could also continue to strengthen the dominance of these tech companies.
CloudandAIData Platforms
With generative AI come vast data requirements. What is unique to generative AI is that you can gather meaningful value from unstructured data better than ever before. Companies today have massive amounts of unstructured data they want to store, share, and utilize in an efficient manner. Given the nature of AI, it is hosted on the cloud which provides further incentives for users to migrate their data and workloads into the large cloud vendors such as AWS, Azure, or Google Cloud. All the major cloud players provide data infrastructure which is critical to supporting AI moving forward. However, most companies operate in a multi-cloud environment. Due to this, 3rd party vendors like Snowflake could meaningfully benefit from AI.
It is important to note that the way data has been managed historically is not adequate to the data requirements of AI. As a result, legacy data software companies will try and address these new needs but that leaves room for new data-oriented software companies to disrupt, given their greater agility and flexibility that come from not being tied to outdated legacy architectures.
AIInference
AI inference refers to the use of these LLMs by the companies that train them but also by 3rd parties. The belief is that AI inference over time will be the larger market as AI applications get developed to improve productivity across industries and functions. Early adopters have been software companies, but over time use of AI will continue to expand to other industries to help with improving the productivity of a variety of functions. Even with limited use cases for AI, inference already accounts for 40% of Nvidia’s GPU demand.
In the long term, the range of companies implementing AI to increase productivity is far too wide to go through all of them. It will be a significant competitive advantage for companies that can implement this technology effectively. AI will open whole new use cases in the future, which is where we believe investors should be focusing their effort to understand. This technology will have applicability for all sectors of the economy with certain industries benefiting more than others. This technology will move faster than anything we have ever seen before, and we are paying attention.
KEY TAKEAWAYS - IS A “CabotGPT”
COMING?!
To invest in complex innovations, we believe it is important to have a conceptual understanding of how everything works. This can inform us about the opportunities and limitations. I hope that from reading this you realize you don’t need to be a PhD mathematician or computer scientist to understand these models.
The push to build out AI infrastructure is already ongoing. Industries are benefiting from these huge investments being made today. There does not seem to be any binding constraint that should limit this technology from continuing to develop in the medium term. We have vast amounts of data, massive amounts of computational power, emerging AI data infrastructures, and a model architecture that has shown that with increasing amounts of scale it will continue to improve its performance. In the short term, the availability of GPUs could be a binding constraint. This will be resolved by more capacity or advancements in model architecture to make training more efficient. Also, if Nvidia’s moat around its software went away (CUDA), other GPU manufacturers would be able to add more supply to the market as well. However, based on our understanding of CUDA, that seems unlikely.
One observation is that the earnings growth related to AI has only come from AI infrastructure investment and not actual products, the one exception being Microsoft’s Co-pilot product. When you listen to executives in the industry, they believe that inference will be the larger market. Once performance of these models improves across all modalities (text, audio, video, etc.), and enterprises invest in the data infrastructure to support AI, we should see some drastic changes in the way companies do business.
Today, the foundational work has been done so that someone with a reasonable technical aptitude can fine tune a base model for a particular function. Developers share their open-source projects at Huggingface.co where they upload their latest versions of fine-tuned and base models across various tasks and modalities. These range from chat-based models that use the transformer architecture discussed in this paper to other modalities like video or images with different model structures. In fact, if you are willing to pay fees to OpenAI they have an easy interface where you can program what you want your model to do through natural written text. You can easily upload a file for the model to use to fine tune its responses. In their recent dev day Sam Altman, CEO of OpenAI, highlighted these features during a presentation. It looked simple to get it set up, but the largest hurdle will likely be aggregating your finetuning data. Regarding a CabotGPT, never say never! As much as I would find it personally gratifying to build a Large Language Model to help communicate our findings from our research efforts, I think we will stick to the old-fashioned way for now.
I would like to end with a thought from an interview with Open AI’s Chief Scientist Ilya Sutskever. In a post on the AGI world he wrote: “AI could help us become more enlightened…. as we interact with AGI, it will help us see the world more correctly. Imagine talking to the best meditation teacher in history.” (Patel, 2023). Extrapolating on this, the potential would be for the distribution of thought leadership of every discipline, accessible to everyone with limited to no friction. Putting this into historical context, the printing press allowed for the mass sharing of ideas centuries ago. More recently, the internet has allowed for an exponential amount of information sharing but also a tremendous amount of noise and misinformation. The long-term potential of generative AI would enable things like getting start-up advice from Elon Musk, learning physics from Albert Einstein, or learning the Socratic method from Socrates himself. If that does not excite you, I do not know what will.
REFERENCES
Anyoha, R. (2020, April 23). The history of Artificial Intelligence. Science in the News. https://sitn.hms.harvard.edu/flash/2017/history-artificial-intelligence
Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021, March 1). On the dangers of stochastic parrots: Proceedings of the 2021 ACM Conference on Fairness, accountability, and transparency. ACM Conferences. https://dl.acm.org/doi/10.1145/3442188.3445922
Chowdhury, H. (2023). Nvidia plans to triple production of its $40,000 chips as it races to meet huge demand from AI companies, report says. Business Insider. https://www.businessinsider.com/nvidia-triple-production-h100chips-ai-drives-demand-2023-8
Eadline, D. (2024, January 25). Meta’s Zuckerberg puts its AI future in the hands of 600,000 gpus. HPCwire. https://www.hpcwire.com/2024/01/25/metas-zuckerberg-puts-its-ai-future-in-the-hands-of-600000gpus/#:~:text=Meta’s%20Zuckerberg%20Puts%20Its%20AI%20Future%20in%20the%20Hands%20of%206 00%2C000%20GPUs
The first newspapers. (n.d.). In Britannica. Retrieved February 2024, from https://www.britannica.com/topic/publishing/The-first-newspapers
Freund, K. (2023, December 18). Breaking: AMD is not the fastest GPU; here’s the real data. Forbes. https://www.forbes.com/sites/karlfreund/2023/12/13/breaking-amd-is-not-the-fastest-gpu-heres-the-realdata/?sh=6ef1db443a6f
George, B. St., & Gillis, A. S. (2023, April 25). What is the turing test?: Definition from TechTarget. Enterprise AI. https://www.techtarget.com/searchenterpriseai/definition/Turingtest#:~:text=The%20Turing%20Test%20is%20a,cryptanalyst%2C%20mathematician%20and%20theoretical %20biologist
The history of electricity timeline (2019, September 12) Mr. Electric. https://mrelectric.com/blog/the-history-ofelectricity-history-of-electricitytimeline#:~:text=In%201925%2C%20only%20half%20of,homes%20having%20electricity%20by%201960
Javaheripi, M., & Bubeck, S. (2023, December 16). Phi-2: The surprising power of small language models Microsoft Research.
https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-smalllanguage-models
Johnson, K. (2023, December 7). LLM vs NLP | Kevin Johnson. YouTube.
https://www.youtube.com/watch?v=FmxcmKvkqnE
Ludvigsen, K. G. A. (2023a, July 18). The carbon footprint of GPT-4. Medium. https://towardsdatascience.com/thecarbon-footprint-of-gpt-4-d6c676eb21ae
Ludvigsen, K. G. A. (2023b, July 18). The carbon footprint of GPT-4. Medium. https://towardsdatascience.com/thecarbon-footprint-of-gpt-4d6c676eb21ae#:~:text=According%20to%20unverified%20information%20leaks,%E2%80%93100%20days %20%5B2%5D
M., V. (2023, December 25). Why do we need GPU clusters for inferencing/serving llms?. LinkedIn.
https://www.linkedin.com/pulse/why-do-we-need-gpu-clusters-inferencingserving-llms-vishal-malik-s9mve
Microsoft. (2024, January 30). Press Release & Webcast Earnings Release FY24 Q2
https://www.microsoft.com/en-us/investor/earnings/fy-2024-q2/press-release-webcast
Millidge, B. (2022, August 6). The scale of the brain vs machine learning. Beren’s Blog https://www.beren.io/2022-08-06-The-scale-of-the-brain-vs-machinelearning/#:~:text=Synapses%20are%20not%20identical%20to,way%20to%20represent%20NN%20weights
Nvidia Corporation - Nvidia Q4 FY24 earnings call. Nvidia . (2024, February 21). https://investor.nvidia.com/events-and-presentations/events-and-presentations/event-details/2024/NVIDIAQ4-FY24-Earnings-Call-2024 4s8TN5sqa/default.aspx
Onion, A., Sullivan, M., Mullen, M., & Zapata, C. (Eds.). (2023, June). Printing press. History.com. https://www.history.com/topics/inventions/printing-press
Patel, D. (2023, March 27). Ilya Sutskever (openai chief scientist) - building AGI, alignment, spies, Microsoft, & enlightenment. YouTube.
https://www.youtube.com/watch?v=Yf1o0TQzry8
Petrosyan, A. (2023, February 20). Internet penetration United States 2023. Statista.
https://www.statista.com/statistics/209117/us-internetpenetration/#:~:text=As%20of%202023%2C%20approximately%2092,internet%20users%20in%20the%20c ountry
Pricing. Open AI Pricing. (n.d.). https://openai.com/pricing
Revisiting the nifty fifty (2022, May). Stray Reflections https://strayreflections.com/article/252/Revisiting_the_Nifty_Fifty#:~:text=From%201973%20to%201977%2C%20the,y ears%2C%20in%20November%20of%201982
Roos, D. (2023, March). Seven ways the printing press changed the world. History.com.
https://www.history.com/news/printing-press-renaissance
Sastry, G., Heim, L., Belfield, H., Anderljung, M., Brundage, M., Hazell, J., O’Keefe, C., Hadfield, G. K., Ngo, R., Pilz, K., Gor, G., Bluemke, E., Shoker, S., Egan, J., Trager, R. F., Avin, S., Weller, A., Bengio, Y., & Coyle, D. (2024, February 13). Computing power and the governance of Artificial Intelligence. arXiv.org. https://arxiv.org/abs/2402.08797
Shilov, A. (2023, December 27). TSMC charts a course to trillion-transistor chips, eyes 1NM monolithic chips with 200 billion transistors. Tom’s Hardware. https://www.tomshardware.com/tech-industry/manufacturing/tsmccharts-a-course-to-trillion-transistor-chips-eyes-monolithic-chips-with-200-billion-transistors-built-on-1nmnode
The story of the Intel® 4004. Intel. (n.d.). https://www.intel.com/content/www/us/en/history/museum-story-of-intel4004.html
Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., Rodriguez, A., Joulin, A., Grave, E., & Lample, G. (2023, February 27). Llama: Open and efficient foundation language models. arXiv.org.
https://arxiv.org/abs/2302.13971
Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., Bhosale, S., Bikel, D., Blecher, L., Ferrer, C. C., Chen, M., Cucurull, G., Esiobu, D., Fernandes, J., Fu, J., Fu, W., … Scialom, T. (2023, July 19). Llama 2: Open Foundation and fine-tuned chat models. arXiv.org.
https://arxiv.org/abs/2307.09288
Turing, A. (1950). [PDF] Computing Machinery and Intelligence A.M. Turing
https://www.semanticscholar.org/paper/Computing-Machinery-and-Intelligence-A.M.-TuringTuring/6e369d363e91a5c09cc33907590b768d52657c0f
U.S. Department of the Interior. (2015). The Electric Light System. National Parks Service. https://www.nps.gov/edis/learn/kidsyouth/the-electric-light-system-phonograph-motionpictures.htm#:~:text=In%201882%20Edison%20helped%20form,the%20U.S.%20have%20electric%20powe r
Uffindell , R. (2023, July 27). OpenAI could power next-gen model with 10 million Nvidia gpus. Techerati. https://www.techerati.com/news-hub/openai-nvidia-chatgpt-gpu-10-million/
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2023, August 2). Attention is all you need. arXiv.org. https://arxiv.org/abs/1706.03762
Verma, S., & Vaidya, N. (2024, January 25). Mastering LLM techniques: Inference optimization. NVIDIA Technical Blog. https://developer.nvidia.com/blog/mastering-llm-techniques-inferenceoptimization/#:~:text=LLM%20memory%20requirement&text=As%20an%20example%2C%20a%20model, tensors%20to%20avoid%20redundant%20computation
Wanner, M. (2018). 600 trillion synapses and Alzheimer’s disease. The Jackson Laboratory.
https://www.jax.org/news-and-insights/jax-blog/2018/December/600-trillion-synapses-and-alzheimersdisease#:~:text=Each%20neuron%20has%2C%20on%20average,as%20high%20as%201%20quadrillion
Warren, T. (2023, January 23). Microsoft extends OpenAI partnership in a “multibillion dollar investment.” The Verge. https://www.theverge.com/2023/1/23/23567448/microsoft-openai-partnership-extension-ai
Wikimedia Foundation. (2024a, January 24). Deep Blue (chess computer). Wikipedia.
https://en.wikipedia.org/wiki/Deep_Blue_(chess_computer)
Wikimedia Foundation. (2024b, February 13). Intel 4004. Wikipedia.
https://en.wikipedia.org/wiki/Intel_4004#:~:text=The%204004%20employs%20an%2010,instruction%20cyc le%20is%2010.8%20microseconds
Woolridge, M. (2023, December 21). The Turing lectures: The future of generative AI. YouTube.
https://www.youtube.com/watch?v=2kSl0xkq2lM