- The Angle
- Posts
- Engines or plastics? How we talk about LLMs and how we use them.
Engines or plastics? How we talk about LLMs and how we use them.
The Angle Issue #249
Engines or plastics? How we talk about LLMs and how we use them.
Gil Dibner
The prevailing sentiment in the technology world appears to be that LLMs resemble automobile engines: powerful but mostly useless on their own. As Christopher Penn writes, “no one rides down the road on an engine; everyone drives down the road in a vehicle of some kind. And that's where generative AI is today - we're running into the limitations of using an engine directly (through apps like ChatGPT) and needing the rest of the car.”
There are certainly many use cases where LLMs provide extraordinary value. Most of these appear to be in consumer use cases of classic information retrieval. The ability of LLMs from OpenAI, Google, and Perplexity to search vast troves of knowledge, identify salient information, and synthesize that information into easily consumed output is impressive. For me personally, LLMs have largely replaced traditional search engines because they offer a much faster path to synthesis with their ability to condense information into exactly the format I want to consume.
As we try to squeeze more juice out of the LLM lemon, we are moving from thin wrappers to ever thicker wrappers. This began with prompt engineering. Most recently, we’ve observed a massive wave of agentic orchestration companies (each claiming to be the “Zapier of AI”) pass through our dealflow. The approach of augmenting LLMs with new agentic capabilities, computer-use, and reasoning models such as demonstrated by OpenAI’s o1 will lead to increased capabilities. From this perspective, today’s world of models and wrappers will evolve into tomorrow’s world of supermodels and superwrappers.
As exciting as the supermodels+superwrappers framework is, I’m focused on another framework. In a famous scene from The Graduate, an older friend offers a young man career advice: “There’s a great future in plastics,” he says, with great sincerity. I am starting to think this might be the right way to think about LLMs at the current moment. There are many applications for which plastics - and plastics alone - are the best solution: disposable forks, ziploc bags, and beach toys. But there are far more and far greater applications (a car, for example), where plastics play an integral role because they are built into so many parts of the car: bumpers, lighting, dashboards, airbags, engine covers, wiring insulation, etc. There are hundreds of components of a modern automobile that rely heavily on the unique properties of various types of plastic. But - altogether - plastic accounts for only about 8-10% of the weight of a modern car. The engine and frame are still made of metal, the tires are still made of rubber, the windows are still made of laminated glass. Plastic is an example of a core technology that enables multiple subsystems of larger complex systems. None of us, however, would want to ride in a car made entirely of plastic.
LLMs offer exactly this type of technological edge for enterprise applications. LLMs have the ability to easily weave various types of data and queries together. Used properly, LLMs can validate outputs and check application logic on the fly. LLMs can extend search functionality seamlessly across a wide range of internal and external databases. LLMs are great at parsing natural language input and generating natural language output in ways that can make applications far more accessible and useful than before. LLMs generate and validate code, HTML, or XML. As an ingredient in a larger more complex system, LLMs can be a powerful value-driver and differentiator. As the entirety of an approach, LLMs seem to fall short of the mark in too many cases.
One of the smartest people I know (I will leave him anonymous) said it well on a private call earlier this week: “the problem with LLMs is that they make it incredibly easy to build an impressive demo but not a full solution.” We observe this phenomenon over and over again in our dealflow. It’s actually not that hard (for any team) to harness one of the many foundational models to begin to demonstrate value. You want to import hundreds of pages of customer documents and magically generate a completed response to a customer tender or RFP? No problem. The LLM can do that. You want to ingest a massively complex master service agreement (MSA) and chat with a robotic in-house counsel to ask some questions about the terms? No problem. The LLM can do that too. But problems emerge the minute the system is required to handle deterministic data about which humans can and want to reason. What if you are not sure that an arbitration clause exists in that MSA? What if we want to establish a fixed record of the clauses that exist and don’t exist in each MSA we are looking at? What if we want to group MSAs together by the types of clauses they contain and how those clauses have been authored? On their own, LLMs are far from the most efficient way to do all of this - but they are the best way to do parts of this.
Once more advanced and human-auditable reasoning is required, an LLM-first approach often breaks down. It’s no wonder that many of the most interesting applications of LLMs we’ve seen recently have used LLMs as connective tissue between other more traditional techniques. This is not currently the consensus view, but we believe it may become the prevailing approach to leveraging the strengths of LLMs in enterprise applications: not as a primary standalone engine but as a key enabling technology that grants new superpowers to existing systems and techniques.
There will be some great application companies built with the supermodels+superwrappers approach. But I’m increasingly excited about companies that are using LLMs as a superpower additive deep within their architecture. If you are building with this technology, I’m curious how you are approaching these challenges - and I’d love to hear about what you are building.
FROM THE BLOG
Am I Thinking About AI the Right Way?
Gil shares the four AI themes and questions he's thinking about.
The Venture Apocalypse
The venture world is deeply debating its future, but core principles remain unchanged.
No Sleepwalk to Success
Engineering success in a technical startup.
Revenue Durability in the LLM World
Everything about LLMs seems to make revenue durability more challenging than ever.
WORTH READING
ENTERPRISE/TECH NEWS
AI efficiency gains. Klarna’s sales and marketing expense fell 16%, and customer service and operations expense fell 14%, all while boosting revenues 23% YoY. This achievement is, according to Klarna, due to their implementation of AI. Namely, the launch of their customer service chatbot, and the use of AI to generate images and translations (and similar) which they normally paid services firms to do for them manually.
Debanked. After Marc Andreessen shared stories of debanked startups on the Joe Rogan podcast last week, founders came out of the woodwork to share their stories. One notable story is this one from David Marcus about how Libra, Meta’s effort to get into blockchain payments, was killed by regulators.
Are LLMs just mimics? From Andrej Karpathy, a bit of cold water for the AGI-crowd: large Language Models (LLMs), despite being labeled "AI," essentially function by mimicking the average human data labeler whose work informed their training. While specialized labelers in fields like coding and math enhance LLM proficiency in those areas, the core functionality remains imitative rather than truly intelligent. Therefore, "asking an AI" is more akin to querying a collective representation of human knowledge than consulting an independent, reasoning entity. This imitation, while useful, should not be mistaken for genuine artificial intelligence capable of complex tasks like governing.
Enterprise software rebounding? After a significant downturn, the enterprise software market is showing signs of recovery. Renewed customer spending and a focus on efficiency are driving this rebound, particularly for cloud-based solutions. While the recovery is still in its early stages, it signals a potential turning point for the sector.
HOW TO STARTUP
Grant benchmarks. Peter Walker from Carta sharing some early-stage wisdom: early-stage startup equity grants for the first five hires typically range from 0.3% to 2% of fully diluted shares, vesting over four years. The first hire often receives a significantly larger equity stake (median 1.45-2.01%), while subsequent hires receive decreasing percentages. Biotech startups tend to offer slightly higher equity than SaaS companies, particularly for the first hire. Detailed benchmarks available at the link.
Adaptive UIs. Important point from Paul Bucheit, inventor of Gmail, about the design of AI-powered tools.
HOW TO VENTURE
YC embraces dupes. Analysis reveals what we all already knew…Y Combinator frequently invests in startups with overlapping ideas, even within the same cohort. This duplication extends beyond the recent trend of AI code editors to other sectors, suggesting a deliberate strategy rather than coincidence. The data challenges the assumption that YC prioritizes unique concepts, indicating a potential focus on founder strength or market timing over absolute novelty.
AI boom, VC bust. Despite a surge in AI investments, venture capital firms are experiencing record-low profits. This downturn stems from a broader decline in tech valuations. The disconnect between booming investment and meager returns highlights the core tension facing VCs and LPs at this moment.
Fund returners key to VC. Analysis from Dave Clark reveals that while growth funds have fewer outright losses than early-stage funds, they generate a surprisingly similar proportion of high-return investments. Despite lower loss ratios, achieving a 3x return still relies heavily on "fund returners" (investments returning the entire fund's size) for both early and growth stage funds. Further investigation is needed to understand why growth funds produce a higher percentage of these exceptional performers.
PORTFOLIO NEWS
Aquant has been named one of the Next Big Things in Tech by Fast Company.
PORTFOLIO JOBS
Aquant
Accountant (Boston)
groundcover
Account Executive (Remote)
Tensorleap
Algorithm Developer (Tel Aviv)
Reply