Understanding the Open Source Tool Stack For LLMs

How to stay adaptive in an emerging AI world

Hi there,

If the recent OpenAI drama has taught us anything, it's that AI is still in its infancy.

In case you missed it, Sam Altman was fired, hired by Microsoft, and rejoined OpenAI - after 95% of the OpenAI staff threatened to leave with him. Meanwhile, in an unprecedented move, Twitch co-founder Emmett Shear joined as interim CEO, only to set a record for the fastest CEO speedrun in history - all while the world watched live on X.

It felt like a season's worth of Netflix drama crammed into a weekend, except it wasn't fiction. And it was a strong reminder of how fragile the AI landscape still is, and how we need to embrace that fact more than ever.

So how do we do that?

Let's find out!

Want to learn how to 10x your data analysis productivity with ChatGPT?

Sign up for my event with O’Reilly on Dec. 12 & 13, 2023! Use this promo code to get free access to the platform and webinar for 30 days. Enjoy!

You don’t stay flexible with open source LLMs alone

Open source systems have a long history of providing a lifeline to ensure that your organization has the tools to pivot and adapt a system regardless of industry trends.

However, when it comes to AI, most people think primarily about open source AI models and not so much about the ecosystem around them.

But in my opinion, the most important thing right now is the ecosystem.

There are many alternatives to GPT-4, both commercially like Claude, Cohere, PaLM, etc., and open source (which we'll discuss later). These alternatives may not be as good as GPT-4, but they are available.

Besides that, you could also just change the way you access GPT-4 by switching from OpenAI to Azure (which acts like a "drama shield", as someone rightly pointed out on X.)

Choosing to use an open source LLM is not only about flexibility. It is primarily driven by cost and performance. If your main goal is to stay flexible, you need a whole architecture that allows you to switch components as needed.

Final remarks:

This decision about open source tools is only relevant if you have a use case that has survived the prototype phase and is ready to go into production or at least a pilot. If you're not sure you want to go forward with that use case, there's no point in future-proofing it.

With that in mind, let’s take a closer look at the OS ecosystem for LLMs.

The Open Source Tool Stack

When we talk about the open source tool stack for LLMs, there are three main layers: the model, the tooling, and the UI.

(Actually, there is also a fourth layer used to serve the LLM, but we won't focus on that for now. Let's assume our LLM is hosted as an API. If you want to learn more about hosting LLMs, you can check this resource.)

In each of these layers we find different components, which we can think of as Lego blocks. They can be combined and the whole application is built by stacking them.

Here's a visual overview:

Let’s go through this bit by bit:

Models

The first layer is the actual model.

But what is actually an open source LLM?

Essentially, it’s just two files: a huge file with parameters (weights) and a small code file to actually runs the model using the parameters.

I took this illustration from a great video by Andrej Karpathy:

That’s what an open source LLM looks like

A key concept to understand with open source LLM is that there's a difference between foundation (sometimes called base models) and fine-tuned models.

For example, Llama-2 is a base model by Meta (which comes with different parameter sizes), and there’s a fine-tuned model called Llama 2-Chat, which is specifically designed for dialog applications.

Currently, some of most popular open source foundation models are:

  • Llama by Facebook / Meta

  • BLOOM by Huggingface

  • T5 by Google

  • Falcon by the Technology Innovation Institute (TII) in Abu Dhabi

For each of these models, there's already a plethora of fine-tuned versions available from the community. The best place to browse these models is the Huggingface model hub (which currently lists over 400,000 models).

Tooling

This layer is probably the most dynamic and exciting one in the open source LLM space. This tooling allows us to quickly build applications with LMMs and easily switch components.

There are different types of tools:

Fine-tuning tools 

In many cases, the fine-tuning code will be written using PyTorch, which is a general-purpose machine learning framework. If you want more abstraction, you can use tools like HuggingFace AutoTrain which allows you to quickly fine-tune an LLM on your own dataset.

For fine-tuning LLMs, the "dataset" is typically just a large JSON file that comes in a Q&A-style format, as shown here:

Integration tools

Integration tools help us do the "plumbing" of LLMs, such as pointing to a model, handling the prompts and conversations, or accessing other data sources.

The most prominent example in this space is Langchain (which recently got a $20mio. investment by Sequoia) that comes with a lot of templates for common LLM use cases.

While Langchain is a more general-purpose LLM framework, LlamaIndex specializes in deep indexing and retrieval for LLMs, making it a great choice for smart search and retrieval augmented generation (RAG) architectures.

LLMOps

LLMOps is a completely new field that offers tools for testing, monitoring, and maintaining LLMs in production. Weights and Biases is a popular example (and has many more features beyond LLMOps). Other more specialized tools like Promptfoo are emerging. Promptfoo is useful for comparing LLM outputs against a benchmark and measuring their quality, as these outputs tend to vary. Check it out!

LLM performance comparison in promptfoo

Typically, these tools build on top of another.

For example:

  • You get a model from Huggingface,

  • fine-tune it if necessary

  • build an application like RAG with llamaindex

  • use Promptfoo to monitor the quality of the system.

The integration and LLMOps tool space is super dynamic and there’s a new tool (more specialized) launching every day - which might indicate a growing maturity of the ecosystem.

Check out this list for an overview of LLM tools in this space.

UI

The final part we need is the front-end or user interface (UI). While this is not a traditional LLM topic, there are some particular frameworks available.

The reason is that many LLM applications have chat-like interfaces that can be customized with various features (such as displaying sources for document retrieval).

Instead of starting from scratch, frameworks like Vercel AI offer pre-made app templates.

Additionally, established frontend frameworks like Streamlit are making it even simpler to integrate with tools like LangChain.

Tip: LangChain itself has a great UI component called Langchain Chat.

LangChain open source chat UI

Risks

So many great tools - and all for free and ready to customize! Sounds amazing, right?

But before you get started, be aware of the following risks and limitations (which apply to open source in general, not only LLMs):

People

What looks easy to you may look super intimidating to someone outside the AI bubble (which often includes Python). Finding talent is hard, and open source only makes sense if there's someone in your organization who can handle it. To be clear, if you don't have a strong in-house dev team or engineering hiring culture, then the open source LLM stack is probably not a happy place for you!

Technology

Many of the tools mentioned above are less than 12 months old. Since then, they have experienced tremendous growth, both in terms of adoption and functionality. This comes with at least two problems:

A) Bugs: Be prepared to find and fix bugs yourself - there is a lot of uncharted territory here.

B) Compatibility: New updates are likely to cause compatibility issues. Langchain has a notorious legacy here. You may find yourself in a situation where it would have been easier to start from scratch.

More on open source LLMs

If you want to learn more about the open source ecosystem (and my take on “If AI was a song, what would it be”), check out my recent talk with Andreas Welsch about the LLM ecosystem:

Conclusion

The open source ecosystem for LLMs has come a long way, and we are just seeing the beginning. For developers, that's good news and bad news. Good in the sense that there's probably a tool for your problem, but bad in the sense that these tools are often still evolving.

That's why it's so important to look at your open source LLM stack in a modular way. It's too early yet to declare a clear "winner" or "best" way of doing things with LLMs.

As Heraclitus said over a thousand years ago, "The only constant in life is change" – anticipate that from day 1!

I hope you enjoyed today's more technical issue.

Next week I'll be back with a practical use case!

See you next Friday!

Tobias

PS: If you found this newsletter useful, please leave a feedback! It would mean so much to me! ❤️