Document chat for the masses: building with Fixie AI
Every other large language model-related Twitter thread seems to be some version of the following: “Introducing PiperAI, a new way to chat with your {arbitrary data}, built using {foundation model API}, LangChain, and {vector store}”
Document chat is an obvious and powerful use case for LLMs because it offers knowledge workers a new model for interacting with text, shuffling the cognitive load of long-form comprehension off our shoulders and onto some far-away silicon. From attorneys sifting through legal briefs to sales leaders reading call scripts, LLM-driven document chat promises a deeper, more efficient understanding of unstructured data.
Want to create a document chat app for your data? Great! All you have to do is write some code to:
- Assemble the documents you want to query
- Ingest and break up the unstructured text into smaller chunks
- Pass the chunks to an embeddings API and retrieve your embeddings
- Index the embeddings in a vector store
- Point an LLM at the vector store with a prompt for Q&A
- Build a half-decent prompt, manage your API keys, figure out hosting and costs…
For the non-technical among us, building an app like this is hard. Enter Fixie, a platform for building with LLMs that offers the most accessible tooling for document chat that I’ve seen. Fixie users create agents, programs that extend LLMs to interact with the outside world. Agents can use a combination of natural language and defined functions to accomplish multi-step tasks like chatting with unstructured text in a document.
Fixie wraps all of the steps involved in document chat into a few built-in functions, hosts the entire app, and provides a chat-based web UI. The tooling is incredibly accessible for anyone with a small bit of programming experience and marries the customization of code with easy development and deployment.
Let’s try a simple tutorial using Fixie's python SDK. Imagine we already have a single text file that contains the entire script of Seinfeld. We want to use an LLM to chat with the contents of the file for the purposes of winning at Seinfeld-related trivia.
Note that you’ll need a Fixie account to recreate this tutorial. Fixie is in developer preview as of this writing, so it’s free to use.
Building our Seinfeld trivia agent
First, we’ll install Fixie in our local environment.
pip install fixie
Then, we’ll authorize our account with the Fixie cloud platform.
fixie auth
Then, we’ll initialize a Fixie agent in our directory.
fixie init seinfeld
Initializing a new agent prompts us for some information about our agent and provides us with the templated files we need to create our document chat app: agent.yaml
, main.py
, and a readme.
In our main.py
file, we have a new template with a few variables. BASE_PROMPT
governs the general objectives and tone for the agent, which we can change to suit our purpose as a Seinfeld trivia bot.
import fixieai
BASE_PROMPT = (
"I am an expert in trivia about the TV show Seinfeld. I answer questions about the show confidently and concisely."
)
FEW_SHOTS
provides the agent with few-shot examples of desirable outputs. We’ll pick a few questions and answers that the agent should be able to parse from the script of Seinfeld. The examples should reflect the scope of possible questions, as well as the tone and type of answers we defined in BASE_PROMPT
.
FEW_SHOTS = """
Q: What is the first job that George lies about having?
Ask Func[fixie_query_corpus]: What is the first job that George lies about having?
Func[fixie_query_corpus] says: George lies to Vanessa and tells her that he is an architect.
A: George lies to Vanessa and tells her that he is an architect.
Q: How does Mr. Pitt eat his Snickers?
Ask Func[fixie_query_corpus]: How does Mr. Pitt eat his Snickers?
Func[fixie_query_corpus] says: Elaine tells George and Jerry that Mr. Pitt eats his Snickers with a knife and fork.
A: Elaine tells George and Jerry that Mr. Pitt eats his Snickers with a knife and fork.
"""
You’ll notice that the few-shot examples invoke a function called fixie_query_corpus
. This is a built-in function that executes the prompt in the query against the content of a corpus we define while building our agent. Foundation models like GPT-4 have pretty extensive knowledge of Seinfeld out of the box, so just imagine that we’re providing a non-public document here.
Reiterating that the agent should respond with information from our corpus helps scope down the agent's potential answers to information only in the text corpus. If GPT-4 knows that the sky is blue and our document says that it’s red, the correct response to a question about the color of the sky is “red”.
To define our corpus, Fixie provides web crawling functionality when given a set of URLs. We’ll pass in the URL where our Seinfeld script is hosted and assemble the document corpus using the DocumentCorpus
class.
URLS = [
"https://raw.githubusercontent.com/garrett-obrien/fixie-seinfeld/main/seinfeld-text-corpus.txt"
]
CORPORA = [fixieai.DocumentCorpus(urls=URLS)]
Then, we’ll instantiate our CodeShotAgent
, the base class for agents in Fixie that handles communication with the Fixie platform.
agent = fixieai.CodeShotAgent(
BASE_PROMPT,
FEW_SHOTS,
CORPORA,
conversational=True,
llm_settings=fixieai.LlmSettings(
temperature=0, model="openai/gpt-4", maximum_tokens=2000
),
)
One of the most helpful things about building document chat as a Fixie agent is the ability to tune the settings of the LLM you’re using. GPT-4 is probably overkill for most questions but offers some extra firepower for high-context queries (like picking up on nuance in a sitcom script).
The combination of tuning with BASE_PROMPT
, FEW_SHOTS
, and the settings of the model provides a multi-pronged approach to optimizing document chat for a given use case.
Deploying to the Fixie platform
Now that we’ve assembled our agent, we can run fixie agent deploy
to deploy our agent to the Fixie platform. Once our deployment succeeds, we can flip over to the browser and start a session with our agent using the Fixie chat UI.
The chat UI is pretty basic but offers quick prototyping in the browser and the ability to publish the agent for public access by using the sharing feature. Using only the script of the show, our Seinfeld agent scores what I'd call a B+ on most online Seinfeld trivia. Hardly a robust benchmark, but not bad for such little code.
Limitations
Depending on the size of your text corpus, it can take a while to deploy your agent to the cloud platform in each iteration. As a result, optimizing your FEW_SHOTS
and BASE_PROMPT
gets quite arduous.
The cloud side of Fixie is still a bit of a black box, so updates to the underlying infrastructure can impact the utility of your agent in unexpected ways. In experimenting with the various OpenAI models available on the platform (especially GPT-3.5 and GPT-4), I’ve encountered a few cases where the agent gets stuck in a loop, conversing with Fixie’s internal router agent and ultimately getting the wrong answers to simple questions. These failure modes seem to come and go as time goes on, so hopefully, stability will improve throughout the developer preview period.
Build your own
Check our Fixie's examples repo for inspiration, or take the Seinfeld trivia bot for a spin (you'll need a Fixie account).
You can see the full code on GitHub.