Langchain Is Great… If You Don’t Care About Latency

Okay, let me clarify my position: Is Langchain a good tool? Yes. Are there scenarios where it will be useful? Absolutely.

Is it suitable for production where low latency is crucial? Maybe not.

Background

Recently at work, I had the chance to use Langchain outside of a personal project (those projects you create to show recruiters that you have skills, hoping they'll hire you).

The task was to create a tool that takes natural language as input and searches a NoSQL database to provide the result.

Example Input : Provide me the email address of the applicant John Doe who applied for the Data Scientist position yesterday

In standard API endpoints, the response typically includes either all details about John Doe or a predefined subset of information. Alternatively, a GraphQL-based approach allows you to specify the exact data you need in the request itself (e.g., requesting only the email in the example above).

Caveats

At first glance, this problem seems straightforward when approached through the lens of a relational database. One possible solution is to have a large language model (LLM) generate an SQL query to filter data based on specific parameters. Although this approach has limitations—such as limited control over tables, rows, and query types—it remains a feasible option.

However, NoSQL databases present a different set of challenges, particularly in scenarios where we need to provide high flexibility for customers. In NoSQL environments, schemas are often inconsistent, and there isn’t a standardized query language like SQL to achieve similar filtering and data retrieval results. This lack of uniformity makes designing a flexible, user-friendly querying system more complex.

Implementation

I cannot go into too much depth for obvious reasons, but I will try my best to give an overview of the approach and then highlight the issues.

We use an LLM to generate an intermediary plan that we use to filter information from the database. Another LLM call is made to decide what parameters were requested in the call and then only the results with those parameters are returned.

The second LLM call is quite straightforward with very little logic involved.

On the other hand, the first LLM’s output processing is quite complicated. Currently, we support all types of complex queries from nested AND, OR, search on multiple tables, and almost all operators(i.e. ==, ≠, <, >, <=, >=, not in, in), order by, limit, etc.

As most of our services depend on results from the database results, it serves as the bottleneck of the system and hence needs to be optimized for latency.

The Issue

Our initial solution with Langchain (for handling the LLM calls, and parsing using the LCEL) took over 30 seconds on average to process information including the LLM calls.

We did different optimizations in our code to reduce it to 15 seconds(i.e. different pruning strategies, caching, and language-based optimization). Then we benchmarked specifically Langchain code and it didn’t take us time to realize that this package is not made for production-grade software where low latency is a must.

In our results just by removing Langchain from our code and essentially taking the code written to finally achieve the task. We achieved roughly a 20% reduction in execution time.

We mostly blame it on the crazy amount of abstraction and checks that are inside Langchain. Yes, it’s useful for early adopters, quick POCs, and chat systems where you can afford the low latency.

Just for reference look at how many layers of abstraction are there to just call an LLM native API i.e. OPENAI or VERTEX AI (5-6 layers of inheritance)

Even small tasks like chunking a sentence have a multilevel inheritance with different checks. Don’t get me wrong, it is flexible but for many applications, it is not suitable. We achieved the same results by removing Langhcain entirely from the project. This is not the story of just one of our tools, other tools face a similar issue with Langchain.

Conclusion

Use Langchain with caution, its an evolving package with updates every day, be mindful of what you are trading for, ease of use, development speed, latency, etc.

I would love to hear your opinions, and in the future, I plan to share more articles about AI and the real challenges we face in building our infrastructure with multi-agent workflows. No nonsense—just one developer talking to others.

Note: ChatGPT is used to generate the cover image.