Liquid AI Unveils LFM2.5 Models: Powering Fast, Multilingual Search Across 11 Languages

Tech News

June 19, 2026

Revolutionizing Multilingual Information Retrieval

The central development is this: In today’s globalized digital landscape, the ability to search and retrieve information across multiple languages quickly and accurately is more critical than ever. Addressing this growing need, Liquid AI has introduced two groundbreaking retrieval models: LFM2.5-Embedding-350M and LFM2.5-ColBERT-350M. These innovative models, each boasting 350 million parameters, are designed to deliver high-performance multilingual and cross-lingual search capabilities across an impressive 11 languages, all while maintaining a remarkably small footprint suitable for diverse deployment environments.

Revolutionizing Multilingual Information Retrieval
Expert Perspective
Frequently Asked Questions
Meet the LFM2.5 Retrievers: Embedding vs. ColBERT
Architectural Evolution: From Causal to Bidirectional
Training Methodology and Data
Leading the Class in Benchmarks
Blazing Fast Latency and Edge Deployment
Practical Use Cases and Examples
Why does multilingual search models matter right now?
What broader change could multilingual search models signal?
What should the market watch next around multilingual search models?
Key Takeaways

Meanwhile, Built upon the foundation of the LFM2.5-350M-Base model released earlier this year, these new additions represent the first bidirectional members of the LFM family. They are now readily available on Hugging Face under the LFM Open License v1.0, opening doors for developers and enterprises to integrate advanced multilingual search into their applications.

Meet the LFM2.5 Retrievers: Embedding vs. ColBERT

While both LFM2.5 models share a common architectural backbone, they employ distinct strategies for text representation, catering to different optimization priorities:

LFM2.5-Embedding-350M (Dense Bi-Encoder): This model is engineered for speed and efficiency. It transforms an entire document into a single dense vector. The primary advantages here are the fastest search times and the smallest, most cost-effective index. If your priority is lightning-fast retrieval with minimal storage overhead, the Embedding model is your go-to choice.
LFM2.5-ColBERT-350M (Late-Interaction Model): When accuracy and nuanced understanding are paramount, the ColBERT model shines. Instead of a single document vector, it converts each token (word or sub-word unit) into its own vector. This allows for a more granular, word-by-word matching process, leading to higher accuracy and better generalization, especially in complex cross-lingual scenarios. The trade-off is a larger index size. Additionally, this model can efficiently rerank results from an initial retriever without needing to build a dedicated index.

In practical terms, Both models are optimized for short-context search scenarios, making them ideal for applications such as product catalogs, comprehensive FAQ knowledge bases, and extensive support documentation. Liquid AI positions them as seamless drop-in replacements for existing Retrieval Augmented Generation (RAG) pipelines.

Architectural Evolution: From Causal to Bidirectional

A key innovation behind the LFM2.5 retrievers is a significant architectural shift. Starting from the LFM2.5-350M-Base, a general-purpose checkpoint, Liquid AI applied specialized bidirectional patches to the LFM2 architecture. This transformed it from a causal decoder, which processes text sequentially (left-to-right), into a bidirectional encoder.

In a causal setup, each token can only attend to itself and preceding tokens, which is well-suited for text generation.
For retrieval, however, understanding the full context is crucial. The bidirectional design enables every token to attend to both its left and right context, allowing for a more comprehensive representation of the text.

For example, This intelligent adaptation preserves the inherent efficiency of the LFM2 backbone while generating the rich, full-context representations that retrieval tasks demand. Each model features 17 layers (10 convolution, 6 attention, and 1 pooling or dense layer) and supports a context length of up to 32,768 tokens, though documents are typically tuned for 512 tokens.

Training Methodology and Data

The development of these high-performing models followed a rigorous three-stage training process:

Stage One: Large-scale contrastive pretraining conducted primarily in English.
Stage Two: Multilingual and cross-lingual distillation, where knowledge was transferred from a powerful teacher model across all 11 target languages.
Stage Three: Final fine-tuning using carefully curated hard-mined negatives to further refine performance.

That said, The Embedding model received slightly more cross-lingual data during training, while the late-interaction setup of ColBERT naturally facilitates cross-lingual retrieval. The training data utilized a combination of Liquid AI’s internal curated datasets and open-source English retrieval datasets, with LLM-based translation expanding the scope for multilingual and cross-lingual pairs.

Leading the Class in Benchmarks

Liquid AI rigorously evaluated the LFM2.5 models across two critical capabilities: multilingual retrieval using NanoBEIR and cross-lingual open-domain Question Answering (QA) with MKQA-11. Both benchmarks reported results across all 11 supported languages: Arabic, German, English, Spanish, French, Italian, Japanese, Korean, Norwegian, Portuguese, and Swedish.

Interestingly, On average, both LFM2.5 models demonstrated superior performance within their class. Notably, they outperformed larger competitors like Qwen3-Embedding-0.6B. While ColBERT generally led on both average metrics, the Embedding model was a close second on MKQA-11.

Blazing Fast Latency and Edge Deployment

A significant advantage of the LFM2.5 models is their remarkable efficiency and versatility in deployment. Liquid AI has released GGUF variants for `llama.cpp`, enabling these powerful models to run effectively on CPUs, laptops, and even edge devices.

However, When document embeddings are pre-computed, median (p50) query latency remains under 10 milliseconds on a MacBook Pro M4 Max at FP16. For enterprise-scale applications utilizing GPU stacks, such as an H100 at FP16, latencies can drop to as low as 1 millisecond.

This capability makes advanced multilingual search accessible in environments where resources might be limited, or data privacy necessitates on-device processing.

Practical Use Cases and Examples

Meanwhile, The LFM2.5 retrievers open up a wide array of possibilities for enhancing various applications:

E-commerce: Imagine a single product catalog that can be searched in multiple languages. A shopper in Korea can input a query in Korean and instantly receive relevant product listings originally written in English, thanks to seamless cross-lingual retrieval.
FAQ and Support Knowledge Bases: Ensure customers receive accurate answers regardless of the language they use. A French support question can reliably map to an English help article.
On-Device Semantic Search: Empower users to search their personal files, emails, and notes directly on their consumer hardware. The GGUF builds facilitate this with near-zero operational cost and enhanced data privacy.
Enterprise Knowledge Assistants: Facilitate the retrieval of critical internal documents—legal, financial, or technical—across language barriers. The ColBERT model’s superior accuracy is particularly beneficial in these high-stakes scenarios where precision is paramount over index size.

These models integrate easily into existing workflows, with the Embedding model supported via `sentence-transformers` and the ColBERT model through `PyLate`.

Expert Perspective

From an industry angle, the clearest signal around multilingual search models is how it may influence lfm2. The story reads less like a one-day spike and more like a marker of broader movement.

The next phase will depend on how quickly teams, regulators, or customers react. In practice, that gives multilingual search models room to reshape expectations across model over the near term.

For readers focused on practical impact, the best next step is to watch what changes around retrieval once attention turns into execution.

Frequently Asked Questions

Why does multilingual search models matter right now?

Revolutionizing Multilingual Information RetrievalThe central development is this: In today’s globalized digital landscape, the ability to search and retrieve information across multiple languages quickly and accurately is more critical than ever.

What broader change could multilingual search models signal?

Addressing this growing need, Liquid AI has introduced two groundbreaking retrieval models: LFM2.5-Embedding-350M and LFM2.5-ColBERT-350M.

What should the market watch next around multilingual search models?

These innovative models, each boasting 350 million parameters, are designed to deliver high-performance multilingual and cross-lingual search capabilities across an impressive 11 languages, all while maintaining a remarkably small footprint suitable for diverse deployment environments.Meanwhile, Built upon the foundation of the LFM2.5-350M-Base model released earlier this year, these new additions represent the first bidirectional members of the LFM family.

Key Takeaways

Liquid AI’s LFM2.5-ColBERT-350M and LFM2.5-Embedding-350M are the first bidirectional models in the LFM family, designed for fast, accurate multilingual and cross-lingual search across 11 languages.
Both 350M parameter models consistently lead their class in benchmarks like NanoBEIR and MKQA-11, surpassing larger models such as Qwen3-Embedding-0.6B.
The Embedding model offers the smallest, most cost-effective index and fastest search, ideal for speed-critical applications.
The ColBERT model provides higher per-token accuracy and better generalization, making it suitable for scenarios where retrieval precision is crucial, at the expense of a slightly larger index.
Thanks to GGUF builds, these models can run efficiently on CPUs, laptops, and edge devices with impressive query latencies, often under 10 ms.
They are designed for easy integration into existing RAG pipelines, utilizing popular libraries like sentence-transformers and PyLate under the LFM Open License v1.0.

Source: https://www.marktechpost.com/2026/06/19/liquid-ai-introduces-lfm2-5-embedding-350m-and-lfm2-5-colbert-350m-dense-bi-encoder-and-late-interaction-models-for-fast-multilingual-search-across-11-languages/