- Ali's Newsletter
- Posts
- ๐ Superlinked: The Next Evolution in Retrieval & RAG Systems ๐๐ค
๐ Superlinked: The Next Evolution in Retrieval & RAG Systems ๐๐ค
Have you ever wondered why your search results or RAG (retrieval-augmented generation) pipeline sometimes feels... flat?Thatโs because traditional vector search often focuses only on unstructured text embeddings โ while ignoring the structured metadata (like recency, categories, ratings, numbers, or images) that could massively improve relevance. Thatโs where Superlinked โจ comes in.
Itโs an open-source framework that lets you combine structured + unstructured data into embeddings, and dynamically tune retrieval at query time.
Letโs break it all down. ๐งฉ
๐ High-Level Idea
Superlinked is all about spaces:
Each field of your data (text, number, category, timestamp, image) is mapped into its own embedding space.
At query time, you compose these spaces together with weights and parameters to get a final, ranked result.
๐ This means you can run multi-signal retrieval that is context-aware, tunable, and doesnโt require re-indexing.
๐ The repo puts it simply: โImprove your vector search relevance by encoding metadata together with your unstructured data into vectors.โ
๐งฑ Core Building Blocks
From the repoโs examples, here are the main components youโll work with:
Schema ๐๏ธ โ Defines your data fields (e.g.,
id,text,rating,timestamp,category).Spaces ๐ โ Encoders for each field:
TextSimilaritySpaceโ for text embeddings (sentence-transformers, etc.).NumberSpaceโ for numeric values (ratings, prices, priorities).CategoricalSpaceโ for discrete categories.RecencySpace/EventEffectSpaceโ for time-based signals.ImageSimilaritySpaceโ for visual embeddings.
Index ๐ โ Groups one or more spaces and decides which fields to store.
Source & Executor โก โ Handle ingestion (
InMemorySource,InMemoryExecutorin the examples).Query ๐ โ Declarative search builder with
.find(),.similar(),.limit(),.select_all().Param ๐๏ธ โ Query-time knobs you can adjust: weights, queries, limits.
Optional LLM parsing ๐ค โ Extract query params from natural language with
.with_natural_query(...).
โ๏ธ How It Works (Step by Step)
Define a Schema
class Review(sl.Schema): id: sl.IdField text: sl.String #This declares the structure of your documents.Create Spaces
space = sl.TextSimilaritySpace(text=review.text, model="all-MiniLM-L6-v2") #Each field gets its own encoder.Create an Index
index = sl.Index(space) #This binds your space(s) together.Ingest Data
source = sl.InMemorySource(review) app = sl.InMemoryExecutor(sources=[source], indices=[index]).run() source.put([{"id": "1", "text": "Amazing acting"}, {"id": "2", "text": "Boring plot"}]) #Documents are embedded and stored.Build & Run a Query
query = sl.Query(index).find(review).similar(space, sl.Param("search")).select_all() result = app.query(query, search="excellent performance") #Parameters (sl.Param) let you tune search at runtime
๐ Retrieval = fusion of embeddings across multiple spaces, combined with query-time weights.
๐งช Example Test Cases (from repo)
The repo shows how to test everything with pytest. Example:
Text-only retrieval โ check if semantic match works.
Text + Rating weighting โ bias results towards highly-rated products.
Limit & Metadata โ ensure filters and limits apply correctly.
(Full runnable code examples are provided in the referenced repo โ ).
๐ก Why It Matters for RAG & Embedding Systems
Traditional RAG = one embedding per doc.
Superlinked RAG = multiple embeddings per doc (spaces), fused together with weights.
That unlocks new retrieval powers:
๐ Recency bias for fast-changing data (support, news, medical).
โญ Personalization with user-specific weights.
๐ผ๏ธ Multi-modal fusion (text + images + numbers).
๐ Hierarchical retrieval (section โ paragraph โ sentence).
๐ค LLM-driven adaptive retrieval (auto-extracted params).
๐ Use Cases in RAG / Advanced Retrieval
1. Context-Aware Retrieval for Support Bots ๐ ๏ธ
Schema: ticket body + category + priority + created_at.
Spaces: text, categorical, number, recency.
๐ Prioritize recent, high-priority, same-category tickets.
2. Personalized Shopping Assistant ๐
Schema: description + price + rating + category + stock.
Spaces: text + number(price) + number(rating) + event(stock).
๐ Query: โBest affordable laptops under $500โ โ retrieval balances affordability, rating, availability.
3. Nested / Hierarchical Retrieval ๐
Schema: doc with sections โ paragraphs โ sentences.
Spaces: section titles (broad), sentence text (fine-grained), recency.
๐ Retrieve relevant snippet inside relevant section. Perfect for long-doc RAG.
4. Adaptive Query Understanding ๐คฏ
Feature: .with_natural_query(...).
Example: โShow me the cheapest recent smartphones like iPhone 14โ.
LLM parses into:
description_query = "iPhone 14"price_weight = 10recency_weight = 5
๐ Retrieval adapts dynamically.
5. Multi-Modal Retrieval ๐จ๐ท
Schema: manuals + product_image + numeric_specs.
Spaces: text + image + number.
๐ User uploads a photo + asks: โFind manuals for this device under $200โ.
Fusion across modalities โ perfect match.
๐ Key Advantages
๐ No re-indexing needed when you change retrieval strategy.
๐๏ธ Query-time personalization & parameter tuning.
๐ผ๏ธ Multi-modal fusion (text + image + numbers).
๐งฉ Structured + unstructured fusion in one framework.
๐ค Seamless LLM integration for natural query parsing.
โก Final Takeaway
Superlinked isnโt just another vector search tool โ
itโs a composable retrieval framework for the next generation of RAG systems.
Instead of forcing one-size-fits-all embeddings, you get:
๐ multi-space embeddings (per field)
๐ query-time parameter control (weights, filters, recency)
๐ fusion across modalities (text, numbers, images, time)
โจ In other words: RAG thatโs smarter, faster, and way more context-aware.
๐ Conclusion
This means Superlinked becomes your retriever layer in a RAG pipeline, replacing โvanilla vector searchโ with a composable, multi-signal retriever.
โก So, in short: Superlinked enables structured + unstructured fusion at retrieval-time, which is a huge leap over vanilla RAG. You can think of it as โparametric retrieval over multiple embedding spacesโ, which allows things like nested retrieval, personalization, multimodal RAG, and adaptive context selection.
Superlinked gives you the ability to treat structured + unstructured fields as first-class signals and fuse them at retrieval time.
This makes it a game-changer for:
RAG pipelines ๐ง
Recommendation systems โญ
Multi-modal assistants ๐จ
Dynamic search engines ๐
If you care about relevance in AI-powered retrieval, this is technology you want to watch closely. ๐
References
Langchain_qdrant reference