Hybrid Rule + LLM Recommendation Engines, Explained

There’s a tempting but wrong idea in the market: that you can hand a large language model your catalogue, let it chat with shoppers, and call it a recommendation engine. It will produce fluent answers — and occasionally recommend a product you don’t sell, ignore a compatibility rule, or quietly suggest something out of stock. The engines that actually perform in retail are hybrid: deterministic business rules for the things that must be true, and an LLM for the things that need language and judgement. This article explains how that architecture fits together and why.

The two failure modes you’re avoiding

Understanding why hybrid wins starts with the weaknesses of each approach on its own.

Pure rules are predictable and safe but brittle. They can’t interpret “something cosy for a draughty old cottage”, and every new nuance means a developer editing a decision tree. They don’t scale with catalogue complexity.
Pure LLM is flexible and great with language but non-deterministic. It can hallucinate products, miss hard constraints, and give different answers to the same question. For anything involving money, compatibility or stock, that’s unacceptable.

Hybrid architecture assigns each technology the job it’s good at: rules guarantee correctness; the LLM provides understanding and explanation. If you’re new to the broader concept, what is AI guided selling sets the scene.

The division of labour

A clean mental model: the LLM understands and explains, the rules constrain and rank, and a retrieval layer grounds everything in your real catalogue.

What the rules layer owns

These are non-negotiable, deterministic checks that should never be left to a probabilistic model:

Hard constraints — budget ceilings, size, compatibility, regional availability.
Stock and lifecycle — never recommend an out-of-stock, discontinued or unpublished SKU.
Commercial logic — margin floors, promoted ranges, “do not pair” exclusions.
Compliance — age-restricted items, regulated categories.

If a rule says no, the product is out. Full stop. This is what makes the engine trustworthy enough to put in front of customers.

What the LLM layer owns

Language understanding — turning “quiet machine for a small flat” into structured requirements.
Disambiguation — asking a sensible follow-up when an answer is unclear.
Explanation — generating the plain-language reason a product fits, which is a major driver of conversion.
Soft preference handling — interpreting fuzzy, subjective inputs that no rule could enumerate.

How retrieval keeps the LLM honest

The component that prevents hallucination is grounding: the LLM never invents products, it only ever selects from a candidate set fetched from your real catalogue.

A common pattern:

The LLM converts the shopper’s answers into a structured query (attributes, constraints, preferences).
A retrieval step — often vector or semantic search — pulls a candidate set of real, in-stock products that match. Our primer on vector search explains how that works under the hood.
The rules layer filters the candidates against hard constraints.
A ranking step scores the survivors against the soft preferences.
The LLM explains the top few in natural language — but only products that survived steps 2–4 can appear.

Because the LLM chooses from a vetted list rather than generating freely, it physically cannot recommend something you don’t sell. This is the single most important design decision in a production engine.

A reference architecture

Putting it together, a request flows like this:

Input — structured answers plus optional free text from the guided flow or quiz.
Interpretation (LLM) — normalise into a requirements object; ask a clarifying question if confidence is low.
Retrieval (semantic/vector + keyword) — fetch candidate products from the live catalogue.
Constraint filtering (rules) — drop anything violating a hard rule.
Ranking (rules + scoring, optionally ML) — order by fit, with commercial weights applied.
Explanation (LLM) — generate reasons for the top results, grounded in the actual product attributes.
Guardrails (rules) — final validation that every returned item is real, in stock and compliant.

This mirrors the patterns we use across AI search and recommendations projects — the same grounding-and-guardrails discipline applies whether the entry point is a search box or a guided flow.

Guardrails are not optional

Even with grounding, you need explicit safety nets:

Output validation — confirm every SKU in the response exists, is purchasable and matches the stated constraints before it reaches the shopper.
Fallbacks — if the LLM is unavailable or low-confidence, degrade gracefully to a rules-only recommendation rather than failing.
Tone and claim controls — prevent the explanation from inventing specifications or making claims your product copy doesn’t support.
Logging — store inputs, candidates and outputs so you can audit and improve. This also feeds your data insights and helps debug poor recommendations.

The data foundation

None of this works without product data the engine can reason over. Before architecture, check that:

Attributes used for constraints and ranking are populated and consistent across the catalogue.
Compatibility and exclusion relationships are modelled explicitly, not buried in descriptions.
Stock status is live, not a nightly export.

In practice, data quality determines recommendation quality far more than model choice. A modest model on clean data beats a frontier model on messy data every time.

Cost, latency and when to call the LLM

LLM calls add latency and cost, so use them where they earn it:

Cache interpretations of common answer patterns.
Consider a smaller, faster model for interpretation and a stronger one only for explanation, or vice versa.
Don’t call the LLM at all for shoppers who pick standard options — a deterministic path is cheaper and faster, with the LLM reserved for free text and ambiguity.

This keeps response times in the range shoppers tolerate while controlling per-session cost.

Common pitfalls

Letting the LLM pick from the whole catalogue. Always retrieve and constrain first.
No fallback when the model errors or times out.
Skipping output validation and trusting the model’s SKUs.
Over-engineering — many catalogues need only light LLM involvement; don’t add it for novelty.

Conclusion

The best guided-selling engines aren’t “an LLM with a catalogue”. They’re a pipeline where rules guarantee what must be true, retrieval grounds the model in real products, and the LLM supplies the language understanding and explanations that make recommendations feel genuinely helpful. Get the division of labour right and you get the flexibility of AI with the reliability of deterministic systems.

If you’re weighing build-versus-buy or architecting a recommendation engine, book a free consultation and we’ll talk through your data, catalogue and the right level of AI for your case.

#guided selling#llm#architecture

Keep reading

Guided Selling

Ready to turn AI into revenue?

Book a free 30-minute consultation. We'll map the highest-ROI AI opportunities for your store — no obligation, no jargon.

Book a consultation Explore our services