Jai's Blog: How RAG Works in GenAI

Referred Link - https://www.linkedin.com/posts/the-gen-academy_genacademy-genai-rag-activity-7374314296928899072-m3Z5

𝗧𝗵𝗶𝗻𝗸 𝗼𝗳 𝗥𝗔𝗚 𝗮𝘀 𝗴𝗶𝘃𝗶𝗻𝗴 𝘆𝗼𝘂𝗿 𝗔𝗜 𝗽𝗲𝗿𝗺𝗶𝘀𝘀𝗶𝗼𝗻 𝘁𝗼 “𝗼𝗽𝗲𝗻 𝗮 𝗯𝗼𝗼𝗸” 𝗯𝗲𝗳𝗼𝗿𝗲 𝗶𝘁 𝗮𝗻𝘀𝘄𝗲𝗿𝘀.

If you’ve bumped into Retrieval-Augmented Generation (RAG) and wondered what it really is (and when you actually need it), this mini-primer is for you.

𝗪𝗵𝗮𝘁 𝗥𝗔𝗚 𝗶𝘀 — 𝗶𝗻 𝗼𝗻𝗲 𝗯𝗿𝗲𝗮𝘁𝗵

RAG pairs a language model with an external knowledge source so answers are grounded in real, up-to-date information instead of just whatever the model remembers from training. That means fewer made-up facts and more verifiable responses.

𝗪𝗵𝗲𝗻 𝘆𝗼𝘂 𝘀𝗵𝗼𝘂𝗹𝗱 𝗿𝗲𝗮𝗰𝗵 𝗳𝗼𝗿 𝗥𝗔𝗚
✅You want a domain-specific assistant (HR policy bot, clinical FAQ, internal IT helper).
✅You need current info beyond a model’s training cutoff.
✅You care about citations and traceability.

𝗧𝗵𝗲 𝗽𝗶𝗽𝗲𝗹𝗶𝗻𝗲 (𝘀𝗶𝗺𝗽𝗹𝗲 𝘃𝗲𝗿𝘀𝗶𝗼𝗻)
✅𝗜𝗻𝗱𝗲𝘅𝗶𝗻𝗴 – Gather your sources (PDFs, sites, databases). Split long docs into smaller, meaningful “chunks,” turn each chunk into an embedding (a numeric vector), and store them in a vector database for fast similarity search.

✅𝗥𝗲𝘁𝗿𝗶𝗲𝘃𝗮𝗹 – Convert the user’s question into an embedding and fetch the closest chunks from the vector store.

✅𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻 – Feed the question + retrieved chunks to the LLM to produce a grounded answer (and optionally add citations).

Why chunk? Models don’t magically use long context well; narrowing to the most relevant bits improves precision and keeps prompts lean.

𝗛𝗲𝗹𝗽𝗳𝘂𝗹 𝗮𝗱𝗱-𝗼𝗻𝘀 (𝘂𝘀𝗲 𝗮𝘀 𝗻𝗲𝗲𝗱𝗲𝗱)

✅𝗤𝘂𝗲𝗿𝘆 𝘁𝗿𝗮𝗻𝘀𝗹𝗮𝘁𝗶𝗼𝗻 (𝗛𝘆𝗗𝗘, 𝗺𝘂𝗹𝘁𝗶-𝗾𝘂𝗲𝗿𝘆): Rewrite or expand the question so retrieval finds better matches. HyDE, for instance, has the model draft a hypothetical answer, embed it, and search with that to boost recall.

✅𝗥𝗼𝘂𝘁𝗶𝗻𝗴 & 𝗰𝗼𝗻𝘀𝘁𝗿𝘂𝗰𝘁𝗶𝗼𝗻: If you have multiple stores (policies, product docs, web search), route the query to the best source and add filters (e.g., “last 90 days”).

𝗕𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝗯𝗹𝗼𝗰𝗸𝘀 (𝘄𝗶𝘁𝗵𝗼𝘂𝘁 𝘁𝗵𝗲 𝗵𝗲𝗮𝗱𝗮𝗰𝗵𝗲)
✅ 𝗟𝗮𝗻𝗴𝗖𝗵𝗮𝗶𝗻 (𝗰𝗵𝗮𝗶𝗻𝘀): Wire steps like “translate → retrieve → generate → parse” into a clear sequence you can swap and test.

✅𝗟𝗮𝗻𝗴𝗦𝗺𝗶𝘁𝗵 (𝗼𝗯𝘀𝗲𝗿𝘃𝗮𝗯𝗶𝗹𝗶𝘁𝘆): Trace every run, see timings and inputs/outputs, and debug failures—super handy once you go beyond demos.

𝗦𝘂𝗺𝗺𝗮𝗿𝘆 𝘆𝗼𝘂 𝗰𝗮𝗻 𝘁𝗮𝗸𝗲 𝘁𝗼 𝘄𝗼𝗿𝗸
✅Start simple: good chunking + a solid vector DB + a clear prompt template.
✅Measure what matters (accuracy on real tasks, not vibes).
✅Iterate: logs and traces will tell you where the bottleneck is.

Tags:

genai rag

Jai's Blog

Monday, October 13, 2025

How RAG Works in GenAI

No comments:

Post a Comment