Enterprise GraphRAG - Sohaib Sultan

The Challenge

A legal tech startup found that standard vector-based RAG was failing on "multi-hop" queries—questions requiring connections between distant pieces of evidence (e.g., "How does Clause 5 in Contract A affect the liability defined in Addendum B?").

The Solution

I built a GraphRAG system that combines vector similarity with graph traversal to retrieve context based on explicit entities and relationships, not just semantic similarity:

Knowledge Graph Construction: Used Llama-3 to extract entities (Persons, Clauses, Dates) and relationships (AMENDS, REFERENCES) from legal docs into Neo4j.
Graph Traversal: Implemented Cypher queries to "walk" the graph and retrieve connected context up to 3 hops away.
Hybrid Retrieval: Combined vector search results with graph sub-graphs to feed the generation model.

Technical Architecture

Database: Neo4j (Graph), Qdrant (Vector)
Framework: LlamaIndex, LangChain
Models: Fine-tuned Llama-3-70b for extraction
Deployment: Docker, AWS Lambda

The Result

Retrieval accuracy for complex multi-hop queries increased from 45% (baseline RAG) to 85%. The system successfully identified contradictions across document sets that were previously invisible to the standard search engine.

Back to Portfolio