Project Details - Sohaib Sultan

The Challenge

Users needed a single, unified interface to interact with various LLM providers (OpenAI, Anthropic, etc.) without switching platforms. The system required handling varying API schemas, maintaining long-term conversational memory, and processing file uploads for RAG-based queries efficiently.

The Solution

I architected a platform that integrates multiple LLMs into a seamless conversational UI:

Unified Backend: Abstracted different LLM APIs into a single standardized interface.
Contextual Memory: Implemented a vector-store memory system to retrieve relevant past interactions.
Advanced RAG: Built a Retrieval Augmented Generation pipeline for file analysis and context-driven answers.
Autonomous Agents: Deployed specific agents for complex tasks like "Deep Research".

Data Storage Architecture

Chat History - JSON Blob via Turso:
All user-generated chats are stored in a structured JSON blob format using Turso, a distributed edge database optimized for low-latency reads and writes.
- Format: chat_id, user_id, name, messages[], timestamp
- Justification: JSON format allows flexibility in storing metadata like token usage, model responses, and user edits.
Users & Sessions: SQL with Clerk:
User authentication, session persistence, and metadata (like roles and permissions) are managed via Clerk over a traditional SQL-based database
- Schema: Users, Sessions, API keys, OAuth tokens
- Security: Built-in 2FA, passwordless auth, session encryption.
Files & Images: Object Storage via UploadThing: User-uploaded assets (files, images, media) are stored using UploadThing on object storage systems (e.g., AWS S3 under the hood).
- CDN-enabled for fast access
- Metadata (file type, size, access history) linked to SQL
Database: MongoDB, Redis
Cloud: AWS (EC2, S3), Docker

Challenges

Integrating multiple LLMs with varying APIs and response formats.
Optimizing real-time communication for low latency.
Managing contextual memory across long user sessions.

The Result

The FlowGPT platform achieved a 95% user satisfaction rate in beta testing, with an average response time of under 200ms. It successfully handled 10,000+ concurrent users and supported multilingual interactions, proving the scalability of the custom architecture.

Back to Portfolio