The Challenge
Users needed a single, unified interface to interact with various LLM providers (OpenAI,
Anthropic, etc.) without switching platforms.
The system required handling varying API schemas, maintaining long-term conversational memory,
and processing file uploads for RAG-based
queries efficiently.
The Solution
I architected a platform that integrates multiple LLMs into a seamless conversational UI:
- Unified Backend: Abstracted different LLM APIs into a single standardized
interface.
- Contextual Memory: Implemented a vector-store memory system to retrieve
relevant past interactions.
- Advanced RAG: Built a Retrieval Augmented Generation pipeline for file
analysis and context-driven answers.
- Autonomous Agents: Deployed specific agents for complex tasks like "Deep
Research".
Data Storage Architecture
- Chat History - JSON Blob via Turso:
All user-generated chats are stored in a structured JSON blob format using Turso, a
distributed
edge database optimized for low-latency reads and writes.
-
- Format: chat_id, user_id, name, messages[], timestamp
-
- Justification: JSON format allows flexibility in storing metadata like token usage, model
responses, and user edits.
- Users & Sessions: SQL with Clerk:
User authentication, session persistence,
and metadata (like roles and permissions) are managed
via Clerk over a traditional SQL-based database
-
- Schema: Users, Sessions, API keys, OAuth tokens
-
- Security: Built-in 2FA, passwordless auth, session encryption.
- Files & Images: Object Storage via UploadThing: User-uploaded assets (files,
images, media) are stored using UploadThing on object storage
systems (e.g., AWS S3 under the hood).
-
- CDN-enabled for fast access
-
- Metadata (file type, size, access history) linked to SQL
- Database: MongoDB, Redis
- Cloud: AWS (EC2, S3), Docker
Challenges
- Integrating multiple LLMs with varying APIs and response formats.
- Optimizing real-time communication for low latency.
- Managing contextual memory across long user sessions.
The Result
The FlowGPT platform achieved a 95% user satisfaction rate in beta testing, with an average
response time of under 200ms. It successfully handled 10,000+ concurrent users and supported
multilingual interactions, proving the scalability of the custom architecture.
Back to Portfolio