Building a Full-Stack AI SaaS with FastAPI, React and Supabase
Building an AI SaaS means combining an LLM integration with user authentication, subscription billing, usage tracking, and a production frontend. Each piece has its own complexity, and the challenge is making them work together without over-engineering.
This guide covers the architecture and implementation patterns for a full-stack AI application using FastAPI as the backend, React as the frontend, and Supabase for authentication and data.
Architecture Overview
React Frontend
↓ (auth via Supabase)
Supabase Auth + Database
↓ (JWT forwarding)
FastAPI Backend
↓ (LLM calls, usage tracking)
OpenAI / Anthropic API
Supabase handles authentication and stores user data. FastAPI handles the AI logic, rate limiting, and external API calls. React handles the UI.
Authentication Flow
Users authenticate through Supabase. The JWT is forwarded to FastAPI for backend calls:
// Frontend: Sign in and get session
const { data } = await supabase.auth.signInWithPassword({
email,
password,
});
// Include JWT in API calls
const response = await fetch("/api/generate", {
method: "POST",
headers: {
Authorization: `Bearer ${data.session.access_token}`,
"Content-Type": "application/json",
},
body: JSON.stringify({ prompt }),
});
FastAPI validates the Supabase JWT:
from fastapi import Depends, HTTPException
from fastapi.security import HTTPBearer
import jwt
security = HTTPBearer()
SUPABASE_JWT_SECRET = os.environ["SUPABASE_JWT_SECRET"]
async def get_current_user(credentials = Depends(security)):
try:
payload = jwt.decode(
credentials.credentials,
SUPABASE_JWT_SECRET,
algorithms=["HS256"],
audience="authenticated",
)
return payload
except jwt.InvalidTokenError:
raise HTTPException(status_code=401, detail="Invalid token")
LLM Integration with Streaming
Stream responses for a responsive UI:
from fastapi.responses import StreamingResponse
from openai import AsyncOpenAI
client = AsyncOpenAI()
@app.post("/api/generate")
async def generate(
request: GenerateRequest,
user = Depends(get_current_user),
):
# Check usage limits
usage = await get_usage(user["sub"])
if usage.tokens_used >= usage.token_limit:
raise HTTPException(429, "Token limit exceeded")
async def stream():
total_tokens = 0
response = await client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": request.prompt}],
stream=True,
)
async for chunk in response:
if chunk.choices[0].delta.content:
content = chunk.choices[0].delta.content
total_tokens += len(content.split())
yield f"data: {json.dumps({'content': content})}\n\n"
# Track usage after completion
await update_usage(user["sub"], total_tokens)
yield "data: [DONE]\n\n"
return StreamingResponse(stream(), media_type="text/event-stream")
Usage Tracking
Track tokens per user per billing period:
CREATE TABLE usage_records (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES auth.users(id),
tokens_used INTEGER NOT NULL,
model TEXT NOT NULL,
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX idx_usage_user_period ON usage_records(user_id, created_at);
async def get_usage(user_id: str) -> UsageStats:
row = await pool.fetchrow("""
SELECT COALESCE(SUM(tokens_used), 0) AS total
FROM usage_records
WHERE user_id = $1
AND created_at >= date_trunc('month', now())
""", user_id)
plan_limit = await get_plan_limit(user_id)
return UsageStats(tokens_used=row["total"], token_limit=plan_limit)
React Frontend with Streaming
Consume the SSE stream in React:
function useGenerate() {
const [output, setOutput] = useState("");
const [loading, setLoading] = useState(false);
const generate = async (prompt: string) => {
setLoading(true);
setOutput("");
const session = await supabase.auth.getSession();
const response = await fetch("/api/generate", {
method: "POST",
headers: {
Authorization: `Bearer ${session.data.session?.access_token}`,
"Content-Type": "application/json",
},
body: JSON.stringify({ prompt }),
});
const reader = response.body?.getReader();
const decoder = new TextDecoder();
while (reader) {
const { done, value } = await reader.read();
if (done) break;
const text = decoder.decode(value);
const lines = text.split("\n").filter((l) => l.startsWith("data: "));
for (const line of lines) {
const data = line.slice(6);
if (data === "[DONE]") break;
const parsed = JSON.parse(data);
setOutput((prev) => prev + parsed.content);
}
}
setLoading(false);
};
return { output, loading, generate };
}
Rate Limiting by Plan
from enum import Enum
class Plan(str, Enum):
FREE = "free"
PRO = "pro"
TEAM = "team"
PLAN_LIMITS = {
Plan.FREE: {"tokens_per_month": 10_000, "requests_per_minute": 5},
Plan.PRO: {"tokens_per_month": 500_000, "requests_per_minute": 60},
Plan.TEAM: {"tokens_per_month": 2_000_000, "requests_per_minute": 120},
}
Common Mistakes
Mistake 1: Storing API keys in the frontend. LLM API keys must live on the server only. The frontend calls your backend, which calls the LLM.
Mistake 2: Not streaming responses. LLM responses take 2-10 seconds. Without streaming, the user stares at a loading spinner. With streaming, they see words appear in real-time.
Mistake 3: No usage limits. Without per-user limits, one user can drain your entire API budget.
Mistake 4: Blocking LLM calls. LLM API calls take seconds. Use async clients and streaming to avoid blocking your event loop.
Takeaways
A full-stack AI SaaS combines Supabase for auth and data, FastAPI for backend logic and LLM integration, and React for the UI. Stream responses for a responsive experience. Track usage per user per billing period. Rate limit by plan. Keep API keys on the server. This stack scales from prototype to production without rewrites.