Building a Full-Stack AI SaaS with FastAPI, React and Supabase

January 31, 2026 · 13 min read · AI & Backend

Architecture and implementation patterns for building a full-stack AI SaaS with FastAPI, React, and Supabase including streaming and usage tracking.

Building a Full-Stack AI SaaS with FastAPI, React and Supabase

Building an AI SaaS means combining an LLM integration with user authentication, subscription billing, usage tracking, and a production frontend. Each piece has its own complexity, and the challenge is making them work together without over-engineering.

This guide covers the architecture and implementation patterns for a full-stack AI application using FastAPI as the backend, React as the frontend, and Supabase for authentication and data.

Architecture Overview

React Frontend
    ↓ (auth via Supabase)
Supabase Auth + Database
    ↓ (JWT forwarding)
FastAPI Backend
    ↓ (LLM calls, usage tracking)
OpenAI / Anthropic API

Supabase handles authentication and stores user data. FastAPI handles the AI logic, rate limiting, and external API calls. React handles the UI.

Authentication Flow

Users authenticate through Supabase. The JWT is forwarded to FastAPI for backend calls:

// Frontend: Sign in and get session
const { data } = await supabase.auth.signInWithPassword({
  email,
  password,
});

// Include JWT in API calls
const response = await fetch("/api/generate", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${data.session.access_token}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({ prompt }),
});

FastAPI validates the Supabase JWT:

from fastapi import Depends, HTTPException
from fastapi.security import HTTPBearer
import jwt

security = HTTPBearer()
SUPABASE_JWT_SECRET = os.environ["SUPABASE_JWT_SECRET"]

async def get_current_user(credentials = Depends(security)):
    try:
        payload = jwt.decode(
            credentials.credentials,
            SUPABASE_JWT_SECRET,
            algorithms=["HS256"],
            audience="authenticated",
        )
        return payload
    except jwt.InvalidTokenError:
        raise HTTPException(status_code=401, detail="Invalid token")

LLM Integration with Streaming

Stream responses for a responsive UI:

from fastapi.responses import StreamingResponse
from openai import AsyncOpenAI

client = AsyncOpenAI()

@app.post("/api/generate")
async def generate(
    request: GenerateRequest,
    user = Depends(get_current_user),
):
    # Check usage limits
    usage = await get_usage(user["sub"])
    if usage.tokens_used >= usage.token_limit:
        raise HTTPException(429, "Token limit exceeded")

    async def stream():
        total_tokens = 0
        response = await client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": request.prompt}],
            stream=True,
        )
        async for chunk in response:
            if chunk.choices[0].delta.content:
                content = chunk.choices[0].delta.content
                total_tokens += len(content.split())
                yield f"data: {json.dumps({'content': content})}\n\n"

        # Track usage after completion
        await update_usage(user["sub"], total_tokens)
        yield "data: [DONE]\n\n"

    return StreamingResponse(stream(), media_type="text/event-stream")

Usage Tracking

Track tokens per user per billing period:

CREATE TABLE usage_records (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL REFERENCES auth.users(id),
    tokens_used INTEGER NOT NULL,
    model TEXT NOT NULL,
    created_at TIMESTAMPTZ DEFAULT now()
);

CREATE INDEX idx_usage_user_period ON usage_records(user_id, created_at);

async def get_usage(user_id: str) -> UsageStats:
    row = await pool.fetchrow("""
        SELECT COALESCE(SUM(tokens_used), 0) AS total
        FROM usage_records
        WHERE user_id = $1
          AND created_at >= date_trunc('month', now())
    """, user_id)

    plan_limit = await get_plan_limit(user_id)
    return UsageStats(tokens_used=row["total"], token_limit=plan_limit)

React Frontend with Streaming

Consume the SSE stream in React:

function useGenerate() {
  const [output, setOutput] = useState("");
  const [loading, setLoading] = useState(false);

  const generate = async (prompt: string) => {
    setLoading(true);
    setOutput("");

    const session = await supabase.auth.getSession();
    const response = await fetch("/api/generate", {
      method: "POST",
      headers: {
        Authorization: `Bearer ${session.data.session?.access_token}`,
        "Content-Type": "application/json",
      },
      body: JSON.stringify({ prompt }),
    });

    const reader = response.body?.getReader();
    const decoder = new TextDecoder();

    while (reader) {
      const { done, value } = await reader.read();
      if (done) break;

      const text = decoder.decode(value);
      const lines = text.split("\n").filter((l) => l.startsWith("data: "));

      for (const line of lines) {
        const data = line.slice(6);
        if (data === "[DONE]") break;
        const parsed = JSON.parse(data);
        setOutput((prev) => prev + parsed.content);
      }
    }

    setLoading(false);
  };

  return { output, loading, generate };
}

Rate Limiting by Plan

from enum import Enum

class Plan(str, Enum):
    FREE = "free"
    PRO = "pro"
    TEAM = "team"

PLAN_LIMITS = {
    Plan.FREE: {"tokens_per_month": 10_000, "requests_per_minute": 5},
    Plan.PRO: {"tokens_per_month": 500_000, "requests_per_minute": 60},
    Plan.TEAM: {"tokens_per_month": 2_000_000, "requests_per_minute": 120},
}

Common Mistakes

Mistake 1: Storing API keys in the frontend. LLM API keys must live on the server only. The frontend calls your backend, which calls the LLM.

Mistake 2: Not streaming responses. LLM responses take 2-10 seconds. Without streaming, the user stares at a loading spinner. With streaming, they see words appear in real-time.

Mistake 3: No usage limits. Without per-user limits, one user can drain your entire API budget.

Mistake 4: Blocking LLM calls. LLM API calls take seconds. Use async clients and streaming to avoid blocking your event loop.

Takeaways

A full-stack AI SaaS combines Supabase for auth and data, FastAPI for backend logic and LLM integration, and React for the UI. Stream responses for a responsive experience. Track usage per user per billing period. Rate limit by plan. Keep API keys on the server. This stack scales from prototype to production without rewrites.

Building a Full-Stack AI SaaS with FastAPI, React and Supabase

Architecture Overview

Authentication Flow

LLM Integration with Streaming

Usage Tracking

React Frontend with Streaming

Rate Limiting by Plan

Common Mistakes

Takeaways

Read Next