FlagForge

Developer

Original Idea

Feature Flag Service A backend API with a lightweight web console for targeting, rollouts, and audit logs.

Product Requirements Document (PRD): FlagForge

1. Executive Summary

FlagForge is a high-performance, developer-first Feature Flagging and Toggle Management platform. By decoupling code deployments from feature releases, FlagForge enables engineering teams to perform risk-free canary rollouts, targeted user segmenting, and real-time kill-switch actions. The system is designed for ultra-low latency (<20ms evaluation) and enterprise-grade governance through immutable audit logs and multi-tenant RBAC.

2. Problem Statement

Modern engineering teams face high risks during production deployments because code changes and feature activations are coupled. Currently, teams struggle with:

  • Deployment Anxiety: Inability to "turn off" a buggy feature without a full rollback.
  • Rigid Targeting: Hardcoded logic for beta testers or regional rollouts requires code redeployments.
  • Lack of Visibility: No central source of truth or audit trail for who changed what configuration and when.
  • Performance Bottlenecks: Existing solutions often add significant latency to application load times.

3. Goals & Success Metrics

  • Goal 1: Sub-20ms flag evaluation latency at the edge.
  • Goal 2: 99.99% availability for the evaluation API.
  • Goal 3: 100% traceability for configuration changes via immutable logs.
  • Metrics:
    • MTTR (Mean Time To Recovery): Reduced by 50% using real-time kill switches.
    • Deployment Frequency: Increased by 3x by allowing dark launches.
    • SDK Overhead: Less than 5kb added to client bundles.

4. User Personas

  • Software Engineer (Devin): Wants easy-to-integrate SDKs and local caching so the app doesn't slow down.
  • Product Manager (Patty): Wants to toggle features for specific "Beta" users without asking engineers for help.
  • SRE/DevOps (Sam): Needs to see audit logs during an incident and perform percentage-based rollouts to monitor system health.
  • QA Engineer (Quincy): Needs to force-enable specific flag variants for testing purposes in staging environments.

5. User Stories

  • As an Engineer, I want to define a flag in the console so that I can wrap my new code in a conditional block before it is finished.
  • As a PM, I want to rollout a feature to 10% of users in the US so that I can measure impact before a full release.
  • As an SRE, I want an immutable audit log of all changes so that I can identify the root cause of a configuration-induced outage.
  • As an Admin, I want to require a "four-eyes" approval for any production flag change to prevent accidental "kill-switch" triggers.
  • As a Developer, I want real-time updates via SSE so that my application reacts to flag changes instantly without polling.

6. Functional Requirements

6.1 Flag Management

  • Flag Types: Support for Boolean (On/Off) and Multivariate (Strings, JSON, Numbers).
  • Environments: Default support for Development, Staging, and Production with unique API keys.
  • Kill Switch: Global override to disable any flag instantly across all segments.

6.2 Targeting & Rollouts

  • Attribute-Based Rules: Target users by email, region, plan type, or custom attributes.
  • Percentage Rollouts: Deterministic canary releases using MurmurHash3 to ensure user stickiness.
  • Scheduled Toggles: Ability to schedule a flag to turn on/off at a specific UTC timestamp.

6.3 Governance & Security

  • Audit Logs: Immutable record of actor, action, timestamp, old_state, and new_state.
  • Four-Eyes Approval: Optional workflow requiring a second authorized user to approve production changes.
  • RBAC: Define roles (Admin, Editor, Viewer) at the Project and Environment levels.

7. Technical Requirements

7.1 Tech Stack (2026 Standards)

  • Frontend: React v19.2.1 (utilizing Server Components and the use hook), Tailwind CSS v4.x (Oxide engine), TanStack Query v5.90.19.
  • Backend: Go (Golang) v1.26 (utilizing encoding/json/v2 for zero-alloc parsing and the "Green Tea" GC).
  • Database:
    • PostgreSQL: Primary store for configuration and Audit Logs (using Row-Level Security).
    • Redis: Low-latency caching and Pub/Sub for real-time flag invalidation.
  • Auth/Multi-tenancy: Clerk (Organization Management for B2B multi-tenancy).

7.2 Architecture Patterns

  • Real-time Updates: Server-Sent Events (SSE) over HTTP/3 for unidirectional push.
  • Deterministic Hashing: murmur3 for consistent bucket assignment in canary rollouts.
  • Edge Delivery: CloudFront Functions for <1ms evaluation logic using CloudFront KeyValueStore.

8. Data Model

| Entity | Attributes | Relationships | | :--- | :--- | :--- | | Project | id, name, org_id | Has many Environments, Flags. | | Environment | id, name, api_key_hash, project_id | Belongs to Project. | | FeatureFlag | id, key, type, description | Belongs to Project. | | TargetingRule | id, flag_id, env_id, priority, conditions (JSONB) | Belongs to Flag + Env. | | AuditLog | id, env_id, actor_id, changes (JSONB), created_at | Immutable; Belongs to Env. |

9. API Specification

9.1 Evaluation API (SDK Consumption)

POST /v1/evaluate

  • Request: { "flagKey": "new-ui", "context": { "userId": "123", "region": "us-east-1" } }
  • Response: { "value": true, "variant": "on", "reason": "rule_match" }

9.2 Management API (Console)

PATCH /v1/flags/:id/target

  • Request: { "rollout_percentage": 25, "rules": [...] }
  • Security: Requires Clerk JWT + Environment Editor permissions.

10. UI/UX Requirements

  • Dashboard: A list view of flags with status indicators (Active, Inactive, Stale).
  • Rule Builder: A "natural language" style UI for creating rules (e.g., "If email ends with @company.com → show Variant A").
  • Diff View: When requesting an approval, show a JSON/Visual diff of the rule changes.
  • Performance Metrics: Small sparklines in the UI showing the evaluation frequency of each flag.

11. Non-Functional Requirements

  • Latency: Evaluation endpoint P99 < 20ms.
  • Scalability: Support up to 100,000 concurrent SSE connections per Go node.
  • Immutability: Audit logs must use PostgreSQL RLS to prevent DELETE or UPDATE operations.
  • SDK Safety: SDKs must fail-open (return default value) if the API is unreachable.

12. Out of Scope

  • Native Mobile SDKs (Swift/Kotlin) for Phase 1 (Web SDK only).
  • Automatic flag cleanup (detecting unused code in GitHub).
  • Advanced ML-driven automated rollbacks (manual kill-switch only for MVP).

13. Risks & Mitigations

  • Risk: Flag evaluation adds latency to client apps.
    • Mitigation: Local in-memory caching in SDKs + SSE for background updates.
  • Risk: Database load from millions of SDK calls.
    • Mitigation: Redis cache layer + CloudFront KeyValueStore for edge evaluation.
  • Risk: Accidental production outages via misconfiguration.
    • Mitigation: Mandatory "Four-Eyes" approval and environment-scoped API keys.

14. Implementation Tasks

Phase 1: Project Setup & Core API

  • [ ] Initialize Go 1.26 Backend with Fiber v3.
  • [ ] Initialize React 19.2.1 Frontend with Tailwind 4.x.
  • [ ] Configure Clerk for Multi-tenant Organization management.
  • [ ] Set up PostgreSQL schema with audit schema isolation.

Phase 2: Flag Evaluation Engine

  • [ ] Implement MurmurHash3 logic for percentage rollouts in Go.
  • [ ] Build Rule Evaluation engine (Bexpr or RuleGo).
  • [ ] Create /v1/evaluate high-concurrency endpoint.
  • [ ] Implement Redis caching strategy for flag definitions.

Phase 3: Dashboard & Governance

  • [ ] Build Flag Management UI (CRUD operations).
  • [ ] Implement "Four-Eyes" approval workflow (Change Requests).
  • [ ] Build Immutable Audit Log viewer with JSONB diffing.
  • [ ] Integrate SSE for real-time dashboard updates.

Phase 4: Edge & SDKs

  • [ ] Develop TypeScript Lightweight SDK with stale-while-revalidate caching.
  • [ ] Deploy CloudFront Functions for Edge Evaluation logic.
  • [ ] Implement Redis Pub/Sub to trigger SSE events on flag changes.
  • [ ] End-to-end load testing to ensure <20ms latency.