Original Idea
Feature Flag Service A backend API with a lightweight web console for targeting, rollouts, and audit logs.
Product Requirements Document (PRD): FlagForge
1. Executive Summary
FlagForge is a high-performance, developer-first Feature Flagging and Toggle Management platform. By decoupling code deployments from feature releases, FlagForge enables engineering teams to perform risk-free canary rollouts, targeted user segmenting, and real-time kill-switch actions. The system is designed for ultra-low latency (<20ms evaluation) and enterprise-grade governance through immutable audit logs and multi-tenant RBAC.
2. Problem Statement
Modern engineering teams face high risks during production deployments because code changes and feature activations are coupled. Currently, teams struggle with:
- Deployment Anxiety: Inability to "turn off" a buggy feature without a full rollback.
- Rigid Targeting: Hardcoded logic for beta testers or regional rollouts requires code redeployments.
- Lack of Visibility: No central source of truth or audit trail for who changed what configuration and when.
- Performance Bottlenecks: Existing solutions often add significant latency to application load times.
3. Goals & Success Metrics
- Goal 1: Sub-20ms flag evaluation latency at the edge.
- Goal 2: 99.99% availability for the evaluation API.
- Goal 3: 100% traceability for configuration changes via immutable logs.
- Metrics:
- MTTR (Mean Time To Recovery): Reduced by 50% using real-time kill switches.
- Deployment Frequency: Increased by 3x by allowing dark launches.
- SDK Overhead: Less than 5kb added to client bundles.
4. User Personas
- Software Engineer (Devin): Wants easy-to-integrate SDKs and local caching so the app doesn't slow down.
- Product Manager (Patty): Wants to toggle features for specific "Beta" users without asking engineers for help.
- SRE/DevOps (Sam): Needs to see audit logs during an incident and perform percentage-based rollouts to monitor system health.
- QA Engineer (Quincy): Needs to force-enable specific flag variants for testing purposes in staging environments.
5. User Stories
- As an Engineer, I want to define a flag in the console so that I can wrap my new code in a conditional block before it is finished.
- As a PM, I want to rollout a feature to 10% of users in the US so that I can measure impact before a full release.
- As an SRE, I want an immutable audit log of all changes so that I can identify the root cause of a configuration-induced outage.
- As an Admin, I want to require a "four-eyes" approval for any production flag change to prevent accidental "kill-switch" triggers.
- As a Developer, I want real-time updates via SSE so that my application reacts to flag changes instantly without polling.
6. Functional Requirements
6.1 Flag Management
- Flag Types: Support for Boolean (On/Off) and Multivariate (Strings, JSON, Numbers).
- Environments: Default support for Development, Staging, and Production with unique API keys.
- Kill Switch: Global override to disable any flag instantly across all segments.
6.2 Targeting & Rollouts
- Attribute-Based Rules: Target users by email, region, plan type, or custom attributes.
- Percentage Rollouts: Deterministic canary releases using MurmurHash3 to ensure user stickiness.
- Scheduled Toggles: Ability to schedule a flag to turn on/off at a specific UTC timestamp.
6.3 Governance & Security
- Audit Logs: Immutable record of
actor,action,timestamp,old_state, andnew_state. - Four-Eyes Approval: Optional workflow requiring a second authorized user to approve production changes.
- RBAC: Define roles (Admin, Editor, Viewer) at the Project and Environment levels.
7. Technical Requirements
7.1 Tech Stack (2026 Standards)
- Frontend: React v19.2.1 (utilizing Server Components and the
usehook), Tailwind CSS v4.x (Oxide engine), TanStack Query v5.90.19. - Backend: Go (Golang) v1.26 (utilizing
encoding/json/v2for zero-alloc parsing and the "Green Tea" GC). - Database:
- PostgreSQL: Primary store for configuration and Audit Logs (using Row-Level Security).
- Redis: Low-latency caching and Pub/Sub for real-time flag invalidation.
- Auth/Multi-tenancy: Clerk (Organization Management for B2B multi-tenancy).
7.2 Architecture Patterns
- Real-time Updates: Server-Sent Events (SSE) over HTTP/3 for unidirectional push.
- Deterministic Hashing:
murmur3for consistent bucket assignment in canary rollouts. - Edge Delivery: CloudFront Functions for <1ms evaluation logic using CloudFront KeyValueStore.
8. Data Model
| Entity | Attributes | Relationships |
| :--- | :--- | :--- |
| Project | id, name, org_id | Has many Environments, Flags. |
| Environment | id, name, api_key_hash, project_id | Belongs to Project. |
| FeatureFlag | id, key, type, description | Belongs to Project. |
| TargetingRule | id, flag_id, env_id, priority, conditions (JSONB) | Belongs to Flag + Env. |
| AuditLog | id, env_id, actor_id, changes (JSONB), created_at | Immutable; Belongs to Env. |
9. API Specification
9.1 Evaluation API (SDK Consumption)
POST /v1/evaluate
- Request:
{ "flagKey": "new-ui", "context": { "userId": "123", "region": "us-east-1" } } - Response:
{ "value": true, "variant": "on", "reason": "rule_match" }
9.2 Management API (Console)
PATCH /v1/flags/:id/target
- Request:
{ "rollout_percentage": 25, "rules": [...] } - Security: Requires Clerk JWT + Environment Editor permissions.
10. UI/UX Requirements
- Dashboard: A list view of flags with status indicators (Active, Inactive, Stale).
- Rule Builder: A "natural language" style UI for creating rules (e.g., "If
emailends with@company.com→ showVariant A"). - Diff View: When requesting an approval, show a JSON/Visual diff of the rule changes.
- Performance Metrics: Small sparklines in the UI showing the evaluation frequency of each flag.
11. Non-Functional Requirements
- Latency: Evaluation endpoint P99 < 20ms.
- Scalability: Support up to 100,000 concurrent SSE connections per Go node.
- Immutability: Audit logs must use PostgreSQL RLS to prevent
DELETEorUPDATEoperations. - SDK Safety: SDKs must fail-open (return default value) if the API is unreachable.
12. Out of Scope
- Native Mobile SDKs (Swift/Kotlin) for Phase 1 (Web SDK only).
- Automatic flag cleanup (detecting unused code in GitHub).
- Advanced ML-driven automated rollbacks (manual kill-switch only for MVP).
13. Risks & Mitigations
- Risk: Flag evaluation adds latency to client apps.
- Mitigation: Local in-memory caching in SDKs + SSE for background updates.
- Risk: Database load from millions of SDK calls.
- Mitigation: Redis cache layer + CloudFront KeyValueStore for edge evaluation.
- Risk: Accidental production outages via misconfiguration.
- Mitigation: Mandatory "Four-Eyes" approval and environment-scoped API keys.
14. Implementation Tasks
Phase 1: Project Setup & Core API
- [ ] Initialize Go 1.26 Backend with Fiber v3.
- [ ] Initialize React 19.2.1 Frontend with Tailwind 4.x.
- [ ] Configure Clerk for Multi-tenant Organization management.
- [ ] Set up PostgreSQL schema with
auditschema isolation.
Phase 2: Flag Evaluation Engine
- [ ] Implement MurmurHash3 logic for percentage rollouts in Go.
- [ ] Build Rule Evaluation engine (Bexpr or RuleGo).
- [ ] Create
/v1/evaluatehigh-concurrency endpoint. - [ ] Implement Redis caching strategy for flag definitions.
Phase 3: Dashboard & Governance
- [ ] Build Flag Management UI (CRUD operations).
- [ ] Implement "Four-Eyes" approval workflow (Change Requests).
- [ ] Build Immutable Audit Log viewer with JSONB diffing.
- [ ] Integrate SSE for real-time dashboard updates.
Phase 4: Edge & SDKs
- [ ] Develop TypeScript Lightweight SDK with stale-while-revalidate caching.
- [ ] Deploy CloudFront Functions for Edge Evaluation logic.
- [ ] Implement Redis Pub/Sub to trigger SSE events on flag changes.
- [ ] End-to-end load testing to ensure <20ms latency.