Original Idea
DNS Propagation Checker API A backend API that monitors DNS changes across global resolvers and webhooks you when propagation is complete.
Product Requirements Document: GlobalDNS Propagation Monitor API
1. Executive Summary
The GlobalDNS Propagation Monitor API is a high-concurrency, distributed backend service designed for infrastructure teams who require real-time verification of DNS record changes. Unlike manual "refresh-and-wait" tools, this platform utilizes a global network of 20+ regional resolver nodes to track propagation and triggers automated workflows via signed webhooks once a user-defined consistency threshold is met. By 2026 standards, it leverages durable execution and green-thread concurrency to ensure sub-second latency and 99.99% reliability.
2. Problem Statement
DNS migrations and record updates are high-stakes operations where "stale" records lead to service downtime or security vulnerabilities. Developers currently lack a programmatic way to wait for global consistency. Manual polling via browser-based tools is inefficient for CI/CD pipelines, and existing APIs often lack geographic granularity, DNSSEC validation, or secure webhook delivery, leading to "guess-and-check" deployment strategies.
3. Goals & Success Metrics
- Accuracy: 100% parity between API-reported status and actual regional DNS resolution.
- Latency: Average time from global resolution to webhook trigger < 5 seconds.
- Scalability: Support 10,000+ concurrent "Watch Jobs" without performance degradation.
- Success Metric: 30% reduction in DNS-related incident duration for teams using the automated webhook trigger.
- Success Metric: Achieve a 95% "First-Time Success" rate for automated CI/CD rollouts using the API.
4. User Personas
- SRE/DevOps Engineer: Needs to automate the "verification" step in a Terraform or GitHub Actions pipeline.
- Cloud Architect: Needs to ensure global traffic is hitting new endpoints after a multi-region failover.
- Full-stack Developer: Needs an easy-to-integrate API to notify them when their personal site migration is complete.
5. User Stories
- As a DevOps Engineer, I want to specify a "target consistency" (e.g., 90% of nodes) so that my automated migration can proceed as soon as the majority of the world sees the new IP.
- As an SRE, I want to receive a signed HMAC webhook so that my internal services can trust the propagation signal and update load balancer weights.
- As a System Administrator, I want to monitor DNSSEC status during a key rollover to ensure I haven't broken resolution for validating clients.
6. Functional Requirements
6.1 Core Monitoring Engine
- Record Support: A, AAAA, CNAME, MX, TXT, NS, PTR, SOA.
- Global Resolution: Query from 20+ regions simultaneously (AWS/GCP/Edge nodes).
- Recursive Check: Option to query authoritative servers directly or standard public resolvers (8.8.8.8, 1.1.1.1).
- DNSSEC Validation: Cryptographic verification of RRSIGs and chain-of-trust walking.
6.2 Job Management API
- CRUD Operations: Create, Read, Pause, and Delete "Watch Jobs."
- Custom Polling: User-defined intervals (minimum 30 seconds).
- Persistence: Historical resolution logs stored for 30 days.
6.3 Notification System
- Webhooks: Outbound POST requests with
Secure Webhook Token (SWT)headers. - Integrations: Native Slack/PagerDuty alerts for job completion or timeouts.
7. Technical Requirements
7.1 Backend (Go 1.25.6 & Node.js 22+)
- DNS Worker (Go 1.25.6):
- Use
GOEXPERIMENT=greenteagcfor high-throughput memory management. - Library:
miekg/dnsv2 for wire-format parsing. - Concurrency: Worker pool pattern using
golang.org/x/sync/semaphore.
- Use
- API Orchestrator (Node.js 22+):
- Framework: Fastify with native Web Streams.
- Durable Execution: Inngest for managing the "Wait-Poll-Notify" lifecycle.
7.2 Frontend (React 19.2+, Vite 6)
- Framework: React 19.2+ using
useEffectEventfor stable WebSocket connections. - Styling: Tailwind CSS 4 (Oxide Engine) with native container queries.
- State: Zustand for atomic updates of real-time monitoring cards.
7.3 Infrastructure & Database
- Database: PostgreSQL 18 with UUIDv7 primary keys and Range Partitioning on
logs. - Cache/Queue: Redis using
ZSETfor job scheduling andStreamsfor result ingestion. - Networking: Anycast-native routing via AWS Global Accelerator.
8. Data Model
8.1 WatchJob (PostgreSQL)
id: UUIDv7 (Primary Key)user_id: UUIDv7 (FK)domain: String (e.g., "api.example.com")record_type: Enum (A, MX, etc.)expected_value: Stringconsistency_threshold: Integer (0-100)status: Enum (PENDING, ACTIVE, COMPLETED, FAILED)
8.2 ResolverNode (PostgreSQL)
id: String (e.g., "aws-us-east-1")geo_region: Stringprovider: Stringis_active: Boolean
9. API Specification
9.1 Create a Watch Job
POST /v1/jobs
{
"domain": "google.com",
"type": "A",
"expected": "142.250.190.46",
"threshold": 90,
"webhook_url": "https://hooks.myapi.com/dns-updates",
"webhook_secret": "whsec_..."
}
Response: 201 Created with job_id.
9.2 Get Job Status
GET /v1/jobs/{id}
Response:
{
"job_id": "018e1234-...",
"global_consistency": 85,
"regional_results": [
{ "region": "us-east-1", "value": "142.250.190.46", "status": "MATCH" },
{ "region": "eu-central-1", "value": "1.1.1.1", "status": "MISMATCH" }
]
}
10. UI/UX Requirements
- Live Propagation Map: A 3D globe (using Three.js) showing real-time pings from 20+ nodes.
- Data Islands: Individual widgets for DNSSEC status, TTL countdown, and Latency charts.
- Real-time Log Stream: A terminal-style output showing raw DNS responses as they arrive.
- Theme: "Obsidian Dark" by default, utilizing Tailwind 4's P3 color palette for high-contrast status indicators.
11. Non-Functional Requirements
- Security: HMAC-SHA256 signatures for webhooks; 5-minute replay attack tolerance window.
- Performance: API response time (99th percentile) < 150ms.
- Availability: 99.99% via multi-cloud (AWS + GCP) deployment.
- Compliance: GDPR-compliant audit logs for all DNS queries.
12. Out of Scope
- DNS Registrar services (we do not sell domains).
- Managed DNS Hosting (we do not host zones).
- General Website Uptime Monitoring (HTTP checks).
13. Risks & Mitigations
- Risk: DNS Amplification attacks using our API.
- Mitigation: Strict rate limiting per API key and truncation of responses > 512 bytes.
- Risk: Root KSK Rollover (Oct 2026).
- Mitigation: Use
gopkg.in/n.v0/dnssec/trustto automate trust anchor updates.
- Mitigation: Use
- Risk: High infrastructure costs for 20+ regions.
- Mitigation: Use Lambda/GCF serverless nodes that only spin up during active polling intervals.
14. Implementation Tasks
Phase 1: Project Setup & Core Worker
- [ ] Initialize Go worker project with Go 1.25.6.
- [ ] Implement DNS query logic using
miekg/dnsv2. - [ ] Setup PostgreSQL 18 with UUIDv7 support and
io_method = aio. - [ ] Configure Redis with ZSET for the primary scheduling loop.
Phase 2: Orchestration & API
- [ ] Initialize Node.js 22 Fastify project with ESM.
- [ ] Integrate Inngest for durable webhook retry logic.
- [ ] Implement HMAC-SHA256 signing using the Web Crypto API.
- [ ] Build the
/v1/jobsCRUD endpoints.
Phase 3: Frontend & Real-time UI
- [ ] Scaffold React 19.2 / Vite 6 project.
- [ ] Configure Tailwind CSS 4 with Oxide engine.
- [ ] Build the Global Map component using
useEffectEventfor WebSocket data. - [ ] Implement Dashboard "Data Islands" with Suspense boundaries.
Phase 4: Security & DevOps
- [ ] Deploy regional worker nodes to 20+ AWS/GCP regions using Terraform.
- [ ] Implement Anycast routing via AWS Global Accelerator.
- [ ] Set up automated Root KSK trust anchor management.
- [ ] Finalize rate-limiting logic to prevent DNS amplification.