GlobalDNS Propagation Monitor API

Developer

Original Idea

DNS Propagation Checker API A backend API that monitors DNS changes across global resolvers and webhooks you when propagation is complete.

Product Requirements Document: GlobalDNS Propagation Monitor API

1. Executive Summary

The GlobalDNS Propagation Monitor API is a high-concurrency, distributed backend service designed for infrastructure teams who require real-time verification of DNS record changes. Unlike manual "refresh-and-wait" tools, this platform utilizes a global network of 20+ regional resolver nodes to track propagation and triggers automated workflows via signed webhooks once a user-defined consistency threshold is met. By 2026 standards, it leverages durable execution and green-thread concurrency to ensure sub-second latency and 99.99% reliability.

2. Problem Statement

DNS migrations and record updates are high-stakes operations where "stale" records lead to service downtime or security vulnerabilities. Developers currently lack a programmatic way to wait for global consistency. Manual polling via browser-based tools is inefficient for CI/CD pipelines, and existing APIs often lack geographic granularity, DNSSEC validation, or secure webhook delivery, leading to "guess-and-check" deployment strategies.

3. Goals & Success Metrics

  • Accuracy: 100% parity between API-reported status and actual regional DNS resolution.
  • Latency: Average time from global resolution to webhook trigger < 5 seconds.
  • Scalability: Support 10,000+ concurrent "Watch Jobs" without performance degradation.
  • Success Metric: 30% reduction in DNS-related incident duration for teams using the automated webhook trigger.
  • Success Metric: Achieve a 95% "First-Time Success" rate for automated CI/CD rollouts using the API.

4. User Personas

  • SRE/DevOps Engineer: Needs to automate the "verification" step in a Terraform or GitHub Actions pipeline.
  • Cloud Architect: Needs to ensure global traffic is hitting new endpoints after a multi-region failover.
  • Full-stack Developer: Needs an easy-to-integrate API to notify them when their personal site migration is complete.

5. User Stories

  • As a DevOps Engineer, I want to specify a "target consistency" (e.g., 90% of nodes) so that my automated migration can proceed as soon as the majority of the world sees the new IP.
  • As an SRE, I want to receive a signed HMAC webhook so that my internal services can trust the propagation signal and update load balancer weights.
  • As a System Administrator, I want to monitor DNSSEC status during a key rollover to ensure I haven't broken resolution for validating clients.

6. Functional Requirements

6.1 Core Monitoring Engine

  • Record Support: A, AAAA, CNAME, MX, TXT, NS, PTR, SOA.
  • Global Resolution: Query from 20+ regions simultaneously (AWS/GCP/Edge nodes).
  • Recursive Check: Option to query authoritative servers directly or standard public resolvers (8.8.8.8, 1.1.1.1).
  • DNSSEC Validation: Cryptographic verification of RRSIGs and chain-of-trust walking.

6.2 Job Management API

  • CRUD Operations: Create, Read, Pause, and Delete "Watch Jobs."
  • Custom Polling: User-defined intervals (minimum 30 seconds).
  • Persistence: Historical resolution logs stored for 30 days.

6.3 Notification System

  • Webhooks: Outbound POST requests with Secure Webhook Token (SWT) headers.
  • Integrations: Native Slack/PagerDuty alerts for job completion or timeouts.

7. Technical Requirements

7.1 Backend (Go 1.25.6 & Node.js 22+)

  • DNS Worker (Go 1.25.6):
    • Use GOEXPERIMENT=greenteagc for high-throughput memory management.
    • Library: miekg/dns v2 for wire-format parsing.
    • Concurrency: Worker pool pattern using golang.org/x/sync/semaphore.
  • API Orchestrator (Node.js 22+):
    • Framework: Fastify with native Web Streams.
    • Durable Execution: Inngest for managing the "Wait-Poll-Notify" lifecycle.

7.2 Frontend (React 19.2+, Vite 6)

  • Framework: React 19.2+ using useEffectEvent for stable WebSocket connections.
  • Styling: Tailwind CSS 4 (Oxide Engine) with native container queries.
  • State: Zustand for atomic updates of real-time monitoring cards.

7.3 Infrastructure & Database

  • Database: PostgreSQL 18 with UUIDv7 primary keys and Range Partitioning on logs.
  • Cache/Queue: Redis using ZSET for job scheduling and Streams for result ingestion.
  • Networking: Anycast-native routing via AWS Global Accelerator.

8. Data Model

8.1 WatchJob (PostgreSQL)

  • id: UUIDv7 (Primary Key)
  • user_id: UUIDv7 (FK)
  • domain: String (e.g., "api.example.com")
  • record_type: Enum (A, MX, etc.)
  • expected_value: String
  • consistency_threshold: Integer (0-100)
  • status: Enum (PENDING, ACTIVE, COMPLETED, FAILED)

8.2 ResolverNode (PostgreSQL)

  • id: String (e.g., "aws-us-east-1")
  • geo_region: String
  • provider: String
  • is_active: Boolean

9. API Specification

9.1 Create a Watch Job

POST /v1/jobs

{
  "domain": "google.com",
  "type": "A",
  "expected": "142.250.190.46",
  "threshold": 90,
  "webhook_url": "https://hooks.myapi.com/dns-updates",
  "webhook_secret": "whsec_..."
}

Response: 201 Created with job_id.

9.2 Get Job Status

GET /v1/jobs/{id} Response:

{
  "job_id": "018e1234-...",
  "global_consistency": 85,
  "regional_results": [
    { "region": "us-east-1", "value": "142.250.190.46", "status": "MATCH" },
    { "region": "eu-central-1", "value": "1.1.1.1", "status": "MISMATCH" }
  ]
}

10. UI/UX Requirements

  • Live Propagation Map: A 3D globe (using Three.js) showing real-time pings from 20+ nodes.
  • Data Islands: Individual widgets for DNSSEC status, TTL countdown, and Latency charts.
  • Real-time Log Stream: A terminal-style output showing raw DNS responses as they arrive.
  • Theme: "Obsidian Dark" by default, utilizing Tailwind 4's P3 color palette for high-contrast status indicators.

11. Non-Functional Requirements

  • Security: HMAC-SHA256 signatures for webhooks; 5-minute replay attack tolerance window.
  • Performance: API response time (99th percentile) < 150ms.
  • Availability: 99.99% via multi-cloud (AWS + GCP) deployment.
  • Compliance: GDPR-compliant audit logs for all DNS queries.

12. Out of Scope

  • DNS Registrar services (we do not sell domains).
  • Managed DNS Hosting (we do not host zones).
  • General Website Uptime Monitoring (HTTP checks).

13. Risks & Mitigations

  • Risk: DNS Amplification attacks using our API.
    • Mitigation: Strict rate limiting per API key and truncation of responses > 512 bytes.
  • Risk: Root KSK Rollover (Oct 2026).
    • Mitigation: Use gopkg.in/n.v0/dnssec/trust to automate trust anchor updates.
  • Risk: High infrastructure costs for 20+ regions.
    • Mitigation: Use Lambda/GCF serverless nodes that only spin up during active polling intervals.

14. Implementation Tasks

Phase 1: Project Setup & Core Worker

  • [ ] Initialize Go worker project with Go 1.25.6.
  • [ ] Implement DNS query logic using miekg/dns v2.
  • [ ] Setup PostgreSQL 18 with UUIDv7 support and io_method = aio.
  • [ ] Configure Redis with ZSET for the primary scheduling loop.

Phase 2: Orchestration & API

  • [ ] Initialize Node.js 22 Fastify project with ESM.
  • [ ] Integrate Inngest for durable webhook retry logic.
  • [ ] Implement HMAC-SHA256 signing using the Web Crypto API.
  • [ ] Build the /v1/jobs CRUD endpoints.

Phase 3: Frontend & Real-time UI

  • [ ] Scaffold React 19.2 / Vite 6 project.
  • [ ] Configure Tailwind CSS 4 with Oxide engine.
  • [ ] Build the Global Map component using useEffectEvent for WebSocket data.
  • [ ] Implement Dashboard "Data Islands" with Suspense boundaries.

Phase 4: Security & DevOps

  • [ ] Deploy regional worker nodes to 20+ AWS/GCP regions using Terraform.
  • [ ] Implement Anycast routing via AWS Global Accelerator.
  • [ ] Set up automated Root KSK trust anchor management.
  • [ ] Finalize rate-limiting logic to prevent DNS amplification.