VerifyPro Contractor API

Developer

Original Idea

Contractor License Verifier API A backend API that checks contractor license status, insurance, and complaint history across state databases.

Product Requirements Document: VerifyPro Contractor API

1. Executive Summary

VerifyPro is a high-performance, developer-first API designed to automate the fragmented and manual process of verifying contractor licenses, insurance coverage, and disciplinary histories. By orchestrating real-time scrapes of state-level databases and utilizing AI-driven document extraction, VerifyPro provides insurance, real estate, and construction platforms with a single, standardized source of truth for contractor compliance.

2. Problem Statement

Contractor verification is currently a "fragmented data" nightmare. There are over 50 state-level licensing boards in the U.S., each with disparate websites, zero standardized APIs, and varying anti-bot protections. Businesses (InsurTech, PropTech) currently hire manual "compliance officers" to click through these portals or build fragile scrapers that break weekly. This leads to high administrative costs, increased liability from expired policies, and slow onboarding for subcontractors.

3. Goals & Success Metrics

  • Accuracy: Maintain a >99.8% accuracy rate for data extraction compared to manual portal lookups.
  • Latency: Achieve <2 seconds response time for cached data and <30 seconds for real-time asynchronous "Deep Verification" jobs.
  • Coverage: Support 45+ U.S. states within the first 12 months.
  • Developer Experience: Achieve a "Time to First Hello World" (API call) of under 5 minutes via the Tyk-powered developer portal.

4. User Personas

  • Devin (Software Engineer at PropTech Startup): Needs a reliable JSON output to automate the "Onboard Subcontractor" flow in their app.
  • Sarah (Insurance Underwriter): Needs a white-labeled PDF report to prove a contractor was compliant at the time a policy was issued.
  • Compliance Carlos (General Contractor): Needs automated alerts (webhooks) when a subcontractor's license is suspended or insurance expires.

5. User Stories

  • As a Developer, I want a unified JSON schema for all 50 states so that I don't have to write custom logic for different state board formats.
  • As an Underwriter, I want to verify the specific limits of a Workers’ Comp policy so that I can ensure the contractor meets our minimum liability requirements.
  • As a Marketplace Operator, I want to receive a webhook notification the moment a contractor’s license status changes to "Inactive" so that I can temporarily delist them.

6. Functional Requirements

  • Unified Lookup Engine: A single endpoint accepting license_number and state_code to return normalized contractor data.
  • Deep Insurance Verification: Automated extraction of policy numbers, carriers, and coverage limits from state insurance registries.
  • Asynchronous Processing: Support for long-running state portal queries using a "Submit-and-Poll" or Webhook pattern.
  • Audit Logging: Every verification must generate a unique audit_id with a hash of the raw source data for legal defensibility.
  • White-Label Reports: Generation of branded PDF compliance certificates via the API.
  • Developer Dashboard: A self-service portal for API key management, usage analytics, and billing.

7. Technical Requirements

Backend Stack

  • Framework: FastAPI v0.128.x (Python 3.9+ minimum) using Lifespan context managers for resource management.
  • Validation: Pydantic v2.7+ for Rust-powered high-speed data serialization.
  • Scraping Engine: Playwright (Async) for JS-heavy portals and HTTPX for static HTML boards.
  • Anti-Bot: curl_cffi for JA3/JA4 TLS fingerprint spoofing and residential proxy rotation.

Infrastructure & Orchestration

  • Orchestration: AWS Step Functions (Standard Workflow) for multi-step portal navigation.
  • Compute: AWS Lambda (ARM64/Graviton) running container images for headless browser execution.
  • Scaling: Step Functions Distributed Map to handle up to 10,000 concurrent state queries.
  • OCR: Azure AI Document Intelligence (Identity Model) for processing physical license photos.

API Management & Billing

  • Gateway: Tyk (Self-managed) for developer portal and API key lifecycle.
  • Billing: Stripe integration via Tyk webhooks for usage-based (per-call) monetization.

8. Data Model

Entity: Contractor

  • id: UUID
  • legal_name: String
  • tax_id_hash: String (SHA-256)
  • primary_state: Enum (US_STATES)

Entity: License (One-to-Many with Contractor)

  • license_number: String
  • status: Enum (Active, Expired, Suspended, Revoked)
  • issue_date: ISO Date
  • expiry_date: ISO Date
  • classifications: List[String]

Entity: InsurancePolicy

  • policy_type: Enum (General_Liability, Workers_Comp)
  • carrier: String
  • limit_amount: Decimal
  • is_active: Boolean

9. API Specification

POST /v1/verify/deep

Starts an asynchronous verification job. Request:

{
  "license_number": "123456",
  "state": "CA",
  "include_insurance": true,
  "callback_url": "https://client-site.com/webhook"
}

Response: 202 Accepted

{
  "job_id": "job_98765",
  "status": "processing",
  "estimated_seconds": 15
}

10. UI/UX Requirements

  • Dashboard: Built with React and Tremor (Tailwind-based components).
  • Command Palette: Implement cmdk for quick navigation between API keys and documentation.
  • Log Viewer: A monospace, searchable table showing real-time request/response payloads for debugging.
  • Visual Status: Animated "Pulse" badges for system health and "Progress Rings" for monthly API quota usage.

11. Non-Functional Requirements

  • Security: AES-256 encryption for all PII; SOC2 Type II compliance readiness.
  • Idempotency: Webhooks must support X-VerifyPro-Event-ID to prevent double-processing.
  • Resiliency: Circuit breakers to automatically skip state portals currently undergoing scheduled maintenance.

12. Out of Scope

  • Criminal background checks for individual employees (focus is on business-level licensing).
  • International license verification (MVP limited to U.S. states).
  • Direct payment of license renewal fees (Read-only verification).

13. Risks & Mitigations

  • Risk: State portals blocking scraper IPs.
    • Mitigation: Use of rotating residential proxies and TLS fingerprint spoofing via curl_cffi.
  • Risk: UI changes on state websites breaking parsers.
    • Mitigation: Amazon Bedrock (GenAI) agents to dynamically locate "Download" or "Search" buttons if CSS selectors fail.
  • Risk: PII data leakage.
    • Mitigation: Immediate hashing of Tax IDs and strict 30-day data retention policy for raw HTML captures.

14. Implementation Tasks

Phase 1: Project Setup

  • [ ] Initialize FastAPI v0.128.x project with Python 3.9+
  • [ ] Configure Pydantic v2.7 schemas for normalized license data
  • [ ] Set up Tyk Gateway and Developer Portal local environment
  • [ ] Configure CI/CD pipeline for AWS Lambda (ARM64) Container Images

Phase 2: Scraping & Orchestration

  • [ ] Build Playwright-based scraper for California (CSLB) as the pilot state
  • [ ] Implement curl_cffi middleware to spoof browser TLS handshakes
  • [ ] Create AWS Step Functions workflow for "Submit-and-Poll" state portals
  • [ ] Integrate Azure AI Document Intelligence for physical ID card OCR

Phase 3: API & Webhooks

  • [ ] Implement Signature Verification (HMAC-SHA256) for outgoing webhooks
  • [ ] Build the /v1/verify asynchronous endpoint logic
  • [ ] Set up Redis Streams for task decoupling and event buffering
  • [ ] Integrate Stripe with Tyk for per-request usage billing

Phase 4: Developer Experience (Frontend)

  • [ ] Build Dashboard using React + Tailwind CSS v4 + Tremor
  • [ ] Implement API Key generation and revocation UI
  • [ ] Create WeasyPrint-based PDF generator for branded compliance reports
  • [ ] Launch OpenAPI 3.1 interactive documentation site via Scalar