Original Idea
Contractor License Verifier API A backend API that checks contractor license status, insurance, and complaint history across state databases.
Product Requirements Document: VerifyPro Contractor API
1. Executive Summary
VerifyPro is a high-performance, developer-first API designed to automate the fragmented and manual process of verifying contractor licenses, insurance coverage, and disciplinary histories. By orchestrating real-time scrapes of state-level databases and utilizing AI-driven document extraction, VerifyPro provides insurance, real estate, and construction platforms with a single, standardized source of truth for contractor compliance.
2. Problem Statement
Contractor verification is currently a "fragmented data" nightmare. There are over 50 state-level licensing boards in the U.S., each with disparate websites, zero standardized APIs, and varying anti-bot protections. Businesses (InsurTech, PropTech) currently hire manual "compliance officers" to click through these portals or build fragile scrapers that break weekly. This leads to high administrative costs, increased liability from expired policies, and slow onboarding for subcontractors.
3. Goals & Success Metrics
- Accuracy: Maintain a >99.8% accuracy rate for data extraction compared to manual portal lookups.
- Latency: Achieve <2 seconds response time for cached data and <30 seconds for real-time asynchronous "Deep Verification" jobs.
- Coverage: Support 45+ U.S. states within the first 12 months.
- Developer Experience: Achieve a "Time to First Hello World" (API call) of under 5 minutes via the Tyk-powered developer portal.
4. User Personas
- Devin (Software Engineer at PropTech Startup): Needs a reliable JSON output to automate the "Onboard Subcontractor" flow in their app.
- Sarah (Insurance Underwriter): Needs a white-labeled PDF report to prove a contractor was compliant at the time a policy was issued.
- Compliance Carlos (General Contractor): Needs automated alerts (webhooks) when a subcontractor's license is suspended or insurance expires.
5. User Stories
- As a Developer, I want a unified JSON schema for all 50 states so that I don't have to write custom logic for different state board formats.
- As an Underwriter, I want to verify the specific limits of a Workers’ Comp policy so that I can ensure the contractor meets our minimum liability requirements.
- As a Marketplace Operator, I want to receive a webhook notification the moment a contractor’s license status changes to "Inactive" so that I can temporarily delist them.
6. Functional Requirements
- Unified Lookup Engine: A single endpoint accepting
license_numberandstate_codeto return normalized contractor data. - Deep Insurance Verification: Automated extraction of policy numbers, carriers, and coverage limits from state insurance registries.
- Asynchronous Processing: Support for long-running state portal queries using a "Submit-and-Poll" or Webhook pattern.
- Audit Logging: Every verification must generate a unique
audit_idwith a hash of the raw source data for legal defensibility. - White-Label Reports: Generation of branded PDF compliance certificates via the API.
- Developer Dashboard: A self-service portal for API key management, usage analytics, and billing.
7. Technical Requirements
Backend Stack
- Framework: FastAPI v0.128.x (Python 3.9+ minimum) using Lifespan context managers for resource management.
- Validation: Pydantic v2.7+ for Rust-powered high-speed data serialization.
- Scraping Engine: Playwright (Async) for JS-heavy portals and HTTPX for static HTML boards.
- Anti-Bot: curl_cffi for JA3/JA4 TLS fingerprint spoofing and residential proxy rotation.
Infrastructure & Orchestration
- Orchestration: AWS Step Functions (Standard Workflow) for multi-step portal navigation.
- Compute: AWS Lambda (ARM64/Graviton) running container images for headless browser execution.
- Scaling: Step Functions Distributed Map to handle up to 10,000 concurrent state queries.
- OCR: Azure AI Document Intelligence (Identity Model) for processing physical license photos.
API Management & Billing
- Gateway: Tyk (Self-managed) for developer portal and API key lifecycle.
- Billing: Stripe integration via Tyk webhooks for usage-based (per-call) monetization.
8. Data Model
Entity: Contractor
id: UUIDlegal_name: Stringtax_id_hash: String (SHA-256)primary_state: Enum (US_STATES)
Entity: License (One-to-Many with Contractor)
license_number: Stringstatus: Enum (Active, Expired, Suspended, Revoked)issue_date: ISO Dateexpiry_date: ISO Dateclassifications: List[String]
Entity: InsurancePolicy
policy_type: Enum (General_Liability, Workers_Comp)carrier: Stringlimit_amount: Decimalis_active: Boolean
9. API Specification
POST /v1/verify/deep
Starts an asynchronous verification job. Request:
{
"license_number": "123456",
"state": "CA",
"include_insurance": true,
"callback_url": "https://client-site.com/webhook"
}
Response: 202 Accepted
{
"job_id": "job_98765",
"status": "processing",
"estimated_seconds": 15
}
10. UI/UX Requirements
- Dashboard: Built with React and Tremor (Tailwind-based components).
- Command Palette: Implement
cmdkfor quick navigation between API keys and documentation. - Log Viewer: A monospace, searchable table showing real-time request/response payloads for debugging.
- Visual Status: Animated "Pulse" badges for system health and "Progress Rings" for monthly API quota usage.
11. Non-Functional Requirements
- Security: AES-256 encryption for all PII; SOC2 Type II compliance readiness.
- Idempotency: Webhooks must support
X-VerifyPro-Event-IDto prevent double-processing. - Resiliency: Circuit breakers to automatically skip state portals currently undergoing scheduled maintenance.
12. Out of Scope
- Criminal background checks for individual employees (focus is on business-level licensing).
- International license verification (MVP limited to U.S. states).
- Direct payment of license renewal fees (Read-only verification).
13. Risks & Mitigations
- Risk: State portals blocking scraper IPs.
- Mitigation: Use of rotating residential proxies and TLS fingerprint spoofing via
curl_cffi.
- Mitigation: Use of rotating residential proxies and TLS fingerprint spoofing via
- Risk: UI changes on state websites breaking parsers.
- Mitigation: Amazon Bedrock (GenAI) agents to dynamically locate "Download" or "Search" buttons if CSS selectors fail.
- Risk: PII data leakage.
- Mitigation: Immediate hashing of Tax IDs and strict 30-day data retention policy for raw HTML captures.
14. Implementation Tasks
Phase 1: Project Setup
- [ ] Initialize FastAPI v0.128.x project with Python 3.9+
- [ ] Configure Pydantic v2.7 schemas for normalized license data
- [ ] Set up Tyk Gateway and Developer Portal local environment
- [ ] Configure CI/CD pipeline for AWS Lambda (ARM64) Container Images
Phase 2: Scraping & Orchestration
- [ ] Build Playwright-based scraper for California (CSLB) as the pilot state
- [ ] Implement
curl_cffimiddleware to spoof browser TLS handshakes - [ ] Create AWS Step Functions workflow for "Submit-and-Poll" state portals
- [ ] Integrate Azure AI Document Intelligence for physical ID card OCR
Phase 3: API & Webhooks
- [ ] Implement Signature Verification (HMAC-SHA256) for outgoing webhooks
- [ ] Build the
/v1/verifyasynchronous endpoint logic - [ ] Set up Redis Streams for task decoupling and event buffering
- [ ] Integrate Stripe with Tyk for per-request usage billing
Phase 4: Developer Experience (Frontend)
- [ ] Build Dashboard using React + Tailwind CSS v4 + Tremor
- [ ] Implement API Key generation and revocation UI
- [ ] Create WeasyPrint-based PDF generator for branded compliance reports
- [ ] Launch OpenAPI 3.1 interactive documentation site via Scalar