The Hidden Cost of DIY Tenant Provisioning

you can build per-tenant infra yourself sure. but the cost isnt just engineering time. its the features you dont ship, the customers you dont onboard, and the incidents you debug at 2am.

Cover Image for The Hidden Cost of DIY Tenant Provisioning

"well just build it ourselves."

famous last words for literally every team that tries to provision per-tenant infra from scratch. every single one.

it starts simple enough. spin up a container per customer. route some traffic. done right?

then you need persistent storage. then environment isolation. then wildcard DNS. then SSL per subdomain. then a provisioning API. then monitoring per tenant. then independent scaling.

three months later your infra engineer is maintaining a custom orchestration system instead of shipping the features your customers actually asked for. sounds fun right.

the iceberg of DIY provisioning

what looks like "just spin up a container" is actually a massive stack of interconnected problems that keep multiplying.

container orchestration

you need to create, start, stop, restart, and destroy containers on demand. you need health checks. automatic restarts on failure. resource limits so one tenant cant eat the entire host machine.

if youre on kubernetes thats writing Deployments, Services, Ingresses, PersistentVolumeClaims, ConfigMaps, and Secrets per tenant. if youre NOT on kubernetes then congrats youre building your own orchestrator. neither option is quick or fun.

networking and routing

each tenant needs a unique URL. wildcard DNS, reverse proxy that routes based on subdomain, SSL certs per tenant (or a wildcard cert with proper management which is its own headache).

oh and custom domains. enterprise customers will absolutely want ai.theircompany.com instead of slug.yourdomain.com. that means DNS verification, cert provisioning, and proxy reconfiguration per custom domain. its a whole thing.

persistent storage

containers are ephemeral. your customers data is not. you need persistent volumes that survive restarts and redeployments. backup strategies. and you gotta make sure one tenants volume is never accessible to another. ever.

environment isolation

API keys, model configs, feature flags, rate limits. each tenant needs their own set of env vars injected at runtime. you need a secure way to store, update, and rotate these without redeploying the whole thing.

provisioning automation

someone has to create all of this when new customer signs up. manually? that doesnt scale past 5 customers. you need a provisioning API that creates container, volume, network config, DNS record, SSL cert, and env vars all in one operation. building that API is a project in itself.

monitoring and observability

10 tenants you can manage manually maybe. 100? no chance. you need per-tenant metrics β€” CPU, memory, disk, network. health status. uptime tracking. alerting. you need to know when tenant 47 is running hot BEFORE they file a support ticket. not after.

the real cost here is opportunity

engineering time is expensive yeah but its not the biggest cost. the biggest cost is opportunity.

every week your team spends building infrastructure is a week they dont spend on:

  • features that actually differentiate your product
  • onboarding flows that convert trial users to paying customers
  • integrations that unlock new markets
  • the actual AI capabilities your customers are paying you for

your customers dont care about your provisioning system. they care about what your AI does for them. infrastructure is necessary but its not differentiating. the less time you spend on it the more time you spend on what actually matters.

the maintenance tax (its permanent btw)

building it is only the beginning. and honestly its the easy part.

infrastructure requires ongoing maintenance forever:

  • container runtime updates and security patches
  • SSL certificate renewals (they expire. they always expire at the worst time.)
  • storage capacity planning
  • network config changes
  • monitoring system upkeep
  • incident response procedures
  • documentation for the team (that nobody reads but you still gotta write it)

this is a permanent tax on your engineering capacity. it doesnt shrink as your product matures. it grows as your tenant count increases. more tenants = more things to break = more time maintaining instead of building.

when DIY actually makes sense

to be fair building your own makes sense in specific situations:

  • you have a dedicated infrastructure team with nothing else on their plate
  • your isolation requirements are unusual enough that no platform supports them
  • infrastructure itself is your competitive advantage
  • youre operating at scale where platform costs genuinely exceed build costs

for most teams shipping AI products? none of these apply. you need isolation not an infrastructure science project.

what ShipClaw replaces

ShipClaw replaces the entire stack above with a visual builder and a deploy button. thats not marketing speak its literally what it does.

DIY componentShipClaw equivalent
Container orchestrationdrag a Runtime node onto canvas
Networking and routingGateway node handles wildcard routing + SSL
Persistent storageVolume node mounts at /data automatically
Environment isolationEnv Config node per tenant
Provisioning automationclick deploy. platform does everything.
Monitoringbuilt-in dashboard with per-tenant metrics
Custom domainsCustom Domain node with automatic SSL

no Dockerfiles. no kubernetes manifests. no terraform. no provisioning API to build and maintain.

design the topology visually. deploy. get back to building the thing your customers actually pay for.

the math (since nobody does it)

rough estimate for a team of two engineers building DIY tenant provisioning:

  • initial build: 8-12 weeks of focused engineering (assuming nothing goes wrong lol)
  • ongoing maintenance: 10-20 hours per week, every week, forever
  • incident response: unpredictable but guaranteed to happen at worst possible time
  • opportunity cost: ~3 months of product development just gone

ShipClaw gets you to the same outcome in an afternoon. the rest of the quarter is yours for actual product work.

bottom line

DIY tenant provisioning is a trap. looks like a weekend project. turns into a permanent engineering commitment that slowly eats your team alive.

the question isnt "can we build this?" β€” you can. every team can. the question is "should we?"

for most teams the answer is no. use the platform. ship the product. stop spending your best engineers time on problems that are already solved.

deploy your first tenant runtime in minutes, not months.