The Hidden Cost of DIY Tenant Provisioning

"well just build it ourselves."

famous last words for literally every team that tries to provision per-tenant infra from scratch. every single one.

it starts simple enough. spin up a container per customer. route some traffic. done right?

then you need persistent storage. then environment isolation. then wildcard DNS. then SSL per subdomain. then a provisioning API. then monitoring per tenant. then independent scaling.

three months later your infra engineer is maintaining a custom orchestration system instead of shipping the features your customers actually asked for. sounds fun right.

the iceberg of DIY provisioning

what looks like "just spin up a container" is actually a massive stack of interconnected problems that keep multiplying.

container orchestration

you need to create, start, stop, restart, and destroy containers on demand. you need health checks. automatic restarts on failure. resource limits so one tenant cant eat the entire host machine.

if youre on kubernetes thats writing Deployments, Services, Ingresses, PersistentVolumeClaims, ConfigMaps, and Secrets per tenant. if youre NOT on kubernetes then congrats youre building your own orchestrator. neither option is quick or fun.

networking and routing

each tenant needs a unique URL. wildcard DNS, reverse proxy that routes based on subdomain, SSL certs per tenant (or a wildcard cert with proper management which is its own headache).

oh and custom domains. enterprise customers will absolutely want ai.theircompany.com instead of slug.yourdomain.com. that means DNS verification, cert provisioning, and proxy reconfiguration per custom domain. its a whole thing.

persistent storage

containers are ephemeral. your customers data is not. you need persistent volumes that survive restarts and redeployments. backup strategies. and you gotta make sure one tenants volume is never accessible to another. ever.

environment isolation

API keys, model configs, feature flags, rate limits. each tenant needs their own set of env vars injected at runtime. you need a secure way to store, update, and rotate these without redeploying the whole thing.

provisioning automation

someone has to create all of this when new customer signs up. manually? that doesnt scale past 5 customers. you need a provisioning API that creates container, volume, network config, DNS record, SSL cert, and env vars all in one operation. building that API is a project in itself.

monitoring and observability

10 tenants you can manage manually maybe. 100? no chance. you need per-tenant metrics — CPU, memory, disk, network. health status. uptime tracking. alerting. you need to know when tenant 47 is running hot BEFORE they file a support ticket. not after.

the real cost here is opportunity

engineering time is expensive yeah but its not the biggest cost. the biggest cost is opportunity.

every week your team spends building infrastructure is a week they dont spend on:

features that actually differentiate your product
onboarding flows that convert trial users to paying customers
integrations that unlock new markets
the actual AI capabilities your customers are paying you for

your customers dont care about your provisioning system. they care about what your AI does for them. infrastructure is necessary but its not differentiating. the less time you spend on it the more time you spend on what actually matters.

the maintenance tax (its permanent btw)

building it is only the beginning. and honestly its the easy part.

infrastructure requires ongoing maintenance forever:

container runtime updates and security patches
SSL certificate renewals (they expire. they always expire at the worst time.)
storage capacity planning
network config changes
monitoring system upkeep
incident response procedures
documentation for the team (that nobody reads but you still gotta write it)

this is a permanent tax on your engineering capacity. it doesnt shrink as your product matures. it grows as your tenant count increases. more tenants = more things to break = more time maintaining instead of building.

when DIY actually makes sense

to be fair building your own makes sense in specific situations:

you have a dedicated infrastructure team with nothing else on their plate
your isolation requirements are unusual enough that no platform supports them
infrastructure itself is your competitive advantage
youre operating at scale where platform costs genuinely exceed build costs

for most teams shipping AI products? none of these apply. you need isolation not an infrastructure science project.

what ShipClaw replaces

ShipClaw replaces the entire stack above with a visual builder and a deploy button. thats not marketing speak its literally what it does.

DIY component	ShipClaw equivalent
Container orchestration	drag a Runtime node onto canvas
Networking and routing	Gateway node handles wildcard routing + SSL
Persistent storage	Volume node mounts at `/data` automatically
Environment isolation	Env Config node per tenant
Provisioning automation	click deploy. platform does everything.
Monitoring	built-in dashboard with per-tenant metrics
Custom domains	Custom Domain node with automatic SSL

no Dockerfiles. no kubernetes manifests. no terraform. no provisioning API to build and maintain.

design the topology visually. deploy. get back to building the thing your customers actually pay for.

the math (since nobody does it)

rough estimate for a team of two engineers building DIY tenant provisioning:

initial build: 8-12 weeks of focused engineering (assuming nothing goes wrong lol)
ongoing maintenance: 10-20 hours per week, every week, forever
incident response: unpredictable but guaranteed to happen at worst possible time
opportunity cost: ~3 months of product development just gone

ShipClaw gets you to the same outcome in an afternoon. the rest of the quarter is yours for actual product work.

bottom line

DIY tenant provisioning is a trap. looks like a weekend project. turns into a permanent engineering commitment that slowly eats your team alive.

the question isnt "can we build this?" — you can. every team can. the question is "should we?"

for most teams the answer is no. use the platform. ship the product. stop spending your best engineers time on problems that are already solved.

deploy your first tenant runtime in minutes, not months.