Stack8s is a Kubernetes automation platform built around data sovereignty. It deploys vanilla Kubernetes clusters on hardware that customers own and control, whether that is a bare metal server in their own data centre, a private cloud, or any combination. Customers link their own hardware into clusters, control which workloads run in which environment, and are never locked into a managed cloud provider's Kubernetes offering. The vision was right. The infrastructure holding it together was not. Eprecisio joined as the founding engineering partner, rebuilt the entire platform from the infrastructure layer up, and has been the core delivery team through the product's growth to its current position as a recognised player at KubeCon.
How the relationship started
The engagement did not start with a Kubernetes platform. It started with a healthcare project.
Ehtisham joined Dr. Jeremy Murray's team to work on a healthcare compliance project. The team was small, the stack was complex, and there were in-house challenges managing the infrastructure to the standard that healthcare compliance demands. Ehtisham stepped in individually to address those blockers.
That engagement built the trust that led to Stack8s. When Dr. Murray started building his vision for a Kubernetes automation product, Eprecisio was the partner he turned to. The relationship that started with one engineer on a healthcare project is now a team of 5 to 6 engineers working full-time on a product that is being presented at KubeCon.
Stack8s is a commercially ambitious product. Its differentiator is data sovereignty. Customers bring their own hardware, register it into the platform, and get production-grade Kubernetes without handing their workloads to a managed cloud provider's managed service. They control what runs where. The platform also ships a marketplace of plugins that teams can deploy directly into their clusters: AI Architect for AI workflow orchestration, Kubeflow for ML pipelines, Laravel stack integrations, and 100+ other open source tools. All of this is available through the Stack8s interface without the customer needing Kubernetes expertise. To deliver that experience credibly, the platform itself has to be faultless.
The state of the platform when active development began
Seven months ago, when the current active engagement began in earnest, the platform was failing repeatedly. Not occasionally. Continuously.
The core problem was that the architecture had accumulated instability at every layer. Networking was unreliable when customers connected their own hardware from different environments. State management did not exist in any meaningful form, so the platform had no consistent picture of what was running, what had failed, or what needed attention.
| Area | State at the start | Impact on customers |
|---|---|---|
| Platform stability | Continuously failing, no root cause tracking | Customers could not trust clusters they provisioned |
| State management | No unified state layer | Node status, provisioning state, and cluster health were inconsistent across views |
| Networking | Unreliable when customers connected hardware from different environments | Workload connectivity failed silently when hardware was registered from mixed environments |
| GPU management | No operator-level control over GPU allocation | ML teams could not rely on GPU provisioning |
| Marketplace | Charts deployed inconsistently, no deployment framework | 100+ open source charts had no reliable install path |
| Customer onboarding | Node registration and NACL creation unreliable | New customer setup required manual intervention |
| Alerting | No structured alerting or status notifications | Failures went undetected until customers reported them |
| Pricing | Connectivity issues with external cloud provider billing APIs | Cost data was inaccurate or unavailable |
When funding came in and the product needed to scale, the architecture underneath it was not ready. The decision was made to stop patching and do a full rebuild.
The team and how the engagement evolved
The engagement grew the way most of our strongest relationships do. It started with one person, proved its value, and expanded as the scope became clear.
| Role | What Eprecisio owns |
|---|---|
| Platform engineering lead | Infrastructure architecture, Kubernetes operator design, cross-cloud networking |
| DevOps engineers (x2) | CI/CD, cluster lifecycle management, ArgoCD GitOps, Terraform modules |
| Full-stack engineer | Platform UI, customer-facing APIs, marketplace frontend |
| Product manager | Roadmap, PRDs, delivery process, sprint management |
| AI-native development | AI-assisted feature development and code quality processes |
This is not a vendor relationship. Eprecisio owns the roadmap process, manages delivery, writes the PRDs, and makes architecture decisions. Dr. Murray focuses on business development, partnerships, and product vision. The engineering execution is ours.
The rebuild: what we actually did
The 2-month rebuild was not a rewrite of features. It was a reconstruction of the foundation the features run on.
Infrastructure and state management layer. The platform had no consistent state model. We designed and implemented a state management architecture that tracks every cluster, node, and workload across all three cloud environments in real time. Every provisioning operation now has defined state transitions with persistence and recovery paths.
Networking for bring-your-own-hardware. Stack8s does not provision managed Kubernetes services. It deploys vanilla Kubernetes clusters on hardware that customers register from wherever that hardware lives. That means the networking layer has to handle arbitrary hardware from arbitrary environments connecting into a single control plane. We rebuilt the networking layer to handle hardware registration from any environment, normalise the connectivity model across mixed infrastructure, and maintain stable cluster networking as customers add or remove nodes from different physical or virtual locations.
GPU operator and compute management. We integrated the NVIDIA GPU Operator with custom resource allocators that give the platform real control over GPU scheduling, allocation, and monitoring across customer clusters.
Service mesh. We designed and implemented the service mesh layer for inter-cluster communication, traffic management, and observability, resolving the connectivity issues that had made the platform unpredictable.
Marketplace and plugin framework. Stack8s ships a marketplace of plugins that customers deploy directly into their clusters from within the platform. This includes AI Architect for AI workflow orchestration, Kubeflow for ML pipelines, Laravel stack integrations, and 100+ other open source Helm charts. We rebuilt the framework that governs how plugins are packaged, versioned, deployed, and updated across customer clusters, so every chart in the marketplace installs reliably regardless of what hardware the cluster is running on.
Customer onboarding infrastructure. Node registration and NACL creation for new customers were manual and error-prone. We automated the full onboarding flow so new customer environments provision without manual intervention.
| Component | What we rebuilt | Technology |
|---|---|---|
| State management | Unified state layer across all cloud providers | Custom Kubernetes operators, etcd |
| Networking for BYOH | Stable cluster networking across hardware registered from any environment | Vanilla Kubernetes networking, custom node registration layer |
| GPU management | Operator-level GPU provisioning and allocation | NVIDIA GPU Operator, custom allocators |
| Service mesh | Fast, stable inter-cluster communication | Custom service mesh implementation |
| Plugin marketplace | Deployment framework for AI Architect, Kubeflow, Laravel stack, 100+ charts | Helm, ArgoCD, custom chart operator |
| Customer onboarding | Automated node registration and NACL creation | Terraform, Kubernetes admission controllers |
| Alerting | Structured cluster and node health alerting | Prometheus, Alertmanager |
| GitOps pipeline | Automated cluster lifecycle management | ArgoCD, GitHub Actions |
| Pricing integration | Reliable connectivity to cloud provider billing APIs | CAST AI integration, custom billing adapters |
The hardest parts
Redesigning the infrastructure layer without taking the product offline. Stack8s had paying customers during the rebuild. The platform could not simply go dark for 2 months. The approach was to build the new infrastructure layer in parallel, migrate workloads incrementally, and cut over component by component.
Networking for arbitrary hardware configurations. Because Stack8s registers customer-owned hardware rather than provisioning managed cloud nodes, the networking layer has to handle a much wider range of physical and virtual configurations. Customers were registering nodes from bare metal, from private clouds, from VMware environments, and from various provider setups. Getting the control plane to maintain stable connectivity across all of these took significantly longer than a more constrained networking model would have.
State recovery for existing clusters. When we introduced the new state management layer, existing customer clusters had no state history. Building a reconciliation process that reconstructed accurate state for live clusters without disrupting them was the most technically delicate work of the rebuild. A single error would have made existing deployments unmanageable.
Dead code and architectural debt. The AI-assisted rebuild process surfaced a significant amount of duplicate and dead code. Removing it required understanding which code was genuinely unused versus which was reached through uncommon paths not obvious from static analysis. This took longer than a clean codebase would have, but it was the right call.
Results
| Metric | Before | After |
|---|---|---|
| Platform stability | Continuously failing | Stable. No recurring systemic failures since rebuild. |
| State management | No consistent state | Real-time state tracking across all clusters and nodes |
| Customer onboarding | Manual intervention required | Fully automated node registration and environment setup |
| ML setup time | Weeks of manual GPU cluster configuration | Hours with automated GPU provisioning |
| Release velocity | Blocked by instability | Regular feature releases on structured sprint cadence |
| Chart deployment | Inconsistent, manual troubleshooting | Reliable across all 100+ open source charts |
| Team model | 1 embedded engineer | 5 to 6 engineers, PM, roadmap ownership |
| Product positioning | Pre-funding, unstable product | KubeCon presence, CAST AI partnership |
"Their deep understanding of GPU infrastructure and MLOps made them the right choice for our project. Reduced our ML setup time by 60%."
Dr. Jeremy Murray, Founder at Stack8s
Where the product is now
Stack8s is no longer a product that is struggling to be stable. It is a product built on a clear and defensible position in the market: organisations that need production-grade Kubernetes without surrendering data sovereignty to a managed cloud provider. That means your hardware, your environment, your rules on where workloads run. Dr. Murray is now taking that product to KubeCon, presenting at Kubernetes automation working groups, and building partnerships with infrastructure players like CAST AI around it.
The Eprecisio team is not winding down. The engagement is actively growing. Dr. Murray has explicitly asked to scale the Pakistan-based engineering team further rather than continuing to hire in the UK, where previous direct hires did not work out.
For how we structure and manage production Kubernetes infrastructure at this scale, see our InfraOps service.
If you are building an infrastructure platform and need a team that can work at this level of technical depth and own the delivery process, book a free 30-minute call.

