Development

One Month SaaS Revamp: AI-Assisted Dev, Test-First Policy, Weekly Releases

April 4, 20268 min read

Three years of a production SaaS accumulates a lot. Code written under deadline pressure that nobody went back to clean. Features layered on top of features. Tests that were meant to be written later. Business logic that lives in the head of the engineer who wrote it, not in the codebase.

Songplace and its companion platform Curator had all of this. The product worked, the business was growing, but the engineering team was spending a disproportionate amount of time on staging bugs, QA cycles that caught the same class of error repeatedly, and feature releases that required careful surgery rather than confident shipping.

One of our engineers took this on as a focused one-month project. Here is what happened.

The starting point

Songplace is a music distribution and playlist management platform serving artists, labels, and curators. Curator is its companion tool for playlist curators to manage submissions and handle their inboxes at scale. Both were live, had real users, and were generating revenue.

The codebase was approximately three years old. It had grown organically, which is another way of saying the architecture decisions from month one were still load-bearing in ways nobody fully understood.

SymptomImpact
Manual QA cycles taking 3 to 5 days per releaseMonthly release cadence at best
Staging diverged from production regularlyBugs surfaced only after deployment
Inconsistent test coverage across modulesNo confidence in changing anything
Business logic scattered across layersSame validation duplicated in 3 places, each slightly different
PRDs written in Slack and Notion notesFeatures built from ambiguous specs, scope creep on every release

The result: a team that could ship, but could not ship fast. Every release felt like a negotiation with the codebase.

Week 1: audit and map before touching anything

The first week was entirely diagnostic. No code was changed.

The engineer used Claude to systematically review every module and build a dependency map of the codebase. Every file was categorised: what it does, what depends on it, what it depends on, where undocumented business logic lives.

This produced three outputs by end of week one:

  1. A module map showing which parts were stable and well-understood versus fragile and undocumented
  2. A list of the 12 highest-risk areas, places where a change was most likely to cause an unexpected regression
  3. A refactoring priority list ordered by impact versus risk

The AI loop started here. Instead of one engineer making gut-feel calls on what to fix first, every prioritisation decision was structured: here is the module, here is its current state, here are its dependencies, here is the usage pattern. What should we address first and why? The output was documented, not just decided.

Week 2: tests before any refactoring

The most important decision of the whole project was made in week two: no code would be refactored until it had test coverage.

This sounds obvious. It almost never happens in practice because it feels like slowing down. It is actually what made the rest of the month possible.

The engineer wrote tests for the 12 high-risk modules before touching a single line of production logic. For each module: unit tests for core business logic, integration tests for adjacent module interactions, and edge case tests specifically for inputs that had historically caused staging bugs.

This is where AI-assisted development made a real time difference. Using Claude to generate test scaffolding from the existing code, then reviewing and refining those tests rather than writing from scratch, compressed what would have been two weeks of test writing into four days. Every test was reviewed. Every edge case was validated against real bug reports from the staging history.

By end of week two, the 12 highest-risk modules had test coverage. Every subsequent change had a safety net.

Week 3: systematic refactoring with the AI loop

With test coverage in place, the refactoring sprint began. The process for each module:

  1. Run existing tests, confirm they pass. This is the baseline.
  2. Ask: given this code, this test suite, and this dependency map, what are the specific refactoring opportunities and what is the risk of each?
  3. Address highest-value, lowest-risk improvements first.
  4. Run tests after each change. If anything breaks, understand why before continuing.
  5. Document the change and reasoning in the commit message.

Three biggest categories of improvement from week three:

Redundant database queries. Several modules were making 3 to 5 database calls for data fetchable in one query or from cache. Invisible in development, showed up as latency under real load. Refactoring brought these down to single queries with appropriate caching.

Inconsistent error handling. Different parts of the codebase had different conventions for catching, logging, and returning errors. Some used try/catch consistently, some swallowed errors silently. Standardising this removed an entire class of hard-to-diagnose staging issues.

Business logic in the wrong layer. Validation logic that should have lived in the service layer was scattered across controllers and components. The same validation was sometimes duplicated, sometimes skipped. Moving it to the right layer and writing tests for it specifically eliminated a category of bug that had appeared in staging repeatedly.

Week 4: PRD process and release policy

The last week was not about code. It was about putting a process around the clean codebase so it stayed clean.

Bi-weekly PRD rhythm. Every new feature now starts with a structured PRD before any code is touched. The template covers: what problem this solves for which user, acceptance criteria expressed as testable conditions, edge cases to handle, and a definition of done that includes test coverage. This prevents the same type of debt from accumulating again.

Test-first release policy. No feature goes to staging without test coverage for its acceptance criteria. Enforced as a CI check, not a code review suggestion. If the tests do not exist, the pipeline does not pass.

What changed

MetricBeforeAfter
Release cadenceMonthly at bestWeekly, consistently
Manual QA cycle3 to 5 days per releaseNear zero. Automated suite catches what QA was finding manually.
Staging bugsRegular, same class of error repeatedlyEffectively zero since the refactor
Feature planning horizonAd hoc, decided in Slack2 weeks out with clear acceptance criteria
Confidence in shippingEvery release felt like surgeryEngineers ship without second-guessing the codebase

The number that matters most to the business: the roadmap is now a plan the team executes against rather than an aspiration they negotiate with the codebase.

What the AI-assisted approach actually contributed

Being specific here because "AI-assisted development" can mean anything:

Test scaffolding. Generating initial test file structure from existing code saved significant time. The engineer reviewed and refined every test. Starting from a scaffold rather than a blank file changed the speed of the test-writing phase.

Code review at scale. No single engineer holds an entire three-year codebase in their head. Using Claude to review individual modules with specific questions ("what are the failure modes of this function given these inputs") surfaced issues a manual review at that scale would have missed.

Refactoring with justification. The value was not "rewrite this." The value was "here are three specific ways this could be improved, here is why each one matters, here is the risk of each change." That framing kept every change justified and traceable.

What AI did not replace: engineering judgment on what to prioritise, product context that determines which edge cases matter, and the code review process that validates correctness.

Technical debt in a SaaS product is not a moral failing. It is the natural result of building something real under real constraints. The reactive version of dealing with it looks like: bugs reach production, staging becomes unreliable, feature velocity drops, the team gets frustrated. The proactive version looks like: one month, one engineer, a structured approach, and a codebase that comes out clean with a process that keeps it that way.

Our Development service covers AI-assisted SaaS revamps, full-stack product builds, and embedding with engineering teams to ship features faster.

If your SaaS has accumulated the kind of debt that is starting to slow your team down, book a free 30-minute call.

Want Results Like These for Your Stack?

We build production-grade infrastructure for AI startups and technical founders. Let's talk about your project.

Book a Free 30-Min Call

Your infra shouldn't be the thing slowing you down.

Book a free 30-minute call. We'll look at your current setup and tell you exactly what's costing you money, what's a deployment risk, and what we'd fix first. No pitch, no fluff.

AWSKubernetesDockerTerraformPythonReactArgoCDPrometheus