Platform rewrites, lessons learned

At some point in your product career, you will do a systems migration or a system rewrite. In this two-part series, let’s explore some lessons learned. This first post is about the things to avoid before starting a systems migration. The second post will walk through some practical tips for organizations on the most efficient way to pull this off.

To set the stage – How do you get to the point of needing a rewrite in the first place? The general arc of startups starts with an MVP that is designed quickly. It’s all lean in the beginning, this is the MVP phase. Most backend processes are manual. It makes sense –  you don’t know if there is product-market fit and so you don’t over-engineer and make a full end to end software solution. As you find product-market fit, your user base starts growing exponentially and you enter the bolt-on phase. There isn’t enough time to slow down and add incremental features to the core product to make scaling easier. So you start bolting on hacks on top of hacks, pseudo automation based on excel macros and more policies and procedures to keep up with the growth. Your software system now gets to spaghetti level status.

Then you hit a wall.

The entire organization slows down, software features take longer and longer to deploy – testing new things to get out of the plateau becomes infinitely harder. You cannot do basic things because the bolt-on system has become super complicated. You enter legacy code hell. Your software system is the inhibitor of your growth. You have a pissed off internal organization as their life isn’t getting easier – they are still in manual hack land. You have a pissed off engineering and product team because the pace of delivery and innovation has slowed. Enter the re-write. You have learned a lot through your operations so far and want to design the entire system differently – learn from your past mistakes and not just continue iterating on the current software stack. You want to retire the legacy stack, rebuild from scratch, restart your growth.

So how do you pull off a rewrite? Here some lessons learned from doing these migrations a couple of times at a fast-growing startup

Aiming for parity is dead wrong

The obvious thing to do with the new system is to aim for parity with the old system. There will be an overwhelming push by every stakeholder (except engineering) to define feature parity with the legacy system as the goal line. Fight this urge – as this is almost universally impossible. The old legacy system got to be so gnarly because it was built quickly with a bolt-on approach.

Dilbert_Give_me_all_Features

The individual pieces that got bolted on probably made sense in isolation but there was no holistic system design – you built the train tracks while the train was moving. Why would you want to rebuild the new system with the same feature set? Building a new system from scratch gives you the ability to design it holistically with the benefit of hindsight. Don’t aim for parity in features, aim for solving the user need. Only replicate the features that solve the user problems the right way and take a mulligan on the implementations that were hacked in the first place!

Resist the system hedge

Everybody wants optionality and hedges give you just that. Hedging is the two systems in parallel strategy. One team is allocated to build new features on the legacy system to continue growing the business and at the same time, another team is allocated to build the new system. The calculus is that If the new system doesn’t pan out, we can always flip back to the old system – we gain optionality.

hedg

This is a problem on multiple fronts. Maintaining two systems is super hard. The chance of converging the feature set of the new system and legacy system is slim to none. How can the new team ever catch up if the legacy system is still under development? The team effectively has to implement everything twice! Organizationally this gets super challenging as well. Working on the new stuff is cool, working on legacy is boring – who gets to be on which team? Nobody wants to be on the legacy team!

Resist the urge to quit

System migrations always take longer than you think. Just like any large project, the last 10% takes the longest time. Since there is an existing legacy system in place, the urge to give up in hard times is real. There is a backup, the business will not collapse if we give up. The team needs to recognize the big picture and hold their nerve. It’s more about the will to push through than anything else.

Dilbert estimates use

This isn’t the same as the sunk cost fallacy, the reason you are doing a migration is that your business just cannot scale to the next level with the legacy system – you need to make this happen to win.

What do you think? What are the learnings that you have learned in your career about system migrations? Comment away!

Leave a Reply