(Almost) Every technical decision I endorse or regret, scaling a health tech startup from bootstrap to enterprise
I was the co-founder and Chief Technology Officer of a startup. I built and scaled a health-tech SaaS from bootstrap to its sale over four years, and then after joining the acquiring company I worked to expand the service to support enterprise clients over three years. I made some core decisions that the company had to stick to, for better or worse, these past seven years. This post will outline the major technical decisions I made—what worked, what didn’t, and what I’d do differently. This idea was inspired by Jack Lindamood, I read his post and thought it was a creative way to share some lessons learned.
AWS
Picked AWS over other cloud services
🟩 Endorse
When starting, I had significant experience using AWS from past roles, and I didn't really have the opportunity to spend ramping up knowledge. AWS was the default choice. Since, I have found that AWS was a great choice. The tooling is superb. Support is reliable. Hiring engineers with experience is easy. AWS provided great stability. In seven years, we only had two downtime incidents, both of which could have been avoided with better multi-AZ replication.
DynamoDB as our primary database
🟥 Regret
I wanted to design a completely serverless service. Serverless SQL databases were not available seven years. After dealing with MongoDB’s manual server management, sharding, and scaling issues, I decided to give DynamoDB a chance. I had never used DynamoDB, but I built a POC and saw performance was good. This was before adding the use of DynamoDB Accelerator (DAX), which I knew was available to help accelerate the database more if it became necessary. DynamoDB it was.
DynamoDB has proven to be a powerful and performant database. When optimized with the correct indexes, DynamoDB performs lightning fast and for most of our application it has worked well. It completely fails in use cases where you cannot make predetermined, optimized indexes for queries. Supporting data tables with advanced filtering and searching is nearly impossible at scale with DynamoDB. Any use of a scan in production is impossible to scale. To support more complex searching, we added an ElasticSearch service and hooked into DynamoDB streams to ensure we kept ElasticSearch up-to-date.
DynamoDB’s official SDK from AWS is also limited. It is similar to most direct database libraries and does not attempt to enter the realm of an ORM. We originally used Dynamoose as our ORM, but its TypeScript support was limited, so I eventually created and open-sourced a more powerful DynamoDB library designed for TypeScript, called Dyngoose.
With a useful ORM and combining DynamoDB with ElasticSearch, we were able to handle almost every use case. The main issue became development time, many features that would be simple in SQL using a JOIN became difficult with DynamoDB. It often required loading a record to get a pointer to another record you needed to load. This can drive down performance of using DynamoDB significantly and the benefits evaporate. One use case that completely failed with DynamoDB is reporting.
Reporting isn’t always the highest priority feature for a startup, delivering features is typically considered more vital. For many businesses, they want to see the value your product and service delivers and without the appropriate reporting capabilities you lack the ability to truly show your clients the value you deliver. DynamoDB and ElasticSearch are both terrible for reporting. Eventually, you may need to build in reporting capabilities.
DynamoDB does not position itself to be the primary application for large SaaS products. This decision came down to my preference for a serverless database. Today, weighing it all, I do regret the decision. DynamoDB continues to perform admirably for us, but I believe PostgreSQL would have been the better choice and today there are great services like Neon, Supabase, and AWS Aurora which make scaling a database easier than ever. I believe an optimized Postgres database with the benefit of appropriate caching in something like DynamoDB would have performed as well and allowed us to build a more useful product for our customers.
Using Lambdas for all of our APIs
🟩 Endorse
Although the Serverless framework did not work (more below), the serverless approach of using Lambda for all of our API endpoints did. Lambda is AWS’s Function-as-Service. Every one of our APIs is processed via a Lambda behind an API Gateway. Our service does not use a lot of CPU intensive processes, our biggest being our ETL process, while most of the service was automated via events (e.g., webhooks and schedules). It was a perfect use case for Lambdas and to this day, even with supporting hundreds of thousands of patients every month, the operating cost for the service is minimal and performance is great.
Using Lambdas provided several key benefits beyond reduced operating cost:
- Significantly lower infrastructure management costs. I didn’t need to spend time managing servers or optimizing load balancers. I never had to deal with memory leaks or runaway processes bringing down a server.
- Consistent performance. Our application performed reliably, for all users, under almost every workload we ever encountered. The service scaled instantly, it was never down for 10 minutes while waiting for another server to come up and enter the load balance’s group of healthy targets.
There are certain technical considerations and limitations that are imposed upon you when creating an API backed entirely by API Gateway and Lambda. Limits such as:
- Maximum of 30 seconds to respond. API Gateway restricts your Lambdas to responding in 30 seconds, known as the “Maximum integration timeout”. 30 seconds is plenty of time for 99% of requests; however, it means certain processes (e.g., exports) become difficult. Generally, this timeout forced us to design performant processes. We rarely had timeouts, but when we did, it was always nearly a mistake in how we performed a DynamoDB query where it fell back to a database scan. This timeout prevented runaway processes and forced us to review performance issues early on. To handle tasks longer than 30 seconds, we designed asynchronous workflows—either queueing jobs or using WebSockets for real-time communication.
- Maximum of 15 minutes runtime. When you have a Lambda that runs from a non-API Gateway event, you typically have 900 seconds to respond. This is plenty of time for typical workloads. To handle tasks longer than 15 minutes, we utilized ECS Tasks on Fargate.
Designing our service to be entirely FaaS has proven to be a good call, our service continues to scale efficiently, run reliably, and is easy to maintain.
API Gateway for WebSockets
🟥 Regret
Several years into building the application, we kept finding use cases where we needed asynchronous requests that would take longer than the limited 30 seconds of the allowed HTTP API quota on API Gateway. Processes where we could not directly control the duration, such as talking to a third-party API or generating export files.
To support these requests, we added the use of WebSockets via API Gateway. We had been using API Gateway for all our HTTP/REST API endpoints and thought API Gateway would be up to the challenge of supporting WebSocket. API Gateway implements WebSocket in a Request → Response format, similar to a typical HTTP request. When you want to push a message to an open WebSocket connections, though, API Gateway falls short.
For simple request → response uses of WebSockets, API Gateway can kick off a Lambda and you can respond to the request similarly as a regular HTTP request. This works well for requests where the response only goes to the user who made the request. Of course, WebSockets can be used for so much more. We wanted to push updates to user’s browsers in realtime to support new features.
API Gateway maintains the WebSocket connections for you. To asynchronous send a message to an open connection, you use the API Gateway Management API. For every websocket message you want to send, you need to perform a request to an HTTP API. The API Gateway Management API also doesn’t support sending a message to a batch of connections. This makes for a very slow “realtime” messaging system when you have to notify a few thousands users of a single event. Without batch support from API Gateway, sending messages to thousands of users individually was inefficient.
WebSockets are great, I will continue to utilize them more in the future. However, I’d most likely self-host a WebSocket server using ws as an ECS Service behind a NLB.
Elastic Container Service
🟩 Endorse
Our serverless application eventually had the need for longer-running services that required more intensive processing. To support these processes, we looked at ECS. While Kubernetes is spectacular for deploying services, we wanted a simple solution for running jobs. The obvious choice for us was ECS.
We started running ECS on Fargate, which allowed us to scale easily and get an idea of our usage as we continued to deploy more functionality. Eventually, we moved to using EC2-backed ECS clusters, which offered several benefits for us.
One thing we did have to implement a custom solution for long-running ECS tasks. Tasks that fail to shutdown properly could hang indefinitely. We introduced a scheduled Lambda which looks for tasks that have stalled, kills them, and throws an alarm.
ECS made deploying task-based docker containers incredibly easy to manage and scale.
A single AWS Account
🟥 Regret
We remained on one AWS account for far too long. We always had strong IaC; even with the ability to automate your infrastructure, it was easier to simply have a shared AWS account for our various test, QA, and production environments.
I now strongly believe there are too many benefits to isolating environments in separate AWS Accounts.
Systems Manager
🟩 Endorse
AWS Systems Manager allowed us, with incredibly ease, to enable remote management and access to our servers. While our main API was entirely serverless, we eventually began to utilize ECS backed by an EC2 cluster. Systems Manager allows secure access to servers, which enables significantly easier access and debugging capabilities with an ECS Task failed.
CloudFormation
🟥 Regret
At first, we utilized Serverless to deploy all our cloud resources and under the hood it used CloudFormation. Eventually, we ran into issues with the way Serverless handled deployments, occasionally it determines it needs to destroy a stack and recreate it, so we had to move our persistent resources to an independent CloudFormation stack. We never really looked into better options and I have regretted it since, many times.
When we ported our resources out of our serverless.yml
file, I found a library called cloudform that allowed us to build resources with strict typing. This was before tools like CDK had been available, but Terraform was very much an option. We continue to live with restrictions imposed by CloudFormation.
CloudFormation is an unforgiving black box. I am not the only one to realize this. With limited tools for testing or validating deployments locally, you often rely heavily on deploying to a staging environment to test changes. Make a mistake though and your stack will have to roll back, a fun process that can take anywhere between five minutes and five hours with very little rationale as to why.
Today, I would utilize Pulumi, which is built on top of Terraform. I find Terraform to be a tedious endeavor, however, Pulumi provides useful constructs for managing resources and intelligent defaults you do not get with CloudFormation or native Terraform. sst.dev v3 is built on top of Pulumi and provides even more developer-friendly constructs that makes it a great choice for managing applications.
Frameworks
Angular
🟩 Endorse
I picked Angular over alternatives early on for very much the same reason I picked AWS—I knew Angular. Seven years ago, Angular and React had similar popularity and there wasn’t an obvious choice like there is today. React was a great utility; however, it didn’t come packaged with all the tooling and support Angular did making React harder to build with when starting from scratched. React felt better suited to large enterprises who could put a lot of manpower into building applications (e.g., Facebook) while Angular felt like it was better suited to smaller teams. That simply made my choice easier, as Angular came out of the box with great tooling for testing, building, deploying, routing, and internationalizing I stuck with Angular.
Angular proved to be a powerful tool and has gotten better with each subsequent release. I believe it still to be a powerful framework for building and is incredibly easy to get started with. While I would still very much like to say I’d start my next project with Angular, the industry has changed. Angular has fallen in popularity and has seen many of the third-party extensions stagger and lose support for newer versions of Angular. In the past seven years I’ve seen the rise and fall of Vue.js, and today Next.js has come to dominate. Next.js provides similar tooling that Angular comes packaged with. Next.js and React have popular support.
Material
🟧 Regret-ish
I am willing to admit, my skillset when it comes to design is limited. I opted for Material because it worked well with Angular and looked fine to me. Material was a great choice for our patient-facing applications, it is familiar to users, allowing them to flow through the experiences smoothly. Material is a mobile-first responsible design, which does not work well for our clinician-facing consoles that are used on desktops and want optimized experiences as well.
I would have likely regretted using multiple UI libraries for different applications, that would have complicated the experience for the developers building the applications, who are working on several Angular applications. I’d look for a more responsive UI framework that optimizes well for desktop and mobile.
Serverless
🟥 Regret
Serverless is a utility to help build entirely serverless applications and deploy them to AWS Lambdas. It has expanded significantly over the past seven years and, however, almost everything it does has restrictions that we've had to work around. This is partially due to the limitations of the technology Serverless relies on, specifically CloudFormation. Serverless attempts to help you define your API Gateway routes and connect them to Lambdas, it then tries to package your code into zip archives and manage deployments and updates for you.
Serverless doesn't offer official local development support, a popular third-party plugin serverless-offline provides most of the essential functionality and became vital to our development process. It lacks file watch support, which we implemented as a gulp task. Early on our watch process worked well, at some point updates were made to Serverless that affected serverless-offline and our watch process had to restart the serverless-offline process with every file change, significantly slowing the development experience.
Serverless doesn't use esbuild or webpack to build your Lambdas. It zips up your entire project and tries to determine the appropriate dependencies to include, doing so per-function. This was too slow. We eventually had to build our own packaging system that would generate the packaged zips for functions to optimize build times and reduce deployment package sizes.
Serverless doesn't deploy custom resources well. To reduce the risk of Serverless attempting to destroy persistent resources (i.e., our database) we created our own CloudFormation templates and handled the deployments of custom resources entirely outside the purview of Serverless.
Today, I would utilize sst.dev for developing and deploying Lambdas. It provides a practical solution for development, it packages functions with esbuild, and it manages resources with Pulumi.
TypeScript for our APIs
🟩 Endorse
When starting, I was a strong JavaScript and Python developer, having utilized both for years working on machine learning R&D contracts for the Marines and Navy. I thought hard about what to use for our APIs and I decided to utilize TypeScript compiling to run all our APIs with Node.js. My biggest reason for this was that I knew this would be a frontend heavy service, we'd be designing patient portals and wanted superb patient experiences, so the first engineers I'd be hiring would have to be strong frontend developers. I wanted to ensure anyone I hired who may have only frontend experience could still work as a full-stack engineer on the team. I was not significantly worried about application performance, I planned to design the application to be serverless and knew that even if Node.js was slower than Python it wouldn't make the application feel slow. TypeScript allowed us to have the same language for the frontend and backend of our application. My theory held true, several engineers who have come in with heavy frontend experience have found it easy to work on the full application stack, in large part to already being extremely familiar with the language. While I still love Python, the benefits for a smaller team to empower everyone to work as a full-stack engineer are significant. TypeScript has grown in popularity and is now significantly more popular than vanilla JavaScript. Performance of JavaScript has improved significantly and is getting better with runtimes like Deno and Bun, which approaches or beats Python performance in many use cases.
There is significant value in a less complicated stack that empowers more engineers to grow into effiicent full-stack contributors.
Process
Prioritizing building the product
🟩 Endorse
Many of the decisions, both technical and business, focused on delivering value to our customers and onboarding new customers. Time spent on infrastructure maintenance and technical debt was a waste of resources. Many of the decisions, such as to utilize TypeScript and go for an entirely serverless application architecture, were made to reduce the overall time I’d spend on onboarding new hires and managing the infrastructure. To this day, most of the decisions where I did not prioritize reducing effort towards delivering value, such as using Gitlab over GitHub, I reject. In a bootstrapped startup, you have very limited resources, and the most limited of all is your time.
My suggestion is to make decisions that will drive team efficiency.
Monorepo
🟩 Endorse
Early on, we used a monorepo; eventually we decided to split up the projects into independent repositories. The added complexity for managing releases, end-to-end testing, and additional developer overhead was not worth it. After several years, we ported back to a monorepo using Nx. Today, I would use Turborepo over Nx, I prefer how it manages project dependencies independently.
A RESTful API
🟧 Regret-ish
As our Angular applications needed data, we needed an API. As we were using Lambdas behind API Gateway, the obvious option was REST API. REST APIs are wonderful, but it becomes difficult to create useful endpoints that can fully bootstrap a user’s session. As a result, we often find our applications end up making too many requests on startup. I believe there are times for REST APIs and there are times for alternatives.
Today, I would utilize GraphQL when appropriate, in addition to REST APIs, to allow for more flexibility.
Kanban
🟩 Endorse
Early on we spent a moment on sprint planning meetings, I came from a team of a dozen engineers and I fell into the same routine. Agile didn’t work well for our when we were one or two engineers and as we grew to a whopping size of four engineers, sprint planning still would be a lot of overhead for little value.
In a fast-paced startup, where client support, sales demos, and feature development compete for attention, priorities shift constantly. Unlike Scrum, which requires sprint planning, Kanban lets us adapt instantly, ensuring we could respond to changing needs without the overhead of excessive planning.
Kanban was the right approach, and I have come to believe it a superior system for smaller teams. Maintain priorities diligently so you always know what to do next, and stay focused on your current assignment if at all possible. My primary goal when using Kanban is to minimize the times you need to pull engineers out of work they’ve begun, and to ensure they always know what they should pick up next. Continue to run retrospectives and incorporate feedback from the team into your process and product.
Cost tracking and resource budgets
🟥 Regret
Early on, our costs were minimal. We designed an entirely serverless application, and production costs remained low for the first two years. Then, as we rapidly expanded, our costs suddenly spiked to $6,000 a month, quickly eating into our limited financial resources—something that could have been easily avoided. At one point, our biggest expense was the SSM Parameter Store, due to a poor implementation of how we loaded parameters. We should have been using AWS Secrets Manager instead.
Later, adding a NAT within our VPC escalated costs again, as we hadn’t implemented the proper VPC Endpoints.
We also incurred massive, unnecessary costs due to a poorly configured AWS Backup plan, which backed up our entire DynamoDB database daily—even though we had PITR enabled and our only tested recovery process relied on DynamoDB’s native restoration capabilities rather than AWS Backup.
The key lessons here:
- Implement a monthly budget review process for cloud expenses.
- Investigate unexpected cost increases early.
- Set up and use AWS Budgets to track spending and prevent surprises.
- Always analyze new infrastructure costs before scaling.
Continuous Delivery
🟩 Endorse
We invested early in a fully automated CI/CD. A good CI/CD ensures deployments are consistently reliable. We hosted the CI/CD on Gitlab and eventually refactored it to GitHub Actions when we migrated to GitHub. Our strategy was simple: deploy early, deploy often.
We wanted to deploy features the moment it was partially visible, usable, or functional. This allowed us to incorporate feedback we built. Due to limited development resources, we usually delivered minimal viable features and moved on to other priorities for a time. We’d return to enhance features after gathering additional feedback from several clients.
Feedback is a vital part of a good SLDC. Clients typically like being part of the process.
Life without QA
🟧 Endorse-ish
We didn’t have a QA analyst for the first five years. We maintained continuous delivery continuous, regularly with numerous deployments to prod a day. While we managed to produce high-quality work and rarely had outages, it was due to diligence of every developer testing their work. It was difficult and risky at times.
Unit testing and integration tests are essential; invest in automated testing early. Write an appropriate amount of test coverage. Set a high bar for developers to test their changes thoroughly. Review every pull request diligently. Hire a QA.
Project Management
🟩 Endorse
We didn’t have a Project Manager for the first six years. Yet, we built a product that delivered exceptional value to our clients. I typically filled the role of a project manager. In my opinion, engineers make good product managers, but they rarely want to do the work.
Clients and Sales rarely ask for a feature without a purpose, but they typically ask for a solution without explaining the reasoning behind it. Often, the request is what they believe to be an “easy to implement” feature. A project manager with the appropriate knowledge of the technology can help determine the right approach. It is vital to understand the issue actually being solved by the request.
Good project managers should be technical. Engineers who want to build products that delight users need to understand the reason behind the request.
SaaS
Slack
🟩 Endorse
For development teams, in my opinion, Slack remains superior to Microsoft Teams. Slack has added many useful features over the past seven years and continued to prioritize its integration and easy for collaborative communication.
Jira
🟥 Regret
We used Jira for our development and support tickets. I have never liked it, and I didn’t like it going into the start. It was what I was familiar with. Jira remains a bloated and costly software, providing little value over many alternatives on the market today.
I would likely try out Linear for our development tickets, although I’ve not used it, I am impressed with the feedback I’ve heard, the technical design, and the user experience they focus on delivering. Linear offers a more modern UI, faster performance, and a streamlined workflow compared to Jira.
For client support tickets, I would select an omnichannel support system. Clients would rather not make tickets in a ticketing system, they want their issues heard, and they want to be replied to. You need to respond to clients telling them you are looking into the issue and will follow up, you need to follow up, customers are everything to a SaaS and through diligent customer service we grew our company. We used Jira and made it work, but it had significant overhead. A simpler tool to ensure all emails are received, issues reported during phone conversations are logged, and the issues get assigned to the appropriate team (e.g., implementation or development) is vital.
Confluence
🟥 Regret
We used Confluence since we used Jira. Confluence, like Jira, is bloated. It provides a complex organization system when you want to keep information ready at your team’s fingertips. It is vital to have a place to store company information, but it needs to be incredibly easy to add documents and information to.
Gitlab
🟥 Regret
I opted for Gitlab at the start as it was cheaper than GitHub, this was before GitHub lowered their prices to match Gitlab, and I could self-host it to improve security and privacy. I knew this application would be dealing with protected health information. The issue was managing Gitlab. We were a small startup, for several years we had only two engineers. Taking on managing Gitlab servers and runners was more overhead than it was worth. When GitHub dropped their prices, I immediately said, "yes please" and we moved to GitHub. Overall, GitHub is great, popularity is unquestionable, and they've continued to expand GitHub Actions to become a powerful CI/CD platform that I now appreciate and find more powerful than most CI/CD platforms I've worked with.
Today, I would pick GitHub from the start.
Software
JavaScript Standard Style
🟩 Endorse
Early on, I added strict linters to our codebase. My goal was not to be pedantic, but to improve the long-term maintainability of the codebase. I had little opinion as to the rules of the linter, so I picked JavaScript Standard Style as our base set of rules. The one change I made was to strictly require trailing commas, I have found it makes git diffs easier when you can see only what is being meaningfully changed.
I believe having appropriate tools in place to manage an ever-growing codebase is vital, doing it from day one will avoid the need to clean up the code in the future. Pick some rules and enforce them.
BabelEdit
🟩 Endorse
BabelEdit is a great internationalization tool, has worked well for us.
Dependabot
🟩 Endorse
Dependabot is a tool to manage your dependencies and help keep them up to date. It is incredibly helpful to continuously keep dependencies up to date. If they stagnate, it becomes much harder to handle upgrades. Having an automated tool has become a must.
Snyk
🟩 Endorse
Security was always important for us, my background was in military contracts and I had an expertise with information security. Working with PHI, we needed to take security seriously from day one. Snyk was a great tool to help us with that, it integrated into developer IDEs to help them avoid mistakes to begin with, integrated into our pull requests to avoid committing mistakes, and monitoring dependencies for vulnerabilities. We eventually added its infrastructure and container scanning tools. Put these kinds of tools in place early on.
Hardware
Apple MacBooks
🟩 Endorse
Apple MacBooks are a fantastic product and a great device for developers.