How we maintain our staging environment with rich data

Learn more about how we use automation to prepare our staging environment and improve the confidence of our engineers in the products they build.

Before diving deep into the orchestration, it is important to understand why the right data is important for us in the staging environment.

Our test cases heavily depend on the routes we run our buses on, the stops we serve, and the timing at which the bus arrives and departs the stop. Whenever we used to build a new feature and had to test some specific case based on particular timing and route we had to rely on luck.🙂

This means that we needed a staging environment that had the data about our latest routes and stops.

For that, we needed a way to make sure our staging data is updated as per our changing business requirements. The current way of doing this was to take a manual snapshot of our production database and then remove the customer-sensitive data from it. This approach involves a lot of manual intervention and was repeated every 2-3 months.

We needed a customizable and convenient solution that can be easily customized if needed and should not have a lot of moving parts that would require us to make hundreds of changes just for changing a simple thing. We also did not want to over-engineer a simple orchestration task making it complex.

We at Cityflo pay a great deal of attention when it comes to our customers. And with the growing application user base, it is of utmost priority to make sure our customers' data is not exposed to any unwanted source. Hence automation of this process ensures that there is no “manual” intervention needed in this process and customer data is masked by an automated task so that it is not visible to anyone else but that particular customer.

Requirements

Cost-effective: The orchestration should make sure that the cost is optimized as per our needs.
Automated: As mentioned above, we wanted to automate the manual process making sure no human intervention is needed.
Easy and Scalable: The solution should scale and should not be very complex and overshot for our current requirements.

Used AWS Services

RDS: Amazon Relational Database Service is a distributed relational database service by Amazon Web Services. It is a web service running "in the cloud" designed to simplify the setup, operation, and scaling of a relational database for use in applications.

EC2: Amazon Elastic Compute Cloud is a part of Amazon.com's cloud-computing platform, Amazon Web Services, that allows users to rent virtual computers on which to run their own computer applications.

Lambda: AWS Lambda is an event-driven, serverless computing platform provided by Amazon as a part of Amazon Web Services. It is a computing service that runs code in response to events and automatically manages the computing resources required by that code.

Event Bridge: Amazon EventBridge is a serverless event bus that makes it easier to build event-driven applications at scale using events generated from your applications, integrated Software-as-a-Service (SaaS) applications, and AWS services.

Route 53: Amazon Route 53 is a highly available and scalable cloud Domain Name System (DNS) web service. It is designed to give developers and businesses an extremely reliable and cost-effective way to route end users to Internet applications by translating names like www.example.com into the numeric IP addresses like 192.0.2.1 that computers use to connect to each other.

The Process

Let’s get down to business now -

“Multiple services, relatively large data all managed beautifully by one simple process”

Event Bridge triggers a lambda function to start the EC2.

Once the EC2 is up and running, it snapshots and restores our production database so that we have a fresh production-like database to start with.

Wait a minute, you must be wondering why not use lambda for the task?

Good question, we had considered it but due to the following use cases, we decided to go with an EC2 instance instead.

Lambda cannot run more than 15 mins and can handle data only up to 6MB.
Whenever we use SQL-like constraints with Django, it does not apply these changes on the DB level. It emulates them by itself, making it difficult to run quick SQL queries!
ref: https://docs.djangoproject.com/en/4.1/ref/models/fields/#arguments

Hence we decided to run these tasks on an EC2. We make use of multi-threading to make sure our task is quick and not slow.

To strike the right balance between server specifications and multi-threading, we had to run this task and monitor it a few times. (as if your code gets merged into master, without review comments)

When the task has masked all the customer data, it connects the new staging DB to our domain using Route53 and deletes the old database.

With that done, it shuts itself down and waits for its next boot peacefully. We run this task every week during the period we don't have peak load on our production database.

Conclusion

With this orchestration, our team has seen a lot of ease in testing and developing features in a close to a production-like environment.