- OS: Ubuntu
- RDS (MySQL) used: 5.7
- Requirements:
- AWS account with RDS MySQL instance
- Gitlab account
- Terraform installation
- Docker installation
- Vault installation
Introduction:
The General Data Protection Regulation (GDPR) is a European regulatory text that frames data processing equally throughout the European Union. It came into force on May 25, 2018.

It has been designed around 3 objectives:
- enforce people’s rights
- make data processors more accountable
- enhance the credibility of regulation through closer cooperation between data protection authorities.
Process:
To summarize a classic need: receivers want to test Production data but in an environment to which they have access and which is less critical, Staging. However, if we want to comply with RGPD standards, we need to anonymize this data and then export it to another environment.
To do this, we want to create a backup of a database in the Production environment, anonymize this backup and then restore it to a database in the Staging environment.
To ensure that the backup is carried out without any impact on our employees, we’re going to launch it at 4:00 a.m. with a CloudWatch event.
Here’s a diagram illustrating the anonymization process:

Technical Part:
When I declare my provider “aws” I specify the assume role that allows me to access the AWS account in question:
|
|
Then I create my Lambda function, which will allow me to execute my python code and set up environment variables thanks to datasources that allow me to retrieve 2 variabes from Vault :GITLAB_PIPELINE_TOKEN
et GITLAB_ANONYMIZE_PIPELINE_URL
.
|
|
I then want to configure my 3 Cloudwatch events, which will allow me to perform a specific action depending on the event. Here’s an example, in 3 parts, of how to configure an event:
|
|
Above, in order;
- I’ve created an event that will run every day at 4:00 a.m.
- I authorize my event to invoke my Lambda function rds_snapshot
- I create my event_target which will target my Lambda function rds_snapshot with action and lang as arguments.
As you can see, this architecture, which consists of 3 Terraform resources, will be replicated 3 times, making 3 events:
- A cronjob-based event for snapshot creation (example above)
- An event that occurs when snapshot creation is available, to invoke restoration (as an RDS instance) of the snapshot via my python script.
- Then a final event based on the end of restoration, to execute a POST request to Gitlab, which will anonymize the restored instance, dump and restore it on a Staging basis.
Here’s a diagram explaining how to automate the RDS Snapshot backup:
