Datawarehouse and BigQuery
In the second module of Data Engineering Zoomcamp, I've learned about Kestra, an open source workflow orchestrator, to automate and provision Data Engineering tasks. Today, it is inevitable for data engineers to work with cloud platform like Google Cloud Platform (GCP), Microsoft Azure, Amazon Web Service (AWS), etc. Thus, we need to pass our credential into Kestra, as well as other workflow orchestrator, to streamline our works. I use GitHub for version control and coding via its Codespaces. I think it is not a good decision to put my credential in an public repository.
I following Manage Secrets in Kestra | How-to Guide from Kestra to keep my credential secret and use it as a variable in my workflows. Since I use an open-source version, I can not directly add my credential as a secret via Kestra UI. Thus, I opt another method to store my credentials in a .env file. Then, convert them to base-64 and store in a .env_encoded file and make sure that both .env and .env_encoded are in .gitignore to prevent leaking. However, this safety approach will let me recuring add my credential every time a re-open the Codespace, which sometimes make me annoying 😅. So, I end up with making it as a secret variable in my GitHub Codespace.
{
"name": "Sam",
"city": "Bangkok"
}
{"name":"Sam","city":"Bangkok"}
Once the secret is added, we can open go into our working directory in the Codespaces to create a new file, let's say setup_secrets.sh. Open that file and put the following code into it.
#!/bin/bash
# setup_secrets.sh
# 1. Create the .env_encoded file (which Kestra will read)
# We take the GitHub Secret, Base64 encode it, and prefix it with SECRET_
echo "SECRET_GCP_CREDS=$(echo -n "$GCP_SERVICE_ACCOUNT_JSON" | base64 -w 0)" > .env_encoded
Then update docker-compose.yaml following what Will said in the Youtube.
services:
kestra:
image: kestra/kestra:latest
env_file:
- .env_encoded # Kestra reads the Base64 secrets from here
environment:
KESTRA_CONFIGURATION: |
kestra:
secret:
type: ENVIRONMENT
Lastly, we want this .env_concoded to spin up everytime we re-open our Codespaces. Add the following line to the .devcontainser.json
"postStartCommand": "bash setup_secrets.sh"
Now, we can use credentials as a secret environmental variable in Kestra without re-creating .env and .env_encoded everytime we re-open Codespaces.
Comments
Post a Comment