Posts

Showing posts from January, 2026

Datawarehouse and BigQuery

Image
          Data warehouse is a repository where all data from multiple sources are stored —raw data, metadata, and summary data— in a structural approach. The data will be used for various user from data analyst, data scientist, and business analyst. Data may need to be cleaned to ensure data quality before using. Data in a data warehouse can be aggregated for specific need (i.e., marketing, sales, inventory) are called data marts. What is Google BigQuery?           Google BigQuery is a part of Google Cloud Platform (GCP). It is the autonomous data to AI platform, automating the entire data life cycle, from ingestion to AI-driven insights, so you can go from data to AI to action faster [ 1 ]. BigQuery is serverless, meaning users don't have to  manage infrastructure. BigQuery separates storage and compute engines. It can automatically scale to handle large datasets and run complex, petabyte-scale queries quickly. BigQuery s...

Using Secrets as Environment Variables in GitHub Codespace

My Problem with Secret Management           In the second module of Data Engineering Zoomcamp , I've learned about Kestra, an open source workflow orchestrator, to automate and provision Data Engineering tasks. Today, it is inevitable for data engineers to work with cloud platform like Google Cloud Platform (GCP), Microsoft Azure, Amazon Web Service (AWS), etc. Thus, we need to pass our credential into Kestra, as well as other workflow orchestrator, to streamline our works. I use GitHub for version control and coding via its Codespaces. I think it is not a good decision to put my credential in an public repository. I following  Manage Secrets in Kestra | How-to Guide from Kestra to keep my credential secret and use it as a variable in my workflows. Since I use an open-source version, I can not directly add my credential as a secret via Kestra UI. Thus, I opt another method to store my credentials in a  .env file. Then, convert them to base-64 an...

Basics Dockers for Data Engineering

Image
On this page What is Docker Docker Volume Multiple Containers Hello there! I have started the Data Engineering Zoomcamp 2026 cohort led by Alexey Grigorev. I want to share what I am learning as I go. My goal is to explain how things work and why we use them. Codes of each learning modules are in  my GitHub . Knowing what to do as a data engineer Data engineer builds and maintain what so called "data pipeline" a program that ingest data from sources, clean, transform and put them in a database. The goal is to make sure the pipeline works whatever it takes. Now, imagine that you build a pipeline using Python 3.11 and the production server is on 3.8: your pipeline cannot run on the server. What we need is a software that allow us to run the same version and dependencies on every machine. That is what docker do. What is Docker Docker is a containerization software; it isolates an environment from the ...

Popular posts from this blog

Basics Dockers for Data Engineering

Using Secrets as Environment Variables in GitHub Codespace

Datawarehouse and BigQuery