ETL Pipeline: Automating COVID DATA LOAD from Google Cloud to-BigQuery Using Airflow
Retry logic and logging
Modularized Python Code
Secure credentials vis Airflow-connection
Design for scale and future scheduling
Problem: Manual data ingestion of COVID case data was error-prone and inefficient.
Solution: Designed an automated ETL pipeline triggered daily that fetches Excel data from GCS, transforms it, and loads into BigQuery for downstream reporting.
Data stored in GCS — Raw Excel files uploaded automatically or manually.
Airflow DAG scheduled — Triggers Python task daily.
Download from GCS — Uses google-cloud-storage Python SDK.
Transform with Pandas — Clean and structure data.
Load into BigQuery — Table updated using bigquery.Client.load_table_from_dataframe.