etl Archives - ML Journey

Building Lightweight ETL Pipelines for Small Projects

November 15, 2025 by Peter Song

Enterprise ETL tools like Informatica, Talend, and Apache Airflow are powerful but often overkill for small projects. When you’re building a startup MVP, automating internal reports, or aggregating data for a side project, you don’t need heavyweight infrastructure with dedicated servers, complex configuration, and steep learning curves. What you need is a lightweight ETL pipeline … Read more

ETL vs ELT in CockroachDB for Modern Data Stacks

November 13, 2025 by Peter Song

The debate between ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) has evolved significantly with the emergence of distributed SQL databases like CockroachDB. Traditional wisdom held that data warehouses were for ELT while operational databases required ETL, but CockroachDB’s unique architecture—combining transactional capabilities with analytical performance and horizontal scalability—blurs these boundaries. Organizations building modern … Read more

ETL vs ELT: Which One Should You Use and Why?

November 10, 2025 by Peter Song

In today’s data-driven world, organizations are drowning in information from countless sources—customer databases, social media feeds, IoT sensors, transaction logs, and more. The challenge isn’t just collecting this data; it’s transforming raw information into actionable insights. This is where ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) come into play. These data integration approaches … Read more

Understanding Change Data Capture (CDC) Data Pipelines for Modern ETL

October 24, 2025 by Peter Song

The evolution of data engineering has fundamentally shifted from batch-oriented Extract, Transform, Load (ETL) processes to continuous, event-driven architectures. Change Data Capture (CDC) sits at the heart of this transformation, enabling organizations to move beyond scheduled data transfers to real-time synchronization. Understanding CDC isn’t just about knowing that it captures database changes—it’s about grasping how … Read more

Building an ETL Pipeline Example with Databricks

October 22, 2025 by Peter Song

Building an ETL pipeline in Databricks transforms raw data into actionable insights through a structured approach that leverages distributed computing, Delta Lake storage, and Python or SQL transformations. This guide walks through a complete ETL pipeline example, demonstrating practical implementation patterns that data engineers can adapt for their own projects. We’ll build a pipeline that … Read more

Hybrid Data Pipeline vs Traditional ETL

October 21, 2025 by Peter Song

The data landscape has transformed dramatically over the past decade. Organizations that once relied exclusively on traditional Extract, Transform, Load (ETL) processes are now exploring hybrid data pipelines to meet modern business demands. This shift isn’t just a technological trend—it represents a fundamental rethinking of how data moves, transforms, and delivers value across enterprises. Understanding … Read more