streaming Archives - ML Journey

Monitoring Kinesis Data Stream Performance

December 29, 2025 by Peter Song

Amazon Kinesis Data Streams has become the backbone of real-time data processing for organizations handling millions of events per second. Whether you’re tracking user behavior, processing IoT sensor data, or aggregating log files, the performance of your Kinesis streams directly impacts your application’s reliability and user experience. Yet, many teams struggle with identifying bottlenecks, optimizing … Read more

Comparing Tools for Big Data and Real-Time Analytics: Kafka vs Flink vs Spark Streaming

October 27, 2025 by Peter Song

Apache Kafka, Apache Flink, and Apache Spark Streaming dominate conversations about real-time big data processing, yet confusion persists about their roles and relationships. Teams evaluating these technologies often frame the question incorrectly—”which one should we use?”—when the reality is more nuanced. These tools occupy different positions in the streaming architecture stack and often work together … Read more

Batch vs Streaming Feature Pipelines

October 6, 2025 by Peter Song

In the world of machine learning operations, feature pipelines serve as the critical infrastructure that transforms raw data into the features your models consume. The architecture you choose—batch or streaming—fundamentally shapes your system’s capabilities, performance characteristics, and operational complexity. Understanding the nuances between these two approaches is essential for building ML systems that meet your … Read more

Understanding the Difference Between Batch and Stream Processing

September 8, 2025July 13, 2025 by Peter Song

In today’s data-driven world, organizations process massive volumes of information daily to make informed decisions and drive business outcomes. Two fundamental approaches dominate the data processing landscape: batch processing and stream processing. Understanding the difference between batch and stream processing is crucial for data engineers, architects, and business leaders who need to choose the right … Read more

Building Real-Time Data Pipelines with Apache Kafka

July 4, 2025July 4, 2024 by Peter Song

Building real-time data pipelines with Apache Kafka is essential for processing large volumes of data efficiently and ensuring that businesses can respond to changes in real-time. This comprehensive guide will help you understand how to create and manage real-time data pipelines using Apache Kafka, focusing on integration with Apache Spark for machine learning applications. We’ll … Read more