Chapter 1
Introduction to Big Data Ecosystem
Chapter 2
Apache Spark Framework
Spark architecture and Resilient Distributed Datasets (RDDs)
medium • 1 min read
Transformations (map, filter) vs Actions (reduce, collect)
medium • 2 min read
Spark DataFrames and Spark SQL
medium • 3 min read
Catalyst Optimizer and Tungsten execution engine
medium • 4 min read
Performance tuning in Spark (Caching, Partitioning)
medium • 5 min read
Chapter 3
Machine Learning with Spark (MLlib)
Overview of Spark MLlib library
medium • 1 min read
Building ML Pipelines in Spark
medium • 2 min read
Feature extraction, transformation, and selection in Spark
medium • 3 min read
Distributed classification and regression with Spark ML
medium • 4 min read
Clustering large datasets using Spark K-Means
medium • 5 min read
Chapter 4
Real-Time Streaming Analytics
Batch processing vs Stream processing
medium • 1 min read
Apache Kafka architecture (Producers, Consumers, Brokers, Topics)
medium • 2 min read
Spark Structured Streaming basics
medium • 3 min read
Window operations in streaming data
medium • 4 min read
Applications of real-time analytics (Fraud detection, IoT)
medium • 5 min read
Chapter 5
Cloud Analytics and Big Data
Big Data Services on AWS (EMR, Redshift, Athena)
medium • 1 min read
Big Data Services on Azure (Synapse, Databricks) and GCP (BigQuery)
medium • 2 min read
Serverless architectures for data analytics
medium • 3 min read
Designing scalable data lakes
medium • 4 min read
Data governance and security in cloud big data
medium • 5 min read