This website uses cookies
This website uses cookies. For further information on how we use cookies you can read our Privacy and Cookie notice
This website uses cookies. For further information on how we use cookies you can read our Privacy and Cookie notice
In stock
Easy Return, Quick Refund.Details
QABETE ENTERPRISES
62%Seller Score
64 Followers
Shipping speed: Excellent
Quality Score: Very Poor
Customer Rating: Good
"Databricks and Apache Spark in Action: A Practical Guide to Building Scalable Data Pipelines and Advanced Analytics Workflows" by Jeffrey Tromp serves as a hands-on manual for leveraging Databricks' unified platform atop Apache Spark to engineer robust data systems. It demystifies Spark's distributed computing for ETL pipelines, real-time analytics, and ML workflows, using Python, Scala, and SQL examples tailored to cloud environments.
The book progresses from Spark fundamentals—RDDs, DataFrames, lazy evaluation—to Databricks-specific tools like Delta Lake for ACID transactions, Unity Catalog for governance, and workflows for orchestration. Practical labs cover ingestion from diverse sources, transformations via Spark SQL/MLlib, and optimization techniques like caching, partitioning, and AQE for production-scale performance.
Sections detail scalable pipelines with Structured Streaming, feature stores, and end-to-end ML ops, including model training on massive datasets and deployment via MLflow. It addresses common pitfalls like shuffle spills and executor tuning, empowering data engineers to build resilient systems.
Aligns with your Python expertise and digital marketing analytics needs—apply for SEO data lakes or campaign optimization in Nairobi's tech scene, enhancing business strategy via big data insights.
1 BOOK
This product has no ratings yet.
/product/02/5364623/1.jpg?0021)
Subscribe to our newsletter
and be the first one to know about our amazing deals