Personal Blog – MS Awareness Week 24th – 30th April 2023

Introduction The next few days (24th-30th April) mark Multiple Sclerosis Awareness Week and as I have been living with the condition for a long time now, I’d like to talk about it and share my thoughts, my experience and my feelings in order for you to understand me a little better. Multiple Sclerosis For anyoneContinue reading “Personal Blog – MS Awareness Week 24th – 30th April 2023”

From Warehouse to Lakehouse Pt.3 – Slowly Changing Dimensions (SCD) with Delta

SCD Type 3 in SQL and Python Introduction After recently designing a few Slowly Changing Dimensions with a client, I thought it would be good to revisit an earlier post theme and expand on the SCD Types. For more information on this blog series and Slowly Changing Dimensions with Databricks and Delta Lakes check out SCD TypeContinue reading “From Warehouse to Lakehouse Pt.3 – Slowly Changing Dimensions (SCD) with Delta”

Tips for the Databricks Certified Associate Developer for Apache Spark 3.0 – Python – Pt.2

Following on from my previous post I wanted to cover off some more key topics that can really help your understanding of Spark and diving in to the Databricks Certified Associate Developer for Apache Spark 3.0 exam. For more information on general assessment tips, great practice exams to take and other core topics, please seeContinue reading “Tips for the Databricks Certified Associate Developer for Apache Spark 3.0 – Python – Pt.2”

Tips for the Databricks Certified Associate Developer for Apache Spark 3.0 – Python – Pt.1

After recently diving in to (and passing!) the Associate Developer for Apache Spark 3.0 exam certification from Databricks, I thought it would be useful to go over some quick points to remember and some potential ‘gotcha’ topics for anyone considering the challenge. The majority of the exam (72% in fact) features the use of theContinue reading “Tips for the Databricks Certified Associate Developer for Apache Spark 3.0 – Python – Pt.1”

From Warehouse to Lakehouse Pt.2 – Slowly Changing Dimensions (SCD) with Delta

SCD Type 2 in SQL and Python Introduction For more information on this blog series and Slowly Changing Dimensions with Databricks and Delta Lakes check out SCD Type 1 from part 1 of the ‘From Warehouse to Lakehouse’ series:https://headinthecloud.blog/2021/08/17/from-warehouse-to-lakehouse-slowly-changing-dimensions-scd-with-delta-and-sql/ All code examples are available in SQL and Python (PySpark) from my GitHub repo so youContinue reading “From Warehouse to Lakehouse Pt.2 – Slowly Changing Dimensions (SCD) with Delta”

From Warehouse to Lakehouse Pt.1 – Slowly Changing Dimensions (SCD) with Delta

SCD Type 1 in SQL and Python Introduction With the move to cloud based Data Lake platforms there has often been criticism from the more traditional Data Warehousing community. A Data Lake, offering cheap, almost endlessly scalable storage in the cloud is hugely appealing to a platform administrator however over the number of years thatContinue reading “From Warehouse to Lakehouse Pt.1 – Slowly Changing Dimensions (SCD) with Delta”