An Exciting New Era

3 minute read

This post is about why I quit Data Analysis

My Comfort Zone

The majority of my career has been dedicated to leading a team of data analysts. I have been fortunate enough to work towards a meaningful cause and have had the flexibility to engage in various other tasks as needed, such as systems development or training. For the longest time, Data Analysis was my primary focus and data engineering supported me and my team in conducting analysis.

What Changed?

Then, Microsoft announced that within a year, they would discontinue a tool that connected our system to our Azure SQL Database. This tool had facilitated a significant shift in our work process and helped us to create an array of reports. To their credit, Microsoft provided a 12-month notice and presented us with two options: use a data lake or use a data lake with Synapse. I pulled together an options paper based on all the available info and flagged the risk and recommendations to the business within a couple of weeks of the announcement.

Crunch Point

Without delving into the details, time passed and the 12 months became 6, then 3, and finally 1. Although the options paper was approved early on, the necessary resources for implementation were not readily available. As a result, the implementation became a last-minute endeavor. With only one month remaining, my proposal to use Synapse was put into action. Two weeks later, it was set up and handed over to me. Thus, I had two weeks to implement a Data Lake solution that would facilitate our core operational reporting.

My Response

Choosing Synapse turned out to be a wise decision. Microsoft had recommended configuring a Data Lake joined with Synapse, which included storage and data factory. Setting up the Synapse link was straightforward, but then the question arose: how do we get the data to the reports?

Microsoft Learn proved to be invaluable during this time. I cannot stress enough how important it was to have access to this resource. I had to quickly familiarize myself with and implement a whole new way of working with data, including concepts like Data Lake House, Serverless SQL, and Data Factory.

Initially, I considered switching all reporting to Serverless SQL via the Data Lake, as the tools had changed but the data schema and structure had not. However, I soon realized that this was not feasible given the time constraints. So, my solution was to push new and changed data from the Data Lake into the old SQL database on a nightly basis using Data Factory within Synapse. This ensured that no changes were required for the existing reports, except that any changes made during the current day would not be included (which was not a significant issue for the majority of reporting).

With just a couple of days left until the deprecation, I was able to turn off the old link at midnight one evening. From that point onwards, data factory started inserting new records and updating existing ones through a scheduled pipeline run using a data flow that I built. It is worth mentioning that this data flow was dynamic from day one, running for each entity on a loop with passed parameters. Interestingly, I take particular pride in adding an if statement that skips entities without any new or altered records. This was achieved by examining the metadata of files in the data lake folder, resulting in a few minutes of processing time saved each day.

Next steps

In the short term, I used Pyspark and Notebooks to connect with complex APIs and integrate existing pipelines that were using Power Automate. I completed another data challenge for another department on very short notice, ensuring continuity in reporting. I also started exploring the power of Delta Tables and the architecture and structure that a Delta Lake House can facilitate.

In the medium term, I took a risk during a restructure to apply to move into an engineering and architecture role, leaving behind data analysis. I was successful and the new role has now become my day job and passion: enabling Data Analysis, Insight, and Science to flourish.

Looking ahead in the long term, I have numerous data consolidation projects and exciting projects for both myself and the team. I am eager to expand our use of Synapse and prepare for the introduction of Fabric, which is likely to transform data in a similar way that SharePoint changed file management and collaboration.

Conclusion

That is how I made the decision to quit data analysis. The past year has presented significant challenges but also incredible opportunities for growth. Along the way, I have discovered new tricks and tips that I intend to share through this blog. For now, the title will remain the same, as Data Analysis (and Science) is what the engineering and architecture work enables.