top of page

Formula 1

Data Engineering Project on Azure Cloud

This project is a big data initiative leveraging Azure cloud services, particularly focusing on the emerging Data Lake architecture. It involves ingesting Formula-1 race data from an external API, storing it as Delta tables in Azure Data Lake Storage (ADLS), and then processing it for reporting and analysis.

Key components and concepts utilized in the project include:

 

  1. Batch Load: Full and incremental data ingestion strategies are employed.

  2. Delta Tables: Delta tables are used for efficient storage and management of data in ADLS.

  3. Azure Service Principal: Identity and access management is handled using Azure Service Principal.

  4. Key Vault: Sensitive information such as credentials and secrets are securely stored and managed using Azure Key Vault.

  5. Libraries: Data processing and manipulation are facilitated using libraries such as PySpark API or pandas.

  6. ETL/ELT Workflow: The workflow for Extract, Transform, Load (ETL) or Extract, Load, Transform (ELT) operations is orchestrated and automated using Azure Data Factory.

  7. Analysis & Visualization: After data preparation, analysis is conducted to identify significant patterns, particularly focusing on dominant Formula-1 drivers and teams throughout history.

Technologies used in the project include:

  • Azure Databricks: For data processing, analytics, and collaboration.

  • Azure Key Vault: For secure storage of keys, secrets, and certificates.

  • Azure Data Factory: For orchestrating and automating data workflows.

  • Azure Data Lake Gen2: As the storage solution for structured and unstructured data.

  • Power BI: For visualization and reporting.

  • Delta Lakes: For efficient and reliable data lake storage and management.

Download Project

bottom of page