Formula 1

This project is a big data initiative leveraging Azure cloud services, particularly focusing on the emerging Data Lake architecture. It involves ingesting Formula-1 race data from an external API, storing it as Delta tables in Azure Data Lake Storage (ADLS), and then processing it for reporting and analysis.

Batch Load: Full and incremental data ingestion strategies are employed.
Delta Tables: Delta tables are used for efficient storage and management of data in ADLS.
Azure Service Principal: Identity and access management is handled using Azure Service Principal.
Key Vault: Sensitive information such as credentials and secrets are securely stored and managed using Azure Key Vault.
Libraries: Data processing and manipulation are facilitated using libraries such as PySpark API or pandas.
ETL/ELT Workflow: The workflow for Extract, Transform, Load (ETL) or Extract, Load, Transform (ELT) operations is orchestrated and automated using Azure Data Factory.
Analysis & Visualization: After data preparation, analysis is conducted to identify significant patterns, particularly focusing on dominant Formula-1 drivers and teams throughout history.

Technologies used in the project include:

Azure Databricks: For data processing, analytics, and collaboration.
Azure Key Vault: For secure storage of keys, secrets, and certificates.
Azure Data Factory: For orchestrating and automating data workflows.
Azure Data Lake Gen2: As the storage solution for structured and unstructured data.
Power BI: For visualization and reporting.
Delta Lakes: For efficient and reliable data lake storage and management.

Download Project