Skip to content

This repository contains practical examples of data warehousing concepts, including star schema and ETL processes, all implemented using MySQL.

License

Notifications You must be signed in to change notification settings

madhurimarawat/Data-Warehousing

Repository files navigation

Data-Warehousing

This repository contains practical examples of data warehousing concepts, including star schema and ETL processes, all implemented using MySQL.

Repo Size GitHub Stars GitHub Forks GitHub Issues Closed Issues Open Pull Requests Closed Pull Requests GitHub Discussions GitHub Contributors License Last Commit GitHub Watchers


Tools and Technologies βš™οΈπŸ’»

  1. MySQL: Β  An open-source relational database management system for managing and organizing structured data using SQL.
  2. Python: Β  A high-level, interpreted programming language known for its readability and versatility. It supports multiple programming paradigms and is widely used for web development, data analysis, automation, and scientific computing.
  3. Pandas: Β  An open-source data analysis and manipulation library for Python. It provides data structures like DataFrames and Series, enabling efficient handling and analysis of structured data.
  4. NumPy: Β  A fundamental package for numerical computing in Python. It offers support for multi-dimensional arrays and matrices, along with a collection of mathematical functions for performing efficient operations on these data structures.
  5. MySQL Connector: Β  A Python library that enables connecting to a MySQL database server. It allows developers to execute SQL queries, manage database connections, and interact with MySQL databases directly from Python applications.

Directory Structure πŸ“‚

Data-Warehousing/
β”‚
β”œβ”€β”€ Experiment 1/
β”‚   β”œβ”€β”€ Documentation/ πŸ“
|   β”‚   β”œβ”€β”€ Explanation of methods and key observations from Experiment 1.
β”‚
β”œβ”€β”€ Experiment 2/
β”‚   β”œβ”€β”€ Codes/ πŸ’»
β”‚   β”‚   └── Contains the MySQL script for input and output in Experiment 2.
β”‚   β”œβ”€β”€ Documentation/ πŸ“
β”‚   β”‚   β”œβ”€β”€ Detailed documentation explaining the methodology and analysis for Experiment 2.
β”‚   β”œβ”€β”€ Output/ πŸ“Š
β”‚   β”‚   └── Contains the results and analysis of Experiment 2.
β”œβ”€β”€ Experiment 3/
β”‚   β”œβ”€β”€ Codes/ πŸ’»
β”‚   β”‚   └── Contains the MySQL script for input and output in Experiment 3.
β”‚   β”œβ”€β”€ Documentation/ πŸ“
β”‚   β”‚   β”œβ”€β”€ Detailed documentation explaining the methodology and analysis for Experiment 3.
β”‚   β”œβ”€β”€ Output/ πŸ“Š
β”‚   β”‚   └── Contains the results and analysis of Experiment 3.
.....

Project Folder Structure

  • Codes πŸ’» (If applicable)
    Contains the source code files used for data processing and analysis in each experiment. These scripts are essential for executing tasks within the experiment. Additionally, the following files are included:

    • MySQL Commands and Output (TXT): This text file contains the specific MySQL command-line operations used in the experiment, documenting both the input commands and their corresponding outputs. A detailed explanation of these commands and their results can be found in the Documentation folder, available in both MD and PDF formats.
  • Dataset πŸ“ (If applicable)
    Stores datasets used in experiments, ensuring easy access and organization.

    • e.g., data.csv, stream_data.json
  • Output πŸ“Š
    Stores results generated from experiments, including visualizations, processed data, logs, and analysis reports. Each experiment's output is stored separately with a relevant name.

    • e.g., Experiment_X_Output (where "X" refers to the relevant experiment number)
  • Documentation πŸ“
    Contains detailed documentation for each experiment, covering methodology, analysis, and insights. Documentation is provided in both Markdown (.md) and PDF formats for easy reference.

    • documentation.md (Markdown version)
    • documentation.pdf (PDF version, converted from Markdown)
  • Commands File (πŸ“‹)
    A text file stored in the Codes folder, documenting specific commands, steps, and MySQL output used in the experiment. This is especially useful for tracking command-line operations and database interactions.

    • MySQL_Commands_Output.txt

Table Of Contents πŸ“” πŸ”– πŸ“‘

This experiment introduces the fundamental concepts and architecture of data warehousing, including ETL processes, data modeling techniques, and OLAP functionalities.

This experiment focuses on designing and implementing a star schema data model for a specified business scenario, emphasizing the creation of fact and dimension tables.

In this experiment, the Snowflake Schema was implemented to achieve a more normalized data structure than the Star Schema.

In this experiment, an ETL process was designed and implemented to migrate data from operational databases to a data warehouse.

In this experiment, OLAP operations such as slicing, dicing, drill-down, drill-up, and pivoting were applied to analyze predefined data in a data warehouse.

This experiment involved cleaning and transforming raw data before loading it into the data warehouse, ensuring consistency, accuracy, and completeness.

SQL queries were optimized for large-scale data warehouse applications using techniques like indexing, partitioning, and query tuning to improve performance.

This experiment implemented data aggregation techniques to generate summarized views of large datasets, enhancing reporting and analytical efficiency.

This experiment involves generating business reports from a MySQL data warehouse using SQL queries and Python for data extraction and processing.

A real-time data pipeline is implemented with Python, continuously ingesting streaming data into a MySQL data warehouse for immediate analysis.

This experiment applies Slowly Changing Dimensions (SCD) techniques in a MySQL data warehouse, developed using Python to maintain historical data accuracy.


Thanks for Visiting πŸ˜„

Releases

No releases published

Packages

No packages published