Advanced SQL Data Analysis

This project demonstrates advanced SQL skills that can transform raw data into strategic insights.

CTEs and Window Functions for Business Intelligence

Project Overview

This project demonstrates my expertise in advanced SQL techniques by analyzing a comprehensive sales database using Common Table Expressions (CTEs) and Window Functions. The analysis transforms raw operational data into strategic insights that drive business decisions.

Key Business Questions Addressed:

  • Customer segmentation based on purchasing behavior
  • Product performance analysis across multiple dimensions
  • Sales team performance evaluation
  • Data quality validation and integrity checks
  • Time-series analysis of sales trends
SQL Data Analysis CTEs Window Functions Business Intelligence Data Quality

Database Schema

The analysis was performed on a relational database with the following complete tables:

Orders Table

OrderID ProductID CustomerID SalesPersonID OrderDate ShipDate OrderStatus ShipAddress BillAddress Quantity Sales CreationTime
1 101 2 3 2025-01-01 2025-01-05 Delivered 9833 Mt. Dias Blv. 1226 Shoe St. 1 10 2025-01-01 12:34:56.0000000
2 102 3 3 2025-01-05 2025-01-10 Shipped 250 Race Court NULL 1 15 2025-01-05 23:22:04.0000000
3 101 1 5 2025-01-10 2025-01-25 Delivered 8157 W. Book 8157 W. Book 2 20 2025-01-10 18:24:08.0000000
4 105 1 3 2025-01-20 2025-01-25 Shipped 5724 Victory Lane NULL 2 60 2025-01-20 05:50:33.0000000
5 104 2 5 2025-02-01 2025-02-05 Delivered NULL NULL 1 25 2025-02-01 14:02:41.0000000
6 104 3 5 2025-02-05 2025-02-10 Delivered 1792 Belmont Rd. NULL 2 50 2025-02-06 15:34:57.0000000
7 102 1 1 2025-02-15 2025-02-27 Delivered 136 Balboa Court NULL 2 30 2025-02-16 06:22:01.0000000
8 101 4 3 2025-02-18 2025-02-27 Shipped 2947 Vine Lane 4311 Clay Rd 3 90 2025-02-18 10:45:22.0000000
9 101 2 3 2025-03-10 2025-03-15 Shipped 3768 Door Way NULL 2 20 2025-03-10 12:59:04.0000000
10 102 3 5 2025-03-15 2025-03-20 Shipped NULL NULL 0 60 2025-03-16 23:25:15.0000000

Customers Table

CustomerID FirstName LastName Country Score
1 Jossef Goldberg Germany 350
2 Kevin Brown USA 900
3 Mary NULL USA 750
4 Mark Schwarz Germany 500
5 Anna Adams USA NULL

Products Table

ProductID Product Category Price
101 Bottle Accessories 10
102 Tire Accessories 15
103 Socks Clothing 20
104 Caps Clothing 25
105 Gloves Clothing 30

Employees Table

EmployeeID FirstName LastName Department BirthDate Gender Salary ManagerID
1 Frank Lee Marketing 1988-12-05 M 55000 NULL
2 Kevin Brown Marketing 1972-11-25 M 65000 1
3 Mary NULL Sales 1986-01-05 F 75000 1
4 Michael Ray Sales 1977-02-10 M 90000 2
5 Carol Baker Sales 1982-02-11 F 55000 3

OrdersArchive Table

OrderID ProductID CustomerID SalesPersonID OrderDate ShipDate OrderStatus ShipAddress BillAddress Quantity Sales CreationTime
1 101 2 3 2024-04-01 2024-04-05 Shipped 123 Main St 456 Billing St 1 10 2024-04-01 12:34:56.0000000
2 102 3 3 2024-04-05 2024-04-10 Shipped 456 Elm St 789 Billing St 1 15 2024-04-05 23:22:04.0000000
3 101 1 4 2024-04-10 2024-04-25 Shipped 789 Maple St 789 Maple St 2 20 2024-04-10 18:24:08.0000000
4 105 1 3 2024-04-20 2024-04-25 Shipped 987 Victory Lane NULL 2 60 2024-04-20 05:50:33.0000000
4 105 1 3 2024-04-20 2024-04-25 Delivered 987 Victory Lane NULL 2 60 2024-04-20 14:50:33.0000000
5 104 2 5 2024-05-01 2024-05-05 Shipped 345 Oak St 678 Pine St 1 25 2024-05-01 14:02:41.0000000
6 104 3 5 2024-05-05 2024-05-10 Delivered 543 Belmont Rd. NULL 2 50 2024-05-06 15:34:57.0000000
6 104 3 5 2024-05-05 2024-05-10 Delivered 543 Belmont Rd. 3768 Door Way 2 50 2024-05-07 13:22:05.0000000
6 101 3 5 2024-05-05 2024-05-10 Delivered 543 Belmont Rd. 3768 Door Way 2 50 2024-05-12 20:36:55.0000000
7 102 3 5 2024-06-15 2024-06-20 Shipped 111 Main St 222 Billing St 0 60 2024-06-16 23:25:15.0000000

CTE Implementation

Common Table Expressions (CTEs) were used to create modular, readable queries that build upon each other to answer complex business questions.

Customer Sales Analysis

This query calculates total sales per customer, identifies their last order date, ranks them by sales performance, and segments them into categories for targeted marketing.

/*
=======================================================================
                 CTE (NON RECURSIVE) MINI PROJECT 
=======================================================================
*/

--CTE (Standalone)
--Find the total sales per customer

WITH TotalSalesPerCustomer AS
(SELECT 
    CustomerID,
    SUM(Sales) TotalSales
FROM [Sales].[Orders]
GROUP BY CustomerID
),

--CTE (Multiple Standalone) USE a comma to separate each standalone CTE and do not repeat the with
--Find the last order date for each customers

LastOrderDate AS
(SELECT 
    CustomerID,
    MAX(OrderDate) LastOrderDate
FROM [Sales].[Orders]
GROUP BY CustomerID

--Note, you can use any sql statement within the CTE except the ORDER BY
),

--CTE (Nested) 
--Rank the customers based on Total Sales per customer

CustomerRanking AS
(SELECT 
    CustomerID,
    TotalSales,
    RANK() OVER(ORDER BY TotalSales DESC) CustomerRank
FROM TotalSalesPerCustomer
),

--Segment the customers based on Total Sales (CTE (Nested) )
CustomerSegmentation AS
(SELECT 
    CustomerID,
    CASE WHEN TotalSales > 110 THEN 'HighSales'
         WHEN TotalSales BETWEEN 90 AND 110 THEN 'MediumSales'
         ELSE 'LowSales' END Segment
FROM TotalSalesPerCustomer
)


SELECT 
    C.CustomerID,
    C.FirstName,
    C.LastName,
    T.TotalSales,
    L.LastOrderDate,
    CR.CustomerRank,
    CS.Segment
FROM [Sales].[Customers] C
LEFT JOIN TotalSalesPerCustomer T ON T.CustomerID = C.CustomerID
LEFT JOIN LastOrderDate L ON L.CustomerID = C.CustomerID
LEFT JOIN CustomerRanking CR ON CR.CustomerID = C.CustomerID
LEFT JOIN CustomerSegmentation CS ON CS.CustomerID = C.CustomerID
ORDER BY TotalSales DESC;

Business Impact:

This analysis enables the marketing team to:

  • Identify top customers for loyalty programs
  • Target reactivation campaigns to customers who haven't ordered recently
  • Allocate resources effectively based on customer value segments
  • Personalize marketing messages based on customer value tier

Window Functions Implementation

Window functions were used to perform complex calculations across sets of rows while retaining detailed data.

Sales Performance Analysis

This comprehensive query provides multiple perspectives on sales performance including product totals, rankings, and comparisons to averages.

/*
========================================
          WINDOW FUNCTIONS
========================================
*/
USE SalesDB;
GO

SELECT 
    OrderID,
    ProductID,
    OrderDate,
    OrderStatus,
    Sales,
    SUM(Sales) OVER(PARTITION BY ProductID) ProductTotal,
    SUM(Sales) OVER(PARTITION BY ProductID, OrderStatus) ProductTotalbyStatus,
    SUM(Sales) OVER() TotalSales,
    ROUND(CAST(Sales AS FLOAT)/SUM(Sales) OVER() * 100, 2) PercentageOfTotal,
    AVG(Sales) OVER() AverageSales,
    AVG(Sales) OVER(PARTITION BY ProductID) AverageSalesbyProduct,
    MAX(Sales) OVER() HighestSales,
    MIN(Sales) OVER() LowestSales
FROM [Sales].[Orders]

Business Impact:

This comprehensive view helps business leaders:

  • Understand which products contribute most to revenue
  • Identify performance differences by order status
  • Benchmark individual orders against averages
  • Spot outliers in the sales distribution
  • Make data-driven inventory decisions

Data Quality Check

Window functions can also be used to identify potential data quality issues, as shown in this duplicate detection query:

--Check whether the table 'OrdersArchive' contains any duplicates
--Checking data quality

SELECT *
FROM (
    SELECT 
        OrderID,
        COUNT(OrderID) OVER(PARTITION BY OrderID) IDCheck
    FROM [Sales].[OrdersArchive]
) T
WHERE IDCheck != 1

Business Impact:

Maintaining data integrity is crucial for accurate reporting. This query helps:

  • Identify potential duplicate orders
  • Ensure the reliability of analytics
  • Maintain trust in business metrics
  • Improve data governance processes

Advanced Analytics Examples

Additional window function implementations demonstrating advanced analytical capabilities:

--Rank each of the orders based on their sales from highest to lowest
SELECT 
    OrderID,
    OrderDate,
    Sales,
    RANK() OVER(ORDER BY Sales DESC) SalesRank
FROM [Sales].[Orders]

--Calculate the moving average of sales for each product over time
SELECT 
    OrderID,
    ProductID,
    OrderDate,
    Sales,
    AVG(Sales) OVER(PARTITION BY ProductID) AverageSales,
    AVG(Sales) OVER(PARTITION BY ProductID ORDER BY OrderDate) MovingAverage
FROM [Sales].[Orders]

--Find the deviation of each sales from the minimum and maximum sales amount
SELECT 
    OrderID,
    Sales,
    MAX(Sales) OVER() HighestSales,
    MIN(Sales) OVER() LowestSales,
    ABS(Sales - MAX(Sales) OVER()) SalesDeviationFromMax,
    (Sales - MIN(Sales) OVER()) SalesDeviationFromMin
FROM [Sales].[Orders]

Business Impact:

These advanced techniques enable:

  • Trend analysis and forecasting
  • Performance benchmarking
  • Anomaly detection
  • Seasonality analysis

Key Takeaways

Technical Achievements

  • Demonstrated mastery of advanced SQL features including CTEs and Window Functions
  • Created modular, maintainable queries that can be easily adapted as business needs change
  • Implemented comprehensive data quality checks
  • Delivered actionable business intelligence from raw data
  • Showcased ability to handle complex analytical requirements

Business Value Delivered

  • Enabled data-driven customer segmentation for targeted marketing
  • Provided visibility into product performance across multiple dimensions
  • Established benchmarks for evaluating sales performance
  • Identified potential data integrity issues before they impact decision-making
  • Created foundation for predictive analytics and forecasting