SQL Saturday #226 in South Florida, USA

I am excited yet again as I will be speaking at SQL Saturday #226 in sunny South Florida, USA. I will be speaking in the BI and Reporting Track in the afternoon. If you are from the South Florida region, please don’t miss out on this exclusive training day on everything SQL Server.

The event schedule is listed here.

sqlsat226

Cheers.

Update: I had a great time speaking at Fort Lauderdale, Florida. Below is a farewell photo with other speakers (happy faces)

SQL Saturday South Florida Speakers

Advertisement

Statistical Median with T-SQL (2012)

I have been interacting with my statistician colleagues lately and there was a reporting requirement to calculate medians over a spread of continuously distributed values. Thankfully with SQL Server 2012, you do not have to use the ROW_NUMBER() function as you had to with earlier T-SQL versions.

PERCENTILE_CONT and PERCENTILE_DISC are now available in T-SQL 2012. According to the SQL BOL, these functions are described as:

PERCENTILE_CONT: Calculates a percentile based on a continuous distribution of the column value in SQL Server 2012. The result is interpolated and might not be equal to any of the specific values in the column.

PERCENTILE_DISC: Computes a specific percentile for sorted values in an entire rowset or within distinct partitions of a rowset in SQL Server 2012. PERCENTILE_DISC calculates the percentile based on a discrete distribution of the column values; the result is equal to a specific value in the column.

Let us put these functions to test and understand them a bit more in detail:

I have a sample table(ServiceCost) with ServiceIDs (that will be the data element to group by) and Cost (the field for which medians have to be calculated).

MedianSampleTable

I calculated the medians using PERCENTILE_CONT and PERCENTILE_DISC as below:

</code>

SELECT DISTINCT ServiceID,

              PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY Cost)

              OVER (PARTITION BY ServiceID) AS MedianValue,

              PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY Cost)

              OVER (PARTITION BY ServiceID) AS MedianDisc

  FROM ServiceCost

  order by ServiceID<code>

 

The result of this query is:

MedianOutputTable

The difference between the way these two functions work is clear from the output for ServiceID: SA456. The PERCENTILE_CONT function calculated the mean of the values {62,73} to result in 67.5; while the PERCENTILE_DISC function resulted in the value 62 which is one of the column values in the data set.

We can now show off the power of T-SQL for statistical analysis… Hurray!