How to Count Distinct Values in SQL with Mutiple Columns, GROUP BY and Examples

By Cristian G. Guasch • Updated: 06/28/23 • 12 min read

When working with databases, it’s often essential to analyze and summarize information efficiently. One common task is to count distinct values in SQL, which allows users to determine how many unique items appear within a specific dataset. By mastering this technique, individuals can gain valuable insights into their data, uncovering trends and patterns that may not be immediately apparent.

To tackle this challenge, SQL provides the COUNT and DISTINCT keywords, which, when combined, enables users to generate an accurate count of distinct items in a record set. Whether it’s for calculating the number of unique products sold, keeping track of user activity, or analyzing customer demographics, understanding how to count distinct values in SQL is an indispensable skill for data professionals.

To count distinct values in a table, a simple SQL query can be used, combining the COUNT and DISTINCT functions along with the desired column name. Additionally, various SQL implementations, such as MySQL, PostgreSQL, and Microsoft SQL Server, may have specific nuances or additional functions to achieve the same result. It’s important for users to be aware of these differences and select the appropriate method for their particular database platform.

Understanding Distinct Values in SQL

When dealing with databases, it’s common to encounter duplicate data. Sometimes, these duplicates can lead to inaccurate results or slow down performance. This is where the concept of distinct values in SQL becomes crucial. By using the COUNT DISTINCT keyword, one can efficiently handle and analyze the unique records in their dataset.

In SQL, DISTINCT is an essential clause that eliminates duplicate records from query results. This is particularly useful when dealing with large datasets, ensuring that only unique data points are identified. To count distinct values in a specific column, the COUNT and DISTINCT keywords are used in combination, resulting in a query that looks like:

SELECT COUNT(DISTINCT column_name)
FROM table_name;

By applying the COUNT DISTINCT query, users can gather valuable insights from their data, such as:

  • Identifying the number of unique products in a store’s inventory.
  • Discovering the total number of distinct clients for a business.
  • Determining how many different categories exist within a content platform.

There are other techniques to achieve similar results while working with SQL. Some of them include using GROUP BY, HAVING, and JOIN. These methods enable users to filter data, aggregate it, or retrieve related data from multiple tables. However, the COUNT DISTINCT option is often the most direct and efficient way to obtain the desired information.

While COUNT DISTINCT is a powerful and valuable tool, it does have certain limitations:

  • It cannot be applied to certain data types like text or image.
  • Performance may decrease in scenarios where there’s a high percentage of distinct values.
  • It’s not recommended for use with large datasets on systems with limited memory.

However, these constraints shouldn’t deter users from leveraging the COUNT DISTINCT functionality in their analysis. With proper understanding and appropriate application, this technique can greatly enhance the overall efficiency and accuracy of data processing in SQL.

In conclusion, counting distinct values in SQL using the COUNT DISTINCT keyword is a powerful and efficient way to analyze and work with unique records. By removing duplicate data, it allows users to gather comprehensive insights, improve performance, and optimize their data analysis tasks.

Applying the COUNT DISTINCT Function

To count distinct values in SQL, one can make use of the COUNT DISTINCT function. This versatile function allows users to find the number of unique occurrences within a specified column. Let’s dive into some practical examples to better understand how to apply the COUNT DISTINCT function.

Imagine having a table called Orders with the following columns: OrderID, CustomerID, OrderDate, and ProductName. The objective is to count the number of unique customers who placed orders.

Using the COUNT DISTINCT function, the SQL query will look like:

SELECT COUNT(DISTINCT CustomerID) as UniqueCustomers
FROM Orders;

In this example, the function counts the distinct CustomerID values in the Orders table. The result will be the total number of unique customers who placed orders.

It’s worth noting that the COUNT DISTINCT function works with various data types, including:

  • Numeric values (e.g., integers or decimals)
  • Text (e.g., names, email addresses, or product descriptions)
  • Dates and times

There are limitations, though. If users attempt to apply the COUNT DISTINCT function on columns with NULL values, they’ll discover that these values won’t be considered in the final count.

Here’s a list of key takeaways for using the COUNT DISTINCT function in SQL:

  • It enables users to count unique values within a specified column.
  • The function works with various data types, such as numeric, text, or dates/times.
  • Remember: NULL values won’t be taken into account when counting distinct values.

An alternative to the COUNT DISTINCT function is to use a combination of COUNT and GROUP BY. Take a look at the following example:

SELECT CustomerID, COUNT(OrderID) as NumberOfOrders
FROM Orders
GROUP BY CustomerID;

This query retrieves the number of orders per customer by utilizing the GROUP BY clause, on the condition that each unique CustomerID has a different order count. Although this approach may be useful in certain situations, it’s generally more efficient and convenient to use the COUNT DISTINCT function for counting distinct values in SQL.

COUNT DISTINCT with Multiple Columns

When working with SQL databases, count distinct is a common operation used to find the number of unique values in a column or set of columns. In some cases, you may need to count distinct values across multiple columns. This section discusses how to achieve that using count distinct sql techniques.

To begin, let’s consider a scenario where you have a database table named ‘orders’, which stores data about customer orders. The table has three columns: ‘order_id’, ‘customer_id’, and ‘product_id’. Now, let’s say you want to know how many unique combinations of ‘customer_id’ and ‘product_id’ exist in the table.

To accomplish this, you’d utilize the COUNT DISTINCT function along with the CONCAT function in SQL. The CONCAT function concatenates multiple columns into a single text string, and the COUNT DISTINCT function counts the unique instances of those concatenated strings. Here’s an example query to achieve the desired result:

SELECT COUNT(DISTINCT CONCAT(customer_id, '_', product_id))
FROM orders;

The underscore in the CONCAT function is used as a separator to avoid miscounting cases where the customer_id and product_id digits could form an unintended unique combination.

In certain database systems such as PostgreSQL and SQL Server, you can perform this task more directly by using a simple tuple in the COUNT DISTINCT query. The resulting query would look like this:

SELECT COUNT(DISTINCT (customer_id, product_id))
FROM orders;

However, it’s crucial to note that this syntax may not be supported across all database systems. In MySQL, for instance, you’ll need to use the CONCAT method mentioned earlier.

To recap, here are the key points covered in this section:

  • To count distinct values across multiple columns, combine the COUNT DISTINCT function with the CONCAT function in your SQL query.
  • Use a separator, such as an underscore, in the CONCAT function to avoid incorrect counts.
  • Alternative approaches, such as using tuples in the COUNT DISTINCT query, might be applicable in certain database systems.

Remember, the appropriate count distinct sql method to use depends on your specific database system, so it’s essential to consider compatibility when writing your queries.

GROUP BY for Categorical Data

When handling large datasets in SQL, one may need to count distinct values in a particular column of a table, which is often an essential part of data analysis. This process can be simplified using the GROUP BY clause for categorical data.

The GROUP BY clause is quite efficient when it comes to counting distinct values within categorical columns. It groups the data based on unique values of a certain column followed by applying the COUNT() function with the DISTINCT keyword to obtain the desired result. Here’s an example of this process:

SELECT column_name1, COUNT(DISTINCT column_name2) as count_distinct_column
FROM table_name
GROUP BY column_name1;

In this query, replace column_name1 with the categorical column, column_name2 with the distinct value column, and table_name with the name of the table you are working with.

Here are a few scenarios that GROUP BY can be efficiently applied to:

  • Counting the number of unique visitors per country on a website
  • Finding the distinct product varieties sold by different vendors
  • Listing down the number of different services rendered by technicians

It’s important to remember that the GROUP BY clause is only applicable for tables with categorical data. Numerical or continuous values are not suitable for this method because grouping them wouldn’t yield meaningful results.

However, if the unique values to be counted are stored in multiple columns, SQL provides a more advanced alternative called the COUNT(distinct sql) using CONCAT() function:

SELECT column_name1, COUNT(DISTINCT CONCAT(column_name2,column_name3)) as count_distinct_combination
FROM table_name
GROUP BY column_name1;

This method allows the user to count unique combinations of values from multiple columns, providing more flexibility and functionality.

In conclusion, whenever the need arises to count distinct values within categorical columns, SQL’s powerful GROUP BY clause proves to be a valuable tool for data analysis. By following the outlined techniques, one can effortlessly obtain the desired counts and make informed decisions based on the derived information.

Using DISTINCT with Aggregate Functions

When working with databases, it’s essential to understand how to count distinct values in SQL. One approach to achieve this is by using the DISTINCT keyword in combination with aggregate functions. This section delves into the power of this combination and demonstrates how it can be useful in various scenarios.

In SQL, aggregate functions are used to perform calculations on a set of values and return a single numerical value. Some common aggregate functions include SUM, AVG, MIN, MAX, and COUNT. The DISTINCT keyword, when paired with these functions, removes duplicate values and calculates results only on unique data.

To illustrate the combination of count distinct SQL along with aggregate functions, let’s consider a few examples:

  1. Using COUNT and DISTINCT: This combination can help obtain the number of unique values in a specific column. For example, imagine a table named orders with columns order_id, customer_id, and product_id. To find the number of unique customers, the following SQL query could be employed: SELECT COUNT(DISTINCT customer_id) FROM orders;
  2. Using SUM and DISTINCT: When dealing with numerical data, it may be necessary to calculate the sum of distinct values. Consider a table named sales with columns product_id, price, and quantity. To compute the sum of unique product prices, the following query can be used: SELECT SUM(DISTINCT price) FROM sales;
  3. Using AVG and DISTINCT: In some cases, determining the average of distinct values is essential. Given the same sales table, calculating the average price of unique products could be done with the following query: SELECT AVG(DISTINCT price) FROM sales;

These examples demonstrate the effectiveness of combining the DISTINCT keyword with aggregate functions in SQL. To summarize, here are some key takeaways:

  • Using the DISTINCT keyword eliminates duplicate values, ensuring aggregate functions work only with unique data.
  • Aggregate functions like COUNT, SUM, and AVG can be used alongside DISTINCT for various purposes.
  • To implement count distinct SQL, remember the syntax: <aggregate function>(DISTINCT <column_name>).

By incorporating these concepts, users can effectively harness the capabilities of SQL to perform calculations on unique values and derive valuable insights from their data.

Combining COUNT DISTINCT and CASE

Combining COUNT DISTINCT and CASE in SQL can be incredibly useful when querying a database for specific data insights. This powerful method enables one to achieve more complex results that aren’t possible by using each function individually. Let’s discuss how the combination works, the benefits it offers, and some practical examples to illustrate its utility.

In a typical SQL query involving COUNT DISTINCT, the main goal is to determine the number of unique values within a specific column. By incorporating the CASE statement, we can further refine the results by applying certain conditions, such as focusing only on specific subsets of data. Consequently, this combination allows for a higher degree of flexibility and customization when querying a database.

There are several benefits to using COUNT DISTINCT with CASE:

  • Greater precision in data analysis by isolating subsets of data within a table
  • Streamlined queries for more efficient data extraction
  • Improved readability and maintainability of code toward more complex queries

To illustrate how this combination might work in practice, let’s take an example involving a table named orders with columns order_id, customer_id, product_id, and price. Our goal for this query is to find the number of unique customers who placed orders for a specific range of products (let’s say product IDs 100-200).

SELECT COUNT(DISTINCT customer_id) as unique_customers
FROM orders
WHERE product_id BETWEEN 100 AND 200;

Now, assume we want to further refine our results and categorize these unique customers by the total amount they spent on these products. To accomplish this, we can combine COUNT DISTINCT and CASE:

SELECT
  COUNT(DISTINCT CASE WHEN price BETWEEN 1 AND 50 THEN customer_id ELSE NULL END) as low_spenders,
  COUNT(DISTINCT CASE WHEN price BETWEEN 51 AND 150 THEN customer_id ELSE NULL END) as mid_spenders,
  COUNT(DISTINCT CASE WHEN price > 150 THEN customer_id ELSE NULL END) as high_spenders
FROM orders
WHERE product_id BETWEEN 100 AND 200;

This query yields a table displaying the classification of unique customers (low, mid, or high spenders) based on their spending within the specified product range. By expertly combining COUNT DISTINCT and CASE, we’ve unleashed the full potential of SQL in pinpointing trends, insights, and opportunities within our data.

Dealing with NULL Values

When working with count distinct SQL queries, it’s essential to understand how to deal with NULL values. These unique values represent missing or unknown data in a database, which can potentially affect the accuracy of the results

Related articles