Recently, we were working on a stored procedure which involved combining similar result sets from various queries into a single result set with the help of the UNION operator. As I was reviewing the query written by one of the team members, I noticed that they had explicitly specified column aliases for all the result-sets being consumed in the UNION.
In order to test the team’s understanding about the fundamentals, I simply mentioned that the same column name across all result sets in the UNION was not required. The conversation that followed went something like this:
Sample query being reviewed:
USE tempdb; GO --Table variables to simulate the result sets --being combiend using UNION DECLARE @productSet1 TABLE (ProductId INT NOT NULL IDENTITY(1,1), ProductName VARCHAR(50) NULL ); DECLARE @productSet2 TABLE (Id INT NOT NULL IDENTITY(1,1), Product VARCHAR(50) NULL ); --Insert some test data INSERT INTO @productSet1 (ProductName) VALUES ('Camshaft'), ('Connecting Rod'), ('Starter Motor'); --Insert some test data INSERT INTO @productSet2 (Product) VALUES ('Piston'), ('Distributor'), ('10A Fuse'), ('16A Fuse'); --Query that was written by the developers SELECT ps1.ProductId AS ProductId, ps1.ProductName AS ProductName FROM @productSet1 AS ps1 UNION SELECT ps2.Id AS ProductId, ps2.Product AS ProductName FROM @productSet2 AS ps2; GO
Me: What has been done is ideal in the sense that ensuring the same column names in all the result sets makes the query much more readable. But, UNION does not need that column names are the same across all result sets.
Team: No? But then how will SQL Server combine the result sets if it does not know which column in the subsequent result sets to map with which column from the first result set?
Me: All SQL Server needs is that the number and order (i.e. sequence) of columns in the result sets being consumed by the UNION are the same and the data types are compatible. It is not required that the column names are the same.
Team: In that case, what will be the column names in the result set?
Me: That’s quite simple – the final result set follows the column names from the first result set.
Team: Hmm… can you show an example?
Me: Sure! Let’s manipulate the given script itself for the quick demo.
--Assumption: Table variables referenced by the query already exist. SELECT ps1.ProductId AS ProductId, ps1.ProductName AS ProductName FROM @productSet1 AS ps1 UNION SELECT ps2.Id, ps2.Product FROM @productSet2 AS ps2; GO
If we run this statement, we will see that the UNION succeeds and the results follow the column names from the first result set:
Team: This is very interesting! So we can name the columns of result sets involved in a UNION to names of our choice.
Me: Yes. However, it may have an impact on the readability of the query. It’s better to have the name, number, sequence and data-types of the columns involved in a UNION same, but it’s not necessary to do so.
Team: Ok, now we get it. Thank-you, Nakul!
Me: You’re welcome!
Whenever two or more result sets are being combined using a UNION or UNION ALL, all one has to ensure is that:
- The number & order of the columns being consumed in the UNION are the same across all queries
- The data types of the columns across the queries must be compatible
It is NOT required (but is recommended from a readability perspective) to have the same column names across all queries.
- UNION operator [Books On Line, Link]
Until we meet next time,
Here’s another interesting item:
While the column names will come from the first resultset, it appears that the data type will come from the largest datatype for a given column.
DECLARE @table_1 TABLE ( [id_1] [int] )
DECLARE @table_2 TABLE ( [id_2] [bigint] )
INSERT INTO @table_1 ([id_1])
VALUES (1), (2), (3)
INSERT INTO @table_2 ([id_2])
VALUES (1), (2), (3), (10000000000)
FROM ( SELECT [id_1] FROM @table_1
SELECT [id_2] FROM @table_2
FROM [tempdb].[sys].[columns] [c]
INNER JOIN [tempdb].[sys].[types] [t] ON [c].[user_type_id] = [t].[user_type_id]
WHERE [c].[object_id] = OBJECT_ID(‘tempdb..#tempresult’)
DROP TABLE #tempresult
Here you see that @table_2 has a value that is too large for an [int].
Querying the system view [tempdb].[sys].[columns] shows that the column [id_1] is of type [bigint]
That’s really interesting! It does make sense for SQL Server to choose the largest data-size applicable to the column.
I did not know this. Thank-you very much for sharing.