Correlated Subquery Versus Non-Correlated Subquery
There are three tables that may be used by a business to keep track of the absent days of its employees. It's from a collection of SQL puzzles by Joe Celko. The reader is tasked with concocting a DELETE statement to remove those employees who have accrued 40 or more absent days. And for that purpose we use a subquery. The one proposed by the author is this:
DELETE FROM Personnel
WHERE emp_id = (SELECT A1.emp_id
FROM Absenteeism AS A1
WHERE A1.emp_id = Personnel.emp_id
GROUP BY A1.emp_id
HAVING SUM(severity_points) >= 40);
And my question is, why go for a correlated one? Isn't it just more efficient to run something like one below?
DELETE FROM Personnel
WHERE emp_id =
(SELECT emp_id FROM Absenteeism
GROUP BY emp_id
HAVING SUM(severity_points) >= 40)
Thank you!
sql-server subquery
add a comment |
There are three tables that may be used by a business to keep track of the absent days of its employees. It's from a collection of SQL puzzles by Joe Celko. The reader is tasked with concocting a DELETE statement to remove those employees who have accrued 40 or more absent days. And for that purpose we use a subquery. The one proposed by the author is this:
DELETE FROM Personnel
WHERE emp_id = (SELECT A1.emp_id
FROM Absenteeism AS A1
WHERE A1.emp_id = Personnel.emp_id
GROUP BY A1.emp_id
HAVING SUM(severity_points) >= 40);
And my question is, why go for a correlated one? Isn't it just more efficient to run something like one below?
DELETE FROM Personnel
WHERE emp_id =
(SELECT emp_id FROM Absenteeism
GROUP BY emp_id
HAVING SUM(severity_points) >= 40)
Thank you!
sql-server subquery
add a comment |
There are three tables that may be used by a business to keep track of the absent days of its employees. It's from a collection of SQL puzzles by Joe Celko. The reader is tasked with concocting a DELETE statement to remove those employees who have accrued 40 or more absent days. And for that purpose we use a subquery. The one proposed by the author is this:
DELETE FROM Personnel
WHERE emp_id = (SELECT A1.emp_id
FROM Absenteeism AS A1
WHERE A1.emp_id = Personnel.emp_id
GROUP BY A1.emp_id
HAVING SUM(severity_points) >= 40);
And my question is, why go for a correlated one? Isn't it just more efficient to run something like one below?
DELETE FROM Personnel
WHERE emp_id =
(SELECT emp_id FROM Absenteeism
GROUP BY emp_id
HAVING SUM(severity_points) >= 40)
Thank you!
sql-server subquery
There are three tables that may be used by a business to keep track of the absent days of its employees. It's from a collection of SQL puzzles by Joe Celko. The reader is tasked with concocting a DELETE statement to remove those employees who have accrued 40 or more absent days. And for that purpose we use a subquery. The one proposed by the author is this:
DELETE FROM Personnel
WHERE emp_id = (SELECT A1.emp_id
FROM Absenteeism AS A1
WHERE A1.emp_id = Personnel.emp_id
GROUP BY A1.emp_id
HAVING SUM(severity_points) >= 40);
And my question is, why go for a correlated one? Isn't it just more efficient to run something like one below?
DELETE FROM Personnel
WHERE emp_id =
(SELECT emp_id FROM Absenteeism
GROUP BY emp_id
HAVING SUM(severity_points) >= 40)
Thank you!
sql-server subquery
sql-server subquery
asked 10 hours ago
Shams Observer
202
202
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
The non-correlated version you proposed will fail if the subquery returns more than one row (which it likely will). You should use IN
rather than =
here. The purpose of Celko's correlation is to make sure only zero or one rows is returned from the subquery (because it matches and groups on emp_id).
Also, don't forget to alias the table inside the subquery and explicitly reference the alias. Since both tables (Personnel and Absenteeism) have an emp_id column, you want to be very clear where you are referring to which table.
This is more of a "best practice" for readability, but can also prevent logical issues and mistakes in queries. For instance, this query would delete every row in the table:
DELETE FROM Personnel WHERE emp_id IN (SELECT emp_id);
Of course, that's a really contrived example, but hopefully it helps.
So here's the updated query:
DELETE FROM Personnel
WHERE emp_id IN
(SELECT a.emp_id FROM Absenteeism a
GROUP BY a.emp_id
HAVING SUM(a.severity_points) >= 40)
Given those changes: you're right that the correlation isn't strictly necessary in this case.
Logically, it seems like the correlated version might be more efficient, especially if there are lots and lots of rows in the Absenteeism table that don't have a match in the Personnel table (because the unbounded subquery might read more data).
But one would have to test to be sure. The query optimizer can take the declarative query you've written and do many different things with it.
1
ffs go to bed man
– sp_BlitzErik
9 hours ago
Thanks for that hint! Being an SQL learner I can't really get why we have to alias the table inside the subquery, if there's no correlation. I mean, there's just one table there, so there's no space for any ambiguity. Can you explain, please?
– Shams Observer
8 hours ago
1
@sp_BlitzErik SQL never sleeps
– jadarnel27
3 hours ago
1
Thanks, @MartinSmith. I've updated my answer (again). Maybe SQL should sleep =P
– jadarnel27
2 hours ago
1
BTW Your "contrived" example does happpen a lot. e.g. see stackoverflow.com/questions/4594733/… and the linked questions bar off that for plenty of examples
– Martin Smith
1 hour ago
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f226095%2fcorrelated-subquery-versus-non-correlated-subquery%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The non-correlated version you proposed will fail if the subquery returns more than one row (which it likely will). You should use IN
rather than =
here. The purpose of Celko's correlation is to make sure only zero or one rows is returned from the subquery (because it matches and groups on emp_id).
Also, don't forget to alias the table inside the subquery and explicitly reference the alias. Since both tables (Personnel and Absenteeism) have an emp_id column, you want to be very clear where you are referring to which table.
This is more of a "best practice" for readability, but can also prevent logical issues and mistakes in queries. For instance, this query would delete every row in the table:
DELETE FROM Personnel WHERE emp_id IN (SELECT emp_id);
Of course, that's a really contrived example, but hopefully it helps.
So here's the updated query:
DELETE FROM Personnel
WHERE emp_id IN
(SELECT a.emp_id FROM Absenteeism a
GROUP BY a.emp_id
HAVING SUM(a.severity_points) >= 40)
Given those changes: you're right that the correlation isn't strictly necessary in this case.
Logically, it seems like the correlated version might be more efficient, especially if there are lots and lots of rows in the Absenteeism table that don't have a match in the Personnel table (because the unbounded subquery might read more data).
But one would have to test to be sure. The query optimizer can take the declarative query you've written and do many different things with it.
1
ffs go to bed man
– sp_BlitzErik
9 hours ago
Thanks for that hint! Being an SQL learner I can't really get why we have to alias the table inside the subquery, if there's no correlation. I mean, there's just one table there, so there's no space for any ambiguity. Can you explain, please?
– Shams Observer
8 hours ago
1
@sp_BlitzErik SQL never sleeps
– jadarnel27
3 hours ago
1
Thanks, @MartinSmith. I've updated my answer (again). Maybe SQL should sleep =P
– jadarnel27
2 hours ago
1
BTW Your "contrived" example does happpen a lot. e.g. see stackoverflow.com/questions/4594733/… and the linked questions bar off that for plenty of examples
– Martin Smith
1 hour ago
add a comment |
The non-correlated version you proposed will fail if the subquery returns more than one row (which it likely will). You should use IN
rather than =
here. The purpose of Celko's correlation is to make sure only zero or one rows is returned from the subquery (because it matches and groups on emp_id).
Also, don't forget to alias the table inside the subquery and explicitly reference the alias. Since both tables (Personnel and Absenteeism) have an emp_id column, you want to be very clear where you are referring to which table.
This is more of a "best practice" for readability, but can also prevent logical issues and mistakes in queries. For instance, this query would delete every row in the table:
DELETE FROM Personnel WHERE emp_id IN (SELECT emp_id);
Of course, that's a really contrived example, but hopefully it helps.
So here's the updated query:
DELETE FROM Personnel
WHERE emp_id IN
(SELECT a.emp_id FROM Absenteeism a
GROUP BY a.emp_id
HAVING SUM(a.severity_points) >= 40)
Given those changes: you're right that the correlation isn't strictly necessary in this case.
Logically, it seems like the correlated version might be more efficient, especially if there are lots and lots of rows in the Absenteeism table that don't have a match in the Personnel table (because the unbounded subquery might read more data).
But one would have to test to be sure. The query optimizer can take the declarative query you've written and do many different things with it.
1
ffs go to bed man
– sp_BlitzErik
9 hours ago
Thanks for that hint! Being an SQL learner I can't really get why we have to alias the table inside the subquery, if there's no correlation. I mean, there's just one table there, so there's no space for any ambiguity. Can you explain, please?
– Shams Observer
8 hours ago
1
@sp_BlitzErik SQL never sleeps
– jadarnel27
3 hours ago
1
Thanks, @MartinSmith. I've updated my answer (again). Maybe SQL should sleep =P
– jadarnel27
2 hours ago
1
BTW Your "contrived" example does happpen a lot. e.g. see stackoverflow.com/questions/4594733/… and the linked questions bar off that for plenty of examples
– Martin Smith
1 hour ago
add a comment |
The non-correlated version you proposed will fail if the subquery returns more than one row (which it likely will). You should use IN
rather than =
here. The purpose of Celko's correlation is to make sure only zero or one rows is returned from the subquery (because it matches and groups on emp_id).
Also, don't forget to alias the table inside the subquery and explicitly reference the alias. Since both tables (Personnel and Absenteeism) have an emp_id column, you want to be very clear where you are referring to which table.
This is more of a "best practice" for readability, but can also prevent logical issues and mistakes in queries. For instance, this query would delete every row in the table:
DELETE FROM Personnel WHERE emp_id IN (SELECT emp_id);
Of course, that's a really contrived example, but hopefully it helps.
So here's the updated query:
DELETE FROM Personnel
WHERE emp_id IN
(SELECT a.emp_id FROM Absenteeism a
GROUP BY a.emp_id
HAVING SUM(a.severity_points) >= 40)
Given those changes: you're right that the correlation isn't strictly necessary in this case.
Logically, it seems like the correlated version might be more efficient, especially if there are lots and lots of rows in the Absenteeism table that don't have a match in the Personnel table (because the unbounded subquery might read more data).
But one would have to test to be sure. The query optimizer can take the declarative query you've written and do many different things with it.
The non-correlated version you proposed will fail if the subquery returns more than one row (which it likely will). You should use IN
rather than =
here. The purpose of Celko's correlation is to make sure only zero or one rows is returned from the subquery (because it matches and groups on emp_id).
Also, don't forget to alias the table inside the subquery and explicitly reference the alias. Since both tables (Personnel and Absenteeism) have an emp_id column, you want to be very clear where you are referring to which table.
This is more of a "best practice" for readability, but can also prevent logical issues and mistakes in queries. For instance, this query would delete every row in the table:
DELETE FROM Personnel WHERE emp_id IN (SELECT emp_id);
Of course, that's a really contrived example, but hopefully it helps.
So here's the updated query:
DELETE FROM Personnel
WHERE emp_id IN
(SELECT a.emp_id FROM Absenteeism a
GROUP BY a.emp_id
HAVING SUM(a.severity_points) >= 40)
Given those changes: you're right that the correlation isn't strictly necessary in this case.
Logically, it seems like the correlated version might be more efficient, especially if there are lots and lots of rows in the Absenteeism table that don't have a match in the Personnel table (because the unbounded subquery might read more data).
But one would have to test to be sure. The query optimizer can take the declarative query you've written and do many different things with it.
edited 1 hour ago
answered 9 hours ago
jadarnel27
3,6301330
3,6301330
1
ffs go to bed man
– sp_BlitzErik
9 hours ago
Thanks for that hint! Being an SQL learner I can't really get why we have to alias the table inside the subquery, if there's no correlation. I mean, there's just one table there, so there's no space for any ambiguity. Can you explain, please?
– Shams Observer
8 hours ago
1
@sp_BlitzErik SQL never sleeps
– jadarnel27
3 hours ago
1
Thanks, @MartinSmith. I've updated my answer (again). Maybe SQL should sleep =P
– jadarnel27
2 hours ago
1
BTW Your "contrived" example does happpen a lot. e.g. see stackoverflow.com/questions/4594733/… and the linked questions bar off that for plenty of examples
– Martin Smith
1 hour ago
add a comment |
1
ffs go to bed man
– sp_BlitzErik
9 hours ago
Thanks for that hint! Being an SQL learner I can't really get why we have to alias the table inside the subquery, if there's no correlation. I mean, there's just one table there, so there's no space for any ambiguity. Can you explain, please?
– Shams Observer
8 hours ago
1
@sp_BlitzErik SQL never sleeps
– jadarnel27
3 hours ago
1
Thanks, @MartinSmith. I've updated my answer (again). Maybe SQL should sleep =P
– jadarnel27
2 hours ago
1
BTW Your "contrived" example does happpen a lot. e.g. see stackoverflow.com/questions/4594733/… and the linked questions bar off that for plenty of examples
– Martin Smith
1 hour ago
1
1
ffs go to bed man
– sp_BlitzErik
9 hours ago
ffs go to bed man
– sp_BlitzErik
9 hours ago
Thanks for that hint! Being an SQL learner I can't really get why we have to alias the table inside the subquery, if there's no correlation. I mean, there's just one table there, so there's no space for any ambiguity. Can you explain, please?
– Shams Observer
8 hours ago
Thanks for that hint! Being an SQL learner I can't really get why we have to alias the table inside the subquery, if there's no correlation. I mean, there's just one table there, so there's no space for any ambiguity. Can you explain, please?
– Shams Observer
8 hours ago
1
1
@sp_BlitzErik SQL never sleeps
– jadarnel27
3 hours ago
@sp_BlitzErik SQL never sleeps
– jadarnel27
3 hours ago
1
1
Thanks, @MartinSmith. I've updated my answer (again). Maybe SQL should sleep =P
– jadarnel27
2 hours ago
Thanks, @MartinSmith. I've updated my answer (again). Maybe SQL should sleep =P
– jadarnel27
2 hours ago
1
1
BTW Your "contrived" example does happpen a lot. e.g. see stackoverflow.com/questions/4594733/… and the linked questions bar off that for plenty of examples
– Martin Smith
1 hour ago
BTW Your "contrived" example does happpen a lot. e.g. see stackoverflow.com/questions/4594733/… and the linked questions bar off that for plenty of examples
– Martin Smith
1 hour ago
add a comment |
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f226095%2fcorrelated-subquery-versus-non-correlated-subquery%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown