Why is this stream aggregate necessary?












11














Check out this query. It's pretty simple (see the end of the post for table and index definitions, and a repro script):



SELECT MAX(Revision)
FROM dbo.TheOneders
WHERE Id = 1 AND 1 = (SELECT 1);



Note: the "AND 1 = (SELECT 1) is just to keep this query from being auto-parameterized, which I felt like was confusing the issue - it actually gets the same plan with or without that clause though




And here's the plan (paste the plan link):



plan with a stream agg



Since there is a "top 1" there, I was surprised to see the stream aggregate operator. It doesn't seem necessary to me, since there is guaranteed to only be one row.



To test that theory, I tried out this logically equivalent query:



SELECT MAX(Revision)
FROM dbo.TheOneders
WHERE Id = 1
GROUP BY Id;


Here's the plan for that one (paste the plan link):



plan without a stream agg



Sure enough, the group by plan is able to get by without the stream aggregate operator.



Notice that both queries read "backwards" from the end of the index and do a "top 1" to get the max revision.



What am I missing here? Is the stream aggregate actually doing work in the first query, or should it be able to be eliminated (and it's just a limitation of the optimizer that it's not)?



By the way, I realize this is not an incredibly practical problem (both queries report 0 ms of CPU and elapsed time), I'm just curious about the internals / behavior being exhibited here.





Here's the setup code I ran before running the two queries above:



DROP TABLE IF EXISTS dbo.TheOneders;
GO

CREATE TABLE dbo.TheOneders
(
Id INT NOT NULL,
Revision SMALLINT NOT NULL,
Something NVARCHAR(23),

CONSTRAINT PK_TheOneders PRIMARY KEY NONCLUSTERED (Id, Revision)
);
GO

INSERT INTO dbo.TheOneders
(Id, Revision, Something)
SELECT DISTINCT TOP 1000
1, m.message_id, 'Do...'
FROM sys.messages m
ORDER BY m.message_id
OPTION (MAXDOP 1);

INSERT INTO dbo.TheOneders
(Id, Revision, Something)
SELECT DISTINCT TOP 100
2, m.message_id, 'Do that thing you do...'
FROM sys.messages m
ORDER BY m.message_id
OPTION (MAXDOP 1);
GO









share|improve this question





























    11














    Check out this query. It's pretty simple (see the end of the post for table and index definitions, and a repro script):



    SELECT MAX(Revision)
    FROM dbo.TheOneders
    WHERE Id = 1 AND 1 = (SELECT 1);



    Note: the "AND 1 = (SELECT 1) is just to keep this query from being auto-parameterized, which I felt like was confusing the issue - it actually gets the same plan with or without that clause though




    And here's the plan (paste the plan link):



    plan with a stream agg



    Since there is a "top 1" there, I was surprised to see the stream aggregate operator. It doesn't seem necessary to me, since there is guaranteed to only be one row.



    To test that theory, I tried out this logically equivalent query:



    SELECT MAX(Revision)
    FROM dbo.TheOneders
    WHERE Id = 1
    GROUP BY Id;


    Here's the plan for that one (paste the plan link):



    plan without a stream agg



    Sure enough, the group by plan is able to get by without the stream aggregate operator.



    Notice that both queries read "backwards" from the end of the index and do a "top 1" to get the max revision.



    What am I missing here? Is the stream aggregate actually doing work in the first query, or should it be able to be eliminated (and it's just a limitation of the optimizer that it's not)?



    By the way, I realize this is not an incredibly practical problem (both queries report 0 ms of CPU and elapsed time), I'm just curious about the internals / behavior being exhibited here.





    Here's the setup code I ran before running the two queries above:



    DROP TABLE IF EXISTS dbo.TheOneders;
    GO

    CREATE TABLE dbo.TheOneders
    (
    Id INT NOT NULL,
    Revision SMALLINT NOT NULL,
    Something NVARCHAR(23),

    CONSTRAINT PK_TheOneders PRIMARY KEY NONCLUSTERED (Id, Revision)
    );
    GO

    INSERT INTO dbo.TheOneders
    (Id, Revision, Something)
    SELECT DISTINCT TOP 1000
    1, m.message_id, 'Do...'
    FROM sys.messages m
    ORDER BY m.message_id
    OPTION (MAXDOP 1);

    INSERT INTO dbo.TheOneders
    (Id, Revision, Something)
    SELECT DISTINCT TOP 100
    2, m.message_id, 'Do that thing you do...'
    FROM sys.messages m
    ORDER BY m.message_id
    OPTION (MAXDOP 1);
    GO









    share|improve this question



























      11












      11








      11


      1





      Check out this query. It's pretty simple (see the end of the post for table and index definitions, and a repro script):



      SELECT MAX(Revision)
      FROM dbo.TheOneders
      WHERE Id = 1 AND 1 = (SELECT 1);



      Note: the "AND 1 = (SELECT 1) is just to keep this query from being auto-parameterized, which I felt like was confusing the issue - it actually gets the same plan with or without that clause though




      And here's the plan (paste the plan link):



      plan with a stream agg



      Since there is a "top 1" there, I was surprised to see the stream aggregate operator. It doesn't seem necessary to me, since there is guaranteed to only be one row.



      To test that theory, I tried out this logically equivalent query:



      SELECT MAX(Revision)
      FROM dbo.TheOneders
      WHERE Id = 1
      GROUP BY Id;


      Here's the plan for that one (paste the plan link):



      plan without a stream agg



      Sure enough, the group by plan is able to get by without the stream aggregate operator.



      Notice that both queries read "backwards" from the end of the index and do a "top 1" to get the max revision.



      What am I missing here? Is the stream aggregate actually doing work in the first query, or should it be able to be eliminated (and it's just a limitation of the optimizer that it's not)?



      By the way, I realize this is not an incredibly practical problem (both queries report 0 ms of CPU and elapsed time), I'm just curious about the internals / behavior being exhibited here.





      Here's the setup code I ran before running the two queries above:



      DROP TABLE IF EXISTS dbo.TheOneders;
      GO

      CREATE TABLE dbo.TheOneders
      (
      Id INT NOT NULL,
      Revision SMALLINT NOT NULL,
      Something NVARCHAR(23),

      CONSTRAINT PK_TheOneders PRIMARY KEY NONCLUSTERED (Id, Revision)
      );
      GO

      INSERT INTO dbo.TheOneders
      (Id, Revision, Something)
      SELECT DISTINCT TOP 1000
      1, m.message_id, 'Do...'
      FROM sys.messages m
      ORDER BY m.message_id
      OPTION (MAXDOP 1);

      INSERT INTO dbo.TheOneders
      (Id, Revision, Something)
      SELECT DISTINCT TOP 100
      2, m.message_id, 'Do that thing you do...'
      FROM sys.messages m
      ORDER BY m.message_id
      OPTION (MAXDOP 1);
      GO









      share|improve this question















      Check out this query. It's pretty simple (see the end of the post for table and index definitions, and a repro script):



      SELECT MAX(Revision)
      FROM dbo.TheOneders
      WHERE Id = 1 AND 1 = (SELECT 1);



      Note: the "AND 1 = (SELECT 1) is just to keep this query from being auto-parameterized, which I felt like was confusing the issue - it actually gets the same plan with or without that clause though




      And here's the plan (paste the plan link):



      plan with a stream agg



      Since there is a "top 1" there, I was surprised to see the stream aggregate operator. It doesn't seem necessary to me, since there is guaranteed to only be one row.



      To test that theory, I tried out this logically equivalent query:



      SELECT MAX(Revision)
      FROM dbo.TheOneders
      WHERE Id = 1
      GROUP BY Id;


      Here's the plan for that one (paste the plan link):



      plan without a stream agg



      Sure enough, the group by plan is able to get by without the stream aggregate operator.



      Notice that both queries read "backwards" from the end of the index and do a "top 1" to get the max revision.



      What am I missing here? Is the stream aggregate actually doing work in the first query, or should it be able to be eliminated (and it's just a limitation of the optimizer that it's not)?



      By the way, I realize this is not an incredibly practical problem (both queries report 0 ms of CPU and elapsed time), I'm just curious about the internals / behavior being exhibited here.





      Here's the setup code I ran before running the two queries above:



      DROP TABLE IF EXISTS dbo.TheOneders;
      GO

      CREATE TABLE dbo.TheOneders
      (
      Id INT NOT NULL,
      Revision SMALLINT NOT NULL,
      Something NVARCHAR(23),

      CONSTRAINT PK_TheOneders PRIMARY KEY NONCLUSTERED (Id, Revision)
      );
      GO

      INSERT INTO dbo.TheOneders
      (Id, Revision, Something)
      SELECT DISTINCT TOP 1000
      1, m.message_id, 'Do...'
      FROM sys.messages m
      ORDER BY m.message_id
      OPTION (MAXDOP 1);

      INSERT INTO dbo.TheOneders
      (Id, Revision, Something)
      SELECT DISTINCT TOP 100
      2, m.message_id, 'Do that thing you do...'
      FROM sys.messages m
      ORDER BY m.message_id
      OPTION (MAXDOP 1);
      GO






      sql-server group-by aggregate sql-server-2017 database-internals






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Dec 26 at 6:19









      Joe Obbish

      20.6k32880




      20.6k32880










      asked Dec 18 at 15:42









      jadarnel27

      3,5751330




      3,5751330






















          1 Answer
          1






          active

          oldest

          votes


















          15














          You can see the role of this aggregate if no rows match the WHERE clause.



          SELECT MAX(Revision)
          FROM dbo.TheOneders
          WHERE Id = 1
          AND 1 = 1 /*To avoid auto parameterisation*/
          AND Id%3 = 4 /*always false*/


          In that case zero rows go into the aggregate but it still emits one as the correct semantics are to return NULL in this case.



          enter image description here



          This is a scalar aggregate as opposed to a vector one.



          Your "logically equivalent" query is not equivalent. Adding GROUP BY Id would make it a vector aggregate and then the correct behaviour would be to return no rows.



          See Fun with Scalar and Vector Aggregates for more about this.






          share|improve this answer























            Your Answer








            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "182"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f225279%2fwhy-is-this-stream-aggregate-necessary%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            15














            You can see the role of this aggregate if no rows match the WHERE clause.



            SELECT MAX(Revision)
            FROM dbo.TheOneders
            WHERE Id = 1
            AND 1 = 1 /*To avoid auto parameterisation*/
            AND Id%3 = 4 /*always false*/


            In that case zero rows go into the aggregate but it still emits one as the correct semantics are to return NULL in this case.



            enter image description here



            This is a scalar aggregate as opposed to a vector one.



            Your "logically equivalent" query is not equivalent. Adding GROUP BY Id would make it a vector aggregate and then the correct behaviour would be to return no rows.



            See Fun with Scalar and Vector Aggregates for more about this.






            share|improve this answer




























              15














              You can see the role of this aggregate if no rows match the WHERE clause.



              SELECT MAX(Revision)
              FROM dbo.TheOneders
              WHERE Id = 1
              AND 1 = 1 /*To avoid auto parameterisation*/
              AND Id%3 = 4 /*always false*/


              In that case zero rows go into the aggregate but it still emits one as the correct semantics are to return NULL in this case.



              enter image description here



              This is a scalar aggregate as opposed to a vector one.



              Your "logically equivalent" query is not equivalent. Adding GROUP BY Id would make it a vector aggregate and then the correct behaviour would be to return no rows.



              See Fun with Scalar and Vector Aggregates for more about this.






              share|improve this answer


























                15












                15








                15






                You can see the role of this aggregate if no rows match the WHERE clause.



                SELECT MAX(Revision)
                FROM dbo.TheOneders
                WHERE Id = 1
                AND 1 = 1 /*To avoid auto parameterisation*/
                AND Id%3 = 4 /*always false*/


                In that case zero rows go into the aggregate but it still emits one as the correct semantics are to return NULL in this case.



                enter image description here



                This is a scalar aggregate as opposed to a vector one.



                Your "logically equivalent" query is not equivalent. Adding GROUP BY Id would make it a vector aggregate and then the correct behaviour would be to return no rows.



                See Fun with Scalar and Vector Aggregates for more about this.






                share|improve this answer














                You can see the role of this aggregate if no rows match the WHERE clause.



                SELECT MAX(Revision)
                FROM dbo.TheOneders
                WHERE Id = 1
                AND 1 = 1 /*To avoid auto parameterisation*/
                AND Id%3 = 4 /*always false*/


                In that case zero rows go into the aggregate but it still emits one as the correct semantics are to return NULL in this case.



                enter image description here



                This is a scalar aggregate as opposed to a vector one.



                Your "logically equivalent" query is not equivalent. Adding GROUP BY Id would make it a vector aggregate and then the correct behaviour would be to return no rows.



                See Fun with Scalar and Vector Aggregates for more about this.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Dec 19 at 10:14

























                answered Dec 18 at 15:59









                Martin Smith

                61.7k10166247




                61.7k10166247






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Database Administrators Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.





                    Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                    Please pay close attention to the following guidance:


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f225279%2fwhy-is-this-stream-aggregate-necessary%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Morgemoulin

                    Scott Moir

                    Souastre