SQL Aggregation for a smaller result set

Asked
Viewd399

3

I have a database for which I need to aggregate records into another smaller set. This result set should contain the difference between maximum and minumum of specific columns of the original records where they add up to certain SUM, a closed interval constant C.

The constant C determines how the original records are aggregated and no entry in the resulting set ever exceeds it. Naturally I am supposed to run this in natural primary key order..

To illustrate: table has:

  • [key]
  • [a]
  • [b]
  • [minColumn]
  • [maxColumn]
  • [N]

...all are int datatype.

I am after a result set that has entries where the MAX(maxColumn) - MIN(minColumn) for that group such that when their difference is summed up it is less or equal to constant C.

Apart from the MAX(maxColumn) and MIN(minColumn) value I also need the FIRST record column [a] and LAST record column [b] values before creating a new entry in this result set. Finally, the N column should be SUMmed for all original records in a group.

Is there an efficient way to do this without cursors?

-----[Trivial Sample]------------------------------------------------------------

I am attempting to group-by a slightly complicated form of a running sum, constant C.

There is only one table, columns are all of int type and sample data

declare @t table (
  PK int primary key
    , int a, int b, int minColumn, int maxColumn, int N 
)

insert @t values (1,5,6,100,200,1000)
insert @t values (2,7,8,210,300,2000)
insert @t values (3,9,10,420,600,3000)
insert @t values (4,11,12,640,800,4000)

Thus for:

key, a,   b, minColumn, maxColumn,    N
---------------------------------------
1,   5,   6,       100,       200, 1000 
2,   7,   8,       210,       300, 2000 
3,   9,  10,       420,       600, 3000 
4,   11, 12,       640,       800, 4000 

I need the result set to look like, for a constant C of 210 :

firstA | lastB | MIN_minColumn | MAX_maxColumn | SUM_N
5       8                  100             300    3000 
9       10                 420             600    3000 
11      12                 640             800    4000 

[ Adding the bounty and sample as discussed below]

For C = 381, It should contain 2 rows:

firstA | lastB | MIN_minColumn | MAX_maxColumn | SUM_N
5            8             100             300    3000 
9           12             420             800    7000

Hope this demonstrates the problem better.. and for a constant C say 1000 you would get 1 record in the result:

firstA | lastB | MIN_minColumn | MAX_maxColumn | SUM_N
5           12             100             800   10000
  • simple example added, please let me know if it requires further explanation of the problem.. cheers.

    rama-jka toti06 октября 2009, 20:29
  • latest one if required, but 2005 would be ideal..

    rama-jka toti06 октября 2009, 20:57
  • What do you want to group by?

    RedFilter06 октября 2009, 19:49
  • I am trying to follow your examples, but I am just not clear on how you are doing the grouping. Why does the number of rows in the result set vary depending on the value of C?

    RMorrisey09 октября 2009, 00:20
  • Какая у вас версия SQL Server?

    KM.06 октября 2009, 20:33
  • в вашем вопросе не хватает практически всего, что необходимо для решения этой проблемы: таблиц, столбцов, примеров данных, особенностей группировки и ожидаемых результатов

    KM.06 октября 2009, 19:51
  • a set of example data and the example results would clarify this.

    dnagirl06 октября 2009, 19:45

3 ответов

2
 DECLARE @c int
SELECT @c = 210

SELECT MIN(a) firstA,
       MAX(b) lastB, 
       MIN(minColumn) MIN_minColumn, 
       MAX(maxColumn) MAX_maxColumn, 
       SUM(N) SUM_N
FROM @t t 
JOIN (SELECT key, floor(sum/@c) as rank
        FROM (SELECT key, 
                     (SELECT SUM(t2.maxColumn - t2.minColumn) 
                        FROM @t t2 
                       WHERE t2.key <= t1.key 
                    GROUP BY t1.key) as sum
               FROM @t t1) A
     ) B on B.key = t.key
GROUP BY B.rank

/*

Table A: for each key, calculating SUM[maxColumn-minColumn] of all keys below it.
Table B: for each key, using the sum in A, calculating a rank so that:
  sum = (rank + y)*@c where 0 <= y < 1. 
  ex: @c=210, rank(100) = 0, rank(200) = 0, rank(220) = 1, ...
finally grouping by rank, you'll have what you want.

*/
 
  • Have narrowed it down to difference dating back all the way to SQL 2000 which accepted group-by on single aggregation clause or similar. The problem is with created/real tables, where GROUP BY will not pass the parsing.. otherwise on the sample and the table variable it works.

    rama-jka toti09 октября 2009, 18:25
  • Спасибо. Я не понимаю, что ты хочешь делать. Я протестировал этот запрос на SQL Server 2005, и он сработал. Вы хотите изменить его, чтобы добавить другие столбцы, или использовать его в качестве таблицы соединений в другом запросе?

    manji09 октября 2009, 08:58
  • Ok, just to confirm that for the problem as presented (and it was a confusing exercise) the solution DOES work. For a slightly modified data it requires minor changes but that’s to be expected. Wonderful work Najm, thanks for additional help and your time. I wish I could upvote on the solution again or some option SO doesn’t have yet :)..

    rama-jka toti10 октября 2009, 17:05
  • Wonderful.. The only problem I am having is with the GROUP BY not containing a column that is an outer reference, at least MS SQL reports this parsing error. I presume it was written on another DB? Removing it will still produce the correct results but in practice I would like it to be there for [key] changes (ie. key is often compound in production). Any suggestions on that? Will accept the answer regardless…

    rama-jka toti09 октября 2009, 08:40
1

Я немного запутался в логике группировки результатов, которые вы пытаетесь получить, но, судя по описанию того, что вы ищете, я думаю, вам нужно предложение HAVING. Вы должны уметь делать что-то вроде:

 SELECT groupingA, groupingB, MAX(a) - MIN(b)
FROM ...
GROUP BY groupingA, groupingB
HAVING (MAX(a) - MIN(b)) < C
 

... чтобы отфильтровать разницу между максимальным и минимальным значениями после определения группировки. Надеюсь, это будет полезно

  • The editors were kind enough to format the post to perfection :) The samples above cater for 4 scenarios that illustrate the problem, which I admit is a bit odd.

    I am open to suggestions on the simplest, shortest, elegant way of doing this, I will gladly repost with optimisation-heading with a bounty for it and reference to an accepted solution.. but I’m not sure HAVING would produce the correct results for the samples above.

    rama-jka toti09 октября 2009, 08:46
1

объявить @c int

выберите @c = 210

выберите firstA = min (a), lastB = max (b), MIN_minColumn = min (minColumn), MAX_maxColumn = max (maxColumn), SUM_N = sum (N) от @t где minColumn <= @c

объединить все

выберите a, b, minColumn, maxColumn, N от @t где minColumn> @c

  • the condition “MAX_maxcolumn - MIN_mincolumn ) <= C” is it always relative to row 1 or can be anywhere ? what is the expected result for C = 381 for following samples ? { 1, 5, 6, 100, 200, 1000 } { 2, 7, 8, 210, 300, 2000 } { 3, 9, 10, 420, 600, 3000 } { 4, 11,12, 640, 800, 4000 } { 5, 13,14, 810, 850, 4000 } { 6, 15,16, 860, 870, 4000 }

    will record 5 & 6 also “group” into 1 record ?

    Squirrel07 октября 2009, 22:56
  • Thanks.. It is definitely not relative to row 1, but more like it can be ‘anywhere’, ie. it is relative to the last record of the ‘grouping’, or better put, latest record of result.

    Your sample would produce { 5, 6, 100, 300, 3000 }, { 9, 12, 420, 800, 7000 }, { 13, 16, 810, 870, 8000 }…

    So yes, 5 & 6 would be grouped into 1 record.

    rama-jka toti08 октября 2009, 09:02
  • Довольно близко, спасибо, но не получается правильный набор результатов для C = 381. Он должен содержать 2 строки:

    {5, 8, 100, 300, 3000} {9, 12, 420, 800, 7000}

    Но вместо этого он генерирует 3, как в примере выше.

    rama-jka toti07 октября 2009, 18:46
  • Not sure how to pull it off, but I’m looking for a WHERE clause to have the ‘grouping’ such that ( MAX_maxcolumn - MIN_mincolumn ) <= C.. or perhaps there is another way in order to achieve handling such as the one for C=381 above.

    rama-jka toti07 октября 2009, 21:04