Possibly the worst analogy in the world – columnstore and ordering

This post is about a topic that everybody who uses columnstore should think about. And that is about ordering. The analogy I will be using is flawed in so many ways, but it help to get my point through. I hope.

You now say to yourself that columnstore indexes aren’t sorted, so what am I yapping about? Hear me out.

Imagine that I will organize my underpants. I don’t know if the word underpants is, sort of, reserved for female underwear, but I’ll use that term anyhow. (I’m Swedish and don’t get all nuances of the English language.) Also Imagine that I have a bunch of drawers. Say I fit about 50 underpants in each drawer and I have some 1000 underpants. How should I do this? The drawer is the rowgroup but it would be ridiculous to say you have about 1,000,000 underpants in each drawer (the approx number of rows in a col-store rowgroup). The 1000 underpants is the number of rows in my “underpants” table, and even 1000 underpants seems like a ridiculous number. Anyhow…

Now imagine that I’m very picky with what underpants I’m wearing. The type (boxer, briefs, string, and whatnot), color, size, material etc. So how do I decide what underpants goes into each drawer? I could just shove them in there without much though about that process. I think that is how many handle their col-store indexes.

Also imagine that we have a device that register attributes about the underpants as we add them to the drawer. Lets call this device u47. So, the u47 will label the drawer with some info on what briefs are in there. Like range of sizes (xxs to xxl). OK, strange perhaps but say that my size vary a lot over time. And also range of colors. And range of models. You have now figured ut that I like to classify things in ranges. I.e., this particular drawer has m-xl; white-green, string to briefs and cotton to satin. This would be the meta-data that SQL Server has for each rowgroup/segment, as seen in sys.column_store_segments.

Lets go back to how you populated your drawers. Did you just shove the underpants there? Then you probably have close to the full range of all attribute values in each drawer. I.e., every drawer has underpants of all sizes, all colors, etc.

Now I want to wear all black underpants. Let’s say I’m a bit strange in that sense. Since there are black underpants in all drawers, it will take a while to find them all. I have to go through each drawer. Or I want all satin underpants. Or perhaps all medium size black satin briefs. I still have to go through each drawer.

But what if I took the time to sort the underpants before adding them to the drawers? I can only sort them by one attribute. I should pick the one attribute that I most often have in my “what underpants should I use today?” decision process. You achieve this by having a row-clustered index on that column when building the col-store clustered index and build the col-store index using WITH DROP_EXISTING. Also, instead of having my friends over and each of us grab some underwear and add them to each person’s drawer (parallelism when building the index), I would do this myself. Yes, that means MAXDOP = 1. Without it, say you have 4 threads/friends and sort by size when when populating the drawers/building the index, then you have 4 drawers with xs-s, 4 drawers with m-l, etc.

For non-clustered col-store index, it would be enough with a non-clustered index that covers the clustered index when converting it from row to col-store index (DROP_EXISTING).

The underpants inside the drawer isn’t sorted; this is a col-store index we’re talking about, remember? But, you might have some fancy vacuum-tech that allow you to fit many underpants in each drawer (the various columnstore compression technologies), something you probably ordered when you bought the u47, btw.

Now, imagine me walking from the shower to my dressing room and thinking about how to find the underpants of today. I.e., the execution plan. “Today, I want to wear underpants based on color, size and material – model doesn’t matter”. I can’t say what drawers to go through, since that information isn’t available at this stage. It is a run-time decision to do drawer-elimination based on the notes on each drawer (what ranges it contains for each attribute). I.e., rowgroup/segment elimination is a run-time decision.

The underwear will only be partitioned over the drawers in an effective manner for one attribute, the one I sorted them by before populating my drawers. If that is size, then one drawer has only xs, another has only x, etc. But the “xs drawer” has all range of colors, materials etc. There might be some correlation between attributes (like rubber underpants tend to be black), but that correlation isn’t usable for us.

How would row indexes fit in this analogy? You have a bunch of post-it notes with which you build a tree-like structure that you lay out on the floor. You use this tree (over size, for instance) to navigate to the ones you are interested in (medium size for instance), and finally you reach the sub-section in the drawer. You can have several such trees, for various attributes, of course. But, the smart vacuum-tech you bought with the u47 isn’t available for these indexes. If you are very selective (remember the rubber pants?), then you can very quickly find that pair using the tree structure. But it is quicker to find all black underpants using the col-store index because you have many of those and the vacuum-tech allow you to fit so many in each drawer.

And the vacuum-tech also magically allow you to put on as many as some 900 pairs of underpants as a time. Remember that I want to wear all my black underpants. This is batch-mode in the execution plan.. (Rumors has it that we at the end of 2019 will have the option to put on more then one pair at a time even without this magical vacuum tech. That would be batch mode on row-store. We’ll see. )

The bottom line? Think about how the data is sorted when building your col-store indexes and consider rebuilding them if your loading/modification routines causes them to degrade over time. Oh, I should add that you shouldn’t mess about with col-store indexes without having Niko’s blog post series handy and also have read Hugo’s series first.

Disclaimer 1: I don’t own any rubber underpants.
Disclaimer 2: If I did own any rubber underpants, I would still have disclaimer 1 in there…

35 Replies to “Possibly the worst analogy in the world – columnstore and ordering”

  1. Thanks a lot for sharing this with all of us you actually know what you’re talking approximately!
    Bookmarked. Kindly additionally talk over with my site =).
    We may have a hyperlink trade arrangement between us

  2. Hi there i am kavin, its my first time to commenting anyplace,
    when i read this article i thought i could also make comment due to this good piece of writing.

  3. Thanks for one’s marvelous posting! I really enjoyed reading it, you could be a great author.I will ensure that I bookmark
    your blog and definitely will come back later on. I
    want to encourage yourself to continue your great posts, have a nice evening!

  4. Your style is really unique compared to other people I have
    read stuff from. Thank you for posting when you’ve got
    the opportunity, Guess I’ll just book mark this site.

    my webpage; CBD for dogs

  5. Hi! Someone in my Facebook group shared this site with us so I came
    to check it out. I’m definitely loving the information. I’m bookmarking and will be tweeting this to my followers!
    Fantastic blog and fantastic style and design.

  6. Great beat ! I would like to apprentice while you
    amend your web site, how could i subscribe for a blog site?

    The account helped me a acceptable deal. I
    had been tiny bit acquainted of this your broadcast provided bright
    clear concept

    Check out my website :: CBD gummies for sale

  7. Howdy, i read your blog occasionally and i own a similar one
    and i was just curious if you get a lot of spam remarks?
    If so how do you stop it, any plugin or anything you can suggest?
    I get so much lately it’s driving me crazy
    so any support is very much appreciated.

  8. I’m curious to find out what blog system you’re working with?
    I’m experiencing some small security issues with my latest site and I’d like
    to find something more secure. Do you have any recommendations?

    Visit my site … delta 8 thc

  9. First off I want to say great blog! I had a quick question that I’d like to ask if
    you don’t mind. I was curious to know how you center yourself and clear your thoughts prior to writing.
    I’ve had difficulty clearing my mind in getting my ideas out there.
    I truly do enjoy writing however it just seems like the first 10 to
    15 minutes are usually lost just trying to figure out how to begin. Any ideas or tips?
    Many thanks!

    Here is my web-site delta 8 thc

  10. This is very interesting, You are a very skilled blogger.
    I’ve joined your rss feed and look forward to seeking more of your fantastic
    post. Also, I’ve shared your site in my social networks!

    Look into my homepage :: best delta 8 thc

  11. Hiya! Quick question that’s totally off topic.
    Do you know how to make your site mobile friendly?
    My blog looks weird when viewing from my iphone.
    I’m trying to find a template or plugin that might be able to correct this problem.
    If you have any recommendations, please share.

    Appreciate it!

    My blog post :: CBD gummies for sale

  12. Good post. I learn something totally new and challenging on blogs I stumbleupon on a daily basis.
    It will always be exciting to read through content from other writers and
    practice a little something from their sites.

    Also visit my web site cbd

  13. Tremendous issues here. I am very satisfied to look your
    post. Thank you a lot and I’m having a look ahead to
    contact you. Will you please drop me a e-mail?

    Also visit my web blog: cbd for sale

  14. Pretty nice post. I just stumbled upon your blog and wished to say that I’ve really enjoyed
    surfing around your blog posts. In any case I’ll be subscribing to your rss feed and I hope you write again soon!

    Also visit my webpage; delta 8 carts

  15. When I originally left a comment I seem to have clicked the -Notify me when new comments are added- checkbox and
    now each time a comment is added I receive four emails
    with the same comment. There has to be an easy
    method you are able to remove me from that service?
    Cheers!

  16. Thanks , I have recently been looking for information about
    this topic for a while and yours is the best I have found out
    so far. However, what in regards to the bottom line?
    Are you certain about the source?

    Here is my website delta 8 thc vape

Leave a Reply

Your email address will not be published.