gist JS

Saturday, May 11, 2013

Postgres Common Table Expression Super Example (with a hot little window function action too)

Common table expressions are a PostgreSQL user's best friend. Let me show you them.

The Example Problem:

Let's take a look at how to build SymptomsLikeMe. This is something that would look at a bunch of health reports and figure out who has symptoms that are most like mine.

The data model is pretty simple. A user has many symptom surveys. Each symptom survey has a symptom_id (ie pain, nausea, fatigue) and a severity (1,2,3,4).

So, here's the challenge in more details. Given all of a user's latest symptom surveys (1 for fatigue, 4 for pain), compare that to all other latest symptom surveys and produce a similarity score (least square). Return the top 10 closest symptom reports.

So what needs to happen?

  1. Get all of the latests symptom surveys
  2. Get just this users's latest symptom survey
  3. Compare that to all other latest symptom surveys and produce a similarity score (using least squares).
  4. Return the top 10 closest symptom reports.

How to do it?

Well, that's a lot to do, right? Let's look at a couple approaches.
  • In Ruby? Wow. There are ~= 1,788,141 symptom reports eh? So we serialize them all into ruby and...  no.
  • We'll denormalize it!   Urm... I'm not really sure what that would mean. We explode the combinatorial space of all symptoms to symptoms....  ouch.
  • ??????


Let's forget trying to do the whole problem in one massive statement and just do the four steps we listed out above. 

Step 1 "Get all of the latest symptom surveys"

huh? Why not use a not exist well, because not exists have a bad habit of returning more than one row when things starts colliding and we have to tie break collisions or suffer the consequences of duplicate counts. rank = 1 guarantees us that we'll only get 1 result, no matter how many dupes there are.

Step 2 "Given this user's latest symptom survey"

easy peasy.

Step 3 "compare that to all other latest symptom surveys and produce a similarity score"

So to compute the similarity, we need to compare our user, vs each user. If only there were a way to JOIN our user's info onto each row of a another user's data... oo a join!

Ok, so that's the join and then we just want the difference of the two columns 'minus' and we square that (for least squares) and um, sum(). 

Easy peasy!!

Step 4 "Return the top 10 closest symptom reports."

All together now

But then we need all 4 pieces of this to work together and... omg we are done.


  • CommonTableExpressions work to decompose SQL problems into manageable hunks.
  • Read Steps 1-4 backwards and it's just a bunch of unfunded mandates that you need to fill in.
  • I really didn't tell you how window functions work. Sorry. They're great. But they need another blog post to really explain. See http://www.postgresql.org/docs/9.1/static/tutorial-window.html which is really pretty good.


Anonymous said...

Is there a code example implementing the prose?

Anonymous said...

Jeff, your gists are loading... (at least in Chrome)

Here are the gist code examples that are in the post in order of appearance:


Jeff Dwyer said...

hmph. Thanks for letting me know. Working for me (no surprise) in chrome. shout if you have a chance to let me know what error you see.

mama said...

i was waitin' for this type of article and i have gained some useful information from this site. thanks for sharing this information.


Cindy Dy said...

Awesome there, Great work, keep it up. I love returning back to this site and reading the quality content you always have on offer.


nadia hncisn said...

Extraordinary writing is one in all the foremost very important skills to accumulate if you essay writing plan to make a career as a author. Having good writing skills may be a ought to. There are not any short cuts to superb writing.

nadia hncisn said...

To maximize your edges with this cash saving tip, you will discovered automatic bill procure your monthly expenses (gas and groceries as well). therefore on manufacture this work for you, though, nstallment loan Des Plaines your gain and budget got to be compelled to support paying off your credit cards monthly. the aim is to form cash at the aspect of your a compensation rewards.