gist JS

Saturday, May 11, 2013

Postgres Common Table Expression Super Example (with a hot little window function action too)

Common table expressions are a PostgreSQL user's best friend. Let me show you them.

The Example Problem:

Let's take a look at how to build SymptomsLikeMe. This is something that would look at a bunch of health reports and figure out who has symptoms that are most like mine.

The data model is pretty simple. A user has many symptom surveys. Each symptom survey has a symptom_id (ie pain, nausea, fatigue) and a severity (1,2,3,4).

So, here's the challenge in more details. Given all of a user's latest symptom surveys (1 for fatigue, 4 for pain), compare that to all other latest symptom surveys and produce a similarity score (least square). Return the top 10 closest symptom reports.

So what needs to happen?

  1. Get all of the latests symptom surveys
  2. Get just this users's latest symptom survey
  3. Compare that to all other latest symptom surveys and produce a similarity score (using least squares).
  4. Return the top 10 closest symptom reports.

How to do it?

Well, that's a lot to do, right? Let's look at a couple approaches.
  • In Ruby? Wow. There are ~= 1,788,141 symptom reports eh? So we serialize them all into ruby and...  no.
  • We'll denormalize it!   Urm... I'm not really sure what that would mean. We explode the combinatorial space of all symptoms to symptoms....  ouch.
  • ??????


Let's forget trying to do the whole problem in one massive statement and just do the four steps we listed out above. 

Step 1 "Get all of the latest symptom surveys"

huh? Why not use a not exist well, because not exists have a bad habit of returning more than one row when things starts colliding and we have to tie break collisions or suffer the consequences of duplicate counts. rank = 1 guarantees us that we'll only get 1 result, no matter how many dupes there are.

Step 2 "Given this user's latest symptom survey"

easy peasy.

Step 3 "compare that to all other latest symptom surveys and produce a similarity score"

So to compute the similarity, we need to compare our user, vs each user. If only there were a way to JOIN our user's info onto each row of a another user's data... oo a join!

Ok, so that's the join and then we just want the difference of the two columns 'minus' and we square that (for least squares) and um, sum(). 

Easy peasy!!

Step 4 "Return the top 10 closest symptom reports."

All together now

But then we need all 4 pieces of this to work together and... omg we are done.


  • CommonTableExpressions work to decompose SQL problems into manageable hunks.
  • Read Steps 1-4 backwards and it's just a bunch of unfunded mandates that you need to fill in.
  • I really didn't tell you how window functions work. Sorry. They're great. But they need another blog post to really explain. See http://www.postgresql.org/docs/9.1/static/tutorial-window.html which is really pretty good.


Anonymous said...

Is there a code example implementing the prose?

Anonymous said...

Jeff, your gists are loading... (at least in Chrome)

Here are the gist code examples that are in the post in order of appearance:


Jeff Dwyer said...

hmph. Thanks for letting me know. Working for me (no surprise) in chrome. shout if you have a chance to let me know what error you see.

mama said...

i was waitin' for this type of article and i have gained some useful information from this site. thanks for sharing this information.


Cindy Dy said...
This comment has been removed by a blog administrator.
nadia hncisn said...
This comment has been removed by a blog administrator.
camrondiaz said...
This comment has been removed by a blog administrator.
An Binh said...

Sử dụng và setup văn phòng làm việc bao giờ bạn cũng nên chú ý đến hướng và vật dụng trong văn phòng cho hợp phong thủy và lĩnh vực hoạt động nghành nghề của mình. Chính vì thế việc chọn hướng phong thủy bàn làm việc là điều quan trọng mà bạn nên quan tâm đến vì nó giúp nhân viên làm việc được thoải mái thuận lợi tạo nguồn cảm hứng cũng là bộ mặt công ty trước ánh mắt khách hàng.
Có hai nhóm hướng là Đông tứ trạch và Tây tứ trạch phù hợp tương ứng với người Đông tứ mệnh hay Tây tứ mệnh. Cũng không kém phần quan trọng hơn là lựa chọn trang trí các vật dụng trên bàn làm việc như cây xanh để bàn làm việc nên có để giúp bạn làm việc thoải mái hơn. Cho nên khi bạn có nhu cầu lựa chọn văn phòng làm việc, setup hoàn thành thì nên quan tâm thêm các vấn đề về hướng bàn làm việc để giúp bạn gặp nhiều may mắn trong công việc và thành công hơn!