jquery

gist JS

Tuesday, February 16, 2016

Ruby on Rails on Docker on Amazon ECS w/ Terraform & DataDog

Buzzword buzzword buzzword.

But no seriously buzzword. This post is a followup to https://segment.com/blog/rebuilding-our-infrastructure/ which is a great post about how a real company is doing something awesome.

My real company (ezCater) is interested in doing awesome things too, but since a month ago I didn't really know how to plug a docker into a cloud I thought I'd start off simple with my smaller project whatsize.is ("So the gift always fits"  catchy right?)



As I said the Segment post is great, but it did leave a bit to the imagination and I'm a real literal fellow. So here comes the gritty details.


Dockerfile

Nothing too too amazing. But you can see we're also installing Traceview. And then, maybe not too gracefully working around Docker's inability to run > 1 thing by running start.sh and running 2 things in there. You would probably want to do supervisord if you were being smarter.


Terraform

Do you like the AWS UI?  Do you get a warm fuzzy feeling using a UI to click around and reconfigure your VPCs & ELBs?  I sure don't.  Never do that again. Terraform fixes this whole problem and makes me feel warm and fuzzy that I'm not going to end up with some bespoke AWS setup that I can no longer comprehend.

Check out rails-docker-ecs-datadog-traceview-terraform for an AWS setup that gets you:
  • ELB
  • ECS
  • VPC
  • Roles & Policies
  • DataDog agent running in a container on each of your instances
  • Traceview running next to your rails apps.
Do NOT blindly put ^ in production. I am bad at security and bad at AWS.  The above functions afaict. But I punched security & common decency in the nose a couple times.  


You don't realize how blind most devs are flying until you change companies and no longer have your opentsdb stack, your customer monitoring and alerting solution, your lisp-ish TSDB explorer, your custom Stats-On-The-TV service. Impressively DataDog seems to solve most all of this. You want instance metrics? Docker metrics? ELB metrics? RDS metrics? Custom metrics? Super customizable alerting? Seriously afaict it's game over and the Dawg won. You get a ridiculous amount for free out of the box. And with the setup above you have a statsd sink listening on udp:8125 that your app can kick any metric it wants to and instantly have it available in the Dawg.


Trace view is a lot like NewRelic, but has the advantage of being able to differentiate your ass from your elbow in the UI.  Seriously, look at this. 

Every single request you get a little blue mark.  Great you say, but get this! Drag a rectangle around them and you can slice into those requests, then for each of them you get a full freaking overview of just what happened in that request. Even the SQL. So freaking good.

Also fwiw if it sounds too good to be true or maybe it only works for my piddly little app I can verify that this also works at scale, with microservices.








That's all I've got. ECS is pretty awesome so far & something I feel decent taking a bet on, but it's definitely still young. The big todos left in this stack imho are:

  1. How do I run rake tasks from cron?
  2. Where do I get a rails console in prod?
  3. A deploy script to twiddle my ECS task versions.

Interested in making some of this real life? Come work with me! ezCater Jobs.


Tuesday, July 07, 2015

Dealing with Git Merge Revisions

Zen and the Art of Git Chainsaw Maintenance

Git is pretty awesome and mind expanding, but I don’t think that anything is quite a mind-blowing as the first time you learn how to revert a merge. There’s a great explanation here: http://stackoverflow.com/questions/1078146/re-doing-a-reverted-merge-in-git

Basically, if you’ve merged a feature in by mistake, you can simply revert the merge to get back to a happy state. The mind bending comes when you decide that the feature is ready to be merged for real.  While intuition might tell you to just git merge feature again, what you really want to do is “revert the revert”.

It makes sense, and it’s awesome, and it’s righteously scary in that wonderful way that only git can be.

At PatientsLikeMe we were familiar with this, but we hadn’t had a chance to do it in practice until last week. A branch was merged in prematurely and we successfully reverted.

The new wrinkle that I faced today was that the branch that was merged in and since reverted, is actually a long running branch and development continues on it. I wanted to merge master into this long running branch to stay up to date, but when I did that, I noticed pretty quickly that something was awry. 27 conflicts and a bucket of fail. Worse, my mergetool was consistently picking the wrong side of the merge.


Let’s go to the simulated instant replay, sponsored by GitX.
 

This is the state of the repo after we’re reverted our accidental merge.

Here is the repo with more work on both the feature and the master branch.



So, I’d like to merge master into the feature branch, but that leads to a sorts of erroneous conflicts. Why? Well, because git merge master asks git to merge the revert commit into my feature branch, that’s why. The revert commit essentially contains diffs that say to remove all the early work on feature branch. 

It makes sense to apply this to master, but it is a terrible commit to apply to the feature branch. It’s a bit like time travelling and accidentally killing your parents.


So what do we do? We know that we need to ‘revert the revert’ at some point. It turns out that there’s no reason not to do that immediately; we just need a new ‘shim’ branch. We’ll call it master_w_revert_reverted.

git checkout master
git checkout -b master_w_revert_reverted
git revert SHA_OF_THE_MERGE_REVERSION
git commit


Now let's merge this shim into our feature branch.

git co feature
git merge master_w_revert_reverted


Voila! we’ve successfully merged our master branch into the long running feature and we’ve taken care of the ‘revert the revert’ going forward.



The next time we want to merge from master to the long running feature branch, or indeed from when we’re ready to merge the feature into master, we just merge as we would normally. Our days of reversion are over.

Until next time, happy merging!



Like this post? Take a look at my side project, ForceRank.it. Help my group make a decision

Monday, December 22, 2014

What every CEO need to know about Distributed Version Control Systems.

Dear Executive Suite,

You may think that you don't need to know what "distributed version control systems" (DVCS) means... but you would be wrong to do so.

Human collaboration is the most critical product of a business whether it be high strategic imperatives or intricacies of interpersonal negotiating and tactics. Surprisingly, the nuts and bolts logistics of how we collaborate, whether it be video chat, meetings, Word Doc, Email threads or Google Doc, play massive roles in shaping the quality and speed of how our teams collaborate.

These same "communication logistics" are also core to the way software engineers work and the good news is that software engineers have been working on improving these systems for decades. Many of the basic concepts of collaboration have already been adopted by business at large, but the biggest, most important shift we've experienced is just at the point where it will start to cross the chasm from tech to business at large. The seismic shift that Git an GitHub have had on the logistics of software engineering & Open Source development in the past 5 years presages a dramatic revolution in the way successful companies are going to operate in the next decade. By understanding the revolution that has recently occurred in DVCS you will be able to get a glimpse of how your business should be operating in 2025.

The essential insight is that by relinquishing a bit of control over process and focussing on the atomization of good ideas, we're starting to move away from 'the cathedral' and fulfill the promise of 'the bazaar'. This essay will seek to explain the principles of DVCS and give non-technical readers a sense of how this has revitalized open source development and what that means for the future of business.

The Evolution of Collaboration: In the Beginning


In the beginning, there was no version control. This is probably the system that you grew up on as well. Files were either on your local machine or on the server. If they were local, you needed to send them out as attachments when you wanted others to see them. If you sent it to two people and they each edited it and send it back you were left to reconcile the changes yourself, and send out the revised edition. Alternatively the document may have be on a server in which case everyone can edit the same file, but woe be to those who tried to edit it at the same time, for it was easy to wipe out someone's changes. Worse, there was no real way to go back. There was no long term undo. If Mary & John open the file at the same time, Mary writes 3 pages and John fixes a typo, but Mary saves before John, then John wipes out everything Mary's done.

The "solution" here was to create "read-only" documents and or "lock" documents while they were being edited. While this works, it took all parallelism out of your process, reducing efficiency and limiting your ability to scale.

The business analog to this is when you had a single author of your new RFP or 2015 Strategy Document. Sure, you could incorporate small feedback, or even perform massive re-writes of sections, but the cost of doing this was high. Eventually inertia wins and everyone just wants the damn thing to be finished, resulting in a subpar product and a stifling of great ideas, since a late-breaking great idea will require enormous logistic effort.

And then there was light.


For our purposes, CVS can be thought of as the first version control system. CVS is a pretty straightforward system where the server is made to save a snapshot of each file every time it is saved. It is easy to go back in time with this system as every version of every file is saved forever. It is also simple to compare what changed between version 1.9 and 1.14 or in a file between two dates.

For many years this served software developers well. We could all get the canonical version of files from one place and we could easily bring our files up to date. We could then work away locally and once finished we could commit things back to the central repository. When workflows happen in parallel it will always be possible to conflict with other's work, but these conflicts were now readily apparent and nice interfaces grew up around CVS to allow us to see the differences and perform a merge in an easy and sane way. If your company is using Google Docs or Sharepoint, this is about where you are now. It seems pretty great doesn't it?  Anyone who has collaborated on a Google Doc can attest to the almost infinite superiority of the collaboration in comparison to "the-bad-old-way".

But not everyone was happy.

In fact, one of the more esteemed members of our community, Linus Torvalds (creator of Linux) simply hated version control of the sorts we've discussed so far and refused to use them. The reliance on a monolithic centralized repository was 'stupid (and ugly)' to paraphrase his thoughts. Most of us mere mortals were still overjoyed not to be overwriting each other's changes accidentally and delighted that there was ONE central place to get our information. So what were we missing? Why did Linus Torvald hate source control?

Branching


Branching is what happens when a user wants to pursue an idea that is not directly reconcilable with the mainstream. As an example, let's pretend that you're tasked with doing a comprehensive, interconnected rewrite of all the companies procedures. It is going to be long and difficult process. It will take weeks, and will be in a half baked and not ready for prime-time state for much of that time. That means we can't just commit it to the central repository, because then the unbaked efforts would be all mixed in to the current procedures. But unfortunately that was the only way we had to collaborate with others...

This is a great time for a branch. A branch is basically an extra sandbox for working on a side project, which we hope will be eventually merged back into the trunk, or master of our system.

Branching was possible in CVS, but it wasn't simple (and indeed is not a feature of SharePoint or Google Docs). Because it was possible however, many of us didn't realize just what we were missing when Linus and friends came down from the mountain top claiming that they had made branching a million times better. It sounded ok, but not revolutionary. But we were wrong.

Easy branching frees your mind to think creatively, because there's far less pain to taking a thought and expounding on it in a new branch. This happens in two forms, local branches and distributed branches and this is where things get interesting.

Local branches


I have a great idea in the middle of the night to change our marketing and communications theme. Awake at 3 in the morning, I create a local branch and get the thoughts out, boldly cutting out projects, rewording the slogan and changing the branding.  The next day I tweak it some more. On the third day I come to my senses and realize that it's terrible and I'm able to delete the branch without anyone knowing about my wild idea. Reputation saved. Creativity fostered.

Distributed branches


Alternatively, on day three my wild idea might seem like it's still got legs, but I'd like to get feedback on it. I can easily push the local branch out to colleagues who I'd like to take a look. This branch has now become a distributed project, de-centralized from the main line of day to day operations. If a co worker likes what they see, but wants to take things in another direction, they can easily branch this branch. In fact there is NO center. That 'main central branch'  I told you about? That's simply a convention. No branch is inherently better than any other. Readers of 'The Cathedral and the Bazaar' will recognize that we are firmly in bazaar territory now.

Trust > Authority


Without a distributed system, the normal way of controlling the system is through something we call Access Control Lists. These are pretty straightforward tools which stores a list of permissions for every user eg "Bob can edit Sales Documents". While they're conceptually simple, it may be fair to say that they're the single underlying failure behind innumerable business inefficiencies and Open Source to-date. Access control is a crude instrument. A user can change something or they can't. And therein lies the trouble. What if an employee spots a typo on the HR docs? Should they have write access to the centralized system? Certainly not. Should they be forced to find the appropriate change request form and file it with their manager? Certainly not. If they can fix it, they should be able to go ahead and do that right then. An access control list simply can't be fine grained enough to allow spelling fixes, but not policy changes. Distributed systems fix this problem dead. I find an error and change it. I don't have access, so I simply request a 'pull' to the maintainer, who can easily see that this is a simple change and merge it in.

This is a small example, but it underlies everything. When we rely on trust (with transparency) we gain huge efficiencies over simple coarse authority based systems.

The best and brightest: Management of a Distributed System


This may sound like chaos. Amazingly it isn't. While it would seem that without a central arbiter many projects would fall into a chaos of conflicting ideas, in fact, the nature of the process seems to have led to fewer fractures in the software community, not more. I would argue that this is because distributed systems make the process more transparent and meritocratic. If you have a good idea you can create a branch to show it to the world, which can then collaborate on it. When it is difficult to reconcile and merge ideas, creating change within an organization requires dramatic action. Think about the creation of Czar's, skunk works & splinter groups. A process that requires end-runs like these is an admission of a process that is failing to allow creativity.

The fact that ad hoc 'skunk works' are a mode of failure is not a new concept, but the central insight of the DVCS alternative is that we need not create a robust 'structure of change' or a 'continual improvement' system, but rather that we ought make everything a skunk work. With distributed systems, everyone's work is reduced to splinters, but the operating principle is that the best splinters should be picked up and merged back into the trunk.

On the Linux project, the master branch, the source of authority for all, is simply Linus Torvald's master branch. He relies on a number of lieutenants to pick up the best splinters from their subdomain. The master branch become more of an all-star team of ideas then a singular overarching concept.

Please do not make the mistake of thinking that software is somehow a special case for this sort of collaboration. It may seem crazy to think that meaningful coordination could be achieved if your corporation were to throw open the gates to this sort of bazaar-like collaboration, but consider that software engineering is the very epitome of an instance in which all parts must work seamlessly together. A miscue between sales & marketing may result in sub-optimal performance, but even a single byte of miscue between software components can easily spell a total system failure and this is the ancestral environment from which the bazaar comes.

The CEO of the future will perform much the same roll that Linus does for Linux. He is the final quality checker and reviewer of branches as they work their way into the corporate master. He relies on the community create nuggets of insight and brilliance. He trusts his lieutenants to find these gems and elevate them to his attention.


Take away: What we've learned. 


  • Lowering the barrier to experiment expands creativity. 
  • Allow experiments to see the light of day and they will accrete adherents & spur still other ideas into existence. 
  • The key to reaping benefits from the splinters of ideas the ability to merge piecemeal concepts efficiently.
  • Authority based systems are poor at integrating insights from the Bazaar. 
  • Trust based relationships, openness, and transparency are good frameworks for merging concepts and insights.
  • Management of distributed systems transformed from an authoritarian role to the "collector & curator" of the organizations best ideas.


Checklist for success today:


Easily access and edit information:

Can an curious manager pull P&L statements on a Saturday afternoon in order to better understand their departments role in the overall business? Business secrets keep competitors and employees in the same dark. Erecting barriers to information takes effort and has no chance of producing value.

Public experimentation:

Can an employee easily distribute a proposal to the rest of the company? Can experiments effectively snowball or do they require the innovator to push the ball all the way up the hill? If your sales manager in Topeka has an idea in the middle of the night for an advertisement / promotion, can they add this quickly to the list of concepts in the marketing department?

Good merging:

What are the barriers between an employee with a good idea and a change in the process? Do managers regularly integrate the work and experimentations of subordinates and can subordinated collaborate together without supervision?


What to read next:  Counting Votes is Hard


Thursday, September 11, 2014

CORS for Rails / Heroku & Cloudfront (for dummies)

So starting today all your fonts don't load. Because CORS.

No 'Access-Control-Allow-Origin' header is present on the requested resource.

Here's what to do.

1) Remember your CloudFront password.


2) Edit your behavior

 

3) Forwarding headers -> whitelist



4) Add "Origin"

5) Add rack-cors to Gemfile

gem 'rack-cors'


6) Add stuff to config.ru

require 'rack/cors'
use Rack::Cors do

  # allow all origins in development
  allow do
    origins '*'
    resource '*',
        :headers => :any,
        :methods => [:get, :post, :delete, :put, :options]
  end
end

7) Deploy


8) Try to forget about CORS


Did that work? Why not try my tool ForceRank.it, a prioritization tool for product teams that lets you rank choices. It even lets you import a Trello board!

Saturday, August 31, 2013

The Statistics of Monopoly with Respect to Cornish Game Hen Provisioning: Part 2 "Probability is a bitch"

In part one we figured out the average likelihood of a guest ending up on any particular square. So what's the problem with that?

The problem can be summed up in one, easy to remember, phrase: "YOLO".

YOLO

So we did 100,000 simulations, that seems like it should be enough right? Maybe we should do one million to be more accurate? Nope, that's not the problem. The problem is that we're not throwing one million dinner parties. Or even 100,000. We're only throwing one dinner party. And frankly, anything could happen.

Just because the expected value over the long run says that we'll need 5.44x the cornish game hens, this doesn't mean that the actually dinner party won't have 30 guests just haphazardly roll 12's on their first roll, throwing our expectations into turmoil.

So what is a Culinary Experience creator to do?

It turns out that Monte Carlo works really well here too. Since we recorded all 100,000 simulations, we can ask the question "How many game hens do I need to buy in order to have enough in 95% of simulations." Obviously we can change the percentage we use here too. The average, is actually just saying, "How many game hens do I need to buy in order to have enough in 50% of simulations." Which is pretty much like saying "How can I run out of game hens HALF OF THE TIME!"

Gimme Code:

Get an array of the pretty names for the squares.


SQUARES is something like:
What we really want is to put all the Baltic Avenues together. Put all the B&O Railroads together. You know, kinda 'zip' each of these arrays together.


Now zipped is:


Then we process the results:


And:

These results have the average in column one. The 95th percentile in column two and the max observed in column three. So what's the result? Well, say we run 4 moves. The average on chance was 5.44x. But if we want to provision enough food with 95% certainty that there will be enough, we're going to need 9x. And out of 100,000 simulations, one simulation had 15x the number of cornish game hens on Chance. That sure doesn't make it easy to plan the menu.


PS

But what if we want to play by the monopoly rules? Well, then we just change our move function and run things again. This time you can see the super high prevalence of Jail and a bit of a secondary bump ~7 squares after Jail.




Friday, August 30, 2013

The Statistics of Monopoly with Respect to Cornish Game Hen Provisioning


Let's pretend that you need to throw a once in a lifetime culinary spectacle in Panama. If you're @ashinyknife, this will be no problem.


Let's pretend you decide upon a monopoly theme. Generally, N guests start out on go, roll dice and end up on a monopoly square.

Let's pretend that each square has a wholy different gastronomic creation on it.

Given the above, how many cornish game hens should we expect to buy for St Charles place? How much caviar will we need to supply the B&O railroad?

These are the important questions that we will set out to answer today.


Probability

Our first approach might look something like this: http://statistics.about.com/od/ProbHelpandTutorials/a/Probability-And-Monopoly.htm

Basic probability, round 1 is reasonable. Round 2 makes sense.. oh gawd round 3 starts to get hard to keep track of.

Monte Carlo

So what should we do? It seems to me that the appropriate technique to use here is Monte Carlo simulation. What is Monte Carlo? Honestly Monte Carlo should be pretty attractive to those of us for whom probability 101 was a long time ago. Basically "Monte Carlo simulation" means "let's just see what really happens". Say I ask you to figure out the probability that when flipping a coin 100 times I get at least one run of 10 heads. You've got two choices:

1) Figure out the appropriate math.
2) Flip a coin 100 times. Figure out if you get 10 heads in a row. Do this 1 million times and calculate the percentage of times when it was true.

Option 2 is monte carlo.


Time for computers

This is really pretty easy to code up. Create a two-dimension array. Dimension one will keep track of each simulation. Dimension two will track each of the 40 Monopoly squares.

For each simulation, for each user in the simulation, for each of the moves, move them around the board.

To move them around the board we just roll two dice, and move us along.


Finally it's just a matter of averaging up the values for each square in our simulation and voila



Summary

So now the big question: Did we answer our original question? Do we know how much food to buy?
Say we're planning on serving 4 courses. Do we feel figuring out how many hens we would need for an even distribution, then buying 5.44x the cornish game hens for 'Chance' and 3.8x the caviar for the B&O railroad?

What do you think?

See my answer in The Statistics of Monopoly with Respect to Cornish Game Hen Provisioning: Part 2 "Probability is a bitch"



Friday, August 16, 2013

hbase scan: batch vs cache

Here's today's contribution to the Internet: tl;dr When it comes to HBase scanner settings, you want caching, not batchsize. Maybe this is totally clear to everyone else. But for those of us who are 'newer to hbase' I can never quite remember what I'm doing.
Say you've got this code:
Scan s = new Scan(startKey);s.setCaching(foo);s.setBatch(bar);ResultScanner scanner = new ResultScanner(s);for (final Result r : scanner) {  //stuff}
But you're clever and you don't want to do RPC calls to HBase for every row. You might even say you'd like to 'batch' the results from your scanner. 

So you read http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html public void setBatch(int batch)
Set the maximum number of values to return for each call to next()
public void setCaching(int caching)
Set the number of rows for caching that will be passed to scanners. If not set, the default setting from HTable.getScannerCaching() will apply. Higher caching values will enable faster scanners but will use more memory.
Annnd.... not sure. I mean, I only want one Result every time I call next()in my iterator, right? What would  a number >1 even mean? 

And I'm sure I shouldn't set 'caching' that sounds like it will 'cache' something. I want to read the real stuff.

But you do want caching. Caching is how many things come back in a batch from your scanner. 

Ok. Fine. Caching got named poorly.  What is batch?

Batch is in case you have super wide rows. Say you have 250 columns. Batch of 100 would give your iterator:
  • Iteration 1: Result id 0. Columns 0-99
  • Iteration 2: Result id 0. Columns 100-199
  • Iteration 3: Result id 0. Columns 200-249
  • Iteration 4: Result id 1. Columns 0-99
  • Iteration 5: Result id 1. Columns 100-199
Or at least that's what http://twitter.com/monkeyatlarge told me.

Wednesday, June 12, 2013

Github gists on blogger




I've been using https://github.com/moski/gist-Blogger/ to display gists in blogger like:



The only problem with this was that I was including a link to the raw github which was getting a mime type of text/plain, which caused some browsers to not load the JS. The solution is to use github pages apparently, but that a small pita to setup, so I hereby share the results of my toils. 


Step 1: Create your gist

Step 2: Add a div to your blog post

<div class="gistLoad" data-id="5561359" id="gist-5561359"> </div>

Step 3: Add this script to your blog post
<script src="http://jdwyah.github.io/gist-Blogger/javascript/gistLoader.js" type="text/javascript"></script>


Step 4: Profit