Improved Nice, Quick, Effective Sampler in R

August 1st, 2014

I found a problem with the sampler I was using last week.  It is an uncontrolled random sampler, which presents some problems when doing choice modeling.

To deal with the issue, I created a better one.  This version is a stratified random sampler that samples from each income and auto group (which I need to have samples in all income/auto groups in order to develop and test my auto ownership model).  The sampling process works like this:

  1. If there are more than 10 samples for an income/auto group, sample using last week’s random method
  2. If there is at least 2 but not more than 10 samples, pick a random sample to be tested, put the rest into the develop group
  3. If there is 1 sample, put it in both
  4. If there are no samples in the group, print an error to the screen

The code is below:


-A-

Nice, Quick, Effective Sampler in R

July 25th, 2014

It’s always good to test models.  In the best case, we’d test models against a different dataset than what we used to develop data.  In not-the-best-but-still-not-that-bad of cases, we’d use 80% of a dataset to estimate a model, and 20% of it to test against.

In a lot of cases, someone uses a Gibbs Sampler to get that 80%.  I didn’t feel like over-complicating things, so I decided to do a simple random sampler and added in some checks to check that the sample is good.  The following worked well for me.

So to show that it worked.  Click on the pictures to see them large enough to be legible.  The pctHH is the input percentage, the PctHHS is the sample for model estimation, and PctHHT is the sample for model testing.

Screenshot 2014-07-25 11.22.18

Screenshot 2014-07-25 11.22.31

Screenshot 2014-07-25 11.22.46

Screenshot 2014-07-25 11.22.56

 

These will ultimately be part of something larger, but so far, so good.

-A-