Experiment with changes in an R project

Inspired by the Ruby gem scientist - but instead of targeted at web apps, this project targets researchers/etc. that want to compare changes in their code in a rigorous fashion.

How is this different from X tool?

  • git/version control: scientist does not play any part in managing or dealing versions of your project. use git for that.
  • unit tests: tests are meant for making sure changes you make to your code don’t change outputs. scientist is a sort of opposite to unit tests in that it aims to tell you how changes in your code do change the outputs.
  • benchmarking/profiling: scientist isn’t purely a tool for comparing how fast your code runs, but it does include comparison of run times as a tool for decision making about what version to use

Use cases (click to expand)

Code block

You have some code. You want to make a change to the code, and you have a few different ideas about what you’d like to do. For example, you want to pre-allocate the size of the data.frame to see if that saves time.

scientist can help you sort out changes by comparing how long each version takes, and visually diff results.

Using scientist you can compare these two functions like:

a <- Experiment$new(name = "compare_code")
a$control(v1 = {
  out <- data.frame(letter = NA_character_, LETTER = NA_character_,
    both = NA_character_, stringsAsFactors = FALSE)
  for (i in 1:26)
    out[i,] <- c(letters[i], LETTERS[i], paste0(letters[i], LETTERS[i]))
  out
})
a$candidate(v2 = {
  out <- data.frame(letter = rep(NA_character_, times = 26),
    LETTER = rep(NA_character_, times = 26),
    both = rep(NA_character_, times = 26),
    stringsAsFactors = FALSE)
  for (i in 1:26)
    out[i,] <- c(letters[i], LETTERS[i], paste0(sample(letters, 1), LETTERS[i]))
  out
})
a

Then we can run the “experiment”

a$run()

Then compare results

a$diff()

diff_plot

Scripts

You have an R script, let’s call it code.R. Just as above with the code example, you want to make a change to the script. Instead of using code blocks as input as above, you can use file names. (NOTE: file names not supported yet, see issue #7)

Using scientist you can compare these two scripts with:

b <- Experiment$new(name = "compare_scripts")
b$control(file = "code.R")
b$candidate(file = "code_new.R")

note: above code doesn’t work yet

Packages

You have a package, let’s call it foobar. You want to change a function in foobar called stuff(). You make a new version of that function called stuff_new(). (NOTE: functions not supported yet per se, see issue #8; although you can call functions just like code blocks)

Using scientist you can compare these two functions with:

res <- Experiment$new(name = "compare_stuff")
res$control(foobar::stuff(x = 5))
res$candidate(foobar::stuff_new(x = 5))

note: above is pseudocode, as foobar is not a real package; though you can try functions from a real package

Install

remotes::install_github("sckott/scientist")

Usage

Initialize an experiment

res <- Experiment$new(name = "jane")

Set your control code block

res$control({
  x = 5
  x^2
})

Set your candidate code block. You can have 1 or more candidates, which are compared against the control.

res$candidate({
  y = 5
  y^3
})

Now you can see some control and candidate details

res
#> <Experiment> jane
#>  error on mismatch?: FALSE
#>  waiting?: TRUE
#>  progress?: FALSE
#>   control: <unnamed>
#>   candidate: <unnamed>

Run the experiment

res$run()

Get the results

res$control_result
#> [[1]]
#> [1] 25
res$candidate_results
#> [[1]]
#> [1] 125

Get all results plus timing data

res$result()
#> $name
#> [1] "jane"
#> 
#> $control
#> $control$result
#> $control$result[[1]]
#> [1] 25
#> 
#> 
#> $control$time
#> $control$time$start
#> [1] "2020-09-18 01:05:12 GMT"
#> 
#> $control$time$end
#> [1] "2020-09-18 01:05:13 GMT"
#> 
#> $control$time$duration
#> [1] 0.680455
#> 
#> 
#> 
#> $candidates
#> $candidates[[1]]
#> $candidates[[1]]$result
#> [1] 125
#> 
#> $candidates[[1]]$time
#> $candidates[[1]]$time$start
#> [1] "2020-09-18 01:05:12 GMT"
#> 
#> $candidates[[1]]$time$end
#> [1] "2020-09-18 01:05:13 GMT"
#> 
#> $candidates[[1]]$time$duration
#> [1] 0.7593031
#> 
#> 
#> $candidates[[1]]$name
#> [1] NA
#> 
#> 
#> 
#> $comparison
#> [1] FALSE

Publish results - opens a page in your default browser

res$publish()

img

Meta