Include custom validation functions
Source:vignettes/articles/custom-functions.Rmd
custom-functions.Rmd
Custom validation functions can be included and configured within
standard hubValidation
workflows by including a
validations.yml
file in the hub-config
directory. Alternatively, an appropriately structured file can
be included at a different location and the path to the file provided
through argument validations_cfg_path
.
hubValidations
uses the config
package to get validation configuration. This allows for configuration
inheritance and the ability to include executable R code. See the
confog
package vignette on inheritance
and R expressions for more details.
validations.yml
structure
validations.yml
files should follow the nested structure
described below:
Default configuration
The top of any validations.yml
file, under the required
default:
top level property, should contain default custom
validation configurations that will be executed regardless of round
ID.
Within the default configuration, individual checks can be configured
for each of the 3 validation functions run as part of
validate_submission()
, using the following structure for
each validation function:
-
<name-of-caller-function>
: One ofvalidate_model_data
,validate_model_metadata
andvalidate_model_file
depending on the function the custom check is to be included in.-
<name-of-check>
: The name of the check. This is the name of the element containing the result of the check whenhub_validations
is returned (required).-
fn
: The name of the check function to be run, as character string (required). -
pkg
: The name of the package namespace from which to get check function. Must be supplied if function is distributed as part of a package. -
source:
Path to.R
script containing function code to be sourced. If relative, should be relative to the hub’s directory root. Must be supplied if function is not part of a package and only exists as a script. -
args
: A yaml dictionary of key/value pairs or arguments to be passed to the custom function. Values can be yaml lists or even executable R code (optional).
-
-
Note that each of the validate_*()
functions contain a
standard objects in their call environment which are passed
automatically to any custom check function and therefore do not need
including in the args
configuration.
-
validate_model_file
:-
file_path
: character string of path to file being validated relative to themodel-output
directory. -
hub_path
: character string of path to hub. -
round_id
: character string ofround_id
-
file_meta
: named list containinground_id
,team_abbr
,model_abbr
andmodel_id
details.
-
-
validate_model_data
:-
tbl
: a tibble of the model output data being validated. -
file_path
: character string of path to file being validated relative to themodel-output
directory. -
hub_path
: character string of path to hub. -
round_id
: character string ofround_id
-
file_meta
: named list containinground_id
,team_abbr
,model_abbr
andmodel_id
details. -
round_id_col
: character string of name oftbl
column containinground_id
information.
-
-
validate_model_metadata
:-
file_path
: character string of path to file being validated relative to themodel-output
directory. -
hub_path
: character string of path to hub. -
round_id
: character string ofround_id
-
file_meta
: named list containinground_id
,team_abbr
,model_abbr
andmodel_id
details.
-
The args
configuration can be used to override objects
from the caller environment as well as defaults.
Here’s an example configuration for a single check
(opt_check_tbl_horizon_timediff()
) to be run as part of the
validate_model_data()
validation function which checks the
content of the model data submission files.
default:
validate_model_data:
horizon_timediff:
fn: "opt_check_tbl_horizon_timediff"
pkg: "hubValidations"
args:
t0_colname: "forecast_date"
t1_colname: "target_end_date"
The above configuration file relies on default values for arguments
horizon_colname
("horizon"
) and
timediff
(lubridate::weeks()
). We can use the
validations.yml
args
list to override the
default values. Here’s an example that includes executable r
code as the value of an argument.
default:
validate_model_data:
horizon_timediff:
fn: "opt_check_tbl_horizon_timediff"
pkg: "hubValidations"
args:
t0_colname: "forecast_date"
t1_colname: "target_end_date"
horizon_colname: "horizons"
timediff: !expr lubridate::weeks(2)
Round specific configuration
Additional round specific configurations can be included in
validations.yml
that can add to or override default
configurations.
For example, in the following validations.yml
which
deploys the opt_check_tbl_col_timediff()
optional check, if
the file being validated is being submitted to a round with round ID
"2023-08-15"
, default col_timediff
check
configuration will be overridden by the 2023-08-15
configuration.
default:
validate_model_data:
col_timediff:
fn: "opt_check_tbl_col_timediff"
pkg: "hubValidations"
args:
t0_colname: "forecast_date"
t1_colname: "target_end_date"
2023-08-15:
validate_model_data:
col_timediff:
fn: "opt_check_tbl_col_timediff"
pkg: "hubValidations"
args:
t0_colname: "forecast_date"
t1_colname: "target_end_date"
timediff: !expr lubridate::weeks(1)
Available optional functions
hubValidations
includes a number of optional checks or
checks that require administrator configuration to be run, detailed
below.
For more detail on each function and its configuration parameters, consult the function documentation.
For deploying through validate_model_data
check fun | Check | Early return | Fail output | Extra info |
---|---|---|---|---|
opt_check_tbl_col_timediff | Time difference between values in two date columns equal a defined period. | FALSE | check_failure | |
opt_check_tbl_counts_lt_popn | Predicted values per location are less than total location population. | FALSE | check_failure | |
opt_check_tbl_horizon_timediff | Time difference between values in two date columns equals a defined time period defined by values in a horizon column. | FALSE | check_failure |
For deploying through validate_model_metadata
check fun | Check | Early return | Fail output | Extra info |
---|---|---|---|---|
opt_check_metadata_team_max_model_n | The number of metadata files submitted by a single team does not exceed the maximum number allowed. | FALSE | check_failure |