Check input taxonomy and site for required information
chkinp(taxa, station, getval = FALSE)
taxa |
|
---|---|
station |
|
getval | logical to return a vector of values not satisfied by checks, useful for data prep |
A two element list of the original data (named taxa
) and removed taxa by SampleID
(named txrmv
)
if all checks are met. The original data also includes a new column for SampleID
. An
error message is returned if the datasetsdo not meet requirements or a vector of values that caused the error if getval = TRUE
.
Site data will include only those sites in the taxonomic data.
The following are checked:
Required columns in taxonomy data: StationCode, SampleDate, Replicate, SampleTypeCode, BAResult, Result, FinalID
Taxonomic names are present in the STE
reference file
Sites include both diatom and soft-bodied algae data (warning if not)
No missing abundance values for diatoms (for rarification)
One of CondQR50
or all predictors for the conductivity model in the station data
One of XerMtn
or PSA6C
in the station data
Additional required columns for the station data: StationCode, CondQR50, SITE_ELEV, TEMP_00_09, KFCT_AVE, AtmCa, PPT_00_09, MAX_ELEV
No missing data in additional required columns for stationdata
# all checks passed, data returned with SampleID chkinp(demo_algae_tax, demo_station)#> $taxa #> # A tibble: 213 x 8 #> SampleID StationCode SampleDate Replicate SampleTypeCode BAResult #> <chr> <chr> <dttm> <dbl> <chr> <int> #> 1 909M249… 909M24937 2016-06-22 00:00:00 1 Macroalgae NA #> 2 909M249… 909M24937 2016-06-22 00:00:00 1 Epiphyte 15 #> 3 909M249… 909M24937 2016-06-22 00:00:00 1 Epiphyte 83 #> 4 909M249… 909M24937 2016-06-22 00:00:00 1 Epiphyte 2 #> 5 909M249… 909M24937 2016-06-22 00:00:00 1 Integrated 5 #> 6 909M249… 909M24937 2016-06-22 00:00:00 1 Integrated 53 #> 7 909M249… 909M24937 2016-06-22 00:00:00 1 Integrated 4 #> 8 909M249… 909M24937 2016-06-22 00:00:00 1 Integrated 8 #> 9 909M249… 909M24937 2016-06-22 00:00:00 1 Integrated 6 #> 10 909M249… 909M24937 2016-06-22 00:00:00 1 Integrated 9 #> # … with 203 more rows, and 2 more variables: Result <dbl>, FinalID <chr> #> #> $txrmv #> # A tibble: 0 x 2 #> # … with 2 variables: SampleID <chr>, UnrecognizedTaxa <chr> #># errors if (FALSE) { # missing columns in taxa data tmp <- demo_algae_tax[, 1, drop = FALSE] chkinp(tmp, demo_station) chkinp(tmp, demo_station, getval = TRUE) # incorrect taxonomy tmp <- demo_algae_tax tmp[1, 'FinalID'] <- 'asdf' chkinp(tmp, demo_station) chkinp(tmp, demo_station, getval = TRUE) # missing diatom data at sites, returns only a warning tmp <- merge(demo_algae_tax, STE, all.x = T) %>% filter(Class %in% 'Bacillariophyceae') chkinp(tmp, demo_station) # missing abundance data for diatoms tmp <- demo_algae_tax tmp$BAResult <- NA chkinp(tmp, demo_station) chkinp(tmp, demo_station, getval = TRUE) # stations not shared between taxa and station tmp <- demo_station[-1, ] chkinp(demo_algae_tax, tmp) # missing both of XerMtn and PSA6C in station tmp <- demo_station[, !names(demo_station) %in% c('XerMtn', 'PSA6C')] chkinp(demo_algae_tax, tmp) # missing CondQR50 and incomplete predictor fields tmp <- demo_station[, !names(demo_station) %in% c('CondQR50', 'TMAX_WS')] chkinp(demo_algae_tax, tmp) # missing remaining station predictors tmp <- demo_station[, !names(demo_station) %in% c('AtmCa')] chkinp(demo_algae_tax, tmp) # missing data in remaining station predictors tmp <- demo_station tmp$AtmCa[2] <- NA chkinp(demo_algae_tax, tmp) }