Check input taxonomy and site for required information

chkinp(taxa, station, getval = FALSE)

Arguments

taxa

data.frame for input taxonomy data

station

data.frame for input station data

getval

logical to return a vector of values not satisfied by checks, useful for data prep

Value

A two element list of the original data (named taxa) and removed taxa by SampleID (named txrmv) if all checks are met. The original data also includes a new column for SampleID. An error message is returned if the datasetsdo not meet requirements or a vector of values that caused the error if getval = TRUE. Site data will include only those sites in the taxonomic data.

Details

The following are checked:

  • Required columns in taxonomy data: StationCode, SampleDate, Replicate, SampleTypeCode, BAResult, Result, FinalID

  • Taxonomic names are present in the STE reference file

  • Sites include both diatom and soft-bodied algae data (warning if not)

  • No missing abundance values for diatoms (for rarification)

  • One of CondQR50 or all predictors for the conductivity model in the station data

  • One of XerMtn or PSA6C in the station data

  • Additional required columns for the station data: StationCode, CondQR50, SITE_ELEV, TEMP_00_09, KFCT_AVE, AtmCa, PPT_00_09, MAX_ELEV

  • No missing data in additional required columns for stationdata

See also

Examples

# all checks passed, data returned with SampleID chkinp(demo_algae_tax, demo_station)
#> $taxa #> # A tibble: 213 x 8 #> SampleID StationCode SampleDate Replicate SampleTypeCode BAResult #> <chr> <chr> <dttm> <dbl> <chr> <int> #> 1 909M249… 909M24937 2016-06-22 00:00:00 1 Macroalgae NA #> 2 909M249… 909M24937 2016-06-22 00:00:00 1 Epiphyte 15 #> 3 909M249… 909M24937 2016-06-22 00:00:00 1 Epiphyte 83 #> 4 909M249… 909M24937 2016-06-22 00:00:00 1 Epiphyte 2 #> 5 909M249… 909M24937 2016-06-22 00:00:00 1 Integrated 5 #> 6 909M249… 909M24937 2016-06-22 00:00:00 1 Integrated 53 #> 7 909M249… 909M24937 2016-06-22 00:00:00 1 Integrated 4 #> 8 909M249… 909M24937 2016-06-22 00:00:00 1 Integrated 8 #> 9 909M249… 909M24937 2016-06-22 00:00:00 1 Integrated 6 #> 10 909M249… 909M24937 2016-06-22 00:00:00 1 Integrated 9 #> # … with 203 more rows, and 2 more variables: Result <dbl>, FinalID <chr> #> #> $txrmv #> # A tibble: 0 x 2 #> # … with 2 variables: SampleID <chr>, UnrecognizedTaxa <chr> #>
# errors if (FALSE) { # missing columns in taxa data tmp <- demo_algae_tax[, 1, drop = FALSE] chkinp(tmp, demo_station) chkinp(tmp, demo_station, getval = TRUE) # incorrect taxonomy tmp <- demo_algae_tax tmp[1, 'FinalID'] <- 'asdf' chkinp(tmp, demo_station) chkinp(tmp, demo_station, getval = TRUE) # missing diatom data at sites, returns only a warning tmp <- merge(demo_algae_tax, STE, all.x = T) %>% filter(Class %in% 'Bacillariophyceae') chkinp(tmp, demo_station) # missing abundance data for diatoms tmp <- demo_algae_tax tmp$BAResult <- NA chkinp(tmp, demo_station) chkinp(tmp, demo_station, getval = TRUE) # stations not shared between taxa and station tmp <- demo_station[-1, ] chkinp(demo_algae_tax, tmp) # missing both of XerMtn and PSA6C in station tmp <- demo_station[, !names(demo_station) %in% c('XerMtn', 'PSA6C')] chkinp(demo_algae_tax, tmp) # missing CondQR50 and incomplete predictor fields tmp <- demo_station[, !names(demo_station) %in% c('CondQR50', 'TMAX_WS')] chkinp(demo_algae_tax, tmp) # missing remaining station predictors tmp <- demo_station[, !names(demo_station) %in% c('AtmCa')] chkinp(demo_algae_tax, tmp) # missing data in remaining station predictors tmp <- demo_station tmp$AtmCa[2] <- NA chkinp(demo_algae_tax, tmp) }