Function to find or remove errors in BMI data

cleanData(data, purge = FALSE, msgs = FALSE)

Arguments

data

A data frame with BMI data (see details)

purge

If true, a data frame will be returned with problematic rows removed, see details.

msgs

logical, if FALSE a purged or non-purged data frame, if TRUE a two-element list with the data frame and concated list of messages, see the return value

Value

If msgs = FALSE (default), a data frame is returned that is either the same as the input if all checks have passed or a purged (purge = TRUE) or non-purged purge = FALSE) dataset with additional columns for FinalID and LifeStageCode. If msgs = TRUE, a two-element list is returned, where the first element data is the data frame that would be returned if msgs = FALSE and the second element is msg with a concatenated character string of messages indicating if all checks have passed and if not, which issues were encountered. In the latter case, row numbers in the messages indicate which observations in the input data had issues.

Details

This functions checks for several types of common errors: incorrect case in FinalID names, FinalIDs that are missing from the internal database, FinalIDs with inappropriate life stage codes (e.g., non-insects with a LifeStageCode other than 'X').

This functions requires that the dataframe contains at least two columns: FinalID and LifeStageCode.

The default value purge = FALSE will not remove rows where the FinalIDs are incorrect, otherwise they are removed. In the former example, a new column problemFinalID is added as a T/F vector indicating which rows are incorrect. For both purge = FALSE and purge = TRUE, rows with correct FinalID values are also checked for correct life stage codes in the LifeStageCode column. Values are replaced with default values in a lookup table provided with the package if they are incorrect. A new column fixedLifeStageCode is added as a T/F vector indicating which rows were fixed for an incorrect life stage code.

Examples

# load bug, station data data(bugs_stations)
# NOT RUN { # function returns input data cleanData(bugs_stations[[1]]) # same as above but retrieve msgs cleanData(bugs_stations[[1]], msgs = TRUE) # create some wrong FinalID values in bug data wrongdata <- bugs_stations[[1]] wrongdata$FinalID <- as.character(wrongdata$FinalID) wrongdata$FinalID[c(1, 15, 30)] <- c('idwrong1', 'idwrong2', 'idwrong3') # default, purge nothing # new columns fixedLifeStageCode, ProblemFinalID with T/F for wrong/right cleanData(wrongdata) # purge # removes from output cleanData(wrongdata, purge = TRUE) # create some wrong lifestagecodes, only applies if purge is T wrongdata$LifeStageCode <- as.character(wrongdata$LifeStageCode) wrongdata$LifeStageCode[c(2, 16, 31)] <- c('lscwrong1', 'lscwrong2', 'lscwrong3') # no purge cleanData(wrongdata) #compare with purge cleanData(wrongdata, purge = TRUE) # with messages cleanData(wrongdata, purge = TRUE, msgs = TRUE) # }