cleanData.RdFunction to find or remove errors in BMI data
cleanData(data, purge = FALSE, msgs = FALSE)
| data | A data frame with BMI data (see details) |
|---|---|
| purge | If true, a data frame will be returned with problematic rows removed, see details. |
| msgs | logical, if |
If msgs = FALSE (default), a data frame is returned that is either the same
as the input if all checks have passed or a purged (purge = TRUE) or non-purged
purge = FALSE) dataset with additional columns for FinalID and
LifeStageCode. If msgs = TRUE, a two-element list is returned, where
the first element data is the data frame that would be returned if msgs = FALSE
and the second element is msg with a concatenated character string of messages
indicating if all checks have passed and if not, which issues were encountered. In the
latter case, row numbers in the messages indicate which observations in the input data
had issues.
This functions checks for several types of common errors: incorrect case in FinalID names, FinalIDs that are missing from the internal database, FinalIDs with inappropriate life stage codes (e.g., non-insects with a LifeStageCode other than 'X').
This functions requires that the dataframe contains at least two columns:
FinalID and LifeStageCode.
The default value purge = FALSE will not remove rows where the FinalIDs
are incorrect, otherwise they are removed. In the former example, a new
column problemFinalID is added as a T/F vector indicating which
rows are incorrect. For both purge = FALSE and purge = TRUE,
rows with correct FinalID values are also checked for correct life stage codes
in the LifeStageCode column. Values are replaced with default values
in a lookup table provided with the package if they are incorrect. A new
column fixedLifeStageCode is added as a T/F vector indicating which
rows were fixed for an incorrect life stage code.
# NOT RUN { # function returns input data cleanData(bugs_stations[[1]]) # same as above but retrieve msgs cleanData(bugs_stations[[1]], msgs = TRUE) # create some wrong FinalID values in bug data wrongdata <- bugs_stations[[1]] wrongdata$FinalID <- as.character(wrongdata$FinalID) wrongdata$FinalID[c(1, 15, 30)] <- c('idwrong1', 'idwrong2', 'idwrong3') # default, purge nothing # new columns fixedLifeStageCode, ProblemFinalID with T/F for wrong/right cleanData(wrongdata) # purge # removes from output cleanData(wrongdata, purge = TRUE) # create some wrong lifestagecodes, only applies if purge is T wrongdata$LifeStageCode <- as.character(wrongdata$LifeStageCode) wrongdata$LifeStageCode[c(2, 16, 31)] <- c('lscwrong1', 'lscwrong2', 'lscwrong3') # no purge cleanData(wrongdata) #compare with purge cleanData(wrongdata, purge = TRUE) # with messages cleanData(wrongdata, purge = TRUE, msgs = TRUE) # }