cleanData.Rd
Function to find or remove errors in BMI data
cleanData(data, purge = FALSE, msgs = FALSE)
data | A data frame with BMI data (see details) |
---|---|
purge | If true, a data frame will be returned with problematic rows removed, see details. |
msgs | logical, if |
If msgs = FALSE
(default), a data frame is returned that is either the same
as the input if all checks have passed or a purged (purge = TRUE
) or non-purged
purge = FALSE
) dataset with additional columns for FinalID
and
LifeStageCode
. If msgs = TRUE
, a two-element list is returned, where
the first element data
is the data frame that would be returned if msgs = FALSE
and the second element is msg
with a concatenated character string of messages
indicating if all checks have passed and if not, which issues were encountered. In the
latter case, row numbers in the messages indicate which observations in the input data
had issues.
This functions checks for several types of common errors: incorrect case in FinalID names, FinalIDs that are missing from the internal database, FinalIDs with inappropriate life stage codes (e.g., non-insects with a LifeStageCode other than 'X').
This functions requires that the dataframe contains at least two columns:
FinalID
and LifeStageCode
.
The default value purge = FALSE
will not remove rows where the FinalIDs
are incorrect, otherwise they are removed. In the former example, a new
column problemFinalID
is added as a T/F vector indicating which
rows are incorrect. For both purge = FALSE
and purge = TRUE
,
rows with correct FinalID values are also checked for correct life stage codes
in the LifeStageCode
column. Values are replaced with default values
in a lookup table provided with the package if they are incorrect. A new
column fixedLifeStageCode
is added as a T/F vector indicating which
rows were fixed for an incorrect life stage code.
# NOT RUN { # function returns input data cleanData(bugs_stations[[1]]) # same as above but retrieve msgs cleanData(bugs_stations[[1]], msgs = TRUE) # create some wrong FinalID values in bug data wrongdata <- bugs_stations[[1]] wrongdata$FinalID <- as.character(wrongdata$FinalID) wrongdata$FinalID[c(1, 15, 30)] <- c('idwrong1', 'idwrong2', 'idwrong3') # default, purge nothing # new columns fixedLifeStageCode, ProblemFinalID with T/F for wrong/right cleanData(wrongdata) # purge # removes from output cleanData(wrongdata, purge = TRUE) # create some wrong lifestagecodes, only applies if purge is T wrongdata$LifeStageCode <- as.character(wrongdata$LifeStageCode) wrongdata$LifeStageCode[c(2, 16, 31)] <- c('lscwrong1', 'lscwrong2', 'lscwrong3') # no purge cleanData(wrongdata) #compare with purge cleanData(wrongdata, purge = TRUE) # with messages cleanData(wrongdata, purge = TRUE, msgs = TRUE) # }