--- title: "Flagging aducust records using `aducust_flag`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Flagging aducust records using `aducust_flag`} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Forward `WAACHShelp::aducust_flag()` automates the flagging of this `aducust` data (custodial record(s) for a set of individuals). + Often, we would like to flag whether a record exists between two dates. + Of particular interest is the case of carer `aducust` data, where we might want to flag whether a record exists when a child is between 0 and 18, or otherwise. This vignette steps through how data should be structured to use the function, and general use. **The examples presented flag, at a child level, whether any associated carer has an aducust record when the child is between age *x* and *y***. The function can certainly be adapted to suit other applications. To do this, data sets will be simulated to mimic the required structure. ```{r setup, echo=TRUE, message=FALSE, warning=FALSE} # Load the package library(WAACHShelp) # Set seed for reproducibility set.seed(123) ``` # Simulate Data + The function requires the following data sets as inputs: + `data` --- aducust data set. + Will be at the carer level. + `dobmap` --- dobmap file. + In this instance, at the child level. + In reality, only two columns are required: an ID variable, and a DOB for the unit of interest. + `carer_map` + A mapping file which dictates how carer(s) are related to children. + Can generally be formulated using a linkage/family connections file. ## 1) Formulate `dobmap` + Variables + `rootnum` --- child ID variable + `dob` --- A randomly selected set of DOBs **of class Date**. + Note + 100 children will be simulated ```{r echo=TRUE} # Function to create unique random rootnums make_rootnum <- function(n){ replicate(n, paste0(sample(c(LETTERS, 0:9), 6, replace = TRUE), collapse = "")) } # Formulate rootnums n_children <- 100 rootnums <- make_rootnum(n_children) # dobmap: rootnum + dob dobmap <- tibble(rootnum = rootnums, dob = as.Date('2010-01-01') + sample(0:3650, n_children, replace = TRUE)) ``` Now previewing the first few rows: ```{r} head(dobmap, n = 10) %>% waachs_table() ``` ## 2) Formulate `carer_map` + Variables + `rootnum` --- child ID variable. + `carer_type` --- variable denoting "type" of carer. + Necessary if multiple carers per child. + `carer_rootnum` --- carer ID variable. + Must be called something different to the child ID variable (i.e., cannot be called `rootnum`). + Note + Any individual can have any number of carers associated with them. ```{r} carer_types <- c("carer1id", "carer2id", "NEWBMID") # For each child, randomly assign 1 to 3 carers carers_per_child <- sample(1:3, n_children, replace = TRUE) # Create carer_map rows by repeating rootnum as per carers_per_child carer_map <- tibble(rootnum = rep(rootnums, times = carers_per_child)) %>% mutate(carer_type = sample(carer_types, n(), replace = TRUE)) %>% distinct(rootnum, carer_type) %>% # Use unique random alphanumeric strings for carer_rootnum (no prefix) mutate(carer_rootnum = replicate(n(), paste0(sample(c(LETTERS, 0:9), 8, replace = TRUE), collapse = ""))) ``` Now previewing the first few rows: ```{r} head(carer_map, n = 10) %>% waachs_table() ``` ## 3) Formulate `aducust` + Variables + `carer_rootnum` --- carer ID variable. + Carer ID variable name must be equivalent between `carer_map` and `aducust`. + `ReceptionDate` --- aducust start date. + `DischargeDate` --- aducust end date. + Note + Any carer can have any number of aducust records (range 0-10). + Discharge date must be after reception date. ```{r warning=FALSE} # data: multiple aducust records per carer_rootnum with start/end dates # Initialize empty list to store records aducust_list <- vector("list", length = nrow(carer_map)) for (i in seq_len(nrow(carer_map))) { n_records <- sample(0:10, 1) if (n_records == 0) { aducust_list[[i]] <- NULL } else { rec_dates <- as.Date('2020-01-01') + sample(0:1000, n_records, replace = TRUE) dis_dates <- rec_dates + sample(1:30, n_records, replace = TRUE) aducust_list[[i]] <- tibble( rootnum = carer_map$carer_rootnum[i], # carer_rootnum as requested ReceptionDate = rec_dates, DischargeDate = dis_dates ) } rm(i, n_records, rec_dates, dis_dates) } # Combine all rows into one dataframe aducust <- bind_rows(aducust_list) %>% rename(carer_rootnum = rootnum) ``` Now previewing the first few rows: ```{r} head(aducust, n = 10) %>% waachs_table() ``` ```{r echo=FALSE} rm(aducust_list, carer_types, carers_per_child, n_children, rootnums, make_rootnum) ``` # Examples Now that we have our data, we can apply this to an example: ## Example 1: Default use Applying the function using all defaults: + Flags aducust records when child is between 0 and 18. + Does not collapse records in any way. + Resulting "many-to-many" join between `carer_map` and `aducust`. ```{r message=TRUE} eg1 <- aducust_flag(data = aducust, dobmap = dobmap, carer_map = carer_map) ``` Previewing this: ```{r echo=FALSE} head(eg1) %>% dplyr::mutate(dplyr::across(tidyselect::where(~is.character(.) || inherits(., "Date")), ~gsub("-", "\u2011", as.character(.)))) %>% waachs_table() ``` ## Example 2: Changing default age ### Example 2.1: Different ages + Flag record when child is between 10 and 14: ```{r message=TRUE} eg2.1 <- aducust_flag(data = aducust, dobmap = dobmap, carer_map = carer_map, child_start_age = 10, child_end_age = 14) ``` Previewing this: ```{r echo=FALSE} head(eg2.1) %>% dplyr::mutate(dplyr::across(tidyselect::where(~is.character(.) || inherits(., "Date")), ~gsub("-", "\u2011", as.character(.)))) %>% waachs_table() ``` ### Example 2.2: "Negative" ages The function does work when `child_start_age` and `child_end_age` is negative. + Flag aducust record that exists "1 year before the child was born", and age 5. ```{r message=TRUE} eg2.2 <- aducust_flag(data = aducust, dobmap = dobmap, carer_map = carer_map, child_start_age = -1, child_end_age = 5) ``` Previewing this: ```{r echo=FALSE} head(eg2.2) %>% dplyr::mutate(dplyr::across(tidyselect::where(~is.character(.) || inherits(., "Date")), ~gsub("-", "\u2011", as.character(.)))) %>% waachs_table() ``` ## Example 3: Non-default variable names ### Example 3.1: Different `data` date variables + Simply change `data_start_date` and `data_end_date` to suit. ```{r message=TRUE} # Rename ReceptionDate and DischargeDate eg3.1 <- aducust_flag(data = aducust %>% rename(StartDate = ReceptionDate, EndDate = DischargeDate), dobmap = dobmap, carer_map = carer_map, child_start_age = 10, child_end_age = 14, data_start_date = "StartDate", data_end_date = "EndDate") ``` Previewing this: ```{r echo=FALSE} head(eg3.1) %>% dplyr::mutate(dplyr::across(tidyselect::where(~is.character(.) || inherits(., "Date")), ~gsub("-", "\u2011", as.character(.)))) %>% waachs_table() ``` ### Example 3.2: Different ID variable names + Simply change `carer_id_var` to suit. ```{r message=TRUE} eg3.2 <- aducust_flag(data = aducust %>% rename(OtherID = carer_rootnum), dobmap = dobmap, carer_map = carer_map %>% rename(OtherID = carer_rootnum), carer_id_var = "OtherID") ``` Previewing this: ```{r echo=FALSE} head(eg3.2) %>% dplyr::mutate(dplyr::across(tidyselect::where(~is.character(.) || inherits(., "Date")), ~gsub("-", "\u2011", as.character(.)))) %>% waachs_table() ``` ### Example 3.3: Different DOB variable + Simply change `dobmap_dob_var` to suit. ```{r message=TRUE} eg3.3 <- aducust_flag(data = aducust, dobmap = dobmap %>% rename(dateofbirth = dob), carer_map = carer_map, dobmap_dob_var = "dateofbirth") ``` Previewing this: ```{r echo=FALSE} head(eg3.3) %>% dplyr::mutate(dplyr::across(tidyselect::where(~is.character(.) || inherits(., "Date")), ~gsub("-", "\u2011", as.character(.)))) %>% waachs_table() ``` ## Example 4: Summarising Records ### Example 4.1: Summarising within carer + `carer_summary=TRUE` + Within each carer, returns: + "Yes" if flag is "Yes" across any aducust record. + "No" if flag is "No" across all aducust records. ```{r message=TRUE} eg4.1 <- aducust_flag(data = aducust, dobmap = dobmap, carer_map = carer_map, carer_summary = TRUE) ``` Previewing this: ```{r echo=FALSE} head(eg4.1) %>% dplyr::mutate(dplyr::across(tidyselect::where(~is.character(.) || inherits(., "Date")), ~gsub("-", "\u2011", as.character(.)))) %>% waachs_table() ``` ### Example 4.2: Summarising across carers + `any_carer_summary=TRUE` + Across all aducust records for all carers for a child, returns: + "Yes" if any flags are "Yes". + "No" if all flags are "No". ```{r message=TRUE, warning=TRUE} eg4.2 <- aducust_flag(data = aducust, dobmap = dobmap, carer_map = carer_map, any_carer_summary = TRUE) ``` The above warning is to note that `carer_summary` will be ignored, and records will be collapsed to a child level. Previewing this: ```{r echo=FALSE} head(eg4.2) %>% dplyr::mutate(dplyr::across(tidyselect::where(~is.character(.) || inherits(., "Date")), ~gsub("-", "\u2011", as.character(.)))) %>% waachs_table() ``` And to check the collapse is correct: ```{r} nrow(eg4.2) == length(unique(dobmap$rootnum)) ``` # Conclusion The `aducust_flag` function prototype is useful for consistently flagging aducust data sets.