---
title: "Flagging aducust records using `aducust_flag`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Flagging aducust records using `aducust_flag`}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

# Forward

`WAACHShelp::aducust_flag()` automates the flagging of this `aducust` data (custodial record(s) for a set of individuals).

+ Often, we would like to flag whether a record exists between two dates.
+ Of particular interest is the case of carer `aducust` data, where we might want to flag whether a record exists when a child is between 0 and 18, or otherwise.


This vignette steps through how data should be structured to use the function, and general use. **The examples presented flag, at a child level, whether any associated carer has an aducust record when the child is between age *x* and *y***. The function can certainly be adapted to suit other applications.

To do this, data sets will be simulated to mimic the required structure.

```{r setup, echo=TRUE, message=FALSE, warning=FALSE}
# Load the package
library(WAACHShelp)

# Set seed for reproducibility
set.seed(123)
```

# Simulate Data

+ The function requires the following data sets as inputs:
  + `data` --- aducust data set.
    + Will be at the carer level.
  + `dobmap` --- dobmap file.
    + In this instance, at the child level.
    + In reality, only two columns are required: an ID variable, and a DOB for the unit of interest.
  + `carer_map`
    + A mapping file which dictates how carer(s) are related to children.
    + Can generally be formulated using a linkage/family connections file.

## 1) Formulate `dobmap`

+ Variables
  + `rootnum` --- child ID variable
  + `dob` --- A randomly selected set of DOBs **of class Date**.
+ Note
  + 100 children will be simulated

```{r echo=TRUE}
# Function to create unique random rootnums
make_rootnum <- function(n){
  replicate(n, paste0(sample(c(LETTERS, 0:9), 6, replace = TRUE), collapse = ""))
  }

# Formulate rootnums
n_children <- 100 
rootnums <- make_rootnum(n_children)

# dobmap: rootnum + dob
dobmap <- tibble(rootnum = rootnums,
                 dob = as.Date('2010-01-01') + sample(0:3650, n_children, replace = TRUE))
```

Now previewing the first few rows:

```{r}
head(dobmap, n = 10) %>%
  waachs_table()
```


## 2) Formulate `carer_map`

+ Variables
  + `rootnum` --- child ID variable.
  + `carer_type` --- variable denoting "type" of carer.
    + Necessary if multiple carers per child.
  + `carer_rootnum` --- carer ID variable.
    + Must be called something different to the child ID variable (i.e., cannot be called `rootnum`).
+ Note
  + Any individual can have any number of carers associated with them.
    
```{r}
carer_types <- c("carer1id", "carer2id", "NEWBMID")

# For each child, randomly assign 1 to 3 carers
carers_per_child <- sample(1:3, n_children, replace = TRUE)

# Create carer_map rows by repeating rootnum as per carers_per_child
carer_map <- tibble(rootnum = rep(rootnums, times = carers_per_child)) %>%
  mutate(carer_type = sample(carer_types, n(), replace = TRUE)) %>%
  distinct(rootnum, carer_type) %>%
  # Use unique random alphanumeric strings for carer_rootnum (no prefix)
  mutate(carer_rootnum = replicate(n(), paste0(sample(c(LETTERS, 0:9), 8, replace = TRUE), collapse = "")))
```

Now previewing the first few rows:

```{r}
head(carer_map, n = 10) %>%
  waachs_table()
```


## 3) Formulate `aducust`

+ Variables
  + `carer_rootnum` --- carer ID variable.
    + Carer ID variable name must be equivalent between `carer_map` and `aducust`.
  + `ReceptionDate` --- aducust start date.
  + `DischargeDate` --- aducust end date.
+ Note
  + Any carer can have any number of aducust records (range 0-10).
  + Discharge date must be after reception date.

```{r warning=FALSE}
# data: multiple aducust records per carer_rootnum with start/end dates
# Initialize empty list to store records
aducust_list <- vector("list", length = nrow(carer_map))

for (i in seq_len(nrow(carer_map))) {
  n_records <- sample(0:10, 1)

  if (n_records == 0) {
    aducust_list[[i]] <- NULL
  } else {
    rec_dates <- as.Date('2020-01-01') + sample(0:1000, n_records, replace = TRUE)
    dis_dates <- rec_dates + sample(1:30, n_records, replace = TRUE)

    aducust_list[[i]] <- tibble(
      rootnum = carer_map$carer_rootnum[i],  # carer_rootnum as requested
      ReceptionDate = rec_dates,
      DischargeDate = dis_dates
    )
  }
  rm(i, n_records, rec_dates, dis_dates)
}

# Combine all rows into one dataframe
aducust <- bind_rows(aducust_list) %>%
  rename(carer_rootnum = rootnum)
```

Now previewing the first few rows:

```{r}
head(aducust, n = 10) %>%
  waachs_table()
```


```{r echo=FALSE}
rm(aducust_list, carer_types, carers_per_child, n_children, rootnums, make_rootnum)
```


# Examples

Now that we have our data, we can apply this to an example:

## Example 1: Default use

Applying the function using all defaults:

+ Flags aducust records when child is between 0 and 18.
+ Does not collapse records in any way.
+ Resulting "many-to-many" join between `carer_map` and `aducust`.

```{r message=TRUE}
eg1 <- aducust_flag(data = aducust,
                    dobmap = dobmap,
                    carer_map = carer_map)
```

Previewing this:

```{r echo=FALSE}
head(eg1) %>%
  dplyr::mutate(dplyr::across(tidyselect::where(~is.character(.) || inherits(., "Date")),
                                    ~gsub("-", "\u2011", as.character(.)))) %>%
  waachs_table()
```

## Example 2: Changing default age

### Example 2.1: Different ages

+ Flag record when child is between 10 and 14:

```{r message=TRUE}
eg2.1 <- aducust_flag(data = aducust,
                      dobmap = dobmap,
                      carer_map = carer_map, 
                      child_start_age = 10, 
                      child_end_age = 14)
```

Previewing this:

```{r echo=FALSE}
head(eg2.1) %>%
  dplyr::mutate(dplyr::across(tidyselect::where(~is.character(.) || inherits(., "Date")),
                                    ~gsub("-", "\u2011", as.character(.)))) %>%
  waachs_table()
```

### Example 2.2: "Negative" ages

The function does work when `child_start_age` and `child_end_age` is negative.

+ Flag aducust record that exists "1 year before the child was born", and age 5.
  
```{r message=TRUE}
eg2.2 <- aducust_flag(data = aducust,
                      dobmap = dobmap,
                      carer_map = carer_map, 
                      child_start_age = -1, 
                      child_end_age = 5)
```

Previewing this:

```{r echo=FALSE}
head(eg2.2) %>%
  dplyr::mutate(dplyr::across(tidyselect::where(~is.character(.) || inherits(., "Date")),
                                    ~gsub("-", "\u2011", as.character(.)))) %>%
  waachs_table()
```


## Example 3: Non-default variable names

### Example 3.1: Different `data` date variables

+ Simply change `data_start_date` and `data_end_date` to suit.

```{r message=TRUE}
# Rename ReceptionDate and DischargeDate
eg3.1 <- aducust_flag(data = aducust %>% rename(StartDate = ReceptionDate,
                                                EndDate = DischargeDate),
                      dobmap = dobmap,
                      carer_map = carer_map, 
                      child_start_age = 10, 
                      child_end_age = 14,
                      data_start_date = "StartDate",
                      data_end_date = "EndDate")
```

Previewing this:

```{r echo=FALSE}
head(eg3.1) %>%
  dplyr::mutate(dplyr::across(tidyselect::where(~is.character(.) || inherits(., "Date")),
                                    ~gsub("-", "\u2011", as.character(.)))) %>%
  waachs_table()
```

### Example 3.2: Different ID variable names

+ Simply change `carer_id_var` to suit.

```{r message=TRUE}
eg3.2 <- aducust_flag(data = aducust %>% rename(OtherID = carer_rootnum),
                      dobmap = dobmap,
                      carer_map = carer_map %>% rename(OtherID = carer_rootnum),
                      carer_id_var = "OtherID")
```

Previewing this:

```{r echo=FALSE}
head(eg3.2) %>%
  dplyr::mutate(dplyr::across(tidyselect::where(~is.character(.) || inherits(., "Date")),
                                    ~gsub("-", "\u2011", as.character(.)))) %>%
  waachs_table()
```

### Example 3.3: Different DOB variable

+ Simply change `dobmap_dob_var` to suit.

```{r message=TRUE}
eg3.3 <- aducust_flag(data = aducust,
                      dobmap = dobmap %>% rename(dateofbirth = dob),
                      carer_map = carer_map,
                      dobmap_dob_var = "dateofbirth")
```

Previewing this:

```{r echo=FALSE}
head(eg3.3) %>%
  dplyr::mutate(dplyr::across(tidyselect::where(~is.character(.) || inherits(., "Date")),
                                    ~gsub("-", "\u2011", as.character(.)))) %>%
  waachs_table()
```

## Example 4: Summarising Records

### Example 4.1: Summarising within carer

+ `carer_summary=TRUE`
+ Within each carer, returns: 
  + "Yes" if flag is "Yes" across any aducust record.
  + "No" if flag is "No" across all aducust records.

```{r message=TRUE}
eg4.1 <- aducust_flag(data = aducust,
                      dobmap = dobmap,
                      carer_map = carer_map,
                      carer_summary = TRUE)
```

Previewing this:

```{r echo=FALSE}
head(eg4.1) %>%
  dplyr::mutate(dplyr::across(tidyselect::where(~is.character(.) || inherits(., "Date")),
                                    ~gsub("-", "\u2011", as.character(.)))) %>%
  waachs_table()
```

### Example 4.2: Summarising across carers

+ `any_carer_summary=TRUE`
+ Across all aducust records for all carers for a child, returns: 
  + "Yes" if any flags are "Yes".
  + "No" if all flags are "No".

```{r message=TRUE, warning=TRUE}
eg4.2 <- aducust_flag(data = aducust,
                      dobmap = dobmap,
                      carer_map = carer_map,
                      any_carer_summary = TRUE)
```

The above warning is to note that `carer_summary` will be ignored, and records will be collapsed to a child level.

Previewing this:

```{r echo=FALSE}
head(eg4.2) %>%
  dplyr::mutate(dplyr::across(tidyselect::where(~is.character(.) || inherits(., "Date")),
                                    ~gsub("-", "\u2011", as.character(.)))) %>%
  waachs_table()
```

And to check the collapse is correct:

```{r}
nrow(eg4.2) == length(unique(dobmap$rootnum))
```

# Conclusion

The `aducust_flag` function prototype is useful for consistently flagging aducust data sets.