| Title: | Various helper functions to aid reproducibility in WAACHS analysis-related tasks |
|---|---|
| Description: | Data analysis requirements for the WAACHS project often involves a range of complex and nuanced functions. This is conflated by different analysts working across similar tasks at different points in time. This (local) package aims to ameliorate this with a set of functions applicable to all analysts (who use R)---aiding reproducibility. |
| Authors: | Zac Dempsey [aut, cre] |
| Maintainer: | Zac Dempsey <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.5.3 |
| Built: | 2026-05-28 08:29:05 UTC |
| Source: | https://github.com/The-Kids-Biostats/WAACHShelp |
This function was deprecated because it was no longer required by analysts.
add_rows2(..., id = NULL)add_rows2(..., id = NULL)
... |
Other parameters to parse to function. |
id |
Default NULL |
Almost identical to the original dplyr::add_row but also looks for format.sas attribute which haven provides when loading as SAS dataset.
Created by PV (2023).
SAS labelled dataframe
This function flags whether a carer record (custodial, per its name) exists when a child is of a certain age.
aducust_flag( data, dobmap, carer_map, flag_name = "carer_aducust", child_id_var = "rootnum", carer_id_var = "carer_rootnum", data_start_date = "ReceptionDate", data_end_date = "DischargeDate", dobmap_dob_var = "dob", child_start_age = 0, child_end_age = 18, carer_summary = FALSE, any_carer_summary = FALSE )aducust_flag( data, dobmap, carer_map, flag_name = "carer_aducust", child_id_var = "rootnum", carer_id_var = "carer_rootnum", data_start_date = "ReceptionDate", data_end_date = "DischargeDate", dobmap_dob_var = "dob", child_start_age = 0, child_end_age = 18, carer_summary = FALSE, any_carer_summary = FALSE )
data |
Input dataset (carer aducust). |
dobmap |
DOBmap file at the child level. |
carer_map |
Mapping file with columns "child ID", "carer ID". Can have multiple rows per child (e.g., one per carer 1, carer 2, NEWBMID). |
flag_name |
Name of flagging variable to return. Default |
child_id_var |
Variable denoting "child ID". Must exist and be called the same thing in |
carer_id_var |
Variable denoting "carer ID". Must exist in |
data_start_date |
Start date to consider in |
data_end_date |
End date to consider in |
dobmap_dob_var |
Date of birth (DOB) variable in |
child_start_age |
Numeric. Start (minimum) age (years) to consider for flagging (default |
child_end_age |
Numeric. End (maximum) age (years) to consider for flagging (default |
carer_summary |
Collapse aducust flags within carer (i.e., for each |
any_carer_summary |
Collapse aducust flags across carers. Default |
While it is designed for use with flagging carer custodial records, it can be applied in many other circumstances where flagging of a carer (or otherwise) record exists when a child (or otherwise) is of a certain age.
For more details, see the vignette.
Flagged dataframe.
If carer_summary is TRUE, then we are flagging whether a specific carer has any aducust record.
Therefore, we are assessing whether a specific carer has any aducust records.
If any_carer_summary is TRUE, then we are flagging whether any carer (if multiple) have any aducust records.
If any_carer_summary is TRUE, carer_summary will be ignored.
## Not run: # Example 1: Basic use aducust_flag(data = carer_aducust %>% rename(carer_rootnum = root), dobmap = dobmap, carer_map = child_carer_map, child_id_var = "NEWUID", carer_id_var = "carer_rootnum", carer_summary = FALSE, any_carer_summary = TRUE ) ## End(Not run)## Not run: # Example 1: Basic use aducust_flag(data = carer_aducust %>% rename(carer_rootnum = root), dobmap = dobmap, carer_map = child_carer_map, child_id_var = "NEWUID", carer_id_var = "carer_rootnum", carer_summary = FALSE, any_carer_summary = TRUE ) ## End(Not run)
This function serves to flag whether an individual is expected to have education data for every year within a "visibility window" based on their date of birth. The fact that "staggered" (June/July) years were introduced in Western Australia in 1997 has been built into the function.
check_att( dob, visibility_min = 2008, visibility_max = 2014, show_expectation = TRUE )check_att( dob, visibility_min = 2008, visibility_max = 2014, show_expectation = TRUE )
dob |
Date of individual. Must be class "Date". |
visibility_min |
Minimum year in visibility window. Default |
visibility_max |
Maximum year in visibility window. Default |
show_expectation |
Logical. Show flagged dataframe with expected flag per year in visibility window. Default |
Flagged dataframe.
check_att(dob = as.Date("2000-05-30"))check_att(dob = as.Date("2000-05-30"))
This function creates a HTML or Word template in the current working directory
using a specified Quarto extension. It copies the template files to the
_extensions/ directory and generates a new Quarto markdown (.qmd) file.
create_markdown(file_name = NULL, directory = "reports", ext_name = "html")create_markdown(file_name = NULL, directory = "reports", ext_name = "html")
file_name |
A string. The name of the new Quarto markdown (.qmd) file. This must be provided. |
directory |
A string. The name of the directory to plate the files. Default is NULL. Requires user specification |
ext_name |
A string. The extension type to create. Default "html" (alternatives: "word"). |
Adapted from create_template from https://github.com/The-Kids-Biostats/thekidsbiostats.
The function first checks whether a _extensions/ directory exists in the current working
directory. If not, it creates one. It then copies the necessary extension files from the
package's internal data to the _extensions/ directory. Finally, it creates
a new Quarto markdown file based on the extension template.
By default, the reports folder will be selected to house the report.
For more details, see the vignette.
The function assumes that the package WAACHShelp contains the necessary extension files
under ext_qmd/_extensions/.
## Not run: create_markdown(file_name = "my_doc", ext_name = "word") ## End(Not run)## Not run: create_markdown(file_name = "my_doc", ext_name = "word") ## End(Not run)
Adapted from create_project from https://github.com/The-Kids-Biostats/thekidsbiostats.
create_project( project_name = "standard", data = TRUE, reports = TRUE, output = TRUE, documentation = TRUE, other_folders = NULL, R = TRUE )create_project( project_name = "standard", data = TRUE, reports = TRUE, output = TRUE, documentation = TRUE, other_folders = NULL, R = TRUE )
project_name |
String. Default |
data |
Logical. If |
reports |
Logical. If |
output |
Logical. If |
documentation |
Logical. If |
other_folders |
Vector of strings that contain any other folders that should also be created. Elements should be unique. Default |
R |
Logical. If |
This function creates a directory structure for a new project based on a specified extension.
It can also create additional folders such as data, reports, output and documentation.
The function copies specific files and folders from the chosen extension to the project directory.
For more details, see the vignette.
An interactive window will appear prompting the user to select the folder where the project structure should be created. The default is the current working directory.
## Not run: create_project(project_name = "investigation_x", other_folders = c("folder1", "folder2")) ## End(Not run)## Not run: create_project(project_name = "investigation_x", other_folders = c("folder1", "folder2")) ## End(Not run)
Data set specifying the ICD (9 & 10) codes for different events in the morbidity data set.
data(icd_dat)data(icd_dat)
A data frame where rows correspond to an event and columns correspond to the variable name, morbidity search parameters (diagnosis/ediag, ecode) and ICD code breakdown
Counter variable representing the number of rows associated with any given var
Classification type
Variable name
Type of diagnosis field this ICD code corresponds to (diagnosis & ediag = "diag_ediag", ecode = "ecode")
Letter of ICD code (purely numeric is empty string "")
Lower bound of numeric element of ICD code
Upper bound of numeric element of ICD code
...
Generated internally by package
This function serves to add flags to an input data set (morbidity at this stage) pursuant to ICD codes. Flags can be added for general categories based on pre-established ICD codes (e.g., any mental health contact, any substance-related contact, any alcohol/tobacco-related contact etc.) or add a custom set of ICD codes. The file with these pre-established ICD codes are saved as an .RData file and are trivial to change.
icd_morb_flag( data, dobmap = NULL, flag_category, flag_other_varname, diag_type, diag_type_custom_vars = NULL, diag_type_custom_params, under_age = FALSE, age = 18, person_summary = FALSE, id_var = "rootnum", morb_date_var = "subadm", dobmap_dob_var = "dob", dobmap_other_vars = NULL )icd_morb_flag( data, dobmap = NULL, flag_category, flag_other_varname, diag_type, diag_type_custom_vars = NULL, diag_type_custom_params, under_age = FALSE, age = 18, person_summary = FALSE, id_var = "rootnum", morb_date_var = "subadm", dobmap_dob_var = "dob", dobmap_other_vars = NULL )
data |
Input dataset (morbidity). |
dobmap |
DOBmap file corresponding to input dataset. |
flag_category |
Type of flag to generate. Takes values from reference file (e.g., MH_morb, Sub_morb, etc.) or "Other" for custom ICD specification and flagging. |
flag_other_varname |
Flag variable name (specified only When |
diag_type |
Diagnosis type. Select from "principal diagnosis", (all) "additional diagnoses", "external cause of injury", "custom". |
diag_type_custom_vars |
Variables to search across when |
diag_type_custom_params |
Search parameters to search across when |
under_age |
Return additional variables corresponding to when participant was strictly under |
age |
Numeric. Age (years) to consider for the |
person_summary |
Summarise results at a person-level. |
id_var |
Joining (ID) variable consistent between |
morb_date_var |
Hospital morbidity date variable in |
dobmap_dob_var |
Date of birth (DOB) variable in |
dobmap_other_vars |
Other variables to carry across from DOBmap file when joining to |
For more details, see the vignette.
Flagged dataframe.
## Not run: # Example 1: Basic use ## Create any mental health or substance-related morbidity flag, "MH_morb" ## Searches "principal diagnosis", "additional diagnoses", "external cause of injury". ## Create additional flag for whether admission occurred when under 18 years of age icd_morb_flag( data = morb, dobmap = dob, flag_category = "MH_morb", under_age = T, age = 18, dobmap_other_vars = c("xyz123", "abc456") # Also join `xyz123`, `abc456` from DOBmap ) # Example 2: Basic use ## Create any substance-related morbidity flag, "Sub_morb" icd_morb_flag(data = morb, flag_category = "Sub_morb" # Create any MH contact flag ) # Example 3: Search *principal diagnosis* and *first additional diagnosis* # for a custom set of ICD codes ## Call this variable "test_var" icd_morb_flag(data = morb, flag_category = "Other", diag_type = "custom", diag_type_custom_vars = c("diagnosis", "ediag20"), diag_type_custom_params = list("diagnosis" = list("letter" = "F", "lower" = 0, "upper" = 99.9999), "ediag20" = list("letter" = "", "lower" = 0, "upper" = 99.9999)), flag_other_varname = "test_var" ) # Example 4: # Search only across primary diagnosis and # (all) additional diagnosis fields for a custom set of ICD codes. ## Call this variable "test_var2" icd_morb_flag(data = morb, flag_category = "Other", diag_type = c("principal diagnosis", "additional diagnoses"), diag_type_custom_params = list("principal diagnosis" = list("letter" = "F", "lower" = 0, "upper" = 99.9999), "additional diagnoses" = list("letter" = "", "lower" = 0, "upper" = 99.9999)), flag_other_varname = "test_var2" ) # Example 5: Search across (all) additional diagnosis fields and another random field, `dagger` ## Call this variable "test_var3" icd_morb_flag(data = morb, flag_category = "Other", diag_type = c("custom", "additional diagnoses"), diag_type_custom_vars = "dagger", diag_type_custom_params = list("dagger" = list("letter" = "F", "lower" = 0, "upper" = 99.9999)), flag_other_varname = "test_var3" ) # Example 6: Searching across multiple ICD code types within a variable ## Call this variable "test_var4" -- replicating MH_morb flag icd_morb_flag(data = morb, flag_category = "Other", diag_type = c("additional diagnoses", "additional diagnoses", "external cause of injury"), flag_other_varname = "test_var3", diag_type_custom_params = list("principal diagnosis" = list(list("letter" = "F", "lower" = 0, "upper" = 99.9999), list("letter" = "", "lower" = 290, "upper" = 319.9999)), "additional diagnoses" = list(list("letter" = "F", "lower" = 0, "upper" = 99.9999), list("letter" = "", "lower" = 290, "upper" = 319.9999)), "external cause of injury" = list(list("letter" = "E", "lower" = 950, "upper" = 959.9999), list("letter" = "X", "lower" = 60, "upper" = 84.9999)))) ## End(Not run)## Not run: # Example 1: Basic use ## Create any mental health or substance-related morbidity flag, "MH_morb" ## Searches "principal diagnosis", "additional diagnoses", "external cause of injury". ## Create additional flag for whether admission occurred when under 18 years of age icd_morb_flag( data = morb, dobmap = dob, flag_category = "MH_morb", under_age = T, age = 18, dobmap_other_vars = c("xyz123", "abc456") # Also join `xyz123`, `abc456` from DOBmap ) # Example 2: Basic use ## Create any substance-related morbidity flag, "Sub_morb" icd_morb_flag(data = morb, flag_category = "Sub_morb" # Create any MH contact flag ) # Example 3: Search *principal diagnosis* and *first additional diagnosis* # for a custom set of ICD codes ## Call this variable "test_var" icd_morb_flag(data = morb, flag_category = "Other", diag_type = "custom", diag_type_custom_vars = c("diagnosis", "ediag20"), diag_type_custom_params = list("diagnosis" = list("letter" = "F", "lower" = 0, "upper" = 99.9999), "ediag20" = list("letter" = "", "lower" = 0, "upper" = 99.9999)), flag_other_varname = "test_var" ) # Example 4: # Search only across primary diagnosis and # (all) additional diagnosis fields for a custom set of ICD codes. ## Call this variable "test_var2" icd_morb_flag(data = morb, flag_category = "Other", diag_type = c("principal diagnosis", "additional diagnoses"), diag_type_custom_params = list("principal diagnosis" = list("letter" = "F", "lower" = 0, "upper" = 99.9999), "additional diagnoses" = list("letter" = "", "lower" = 0, "upper" = 99.9999)), flag_other_varname = "test_var2" ) # Example 5: Search across (all) additional diagnosis fields and another random field, `dagger` ## Call this variable "test_var3" icd_morb_flag(data = morb, flag_category = "Other", diag_type = c("custom", "additional diagnoses"), diag_type_custom_vars = "dagger", diag_type_custom_params = list("dagger" = list("letter" = "F", "lower" = 0, "upper" = 99.9999)), flag_other_varname = "test_var3" ) # Example 6: Searching across multiple ICD code types within a variable ## Call this variable "test_var4" -- replicating MH_morb flag icd_morb_flag(data = morb, flag_category = "Other", diag_type = c("additional diagnoses", "additional diagnoses", "external cause of injury"), flag_other_varname = "test_var3", diag_type_custom_params = list("principal diagnosis" = list(list("letter" = "F", "lower" = 0, "upper" = 99.9999), list("letter" = "", "lower" = 290, "upper" = 319.9999)), "additional diagnoses" = list(list("letter" = "F", "lower" = 0, "upper" = 99.9999), list("letter" = "", "lower" = 290, "upper" = 319.9999)), "external cause of injury" = list(list("letter" = "E", "lower" = 950, "upper" = 959.9999), list("letter" = "X", "lower" = 60, "upper" = 84.9999)))) ## End(Not run)
In cases where we have a long data frame (i.e., multiple rows of data per participant) with a flag against each record (e.g., variable x = "Yes"/"No"), this function will collapse this dataframe to a participant level.
person_summary(data, flag_category, flag_category_val = "Yes", grouping_var)person_summary(data, flag_category, flag_category_val = "Yes", grouping_var)
data |
Input dataframe. |
flag_category |
Name of the variable to perform classification on. |
flag_category_val |
String value of the |
grouping_var |
Grouping ID variable that identifies potentially multiple records per participant. |
The collapsing is based on the value of flag_category_val and a grouping variable (participant ID) grouping_var.
Specifically, records will be collapsed such that:
If any record(s) for a participant is equal to flag_category_val, then return "Yes".
If all record(s) for a participant are not equal to flag_category_val, then return "No".
## Not run: person_summary(data = dat, flag_category = "variable_x", flag_category_val = "Yes", grouping_var = "record_id" ) ## End(Not run)## Not run: person_summary(data = dat, flag_category = "variable_x", flag_category_val = "Yes", grouping_var = "record_id" ) ## End(Not run)
This function was deprecated because it was no longer required by analysts.
proc_contents(df)proc_contents(df)
df |
Dataframe to input. |
This little function pulls out the labels and formats from a dataframe and compiles this metadata as a dataframe.
Imitates the "proc contents" function of SAS.
Created by PV (2023).
Dataframe
proc_contents(iris)proc_contents(iris)
This function was deprecated because it was no longer required by analysts.
proc_freq(var1, data = NULL, sort = NULL, min.frq = 0)proc_freq(var1, data = NULL, sort = NULL, min.frq = 0)
var1 |
Name of the variable |
data |
Dataset that contains |
sort |
Optional argument which can take on "asc" or "desc" to indicate the type of sort required. |
min.frq |
Minimum frequency |
Renders a HTML table ready for plonking into a HTML or word document.
Renders output similar to the "proc freq" function of SAS.
Created by PV (2023).
Dataframe
test = iris proc_freq(Species, test) test[1:4, "Species"] <- NA proc_freq(Species, test)test = iris proc_freq(Species, test) test[1:4, "Species"] <- NA proc_freq(Species, test)
General function to save a dataset in usable formats across different platforms. Specifically, will return .csv, .sas7bdat, and .RDS files.
save_waachs(dataframe, path, filename)save_waachs(dataframe, path, filename)
dataframe |
Input dataset to save. |
path |
Path to save file to. |
filename |
Name of the file to save. |
Saved object
General tabling function to standardise output and formatting across analysts and projects. Based on gtsummary::tbl_summary
sum_tab(data, ...)sum_tab(data, ...)
data |
Input dataset. |
... |
Any other argument relevant to |
Summary table.
This function was deprecated because it was no longer required by analysts.
sumfun(x, na.rm = TRUE, ...)sumfun(x, na.rm = TRUE, ...)
x |
Vector from input dataframe to summarise. |
na.rm |
(Default TRUE) to remove NA (missing) observations from summary. |
... |
Any other arguments parsed into component base R functions. |
Created by PV (2023).
Summary table with n (length), miss (number of missing observations), mean, sd (standard deviation), med (median), q25 (first quartile), q75 (third quartile), min (minimum value), max (maximum value).
sumfun(iris$Sepal.Length)sumfun(iris$Sepal.Length)
This function applies a custom theme to ggplot2 plots, incorporating colours to align with the project's visual identity.
theme_waachs( base_size = 12, base_line_size = base_size/22, base_rect_size = base_size/22 )theme_waachs( base_size = 12, base_line_size = base_size/22, base_rect_size = base_size/22 )
base_size |
Base font size |
base_line_size |
Base line size (default |
base_rect_size |
Base rectangle size (default |
The function determines the operating system and selects appropriate font names for Windows or other systems. It also adjusts colour scales.
A list of ggplot2 theme elements and scale adjustments.
ggplot(mtcars, aes(x = mpg, y = wt, col = factor(cyl))) + geom_point() + theme_waachs()ggplot(mtcars, aes(x = mpg, y = wt, col = factor(cyl))) + geom_point() + theme_waachs()
gtsummary::tbl_summary tableQuick helper function to easily transpose gtsummary::tbl_summary tables. Best used in conjunction with WAACHShelp::waachs_table.
transpose_tblsum(tbl, ...)transpose_tblsum(tbl, ...)
tbl |
tbl_summary input (of class |
... |
Other parameters (not currently in use). |
data.frame with transposed summary table.
mtcars %>% mutate(cyl = as_factor(cyl)) %>% select(cyl, mpg, disp, hp, wt, am) %>% tbl_summary(by = cyl) %>% modify_header(label ~ "cyl") %>% # Re-label "Characteristic" by stratification variable ("cy") transpose_tblsum() %>% waachs_table()mtcars %>% mutate(cyl = as_factor(cyl)) %>% select(cyl, mpg, disp, hp, wt, am) %>% tbl_summary(by = cyl) %>% modify_header(label ~ "cyl") %>% # Re-label "Characteristic" by stratification variable ("cy") transpose_tblsum() %>% waachs_table()
This function was deprecated because it was no longer required by analysts.
twoway(var1, var2, data = NULL, var2lab = NULL)twoway(var1, var2, data = NULL, var2lab = NULL)
var1 |
Vector of first variable |
var2 |
Vector of second variable |
data |
Dataset containing |
var2lab |
Label for |
Two-way table similar to the "proc freq" function of SAS with two variables
Created by PV (2023).
Two-way table
This function is already used by the team, and filters alphanumeric ICD-9 and ICD-10 codes pursuant to requirements.
val_filt(input_vec, letter, lower, upper)val_filt(input_vec, letter, lower, upper)
input_vec |
Vector of all admissible ICD codes to filter on. |
letter |
Letter to base filtration on (if purely numeric use empty string "") |
lower |
Lower bound on the numeric element of the ICD code (includes numerics >=lower). |
upper |
Upper bound on the numeric element of the ICD code (includes numerics <=upper). |
Vector with filtered ICD codes.
# Filter ICD codes in val3 to those between F10 and F10.9 (inclusive). ## Not run: val_filt(val3, "F", 10.0, 10.9) ## End(Not run)# Filter ICD codes in val3 to those between F10 and F10.9 (inclusive). ## Not run: val_filt(val3, "F", 10.0, 10.9) ## End(Not run)
Function to return discrete or continuous colour palette based on WAACHS logo colours.
waachs_palette( type = "discrete", n, visualisation = F, bias = 2, interpolate = "spline", ... )waachs_palette( type = "discrete", n, visualisation = F, bias = 2, interpolate = "spline", ... )
type |
Type of colour palette to render (values "discrete", "continuous"). |
n |
Number of colours to generate in palette (if |
visualisation |
(Default |
bias |
A positive number representing the spacing between colours at the high end. Parsed to |
interpolate |
Interpolation algorithm to parse to |
... |
Miscellaneous arguments to parse to |
Vector with colour hex codes. If type == "continuous" returns vector of length n. If visualisation == TRUE returns a list containing colour palette vector and ggplot2 visualisation of colour spectrum.
colours <- waachs_palette(type = "continuous", n = 100, visualisation = TRUE) # Render colour palette print(colours$palette) # Print colour palette print(colours$plot) # Print plotcolours <- waachs_palette(type = "continuous", n = 100, visualisation = TRUE) # Render colour palette print(colours$palette) # Print colour palette print(colours$plot) # Print plot
Function to apply consistent formatting to summary tables rendered using R. Works with functions from the gtsummary package, or dataframes.
waachs_table( x, font.size = 10, font.size.header = 11, line.spacing = 1.5, padding = 2.5, body_bg_col = "#FEF0D8", header_bg_col = "#89A1AD", header_text_col = "black", highlight = NULL, highlight_darken = 0.3, font_family = "Barlow", ... )waachs_table( x, font.size = 10, font.size.header = 11, line.spacing = 1.5, padding = 2.5, body_bg_col = "#FEF0D8", header_bg_col = "#89A1AD", header_text_col = "black", highlight = NULL, highlight_darken = 0.3, font_family = "Barlow", ... )
x |
A table, typically a data.frame, tibble, or output from gtsummary. |
font.size |
The font size for text in the body of the table, defaults to 8 (passed through to set_flextable_defaults). |
font.size.header |
The font size for text in the header of the table, defaults to 10. |
line.spacing |
Line spacing for the table, defaults to 1.5 (passed through to set_flextable_defaults). |
padding |
Padding around all four sides of the text within the cell, defaults to 2 (passed through to set_flextable_defaults). |
body_bg_col |
Body background colour (default WAACHS cream). |
header_bg_col |
Header background colour (default WAACHS blue). |
header_text_col |
Header text colour (default black). |
highlight |
A numeric vector specifying rows to highlight. |
highlight_darken |
A numeric value specifying the amount by which |
font_family |
Font family for plot (default Barlow). |
... |
Other arguments parsed to |
Inspired and based on thekidsbiostats::thekids_table().
head(mtcars) %>% waachs_table()head(mtcars) %>% waachs_table()