NA are necessary markers for missing data. However, Working with them can be tricky because of their special properties. Care should also be taken when reading in and presenting the data.
Properties of NA
Types
There are different types of NA that are denoted by the NA_*. This shhould be noted when working with NA data in a data.frame. Operations like case_when
require all output data to be of the same type.
Infection
NAs can be infectious in operations i.e. including them will make the result from logical and math operations NA. The result in string processing is more complicated because of base R does not have many functions for string processing, so it depends on the implementation of the libraries that you are using.
paste0
and glue::glue
converts NA to strings "NA"
whereas stringr::str_c
retains the infectious property
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.0.5
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.6 v dplyr 1.0.7
## v tidyr 1.1.4 v stringr 1.4.0
## v readr 2.1.2 v forcats 0.5.1
## Warning: package 'ggplot2' was built under R version 4.0.5
## Warning: package 'tibble' was built under R version 4.0.5
## Warning: package 'tidyr' was built under R version 4.0.5
## Warning: package 'readr' was built under R version 4.0.5
## Warning: package 'dplyr' was built under R version 4.0.5
## Warning: package 'forcats' was built under R version 4.0.5
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
paste0(NA, 1)
## [1] "NA1"
glue::glue("{NA}{1}")
## NA1
stringr::str_c(NA, 1)
## [1] NA
Reading in
NA can be represented by many symbols in human readable files e.g. “.”, ” “. To clean up these values and convert them to NA, one can use naniar::replace_with_na()
. I do not think this has been integrated well into the mutate
across
syntax yet, so this is what I use:
tibble(x= ".")
## # A tibble: 1 x 1
## x
## <chr>
## 1 .
tibble(x= ".") %>%
naniar::replace_with_na_all(~ .x == ".")
## # A tibble: 1 x 1
## x
## <chr>
## 1 <NA>
Presenting
When presenting the data, the audience may not be R trained and may not understand what does NA mean. Changing it to a text like “missing” may help to bridge the gap.
tidyr::replace_na()
together with mutate
mutate(across(everything, ~replace_na(., "missing")))
tibble::tibble(a = NA)
## # A tibble: 1 x 1
## a
## <lgl>
## 1 NA
tibble::tibble(a = NA) %>%
mutate(across(everything(), ~replace_na(., "missing")))
## # A tibble: 1 x 1
## a
## <chr>
## 1 missing