Don’t be an absolutist. Use the here package for reproducible workflows

Gentle reminders when using the here package for your RStudio projects
r
workflow
reproducibility
Author
Published

Sunday, March 17, 2024

Doi

TL;DR

Don’t be an absolutist– use relative paths. Use the here package instead of setwd() or getwd() to increase reproducibility and avoid wasting your and other people’s time.

What’s the problem with setwd()?

Since I created this website, I’ve been coding, writing, and reading a lot more which has unequivocally led to a mountain of new files and the forging of new paths– quite literally. At first, I kept things pretty organized, but now it’s nearly impossible to know where I saved such_and_such.txt file without wasting at least 5 minutes of my day.

This is what I used to do:

url1 <- "https://somefile_online_data_source_here.com"
download.file(url1, destfile = "./data_file_here.zip")
unzip("data_file_here.zip", exdir = getwd())
Dat <- readRDS("summaryDat.rds")
Dat2 <- readRDS("SummaryDat2.rds")

My directory will be anywhere on my device unless I have previously specified it using setwd(), but this strategy will soon be an obstacle to saving new information in an organized and reproducible way. If, later on, I change my R scripts to a different folder the original file path won’t work anymore.

here is the solution 📁

The here package allows you to set up a relative path mapped onto your R project directory on every device regardless of your absolute path.

The here function

Suppose my directory is located in the Project folder. The here package is going to look for the .Rproj file and establish the root directory there.

# Project/
#    |
#    |__ data/
#    |    |___  summaryDat.rds
#    |    |___  summaryDat2.rds
#    |
#    |__ blog/
#    |    |_____index.qmd
#    |    |
#    |    |__ post/
#    |    | |______ 2024/
#    |    |       |____ 02/
#    |    |          |____  index.qmd
#    |    |              |____  dat3.R
#    |    |__ img/
#    |      |_____  plots.png
#    |
#    |__ scripts/
#      |____ ind.R
#      |____ cond.R

Here you can see my root directory and how that changes with each iteration of the here command.

library(here)
here::here()
# [1] "C:/Users/jpmonteagudo/Desktop/R/Project"
  here::here("blog")
# [1] "C:/Users/jpmonteagudo/Desktop/R/Project/blog"
  here::here("scripts")
# [1] "C:/Users/jpmonteagudo/Desktop/R/Project/scripts"
# I'll point R to the actual document by providing the full relative path
here::here("blog","post","2024","02","dat3.R")
# [1] "C:/Users/jpmonteagudo/Desktop/R/Project/blog/post/2024/02/dat3.R"

I can also go up several folders at once by using the full relative path. However, when I call the here function again, it sends me back to my root directory.

here::here()
# [1] "C:/Users/jpmonteagudo/Desktop/R/Project"

I would use the here function to get or write files and not just be there. If I don’t need anything from my subdirectory, then R will go back to its root, the .Rproj. For example, saving a .png file with multiple plots involves specifying the relative path using here::here().


## Using ggplot2 to save my plots

ggsave("plots.png",arranged_plots, 
       path = here::here("blog","2024","02","post","img"),
                width = 800,
                height = 600,
                units = "px",
                dpi = 72)

## The same can be done using base R

dev.copy(png,here::here("blog","2024","02","clt","img","plots.png"), width = 800, height = 600)
dev.off()

The set_here function

If I want to “just be somewhere” anytime I open my project, I would use another function– the set_here function. Basically, this function creates a .here file anywhere in your project so you can use this directory as your root. Here’s the description in the function’s syntax

When here encounters such a file, it uses the directory that contains this file as root. This is useful if none of the default criteria apply. You need to restart the R session so that here() picks up the newly created file.

here::set_here("blog/2024")
# Created file .here in C:\Users\jpmonteagudo\Desktop\R\Project\blog\2024. 
# Please start a new R session in the new project directory.

Next, I start a new R session here, and RStudio will automatically set my directory to this folder. I don’t need to open the R project to reach this new directory. It will give me access to the folder’s files, and I can then set a relative path to other files.

# Checking directory in new R session
here::here()
# [1] "C:/Users/jpmonteagudo/Desktop/R/Project/blog/2024"

From this new directory, I can reach files anywhere by using the here::here() function.

The confusing i_am function

This function has given me a headache. The here package is supposed to be a tool that facilitates collaboration and connectivity, but I just couldn’t get it to work until now.

Call the here::i_am() function at the top of your script in the first chunk of your markdown file. It will accept a relative path and then establish the new project root there. So far, it only works when I point R to a specific file I’d like to work with. If I choose a file path that’s not in my project directory, it will just point to the original directory and throw an error. If the current directory is outside of the project where the current script is running, you’ll get an error message: Could not find associated project in working directory or any parent directory.

# You're in the scripts folder working on ind.R but need to access summaryDat2.rds.
#  Simply include the relative path to the data file at the top of your script:
library(readr)
data <- read_csv(here::i_am("data/summaryDat2.rds"))
# From my script I'm now pointing to a folder containing my Dat2.R

The dr_here function

The here::dr_here() shows a default message explaining why the current directory was chosen. You probably won’t use this function often– unless you’re curious and want to understand how the package selects the root directory. However, if you used here::here("file_path") and got an unexpected result, go ahead and call here::dr_here. It’ll most likely ask you to create a .here file or set your directory using the here::i_am() function.

In the end, the here package will make it easy to collaborate and work on your projects on any device by using the here::i_am(), here::here(), and here::set_here() functions.

Citation

For attribution, please cite this work as:
Monteagudo, JP. 2024. “Don’t Be an Absolutist. Use the Here Package for Reproducible Workflows.” March 17, 2024. https://doi.org/10.59350/4a9fr-acc34.