R Foundations for Spatial Biology

Process recommendations, best practices, and workflow guidance for each lesson in Course 0. Click any step to expand detailed recommendations.

Course 0 · 8–10 Hours · Beginner
L1

RStudio Environment & Project Management

Configuring for reproducibility and high-memory spatial data

Recommended Process

Create a new R Project for every spatial analysis. Use a consistent directory structure: data/raw/ (immutable), data/processed/, scripts/, output/figures/, output/tables/. Never use setwd() — R Projects handle paths automatically.

Memory Configuration

Spatial datasets routinely exceed 2 GB in memory. Create a .Renviron file in your project root to allocate sufficient RAM:

R_MAX_VSIZE=16Gb # Place this file in your project root directory # Restart R after creating or editing .Renviron

Avoid This

Do not enable "Restore .RData into workspace at startup" in RStudio Global Options. Large spatial objects saved to .RData slow down startup and cause crashes. Disable this setting and always reload data from scripts.

Dependency Management

Initialize renv at the start of every project. Run renv::init() to create a lockfile. Commit renv.lock to version control. Collaborators and reviewers restore the exact environment with renv::restore().

L2

Data Manipulation with Tidyverse

Cleaning and structuring spatial metadata

Recommended Process

Load spatial metadata (barcodes, coordinates, tissue positions) as tibbles. Use dplyr pipelines to join gene expression matrices with coordinate data. Keep raw data untouched — always create new objects for filtered or transformed versions.

# Example: Join expression data with spatial coordinates library(tidyverse) coords <- read_csv("spatial/tissue_positions.csv") expr <- read_csv("filtered_feature_bc_matrix/expression.csv") spatial_data <- expr |> left_join(coords, by = "barcode") |> filter(in_tissue == 1) |> mutate(total_umi = rowSums(across(where(is.numeric))))

Performance Tip

For datasets exceeding 500,000 rows, use data.table or dtplyr as a drop-in backend for dplyr. The syntax remains identical, but operations run 5–10× faster on large spatial matrices.

L3

Visualization with ggplot2

Plotting data in a physical coordinate system

Recommended Process

Map gene expression to spatial coordinates using geom_point() with aes(x = x_coord, y = y_coord, color = expression). Flip the y-axis with scale_y_reverse() to match histology image orientation. Use coord_fixed() to preserve tissue proportions.

# Spatial gene expression plot ggplot(spatial_data, aes(x = col, y = row, color = gene_ACTB)) + geom_point(size = 1.2) + scale_color_viridis_c(option = "magma") + scale_y_reverse() + coord_fixed() + theme_minimal() + labs(title = "ACTB Expression", color = "Log UMI")

Color Accessibility

Always use colorblind-safe palettes. The viridis family (viridis, magma, inferno, plasma) is perceptually uniform and safe for all common forms of color vision deficiency. Avoid red-green gradients.

Avoid This

Do not use rainbow() or heat.colors() for spatial plots. These palettes create perceptual bands that distort data interpretation and exclude colorblind viewers.

L4

Bioconductor Objects & Genomic Structures

Transitioning to spatial-specific containers

Recommended Process

Install Bioconductor packages through BiocManager, not install.packages(). Bioconductor enforces version compatibility across the 2,000+ packages in the ecosystem. Use SpatialExperiment as the primary data container for spatial data.

# Install and verify Bioconductor packages if (!require("BiocManager")) install.packages("BiocManager") BiocManager::install("SpatialExperiment") # Verify installation library(SpatialExperiment) packageVersion("SpatialExperiment") BiocManager::version()

Object Structure

A SpatialExperiment stores: assays (gene expression counts), colData (per-spot metadata), rowData (per-gene metadata), spatialCoords (x/y positions), and imgData (histology images). Access spatial coordinates with spatialCoords(spe).

Avoid This

Do not mix Bioconductor release versions. If your R version maps to Bioconductor 3.18, do not install packages from 3.19. Run BiocManager::valid() to check for version mismatches.

L5

Statistical Foundations for Transcriptomics

Identifying spatial patterns and variable genes

Recommended Process

Begin with descriptive statistics: distribution of UMI counts per spot, genes detected per spot, and mitochondrial gene percentage. Then test for spatial autocorrelation using Moran's I to identify spatially variable genes — genes whose expression is spatially patterned rather than random.

Spatial Autocorrelation

Moran's I ranges from -1 (dispersed) to +1 (clustered). A value near 0 indicates random spatial distribution. Test statistical significance with a permutation test. Genes with high Moran's I and low p-values are spatially variable genes (SVGs).

Avoid This

Do not apply standard differential expression methods (DESeq2, edgeR) to spatial data without accounting for spatial autocorrelation. Spatial data violates the independence assumption. Use spatially-aware methods instead.

L6

Reproducible Research with R Markdown

Dynamic reporting for spatial insights

Recommended Process

Write every analysis as an R Markdown document from the start — not as a script that you later convert. Set echo = TRUE and message = FALSE in global chunk options. Include a YAML header with title, author, date, and output format.

--- title: "Spatial Analysis: Mouse Brain Visium" author: "Your Name" date: "`r Sys.Date()`" output: html_document: toc: true code_folding: show --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, message = FALSE, fig.width = 8, fig.height = 6) ```

Figure Management

Set figure dimensions in chunk options, not in ggsave(). Use fig.width and fig.height in inches. For publication, also set dpi = 300. Save final figures separately with ggsave() using exact journal specifications.

Finalize with renv

Before sharing your report, run renv::snapshot() to capture the exact package versions used. Include the renv.lock file alongside your .Rmd file so reviewers can reproduce your environment exactly.

01

Install R and RStudio

One-time setup before starting Course 0

Step-by-Step

1. Download R from https://cran.r-project.org (version 4.3+ recommended).
2. Download RStudio Desktop from https://posit.co/download/rstudio-desktop/.
3. Install R first, then install RStudio.
4. Open RStudio and verify: run R.version.string in the console.

02

Configure RStudio Settings

Optimize for spatial data workflows

Global Options

Tools → Global Options → General:
☐ Uncheck "Restore .RData into workspace at startup"
☐ Set "Save workspace to .RData on exit" to Never
☑ Check "Automatically notify me of updates to RStudio"

03

Install Essential Packages

Base packages needed for the entire course series
# Core packages install.packages(c("tidyverse", "rmarkdown", "renv", "devtools")) # Bioconductor if (!require("BiocManager")) install.packages("BiocManager") BiocManager::install(c( "SummarizedExperiment", "SpatialExperiment", "scater", "scran" )) # Verify everything loaded sapply(c("tidyverse","SpatialExperiment","renv"), \(pkg) packageVersion(pkg))
!

Top 10 Mistakes in Course 0

Avoid these before moving to Course 1

1. Using setwd() Instead of R Projects

Scripts with setwd("C:/Users/me/...") break on every other computer. Use R Projects — paths are automatically relative.

2. Saving .RData on Exit

A 4 GB SpatialExperiment object in .RData means a 4 GB file loaded every time you open the project. Disable this immediately.

3. Installing Bioconductor Packages with install.packages()

CRAN and Bioconductor are separate repositories. Always use BiocManager::install() for Bioconductor packages to ensure version compatibility.

4. Forgetting coord_fixed() on Spatial Plots

Without coord_fixed(), ggplot2 stretches the axes to fill the plot area. The tissue appears distorted. Always lock the aspect ratio.

5. Not Flipping the Y-Axis

Histology images have the origin at the top-left. R plots have the origin at the bottom-left. Use scale_y_reverse() to match image orientation.

6. Using rainbow() Color Palettes

Rainbow palettes create perceptual artifacts and exclude colorblind viewers. Use viridis for continuous data and scale_color_brewer() for categorical data.

7. Running Out of Memory Silently

R does not always warn before running out of memory. Monitor usage with pryr::mem_used() and configure .Renviron for large datasets.

8. Not Using renv

Package versions change frequently. An analysis that works today may fail in 6 months because of a package update. Initialize renv at the start of every project.

9. Modifying Raw Data Files

Never overwrite files in data/raw/. Write processed data to data/processed/. Raw data is your ground truth — treat the raw data directory as read-only.

10. Knitting R Markdown at the End

Write in R Markdown from the start, and knit frequently. Discovering errors during the final knit — after hours of work — wastes significant time.

Course 0 Completion Readiness Checklist

Verify these before starting Course 1
  • R (version 4.3+) and RStudio Desktop are installed and functional
  • RStudio is configured: .RData restore disabled, workspace save set to Never
  • Core packages installed: tidyverse, rmarkdown, renv, BiocManager
  • Bioconductor packages installed: SpatialExperiment, scater, scran
  • Can create an R Project with proper directory structure (data/raw, data/processed, scripts, output)
  • Can write a dplyr pipeline to filter, mutate, and join tibbles
  • Can create a ggplot2 scatter plot with color mapping and coord_fixed()
  • Can use scale_y_reverse() and viridis color palettes for spatial-style plots
  • Can create a SpatialExperiment object and access spatialCoords()
  • Understand Moran's I as a measure of spatial autocorrelation
  • Can initialize renv and create a lockfile
  • Can write and knit an R Markdown document with code chunks and figures
  • Capstone project completed: reproducible spatial data exploration report