Process recommendations, best practices, and workflow guidance for each lesson in Course 0. Click any step to expand detailed recommendations.
Course 0 · 8–10 Hours · BeginnerCreate a new R Project for every spatial analysis. Use a consistent directory structure: data/raw/ (immutable), data/processed/, scripts/, output/figures/, output/tables/. Never use setwd() — R Projects handle paths automatically.
Spatial datasets routinely exceed 2 GB in memory. Create a .Renviron file in your project root to allocate sufficient RAM:
Do not enable "Restore .RData into workspace at startup" in RStudio Global Options. Large spatial objects saved to .RData slow down startup and cause crashes. Disable this setting and always reload data from scripts.
Initialize renv at the start of every project. Run renv::init() to create a lockfile. Commit renv.lock to version control. Collaborators and reviewers restore the exact environment with renv::restore().
Load spatial metadata (barcodes, coordinates, tissue positions) as tibbles. Use dplyr pipelines to join gene expression matrices with coordinate data. Keep raw data untouched — always create new objects for filtered or transformed versions.
For datasets exceeding 500,000 rows, use data.table or dtplyr as a drop-in backend for dplyr. The syntax remains identical, but operations run 5–10× faster on large spatial matrices.
Map gene expression to spatial coordinates using geom_point() with aes(x = x_coord, y = y_coord, color = expression). Flip the y-axis with scale_y_reverse() to match histology image orientation. Use coord_fixed() to preserve tissue proportions.
Always use colorblind-safe palettes. The viridis family (viridis, magma, inferno, plasma) is perceptually uniform and safe for all common forms of color vision deficiency. Avoid red-green gradients.
Do not use rainbow() or heat.colors() for spatial plots. These palettes create perceptual bands that distort data interpretation and exclude colorblind viewers.
Install Bioconductor packages through BiocManager, not install.packages(). Bioconductor enforces version compatibility across the 2,000+ packages in the ecosystem. Use SpatialExperiment as the primary data container for spatial data.
A SpatialExperiment stores: assays (gene expression counts), colData (per-spot metadata), rowData (per-gene metadata), spatialCoords (x/y positions), and imgData (histology images). Access spatial coordinates with spatialCoords(spe).
Do not mix Bioconductor release versions. If your R version maps to Bioconductor 3.18, do not install packages from 3.19. Run BiocManager::valid() to check for version mismatches.
Begin with descriptive statistics: distribution of UMI counts per spot, genes detected per spot, and mitochondrial gene percentage. Then test for spatial autocorrelation using Moran's I to identify spatially variable genes — genes whose expression is spatially patterned rather than random.
Moran's I ranges from -1 (dispersed) to +1 (clustered). A value near 0 indicates random spatial distribution. Test statistical significance with a permutation test. Genes with high Moran's I and low p-values are spatially variable genes (SVGs).
Do not apply standard differential expression methods (DESeq2, edgeR) to spatial data without accounting for spatial autocorrelation. Spatial data violates the independence assumption. Use spatially-aware methods instead.
Write every analysis as an R Markdown document from the start — not as a script that you later convert. Set echo = TRUE and message = FALSE in global chunk options. Include a YAML header with title, author, date, and output format.
Set figure dimensions in chunk options, not in ggsave(). Use fig.width and fig.height in inches. For publication, also set dpi = 300. Save final figures separately with ggsave() using exact journal specifications.
Before sharing your report, run renv::snapshot() to capture the exact package versions used. Include the renv.lock file alongside your .Rmd file so reviewers can reproduce your environment exactly.
1. Download R from https://cran.r-project.org (version 4.3+ recommended).
2. Download RStudio Desktop from https://posit.co/download/rstudio-desktop/.
3. Install R first, then install RStudio.
4. Open RStudio and verify: run R.version.string in the console.
Tools → Global Options → General:
☐ Uncheck "Restore .RData into workspace at startup"
☐ Set "Save workspace to .RData on exit" to Never
☑ Check "Automatically notify me of updates to RStudio"
Scripts with setwd("C:/Users/me/...") break on every other computer. Use R Projects — paths are automatically relative.
A 4 GB SpatialExperiment object in .RData means a 4 GB file loaded every time you open the project. Disable this immediately.
CRAN and Bioconductor are separate repositories. Always use BiocManager::install() for Bioconductor packages to ensure version compatibility.
Without coord_fixed(), ggplot2 stretches the axes to fill the plot area. The tissue appears distorted. Always lock the aspect ratio.
Histology images have the origin at the top-left. R plots have the origin at the bottom-left. Use scale_y_reverse() to match image orientation.
Rainbow palettes create perceptual artifacts and exclude colorblind viewers. Use viridis for continuous data and scale_color_brewer() for categorical data.
R does not always warn before running out of memory. Monitor usage with pryr::mem_used() and configure .Renviron for large datasets.
Package versions change frequently. An analysis that works today may fail in 6 months because of a package update. Initialize renv at the start of every project.
Never overwrite files in data/raw/. Write processed data to data/processed/. Raw data is your ground truth — treat the raw data directory as read-only.
Write in R Markdown from the start, and knit frequently. Discovering errors during the final knit — after hours of work — wastes significant time.