📘 Experimental Design Summary

This analysis is based on the single-cell RNA-seq dataset published in:

Tu, J. et al. (2021). Single-Cell Transcriptomics of Human Nucleus Pulposus Cells: Understanding Cell Heterogeneity and Degeneration. Advanced Science, 8(23), 2103631. https://doi.org/10.1002/advs.202103631

Study Objectives:

  • Profile transcriptional heterogeneity in human nucleus pulposus cells (NPCs).
  • Compare non-degenerated (Grade II) and degenerated (Grade III–IV) intervertebral discs.
  • Identify distinct NPC subtypes and reconstruct cellular trajectories.

Data Source:

  • GEO accession: GSE165722
  • Technology: BD Rhapsody platform
  • Samples: 8 NPC tissue samples with varying degeneration states.

Summary of Analysis Flow:

  1. QC and Filtering: Remove low-quality cells with high mitochondrial content.
  2. Normalization and Clustering: Use Seurat to identify transcriptionally distinct clusters.
  3. Marker Gene Detection: Find genes distinguishing each cluster.
  4. Cluster Annotation: Label clusters into known biological subtypes.
  5. Pseudotime Analysis: Use Monocle3 to infer differentiation trajectories.

Interpretation:

From the plots generated: - UMAP/tSNE: Reveal 6 biologically interpretable NPC subtypes. - DotPlot: Confirms canonical marker expression in annotated subtypes. - Pseudotime: Suggests HT-CLNPs are early progenitors transitioning into mature states such as effector or fibroNPCs.

📥 Choose Dataset Format

This analysis use dataset “GSE165722”.

🧼 Step 1: Quality Control and Filtering

Filter low-quality cells based on UMI count, gene number, and mitochondrial content using Seurat. This step ensures robust downstream analysis by removing potential doublets and stressed or dying cells.

QC Violin Plot

QC Violin Plot

QC Violin Plot

⚙️ Step 2: Normalization, Dimensionality Reduction, and Clustering

Normalize gene expression data, identify variable genes, scale the matrix, and reduce dimensionality using PCA and UMAP. Perform graph-based clustering to define transcriptionally distinct cell groups.

🌐 Global Reference Annotation (SingleR)

Global cell type identities are assigned using SingleR, referencing transcriptional profiles from datasets such as the Human Primary Cell Atlas. This method enables the identification of broad cell categories (e.g., MSCs, T cells) based on cross-tissue gene expression similarity.

UMAP Clustering (Default)

UMAP Clusters

UMAP Clusters

📌 NPC Subtype Annotation via Signature Score Enrichment (AUCell)

This approach evaluates the activation level of each NPC subtype signature at the single-cell level using AUCell. While umap_clusters_celltype.png displays general cell type classifications inferred from global references, umap_npc_subtypes_auc.png reveals functionally distinct NPC subtypes based on enrichment of tissue-specific marker genes.

UMAP Clustering (Default)

UMAP Clusters

UMAP Clusters

🧬 Step 3: Marker Identification and Cell Type Annotation

Identify differentially expressed genes for each cluster using Seurat’s FindAllMarkers(). Combine marker gene expression with reference-based annotation tools (e.g., SingleR) to assign cell types.

Marker Genes (Preview)

##   p_val avg_log2FC pct.1 pct.2 p_val_adj cluster  gene
## 1     0  1.8229819 0.868 0.317         0       0 CYR61
## 2     0  1.2440728 0.977 0.431         0       0   DCN
## 3     0  1.8586392 0.877 0.345         0       0  CTGF
## 4     0  1.3793767 0.947 0.417         0       0   LUM
## 5     0  1.7458917 0.940 0.456         0       0   FN1
## 6     0  0.6984552 0.915 0.438         0       0   CLU

DotPlot of Top Markers

Top Marker DotPlot

Top Marker DotPlot

Barplot: Cell Type Distribution

Cell Type Distribution Barplot

Cell Type Distribution Barplot

🔬 Step 4: Pseudotime Inference with Monocle3

Use Monocle3 to infer dynamic developmental trajectories among selected clusters. Order cells along pseudotime and visualize progression paths to study lineage relationships during degeneration.

Pseudottime Trajectory

Pseudotime trajectory of NPC populations

Pseudotime trajectory of NPC populations

🔗 Step 5: Cell-Cell Communication with CellChat

Perform ligand-receptor analysis using CellChat to explore cell-cell interactions among NPC subpopulations or all clusters. Identify enriched signaling pathways and visualize the intercellular network.

Cell-cell communication of NPC populations

Cell-cell communication of NPC populations

🧬 Step 6: Stemness Potential Scoring with CytoTRACE

Use CytoTRACE to infer differentiation potential and cellular plasticity based on gene counts and transcriptional diversity. This step helps to highlight progenitor-like populations within NPC clusters.

🧠 Step 7: Transcription Factor Network Inference with SCENIC

Apply the SCENIC pipeline to construct gene regulatory networks and infer regulon activity across single cells. This identifies key transcription factors and their target modules associated with IVDD progression.

🔁 Step 8: Pathway Enrichment via GSEA (fgsea)

Perform gene set enrichment analysis using the fgsea method with Hallmark gene sets. This highlights stage-specific biological pathways such as TNF, MAPK, or unfolded protein response during disc degeneration.

📊 Step 9: Signature Activity Scoring with AUCell

Use AUCell to compute the activation level of gene signatures (e.g., SASP, ECM remodeling) in individual cells. This provides fine-grained insights into heterogeneous transcriptional programs across NPC states.

🧫 Step 10: Immune Subset Reclustering and Exploration

Isolate immune cell subsets (e.g., G-MDSCs, macrophages, T cells) for focused reclustering and downstream analysis. This enables functional dissection of the immune microenvironment in NP degeneration.

📌 Notes