Skip to content

cellgeni/ticket-3970

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Integrate adult with pediatric samples tic-3970

Description

The scvi-ntegration with all pediatric samples worked really well /lustre/scratch127/cellgen/cellgeni/aljes/integration/data/integrated_umap_annotated.h5ad We will repeat the same strategy adding three new samples (from adult ovaries, so we can compare pediatric vs adult). Samples are:

M23s20 =   /nfs/t292_imaging/0XeniumExports/JA_POV/20240801_130116SGP180_Hsa_RPT_Run2/output-XETG003350027061M23-OVR-0-FO-1-S20-ii20240801_130136/
M23s25 = /nfs/t292_imaging/0XeniumExports/JA_POV/20240815_114153SGP180_RPT_run2r/output-XETG001550027062M23-OVR-0-FO-S25-iii20240815_114214
M23s5 = 
/nfs/t292_imaging/0XeniumExports/JA_POV/20240801_130116SGP180_Hsa_RPT_Run2/output-XETG003350027061M23-OVR-0-FO-1-S5-iii20240801_130136/

NOTE: the dataset is ~ 6 million cells.

Run scVI

Copy the data

Copy samples to lustre

mkdir -p data/adult
cp -r /nfs/t292_imaging/0XeniumExports/JA_POV/20240801__130116__SGP180_Hsa_RPT_Run2/output-XETG00335__0027061__M23-OVR-0-FO-1-S20-ii__20240801__130136/ data/adult/
cp -r /nfs/t292_imaging/0XeniumExports/JA_POV/20240815__114153__SGP180_RPT_run2r/output-XETG00155__0027062__M23-OVR-0-FO-S25-iii__20240815__114214/ data/adult/
cp -r /nfs/t292_imaging/0XeniumExports/JA_POV/20240801__130116__SGP180_Hsa_RPT_Run2/output-XETG00335__0027061__M23-OVR-0-FO-1-S5-iii__20240801__130136/ data/adult/
cp /nfs/team292/ct27/ovarian/all_donors.h5ad data/all_donors.h5ad

Copy annotation

cp /nfs/team292/lg18/paediatric_gonads/annotation/xenium/all_donor_scVI_integration_annotation.csv data/annotation.csv

Process the data

Run interactive job

bsub -Is -G cellgeni -q "cpu-interactive" -n 4 -M "32GB" -R "span[hosts=1] select[mem>32GB] rusage[mem=32GB]"  /bin/bash

>>>
Job <377679> is submitted to queue <normal>.
<<Waiting for dispatch ...>>
<<Starting on node-14-13>>
singularity exec \
  --bind /lustre,/nfs \
  /nfs/cellgeni/singularity/images/toh5ad.sif \
  jupyter notebook \
    --no-browser --port=7777 --ip=0.0.0.0 \
    --IdentityProvider.hashed_password='' \
    --IdentityProvider.token='lolkek'

Open ssh connection in separate terminal

ssh -L 7777:localhost:7777 node-14-13

Concat and process all data: notebooks/process_xenium.ipynb

Submit scVI integration jobs

Submit integration

scripts/submit_grid_integration.sh "$PWD/data/adult_pediatric_processed.h5ad" "$PWD/results"

Calculate a number of successful jobs

success=$(cat *Output*log| grep "Successfully completed." | wc -l | awk '{print $1}')
total=$(ls -1 *Output*log | wc -l | awk '{print $1}')
echo "Successfully completed ${success}/${total}"
>>>

Calculate UMAPs for each integration

Create umap list

scripts/create_viz_list.sh "$PWD/data/adult_pediatric_processed.h5ad" "$PWD/results" umap_list_pediatric_adult.tsv 

Run umap creation

integration_list="$PWD/umap_list_pediatric_adult.tsv"
celltype_col="lineage"
scripts/submit_umap_creation.sh $integration_list $celltype_col

Run leiden clustering

Get a list of all .h5ad files for clustering

ls $PWD/results/removeM23S25*/*/*.h5ad -1 > clustering.list

Run clustering

bsub -J "clustering-tic-3970[1-8]" < scripts/clustering.bsub

Run ResolVI

Hyperparameter search

scVI did not work well, so we will try ResolVI with the following set of hyperparameters:

n_hidden   = [64, 128, 256]
n_latent   = [15, 25, 35, 40]
n_layers   = [2, 3]
dropout    = [0.1]
lr         = [0.0001]
batch_size = [512]
dispersion = ['gene', 'gene-batch']
likelihood = ['poisson', 'nb']
n_epochs   = [100]

Run the pipeline

bsub < scripts/run_integration.bsub

Results can be found in results/resolvi folder

Run for separate populations

We will run ResolVI for separate populations. I need to create separate objects for that first:

cp /warehouse/team292_wh01/reproductive_atlas/ovary/annotations/xenium_directannotation_scvi-v1.csv data/xenium_directannotation_scvi-v1.csv
notebooks/split_populations.ipynb

Then run ResolVI for each population:

bsub -env "all, ADATA=data/adata_population1.h5ad, SAMPLE_ID=population1, OUTPUT=results/resolvi_population1" < scripts/run_integration.bsub
bsub -env "all, ADATA=data/adata_population2.h5ad, SAMPLE_ID=population2, OUTPUT=results/resolvi_population2" < scripts/run_integration.bsub
bsub -env "all, ADATA=data/adata_population3.h5ad, SAMPLE_ID=population3, OUTPUT=results/resolvi_population3" < scripts/run_integration.bsub
bsub -env "all, ADATA=data/adata_population4.h5ad, SAMPLE_ID=population4, OUTPUT=results/resolvi_population4" < scripts/run_integration.bsub

The following set of hyperparameters was used:

batch_key  = ['donor_batch']
n_hidden   = [128]
n_latent   = [40]
n_layers   = [2]
dropout    = [0.1]
lr         = [0.0001]
batch_size = [512]
dispersion = ['gene']
likelihood = ['nb']
n_epochs   = [100]

About

Scripts for integration of Xenium datasets

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published