ArchR cell-by-peak matrix to scanpy¶

This notebook describes how to read the ArchR output and get a cell-by-peak matrix of Scanpy.

import¶

[1]:

import scanpy as sc
import os
import pandas as pd

Generate cell-by-peak matrix with Scanpy adata format¶

[2]:

# - load the ArchR output
data_dir = '../../data/iPSC_example/ArchR_files'

mtx_file=os.path.join(data_dir,'pm_matrix.mtx')
peak_file=os.path.join(data_dir,'pm_peak.csv')
barcode_file=os.path.join(data_dir,'pm_barcode.csv')

adata = sc.read_mtx(mtx_file)
peak=pd.read_csv(peak_file, sep=',',header=0,index_col=0)
barcode=pd.read_csv(barcode_file, sep=',',header=0,index_col=0)

# - the peak feature should be in format of 'chrxxx_xxx_xxx'
peak.index=[peak['seqnames'].values[i]+'_'+str(peak['start'].values[i])+'_'+str(peak['end'].values[i]) for i in range(len(peak['seqnames']))]
adata_atac=adata.T
adata_atac.obs=barcode
adata_atac.var=peak

[3]:

adata_atac

[3]:

AnnData object with n_obs × n_vars = 13387 × 246132
    obs: 'BlacklistRatio', 'DoubletEnrichment', 'DoubletScore', 'nDiFrags', 'nFrags', 'nMonoFrags', 'nMultiFrags', 'NucleosomeRatio', 'PassQC', 'PromoterRatio', 'ReadsInBlacklist', 'ReadsInPromoter', 'ReadsInTSS', 'Sample', 'TSSEnrichment', 'Clusters', 'ReadsInPeaks', 'FRIP'
    var: 'seqnames', 'start', 'end', 'width', 'strand', 'score', 'replicateScoreQuantile', 'groupScoreQuantile', 'Reproducibility', 'GroupReplicate', 'distToGeneStart', 'nearestGene', 'peakType', 'distToTSS', 'nearestTSS', 'GC', 'idx', 'N'

[4]:

# - save the result
save_dir = '../../data/iPSC_example/ATAC_data'
os.makedirs(save_dir, exist_ok=True)
adata_atac.write(os.path.join(save_dir, 'adata_atac_raw.h5ad'))

[ ]:

[ ]: