depmap

DepMap Public 23Q4

Overview

This DepMap release contains data from CRISPR knockout screens from project Achilles, as well as genomic characterization data from the CCLE project.

Pipelines

Achilles

This Achilles dataset contains the results of genome-scale CRISPR knockout screens for Achilles (using Avana Cas9 and Humagne-CD Cas12 libraries) and Achilles combined with Sanger’s Project SCORE (KY Cas9 library) screens. The dataset was processed using the following steps:

Expression

DepMap expression data is quantified from RNAseq files using the GTEx pipelines. A detailed description of the pipelines and tool versions can be found here: https://github.com/broadinstitute/ccle_processing#rnaseq. We provide a subset of the data files outputted from this pipeline available on FireCloud. These are aligned to hg38.

Copy number

DepMap WES copy number data is generated by running the GATK copy number pipeline aligned to hg38. Tutorials and descriptions of this method can be found here https://software.broadinstitute.org/gatk/documentation/article?id=11682, https://software.broadinstitute.org/gatk/documentation/article?id=11683. WES samples have been realigned to hg38 and run through this pipeline.

Mutations

DepMap mutation calls are generated using Mutect2 and annotated and filtered downstream. Variants are aligned to hg38. Detailed documentation can be found here https://storage.googleapis.com/shared-portal-files/Tools/%5BDMC%20Communication%5D%2022Q4%20Mutation%20Pipeline%20Update.pdf.

Fusions

DepMap generates RNAseq based fusion calls using the STAR-Fusion pipeline. A comprehensive overview of how the STAR-Fusion pipeline works can be found here: https://github.com/STAR-Fusion/STAR-Fusion/wiki. We run STAR-Fusion version 1.6.0 using the plug-n-play resources available in the STAR-Fusion docs for gencode v29. We run the fusion calling with default parameters except we add the –no_annotation_filter and –min_FFPM 0 arguments to prevent filtering.

###########

Files

###########

README.txt

Description of all files contained in this release

ScreenGeneEffect.csv

Pipeline: Achilles

Post-Chronos Gene effect estimates for all screens Chronos processed by library, copy number corrected, scaled, screen quality corrected then concatenated.

ScreenGeneEffectUncorrected.csv

Pipeline: Achilles

Post-Chronos Gene effect estimates for all screens Chronos processed by library, then concatenated. No copy number correction or scaling.

ScreenNaiveGeneScore.csv

Pipeline: Achilles

Post-Chronos LFC collapsed by mean of sequences and median of guides, computed per library-screen type then concatenated.

ScreenGeneDependency.csv

Pipeline: Achilles

Post-Chronos Gene dependency probability estimates for all screens Chronos processed by library-screen type, then concatenated.

CRISPRScreenMap.csv

Pipeline: Achilles

Post-Chronos Map from ModelID to all ScreenIDs combined to make up a given model’s data in the CRISPRGeneEffect matrix. Columns: - ModelID - ScreenID

CRISPRGeneEffect.csv

Pipeline: Achilles

Post-Chronos Gene effect estimates for all models, integrated using Chronos. Copy number corrected, scaled, and screen quality corrected.

CRISPRGeneEffectUncorrected.csv

Pipeline: Achilles

Post-Chronos Gene effect estimates for all models, integrated using Chronos. No copy number correction or scaling.

CRISPRGeneDependency.csv

Pipeline: Achilles

Post-Chronos Gene dependency probability estimates for all models in the integrated gene effect.

CRISPRInferredGuideEfficacy.csv

Pipeline: Achilles

Post-Chronos The estimates for the efficacies of all reagents in the different libraries-screen types, computed from the Chronos runs. Columns: - sgRNA - Efficacy

CRISPRInferredLibraryEffect.csv

Pipeline: Achilles

Post-Chronos The estimates for the library batch effects identified by Chronos.

CRISPRInitialOffset.csv

Pipeline: Achilles

Post-Chronos Estimated log fold pDNA error for each sgrna in each library, as identified by Chronos.

CRISPRInferredModelGrowthRate.csv

Pipeline: Achilles

Post-Chronos The estimates for the growth rate of all models in the different libraries-screen types, computed from the Chronos runs. Columns: - ScreenID - Achilles-Avana-2D - Achilles-Humagne-CD-2D - Achilles-Humagne-CD-3D - Project-Score-KY

CRISPRInferredModelEfficacy.csv

Pipeline: Achilles

Post-Chronos The estimates for the efficacy of all models in the different libraries-screen types, computed from the Chronos runs. Columns: - ModelID - Achilles-Avana-2D - Achilles-Humagne-CD-2D - Achilles-Humagne-CD-3D - Project-Score-KY

CRISPRInferredSequenceOverdispersion.csv

Pipeline: Achilles

Post-Chronos List of genes identified as dependencies across all lines. Each entry is separated by a newline.

CRISPRInitialOffset.csv

Pipeline: Achilles

Post-Chronos The log fold error in pDNA estimated by Chronos, per library

CRISPRInferredCommonEssentials.csv

Pipeline: Achilles

Post-Chronos List of genes identified as dependencies across all lines. Each entry is separated by a newline.

AchillesCommonEssentialControls.csv

Pipeline: Achilles

Pre-Chronos file

List of genes used as positive controls, intersection of Biomen (2014) and Hart (2015) essentials. Each entry is separated by a newline. The scores of these genes are used as the dependent distribution for inferring dependency probability.

AchillesNonessentialControls.csv

Pipeline: Achilles

Pre-Chronos file

List of genes used as negative controls (Hart (2014) nonessentials). Each entry is separated by a newline.

AvanaRawReadcounts.csv

Pipeline: Achilles

Pre-Chronos file Summed guide-level read counts for each sequence screened with the Avana Cas9 library. -Columns: SequenceID - Rows: sgRNA

HumagneRawReadcounts.csv

Pipeline: Achilles

Pre-Chronos file Summed guide-level read counts for each sequence screened with the Humagne-CD Cas12 library. -Columns: SequenceID - Rows: sgRNA

KYRawReadcounts.csv

Pipeline: Achilles

Pre-Chronos file Summed guide-level read counts for each sequence screened with the Sanger’s KY Cas9 library. -Columns: SequenceID - Rows: sgRNA

AvanaLogfoldChange.csv

Pipeline: Achilles

Pre-Chronos file Log2-fold-change from pDNA counts for each sequence screened with the Avana Cas9 Library. -Columns: SequenceID - Rows: sgRNA

HumagneLogfoldChange.csv

Pipeline: Achilles

Pre-Chronos file Log2-fold-change from pDNA counts for each sequence screened with the Humagne-CD Library. -Columns: SequenceID - Rows: sgRNA

KYLogfoldChange.csv

Pipeline: Achilles

Pre-Chronos file Log2-fold-change from pDNA counts for each sequence screened with the Sanger’s KY Cas9 Library. -Columns: SequenceID - Rows: sgRNA

AvanaGuideMap.csv

Pipeline: Achilles

Pre-Chronos file Mapping of sgRNAs to Genes in the Avana Cas9 library. Columns: - sgRNA: guide in vector - GenomeAlignment: alignment to hg38 - Gene: HUGO (entrez) - nAlignments: total number of alignments for a given sgRNA - UsedByChronos: boolean indicating if sgRNA was included in Chronos analysis - DropReason: why a guide was removed prior to Chronos analysis

HumagneGuideMap.csv

Pipeline: Achilles

Pre-Chronos file

Mapping of sgRNAs to Genes in the Humagne-CD Cas12 library.

Columns:

KYGuideMap.csv

Pipeline: Achilles

Pre-Chronos file Mapping of sgRNAs to Genes in the Sanger’s KY Cas9 library. Columns: - sgRNA: guide in vector - GenomeAlignment: alignment to hg38 - Gene: HUGO (entrez) - nAlignments: total number of alignments for a given sgRNA - UsedByChronos: boolean indicating if sgRNA was included in Chronos analysis - DropReason: why a guide was removed prior to Chronos analysis

ScreenSequenceMap.csv

Pipeline: Achilles

Pre-Chronos file Mapping of SequenceIDs to ScreenID and related info. Columns: - SequenceID - ScreenID - ModelConditionID - ModelID - ScreenType: 2DS = 2D standard - Library - Days - pDNABatch - PassesQC - ExcludeFromCRISPRCombined

AchillesScreenQCReport.csv

Pipeline: Achilles

Pre-Chronos file

Screen-level quality control metrics.

Columns:

AchillesSequenceQCReport.csv

Pipeline: Achilles

Pre-Chronos file

Sequence-level quality control metrics.

Columns:

AchillesHighVarianceGeneControls.csv

Pipeline: Achilles

Pre-Chronos file List of genes with variable gene effects across models and libraries. Used for sequence correlation in QC.

OmicsExpressionGenesExpectedCountProfile.csv

Pipeline: Expression

RNAseq read count data from RSEM.

OmicsExpressionProteinCodingGenesTPMLogp1.csv

Pipeline: Expression

Gene expression TPM values of the protein coding genes for DepMap cell lines. Values are inferred from RNA-seq data using the RSEM tool and are reported after log2 transformation, using a pseudo-count of 1; log2(TPM+1). Additional RNA-seq-based expression measurements are available for download as part of the full DepMap Data Release More information on the DepMap Omics processing pipeline is available at https://github.com/broadinstitute/depmap_omics.

OmicsExpressionTranscriptsExpectedCountProfile.csv

Pipeline: Expression

RNAseq read count data from RSEM.

OmicsExpressionTranscriptsTPMLogp1Profile.csv

Pipeline: Expression

RNAseq transcript tpm data using RSEM. Log2 transformed, using a pseudo-count of 1.

OmicsExpressionAllGenesTPMLogp1Profile.csv

Pipeline: Expression

Gene expression TPM values of all genes for DepMap cell lines. Values are inferred from RNA-seq data using the RSEM tool and are reported after log2 transformation, using a pseudo-count of 1; log2(TPM+1). Additional RNA-seq-based expression measurements are available for download as part of the full DepMap Data Release More information on the DepMap Omics processing pipeline is available at https://github.com/broadinstitute/depmap_omics.

OmicsExpressionAllGenesEffectiveLengthProfile.csv

Pipeline: Expression

Gene effective length for all genes output from RSEM

OmicsCNSegmentsProfile.csv

Pipeline: Copy number

Segment level copy number data

OmicsCNGene.csv

Pipeline: Copy number

Gene-level copy number data that is log2 transformed with a pseudo-count of 1; log2(CN ratio + 1) . Inferred from WGS, WES or SNP array depending on the availability of the data type. Values are calculated by mapping genes onto the segment level calls and computing a weighted average along the genomic coordinate. Genes that overlap with segmental duplication regions and/or flagged by repeatMasker are masked in this matrix. For details see https://github.com/broadinstitute/depmap_omics/blob/master/docs/source/dna.md#masking . Additional copy number datasets are available for download as part of the full DepMap Data Release. More information on the DepMap Omics processing pipeline is available at https://github.com/broadinstitute/depmap_omics.

OmicsFusionFiltered.csv

Pipeline: Fusions

Gene fusion data derived from RNAseq data. Data is filtered using by performing the following:

OmicsFusionUnfilteredProfile.csv

Pipeline: Fusions

Gene fusion data derived from RNAseq data. Data is unfiltered. Column descriptions can be found in the STAR-Fusion wiki. Samples are identified by Profile IDs.

OmicsSomaticMutations.csv

Pipeline: Mutations

MAF-like file containing information on all the somatic point mutations and indels called in the DepMap cell lines. The calls are generated from Mutect2.

Additional processed mutation matrices containing genotyped mutation calls are available for download as part of the full DepMap Data Release.

Columns:

For details, see https://storage.googleapis.com/shared-portal-files/Tools/23Q4_Mutation_Pipeline_Documentation.pdf

OmicsSomaticMutationsProfile.csv

Pipeline: Mutations

MAF-like formatted file containing information on all the somatic point mutations and indels called in the DepMap cell lines. The calls are generated from Mutect2.

Additional processed mutation matrices containing genotyped mutation calls are available for download as part of the full DepMap Data Release.

Samples are identified by Profile IDs.

Columns:

For details, see https://storage.googleapis.com/shared-portal-files/Tools/23Q4_Mutation_Pipeline_Documentation.pdf

OmicsSomaticMutationsMAFProfile.maf

Pipeline: Mutations

MAF file containing information on all the somatic point mutations and indels called in the DepMap cell lines. The calls are generated from Mutect2. A description of the various columns is in the DepMap Release README file. Additional processed mutation matrices containing genotyped mutation calls are available for download as part of the full DepMap Data Release. This file contains the same variants as OmicsSomaticMutationsProfile.csv, but follows the standard MAF format suitable for downstream analysis tools such as maftools. Samples are identified by Profile IDs.

OmicsSomaticMutationsMatrixHotspot.csv

Pipeline: Mutations

Genotyped matrix determining for each cell line whether each gene has at least one hot spot mutation. A variant is considered a hot spot if it’s present in one of the following: Hess et al. 2019 paper, OncoKB hotspot, COSMIC mutation significance tier 1. (0 == no mutation; If there is one or more hot spot mutations in the same gene for the same cell line, the allele frequencies are summed, and if the sum is greater than 0.95, a value of 2 is assigned and if not, a value of 1 is assigned.)

OmicsSomaticMutationsMatrixDamaging.csv

Pipeline: Mutations

Genotyped matrix determining for each cell line whether each gene has at least one damaging mutation. A variant is considered a damaging mutation if LikelyLoF == True. (0 == no mutation; If there is one or more damaging mutations in the same gene for the same cell line, the allele frequencies are summed, and if the sum is greater than 0.95, a value of 2 is assigned and if not, a value of 1 is assigned.)

OmicsExpressionGeneSetEnrichment.csv

Pipeline: Expression

Single Sample Gene Set Enrichment Analysis Scores were calculated using Single Sample Gene Set Enrichment Analysis based on the z-scores of the log2(tpm + 1) of gene-level expression data. Details about the R script used to run the analysis can be found here: https://github.com/broadinstitute/ssGSEA2.0

OmicsExpressionGeneSetEnrichmentProfile.csv

Pipeline: Expression

Single Sample Gene Set Enrichment Analysis Scores were calculated using Single Sample Gene Set Enrichment Analysis based on the z-scores of the log2(tpm + 1) of gene-level expression data. Details about the R script used to run the analysis can be found here: https://github.com/broadinstitute/ssGSEA2.0

OmicsProfiles.csv

Profile ID mapping information Columns: ProfileID, ModelID, ModelConditionID, Datatype, and WESKit Rows: ProfileID

OmicsDefaultModelProfiles.csv

indicates which profiles are selected in the model-level datasets Columns: ModelID, ProfileID, and ProfileType (dna/rna)

OmicsDefaultModelConditionProfiles.csv

indicates which profiles are selected for each model condition in Achilles postprocessing Columns: ModelConditionID, ProfileID, and ProfileType (dna/rna)

OmicsGuideMutationsBinaryKY.csv

binary matrix indicating whether there are mutations in guide locations from the KY library. KY guide library (same as the one used in project Score) can be accessed from https://score.depmap.sanger.ac.uk/downloads. Columns: chrom, start, end, sgRNA, and ModelIDs

OmicsGuideMutationsBinaryHumagne.csv

binary matrix indicating whether there are mutations in guide locations from the Humangne library. Humagne guide library can be accessed from Addgene (https://www.addgene.org/pooled-library/broadgpp-human-knockout-humagne/). Columns: chrom, start, end, sgRNA, and ModelIDs

OmicsGuideMutationsBinaryAvana.csv

binary matrix indicating whether there are mutations in guide locations from the Avana library. Avana guide library can be accessed from AvanaGuideMap.csv. Columns: chrom, start, end, sgRNA, and ModelIDs

Model.csv

Metadata describing all cancer models/cell lines which are referenced by a dataset contained within the DepMap portal. Columns:

ModelCondition.csv

The conditions under which the model was assayed. Columns: