This DepMap release contains data from CRISPR knockout screens from project Achilles, as well as genomic characterization data from the CCLE project.
This Achilles dataset contains the results of genome-scale CRISPR knockout screens for Achilles (using Avana Cas9 and Humagne-CD Cas12 libraries) and Achilles combined with Sanger’s Project SCORE (KY Cas9 library) screens. The dataset was processed using the following steps:
DepMap expression data is quantified from RNAseq files using the GTEx pipelines. A detailed description of the pipelines and tool versions can be found here: https://github.com/broadinstitute/ccle_processing#rnaseq. We provide a subset of the data files outputted from this pipeline available on FireCloud. These are aligned to hg38.
DepMap WES copy number data is generated by running the GATK copy number pipeline aligned to hg38. Tutorials and descriptions of this method can be found here https://software.broadinstitute.org/gatk/documentation/article?id=11682, https://software.broadinstitute.org/gatk/documentation/article?id=11683. WES samples have been realigned to hg38 and run through this pipeline.
DepMap mutation calls are generated using Mutect2 and annotated and filtered downstream. Variants are aligned to hg38. Detailed documentation can be found here https://storage.googleapis.com/shared-portal-files/Tools/%5BDMC%20Communication%5D%2022Q4%20Mutation%20Pipeline%20Update.pdf.
DepMap generates RNAseq based fusion calls using the STAR-Fusion pipeline. A comprehensive overview of how the STAR-Fusion pipeline works can be found here: https://github.com/STAR-Fusion/STAR-Fusion/wiki. We run STAR-Fusion version 1.6.0 using the plug-n-play resources available in the STAR-Fusion docs for gencode v29. We run the fusion calling with default parameters except we add the –no_annotation_filter and –min_FFPM 0 arguments to prevent filtering.
###########
###########
Description of all files contained in this release
Pipeline: Achilles
Post-Chronos Gene effect estimates for all screens Chronos processed by library, copy number corrected, scaled, screen quality corrected then concatenated.
Pipeline: Achilles
Post-Chronos Gene effect estimates for all screens Chronos processed by library, then concatenated. No copy number correction or scaling.
Pipeline: Achilles
Post-Chronos LFC collapsed by mean of sequences and median of guides, computed per library-screen type then concatenated.
Pipeline: Achilles
Post-Chronos Gene dependency probability estimates for all screens Chronos processed by library-screen type, then concatenated.
Pipeline: Achilles
Post-Chronos Map from ModelID to all ScreenIDs combined to make up a given model’s data in the CRISPRGeneEffect matrix. Columns: - ModelID - ScreenID
Pipeline: Achilles
Post-Chronos Gene effect estimates for all models, integrated using Chronos. Copy number corrected, scaled, and screen quality corrected.
Pipeline: Achilles
Post-Chronos Gene effect estimates for all models, integrated using Chronos. No copy number correction or scaling.
Pipeline: Achilles
Post-Chronos Gene dependency probability estimates for all models in the integrated gene effect.
Pipeline: Achilles
Post-Chronos The estimates for the efficacies of all reagents in the different libraries-screen types, computed from the Chronos runs. Columns: - sgRNA - Efficacy
Pipeline: Achilles
Post-Chronos The estimates for the library batch effects identified by Chronos.
Pipeline: Achilles
Post-Chronos Estimated log fold pDNA error for each sgrna in each library, as identified by Chronos.
Pipeline: Achilles
Post-Chronos The estimates for the growth rate of all models in the different libraries-screen types, computed from the Chronos runs. Columns: - ScreenID - Achilles-Avana-2D - Achilles-Humagne-CD-2D - Achilles-Humagne-CD-3D - Project-Score-KY
Pipeline: Achilles
Post-Chronos The estimates for the efficacy of all models in the different libraries-screen types, computed from the Chronos runs. Columns: - ModelID - Achilles-Avana-2D - Achilles-Humagne-CD-2D - Achilles-Humagne-CD-3D - Project-Score-KY
Pipeline: Achilles
Post-Chronos List of genes identified as dependencies across all lines. Each entry is separated by a newline.
Pipeline: Achilles
Post-Chronos The log fold error in pDNA estimated by Chronos, per library
Pipeline: Achilles
Post-Chronos List of genes identified as dependencies across all lines. Each entry is separated by a newline.
Pipeline: Achilles
Pre-Chronos file
List of genes used as positive controls, intersection of Biomen (2014) and Hart (2015) essentials. Each entry is separated by a newline. The scores of these genes are used as the dependent distribution for inferring dependency probability.
Pipeline: Achilles
Pre-Chronos file
List of genes used as negative controls (Hart (2014) nonessentials). Each entry is separated by a newline.
Pipeline: Achilles
Pre-Chronos file Summed guide-level read counts for each sequence screened with the Avana Cas9 library. -Columns: SequenceID - Rows: sgRNA
Pipeline: Achilles
Pre-Chronos file Summed guide-level read counts for each sequence screened with the Humagne-CD Cas12 library. -Columns: SequenceID - Rows: sgRNA
Pipeline: Achilles
Pre-Chronos file Summed guide-level read counts for each sequence screened with the Sanger’s KY Cas9 library. -Columns: SequenceID - Rows: sgRNA
Pipeline: Achilles
Pre-Chronos file Log2-fold-change from pDNA counts for each sequence screened with the Avana Cas9 Library. -Columns: SequenceID - Rows: sgRNA
Pipeline: Achilles
Pre-Chronos file Log2-fold-change from pDNA counts for each sequence screened with the Humagne-CD Library. -Columns: SequenceID - Rows: sgRNA
Pipeline: Achilles
Pre-Chronos file Log2-fold-change from pDNA counts for each sequence screened with the Sanger’s KY Cas9 Library. -Columns: SequenceID - Rows: sgRNA
Pipeline: Achilles
Pre-Chronos file Mapping of sgRNAs to Genes in the Avana Cas9 library. Columns: - sgRNA: guide in vector - GenomeAlignment: alignment to hg38 - Gene: HUGO (entrez) - nAlignments: total number of alignments for a given sgRNA - UsedByChronos: boolean indicating if sgRNA was included in Chronos analysis - DropReason: why a guide was removed prior to Chronos analysis
Pipeline: Achilles
Pre-Chronos file
Mapping of sgRNAs to Genes in the Humagne-CD Cas12 library.
Columns:
Pipeline: Achilles
Pre-Chronos file Mapping of sgRNAs to Genes in the Sanger’s KY Cas9 library. Columns: - sgRNA: guide in vector - GenomeAlignment: alignment to hg38 - Gene: HUGO (entrez) - nAlignments: total number of alignments for a given sgRNA - UsedByChronos: boolean indicating if sgRNA was included in Chronos analysis - DropReason: why a guide was removed prior to Chronos analysis
Pipeline: Achilles
Pre-Chronos file Mapping of SequenceIDs to ScreenID and related info. Columns: - SequenceID - ScreenID - ModelConditionID - ModelID - ScreenType: 2DS = 2D standard - Library - Days - pDNABatch - PassesQC - ExcludeFromCRISPRCombined
Pipeline: Achilles
Pre-Chronos file
Screen-level quality control metrics.
Columns:
Pipeline: Achilles
Pre-Chronos file
Sequence-level quality control metrics.
Columns:
Pipeline: Achilles
Pre-Chronos file List of genes with variable gene effects across models and libraries. Used for sequence correlation in QC.
Pipeline: Expression
RNAseq read count data from RSEM.
Pipeline: Expression
Gene expression TPM values of the protein coding genes for DepMap cell lines. Values are inferred from RNA-seq data using the RSEM tool and are reported after log2 transformation, using a pseudo-count of 1; log2(TPM+1). Additional RNA-seq-based expression measurements are available for download as part of the full DepMap Data Release More information on the DepMap Omics processing pipeline is available at https://github.com/broadinstitute/depmap_omics.
Pipeline: Expression
RNAseq read count data from RSEM.
Pipeline: Expression
RNAseq transcript tpm data using RSEM. Log2 transformed, using a pseudo-count of 1.
Pipeline: Expression
Gene expression TPM values of all genes for DepMap cell lines. Values are inferred from RNA-seq data using the RSEM tool and are reported after log2 transformation, using a pseudo-count of 1; log2(TPM+1). Additional RNA-seq-based expression measurements are available for download as part of the full DepMap Data Release More information on the DepMap Omics processing pipeline is available at https://github.com/broadinstitute/depmap_omics.
Pipeline: Expression
Gene effective length for all genes output from RSEM
Pipeline: Copy number
Segment level copy number data
Pipeline: Copy number
Gene-level copy number data that is log2 transformed with a pseudo-count of 1; log2(CN ratio + 1) . Inferred from WGS, WES or SNP array depending on the availability of the data type. Values are calculated by mapping genes onto the segment level calls and computing a weighted average along the genomic coordinate. Genes that overlap with segmental duplication regions and/or flagged by repeatMasker are masked in this matrix. For details see https://github.com/broadinstitute/depmap_omics/blob/master/docs/source/dna.md#masking . Additional copy number datasets are available for download as part of the full DepMap Data Release. More information on the DepMap Omics processing pipeline is available at https://github.com/broadinstitute/depmap_omics.
Pipeline: Fusions
Gene fusion data derived from RNAseq data. Data is filtered using by performing the following:
Pipeline: Fusions
Gene fusion data derived from RNAseq data. Data is unfiltered. Column descriptions can be found in the STAR-Fusion wiki. Samples are identified by Profile IDs.
Pipeline: Mutations
MAF-like file containing information on all the somatic point mutations and indels called in the DepMap cell lines. The calls are generated from Mutect2.
Additional processed mutation matrices containing genotyped mutation calls are available for download as part of the full DepMap Data Release.
Columns:
For details, see https://storage.googleapis.com/shared-portal-files/Tools/23Q4_Mutation_Pipeline_Documentation.pdf
Pipeline: Mutations
MAF-like formatted file containing information on all the somatic point mutations and indels called in the DepMap cell lines. The calls are generated from Mutect2.
Additional processed mutation matrices containing genotyped mutation calls are available for download as part of the full DepMap Data Release.
Samples are identified by Profile IDs.
Columns:
For details, see https://storage.googleapis.com/shared-portal-files/Tools/23Q4_Mutation_Pipeline_Documentation.pdf
Pipeline: Mutations
MAF file containing information on all the somatic point mutations and indels called in the DepMap cell lines. The calls are generated from Mutect2. A description of the various columns is in the DepMap Release README file. Additional processed mutation matrices containing genotyped mutation calls are available for download as part of the full DepMap Data Release. This file contains the same variants as OmicsSomaticMutationsProfile.csv, but follows the standard MAF format suitable for downstream analysis tools such as maftools. Samples are identified by Profile IDs.
Pipeline: Mutations
Genotyped matrix determining for each cell line whether each gene has at least one hot spot mutation. A variant is considered a hot spot if it’s present in one of the following: Hess et al. 2019 paper, OncoKB hotspot, COSMIC mutation significance tier 1. (0 == no mutation; If there is one or more hot spot mutations in the same gene for the same cell line, the allele frequencies are summed, and if the sum is greater than 0.95, a value of 2 is assigned and if not, a value of 1 is assigned.)
Pipeline: Mutations
Genotyped matrix determining for each cell line whether each gene has at least one damaging mutation. A variant is considered a damaging mutation if LikelyLoF == True. (0 == no mutation; If there is one or more damaging mutations in the same gene for the same cell line, the allele frequencies are summed, and if the sum is greater than 0.95, a value of 2 is assigned and if not, a value of 1 is assigned.)
Pipeline: Expression
Single Sample Gene Set Enrichment Analysis Scores were calculated using Single Sample Gene Set Enrichment Analysis based on the z-scores of the log2(tpm + 1) of gene-level expression data. Details about the R script used to run the analysis can be found here: https://github.com/broadinstitute/ssGSEA2.0
Pipeline: Expression
Single Sample Gene Set Enrichment Analysis Scores were calculated using Single Sample Gene Set Enrichment Analysis based on the z-scores of the log2(tpm + 1) of gene-level expression data. Details about the R script used to run the analysis can be found here: https://github.com/broadinstitute/ssGSEA2.0
Profile ID mapping information Columns: ProfileID, ModelID, ModelConditionID, Datatype, and WESKit Rows: ProfileID
indicates which profiles are selected in the model-level datasets Columns: ModelID, ProfileID, and ProfileType (dna/rna)
indicates which profiles are selected for each model condition in Achilles postprocessing Columns: ModelConditionID, ProfileID, and ProfileType (dna/rna)
binary matrix indicating whether there are mutations in guide locations from the KY library. KY guide library (same as the one used in project Score) can be accessed from https://score.depmap.sanger.ac.uk/downloads. Columns: chrom, start, end, sgRNA, and ModelIDs
binary matrix indicating whether there are mutations in guide locations from the Humangne library. Humagne guide library can be accessed from Addgene (https://www.addgene.org/pooled-library/broadgpp-human-knockout-humagne/). Columns: chrom, start, end, sgRNA, and ModelIDs
binary matrix indicating whether there are mutations in guide locations from the Avana library. Avana guide library can be accessed from AvanaGuideMap.csv. Columns: chrom, start, end, sgRNA, and ModelIDs
Metadata describing all cancer models/cell lines which are referenced by a dataset contained within the DepMap portal. Columns:
The conditions under which the model was assayed. Columns: