Uncategorized

MAPS: pathologist-level cell type annotation from tissue images through machine learning



Section 1: dataset acquisition

Ethical statement

Formalin-fixed paraffin-embedded (FFPE) excisional biopsies from 23 patients with newly diagnosed cHL, and one reactive lymph node were retrieved from the archives of Brigham and Women’s Hospital (Boston, MA) with institutional review board approval (IRB# 2010P002736) and patient wavier of consent. Sex and gender were not considered in the study design due to the proof-of-concept nature of this methodological study. All tumor regions were annotated by V.S. and S.J.R.

Antibody conjugation and panel

Lanthanides conjugated antibodies for MIBI were acquired as previously described19 using the Maxpar X8 Multimetal Labeling Kit (Fluidigm, 201300) and Ionpath Conjugation Kits (Ionpath, 600XXX) with slight modifications to manufacturer protocols. In short, 100 μg BSA-free antibody was first washed with the conjugation buffer, then reduced using 4 mumol L−1 (final concentration) of TCEP (Thermo Fisher Scientific, 77720) to reduce the thiol groups for 30 min in a 37 °C water bath. The reduced antibody was mixed and incubated with Lanthanide-loaded polymers for 90 min in a 37 °C water bath, then washed for 5 times with an Amicon Ultra filter (Millipore Sigma, UFC505096). Resulting conjugated antibodies were then buffered with at least 30% v/v Candor Antibody Stabilizer (Thermo Fisher Scientific, NC0414486) including 0.02% w/v of sodium azide, and stored at 4 °C until usage.

Oligo conjugation to antibodies for CODEX was performed as previously described13. In short, 100 μg BSA-free antibody was reduced using 2.5 mmol L−1 of TCEP at RT for 30 min to reduce the thiol groups. Maleimide-labeled oligos are resuspended in High-salt Buffer C (1 mol L−1 NaCl) and incubated with the reduced antibodies at RT for 2 h. The resulting conjugated antibodies are then washed for 3 times in high salt PBS (0.9 mol L−1 NaCl) in a 50 kDa centrifugal column (Sigma, UFC505096), buffered with at least 30% v/v Candor Antibody Stabilizer (Thermo Fisher Scientific, NC0414486) supplemented with 0.02% w/v of sodium azide, and stored at 4 °C.

The antibody panels can be found in Supplementary Table 1.

Gold slide preparation

The protocol of preparing gold slides has been described previously5,6,20. In short, Superfrost Plus glass slides (Thermo Fisher Scientific, 12-550-15) were first soaked and briefly supersonicated in a ddH2O diluted with dish detergent, cleaned by using Microfiber Cleaning Cloths (Care Touch, BD11945) then rinsed in flowing water to remove any remaining detergent. After that, the slides were air-dried with a constant stream of air in the fume hood. The coating of 30 nm of Tantalum followed by 100 nm of Gold was performed by the Microfab Shop of Stanford Nano Shared Facility (SNSF) and New Wave Thin Films (Newark, CA).

Coverslip and slides vectabonding

To introduce positive charges for better adhesion of tissue sections onto the surface, pre-cleaned 22×22 mm glass coverslips (VWR, 48366-067) or the e-beam coated gold slides were silanized by VECTABOND Reagent (Vector Labs, SP-1800-7) per the protocol from the manufacturer. The slides were first soaked in neat acetone for 5 min, then transferred into 1:50 diluted VECTABOND Reagent in acetone and incubated for 10 min. After that, slides were quickly dipped in ddH2O to quench and remove remaining reagents, then tapped on Kimwipe to remove remaining water, the resulting slides were air-dried then stored at room temperature.

MIBI retrieval and staining protocol

The procedure of a general MIBI staining is similar to previously described5,8,21. The FFPE block was sectioned onto Vectabond-treated gold slides by 5 μm thickness. The sections then went through a standard deparaffinization and antigen retrieval process. In brief, slides with FFPE sections were first baked in an oven (VWR, 10055-006) for 1 h at 70 °C, then were transferred into neat xylene and incubated for 2x 10 min. Standard deparaffinization was performed with a linear stainer (Leica Biosystems, ST4020) in the following sequence: 3x neat xylene, 3x 100% EtOH, 2x 95% EtOH, 1x 80% EtOH, 1x 70% EtOH, 3x ddH2O, 180 s each step with constant dipping, then rest in ddH2O. Antigen retrieval was then performed at 97 °C for 10 min with Target Retrieval Solution (Agilent, S236784-2) on a PT Module (Thermo Fisher Scientific, A80400012).

After PT Module processing, the cassette with slides and solution was left on the benchtop until it reached room temperature. After a quick 1x PBS rinse for 5 min, the sections were blocked by BBDG (5% NDS, 0.05% sodium azide in 1x TBS IHC wash buffer with Tween 20), then stained at 4 °C in an antibody cocktail for overnight (Supplementary Table 1). Subsequently, the samples were quickly rinsed with 1x PBS, then fixed by the Post-fixation buffer (4% PFA + 2% GA in 1x PBS buffer) for 10 min, then quenched with 100 mM Tris HCl pH 7.5, before undergoing a series of dehydration steps on the linear stainer (3x 100 mM Tris pH 7.5, 3x ddH2O, 1x 70% EtOH, 1x 80% EtOH, 2x 95% EtOH, 3x 100% EtOH, 60 s for each step), before store in a vacuum desiccator until acquisition.

CODEX retrieval and staining protocol

The procedure for CODEX staining is similar to previously described22. A cHL FFPE section was mounted on a No.1 glass coverslip pre-treated with VECTABOND Reagent (Vector laboratories, SP-1800-7) as described above, and deparaffinized by heating at 70 °C for 1 h, followed by two 15-min soaks in a xylene bath. The tissue was then manually rehydrated in 6-well plates by incubating in 2x 100% EtOH, 2x 95% EtOH, 1x 80% EtOH, 1x 70% EtOH, and 3x ddH2O, for 3 min each with gentle rocking. Heat-induced antigen retrieval (HIER) was performed in a coverslip jar containing 1x Dako pH 9 Antigen Retrieval Buffer (Agilent, S2375) while using a PT module filled with 1x PBS; the PT module was set to pre-warm to 75 °C, heat to 97 °C for 20 min, before cooling to 65 °C. After HIER, the tissue was washed in CODEX hydration buffer (Akoya Biosciences, 232105) 2x for 2 min and incubated in CODEX staining buffer (Akoya Biosciences, 232106) for 20 min. The tissue was then transferred to a humidity chamber to block with 200 μL of BBDG while being photobleached with a custom LED array for 2 h (see below), then stained at 4 °C in an antibody cocktail overnight.

The blocking buffer was prepared by combining 180 μL of BBDG block, 10 μL of oligo block, and 10 μL of sheared salmon sperm DNA. The BBDG block was prepared by mixing 5% donkey serum, 0.1% Triton X-100, and 0.05% sodium azide in 1x TBS IHC Wash buffer with Tween 20 (Cell Marque, 935B-09). The oligo block was prepared by mixing 57 different custom oligos (IDT) to create a master mix with a final concentration of 0.5 mumol L−1 per oligo. The sheared salmon sperm DNA was used directly from its original 10 mg/mL stock (ThermoFisher, AM9680). To create a humidity chamber, an empty pipette tip box was filled with ddH2O and wet paper towels and then placed on top of a cool box (Corning, 432021) containing an ice block. Two happy lights (Best Buy, 6460231) were leaned against either side of the humidity chamber, and an LED grow light (Amazon, B07C68N7PC) was positioned above. Staining antibodies (Supplementary Table 2) were prepared while blocking.

After overnight antibody staining, the tissue was washed 2x in CODEX staining buffer for 2 min each. Subsequently, it was fixed with 1.6% paraformaldehyde (PFA) with gentle rocking for 10 min; the PFA solution was made by diluting 16% PFA with CODEX storage buffer (Akoya Biosciences, 232107). The tissue was then washed 3x in 1x PBS, incubated in cold 100% methanol for 5 min on ice, and washed 3x with 1x PBS again. All steps except the methanol incubation were performed in 6 well plates with gentle rocking. The tissue was then fixed with CODEX final fixative for 20 min at RT in a humidity chamber; the final fixative was prepared by mixing 20 μL of CODEX final fixative (Akoya Biosciences, 232112) in 1000 μL of 1x PBS. Finally, the tissue was rinsed 3x in 1x PBS and stored in 1x PBS at 4° until CODEX image acquisition.

MIBI-TOF imaging

Datasets were acquired on a commercially available MIBIscopeTM System from Ionpath (Production) equipped with a Xenon ion source (Hyperion, Oregon Physics). The typical running parameters on instruments are listed as following:

  • Pixel dwell time: 2 ms

  • Pixel dwell time: 2 ms

  • Image area: 400 × 400 μm

  • Image size: 512 × 512 pixels

  • Probe size: 400 nm

  • Primary ion current: 4.9 nA on a builtin Faraday cup (or the “Fine” imaging mode)

  • Number of depths: 1 depth

After acquisition, images were extracted with the toffy package (toffy notebook 3b). Detailed pre-processing is mentioned in the sections below.

CODEX imaging

A black flat bottom 96-well plate (Corning, 07-200-762) was used for the reporter plate, where each well represented an imaging cycle. Each well was filled with 240 μL of plate master mix, containing DAPI nuclear stain (7000003, Akoya) (1:600) and CODEX assay reagent (Akoya Biosciences, 7000002) (0.5 mg/mL), as well as two fluorescent oligonucleotides (5 μL each) on the Cy3 and Cy5 channels. Blank channels were also included in the first and last wells, with plate master mix substituted for fluorescent oligonucleotides. The plate was then sealed with aluminum film and stored at 4 °C until CODEX image acquisition.

Prior to CODEX image acquisition, the tissue coverslip and reporter plate were placed into the CODEX microfluidics instrument. The coverslip was stained with 750 μL nuclear stain solution for 3 min before being washed by the fluidics device; the nuclear stain solution was prepared by mixing 1 μL of DAPI nuclear stain in 1500 μL of 1x CODEX buffer. CODEX imaging was operated under a 20x/0.75 objective (CFI Plan Apo λ, Nikon) mounted to an inverted fluorescence microscope (Keyence, BZ-X810) connected to the CODEX microfluidics instrument and CODEX driver software, and the DAPI stain was used to set up imaging areas and z planes. Each imaging cycle contained three channels – DAPI, Cy3, Cy5—and images taken on the first and last cycles were used as blanks for background correction. Multiplexed images were stitched and background corrected using the Singer software (v1.0.7) from Akoya.

Section 2: dataset pre-processing

Channel crosstalk removal

Similar to fluorescence imaging, mass-spectrometry imaging such as MIBI also has channel crosstalk due to the formation of adducts6 or isotopic impurity of the elemental labels used. Thus, Rosetta algorithm was applied to extracted raw images to remove noise from channel crosstalk in a manner similar to flow-cytometry data (toffy notebook 4a). In addition to that, as background signals from bare slides and organic fragments can be partially reflected by gold and “Noodle” background channels, those counts were also removed with a fine-tuned coefficient matrix along with channel crosstalk. This step was performed with a local implementation of toffy package with minor modification.

Image denoising

Image noise in multiplex images is a well-known issue caused by various factors such as instrumentation, tissue quality, and non-specific binding of antibodies. To tackle this challenge, a deep learning-based method is proposed that poses image denoising as a background-foreground segmentation problem. In this approach, the real signal is considered as foreground, while the noise is considered as background. The proposed method uses a supervised deep learning-based segmentation model, UNET23, to segment the foreground from the given image. To train the model, ground truth is generated using a semi-supervised kNN-based clustering method24. The kNN-based clustering method helps to generate reliable ground truth for the model training. Once the model is trained, it is applied to all markers in all images to obtain predicted foreground segmentation maps. These segmentation maps are then multiplied with the original images to get rid of noise and obtain clean images.

Cell segmentation

Cell segmentation of the MIBI cHL datasets was performed with a local implementation of deepcell-tf 0.6.0 as described11,25. Histone H3 channel was used for the nucleus, while the summation of HLA-DR, HLA1, Na-K-ATPase, CD45RA, CD11c, CD3, CD20, and CD68 was used as the membrane feature. Signals from these channels were first capped at the 99.7th percentile before input into the model.

Cell segmentation of the CODEX cHL dataset was performed using a local implementation of deepcell-tf 0.12.2. Segmentation was done using DAPI as the nuclear channel and a summation of CD4, CD7, CD15, CD30, CD11b, CD20, CD45RA, CD45RO, CD31, Podoplanin, and HLA-DR as the membrane features to ensure ideal segmentation of all cell types in the singular field of view.

The deepcell-tf version used to generate the final segmentation mask, along with the detailed parameters for cell segmentation are summarized in Supplementary Table 3.

Image intensity normalization

Due to instrumental limitation, the FOV that MIBI routinely acquired is only 400 × 400 μm size, stitching to achieve large tissue acquisition, and thus the across FOV difference is unavoidable. To compensate for the inter FOV difference, a set of scripts were developed and integrated into the data processing pipeline. Briefly, in a stitched run, the average Histone H3 counts under cell segmentation masks of each FOV were calculated, then, all FOVs Histone H3 counts were normalized towards the highest counts, while other channels were multiplied by the same coefficient. Additional flattening based on the Histone H3 counts were also used to avoid boundary effects and image biases. The code and parameters used are available in the analysis pipeline section.

Image to cell expression matrix and across-runs normalization

The counts of each channel inside each cell segmented mask were summed up and then divided by the cell size to create the cell expression matrix based on normalized stitched TIFs along with their segmentation mask. To avoid the across-runs derivation, the median value of per cell Histone H3 of each run was calculated, then all runs medians of Histone H3, along with all other channels counts were normalized towards the highest Histone H3 median value of that MIBI dataset. The code and parameters used are available in the analysis pipeline section.

Generation of cell phenotyping ground truth

Cell phenotyping on the cHL MIBI datasets was accomplished through an iterative clustering and annotating process. The clustering was performed with FlowSOM26 on the cHL 1 dataset and Leiden27 on the cHL 2 dataset. The cHL 1 dataset was initially clustered with CD11c, CD14, CD15, CD153, CD16, CD163, CD20, CD3, CD30, CD4, CD56, CD57, CD68, CD8, FoxP3, GATA3, Granzyme B, and Pax-5 to capture most of the cell types present in the data. The resulting clusters were then manually annotated by examining the predominantly enriched markers of each cluster, which was done by plotting Z-score and mean expression heatmaps across all clusters and the phenotypic markers used. Clusters with a clear enrichment pattern were annotated. Next, with Mantis Viewer28, the assigned annotation was confirmed by mapping the annotation to each cell and overlaying the raw images of the enriched markers for visual inspection. Due to noise in the data, there were certain clusters with unclear enrichment patterns. These clusters were assessed based on the phenotype marker enrichment patterns and subjected to further clustering and visual inspection. This interactive process was repeated until no useful information could be further extracted, and the remaining cells with no clear enrichment pattern were assigned as “Others”. For the cHL 1 dataset, 1538433 out of 1669853 cells (92.2%) were assigned a final annotation.

Cell phenotyping on the cHL CODEX dataset was performed through an iterative process using Rphenoannoy (R implementation of PhenoGraph) and FlowSOM26,27 to cluster on CD30, CD20, CD2, CD7, CD8, CD57, CD4, Granzyme B, CD56, FoxP3, CD11c, CD16, CD206, CD163, CD68, CD15, CD11b, Cytokeratin, Podoplanin, CD31, MCT, and a-SMA. The resulting stratified cell clusters and corresponding enriched phenotypic markers were then visualized with Z-score and mean expression heatmaps. Cells were then individually mapped back to the original tissue images in QuPath 0.2.0-m1 to validate marker enrichment. Clusters with clear enrichment patterns for a particular cell type were annotated accordingly. Clusters with unclear or partially correct enrichment patterns were further clustered using FlowSOM based on a curated subset of phenotypic markers present on these unclear populations. Multiple iterations of clustering and annotation were performed until signal-noise ratio was too low to confidently distinguish the phenotype of the remaining cells, which were assigned as “Others”. 140,053 out of 145,161 cells (96.5%) were assigned a final annotation.

All final annotations were assessed by S.J. and S.J.R (a board certified hematopathologist).

Section 3: datasets overview

Our study utilized five different datasets, including three in-house datasets, for cell phenotyping in cHL, DLBCL, and CRC. The cHL 1, cHL 2, and DLBCL18 datasets were acquired using Multiplexed Ion Beam Imaging (MIBI) and contained cells from 13, 12, and 9 different phenotypes, respectively. The cHL CODEX and CRC CODEX7 datasets were acquired using Co-detection by Indexing (CODEX) and contained cells from 16 and 14 different phenotypes, respectively. The datasets had varying numbers of cells, protein/functional markers, and levels of class imbalance, and were splitted into five-folds where four folds were used as training/validation (80%/20%) sets and the remaining fold was used as the test set, iteratively.

cHL 1 and cHL 2 (MIBI) dataset

The cHL 1 and cHL 2 (MIBI) Datasets are two in-house datasets used in our study for cell phenotyping in cHL. Both sets of samples were stained with the same batch of antibody cocktail (Supplementary Table 1) with 46 protein/functional markers, and acquired using Multiplexed Ion Beam Imaging (MIBI). cHL 1 Dataset contains 166,9853 cells from 18 cHL patients and 1 control rLN, while cHL 2 Dataset has over 230,895 cells from six FOVs—five from cHL patients and one from a control rLN. When training the proposed method, 5 markers from the cHL 1 dataset were dropped due to poor staining quality, while all 46 markers remained in the training set of cHL 2. To evaluate the performance of our proposed method, both datasets were split into 5 folds for multi-fold training and testing of the proposed method, and under both cases, the FOVs of the control cases were part of the training set in each fold.

cHL (CODEX) dataset

The cHL (CODEX) dataset is another in-house dataset that was acquired using Co-Detection by Indexing (CODEX), a multiplex imaging technique that allows for simultaneous detection of over 50 markers. The dataset consists of a single large FOV containing over 145,161 cells. The cells in the cHL (CODEX) dataset are classified into 16 different cell phenotypes, and each class has an average of 8000+ cells. The multiplex FOV in this dataset consists of 49 markers, which include different markers than those used in the cHL 1 (MIBI) and cHL 2 (MIBI) datasets (see Supplementary Table 2 for more details). To evaluate the performance of MAPS, we randomly split the cells in the cHL (CODEX) dataset into five folds using stratified sampling to ensure a balanced number of cells in each fold for each class.

CRC CODEX dataset

The CRC CODEX dataset (DOI: 10.17632/mpjzbtfgfr.1) is a public dataset that we used in our study to evaluate our proposed method for cell phenotyping7. It consists of 258,385 cells from 14 different classes, with a large variation in the number of cells per class, ranging from as low as 323 cells to as high as >47,000 cells. For our study, we used the same markers and classes as described in the CellSighter paper to ensure a fair head-to-head comparison with MAPS. Since there was no information available about the training and validation split in the dataset, we adopted the five-fold cross-validation approach.

DLBCL MIBI dataset

The DLBCL MIBI dataset used in this paper is from a previous publication with participation of some of coauthors18. It consists of 338,798 cells from 143 FOVs of DLBCL TMA cores of 51 patients, along with 8 FOVs from reactive lymph nodes. In the previous study, those cells were clustered into 9 types by the 10 lineage-associated markers out of the 22-plex image deck. To evaluate the performance of our proposed method, the dataset was split into 5 folds for multi-fold training and testing where FOVs of the control cases were part of the training set in each fold.

Section 4: MAPS model, training and evaluation

Model architecture

The proposed cell phenotyping method used a feed-forward neural network to predict the cell class from a set of predefined classes (K). Let \(x\in {{\mathbb{R}}}^{N+1}\) be the input data, which consists of the expression of a cell for N markers and its area in pixels. The neural network processes this input data to generate a predicted cell class y. The neural network used in the proposed method consists of four fully connected hidden layers, denoted by h1, h2, h3, and h4. Each hidden layer is followed by a ReLU activation function and a dropout layer, denoted by g1, g2, g3, and g4. The output of the last hidden layer, h4, is fed into the classification layer, which generates the predicted cell class y. The classification layer uses a softmax function to convert the output of the neural network into a probability distribution over the predefined classes. Let Wi and bi denote the weights and biases of the ith layer of the neural network, respectively. Then the output hi of the ith hidden layer can be written as:

$${h}_{i}={g}_{i}({W}_{i}{h}_{i-1}+{b}_{i})$$

(1)

where \({h}_{i-1}\in {{\mathbb{R}}}^{512}\) is the output of the (i−1)th hidden layer or the input x for i = 1, and gi is the activation function for the ith layer, which is the ReLU function in this case. The dropout layers are not included in this equation, as they only modify the output of the hidden layers during training, and do not affect the final output of the neural network. The classification layer computes the predicted cell class y as follows:

$$y=\mathop{{{{{{{{\rm{argmax}}}}}}}}}\limits_{k}\,{{{{{{{\rm{softmax}}}}}}}}{({W}_{c}{h}_{4}+{b}_{c})}_{k}$$

(2)

where Wc and bc are the weights and biases of the classification layer, and \({{{{{{{\rm{softmax}}}}}}}}\) is the softmax function that converts the kth output into a probability distribution over the predefined classes (K). The predicted cell class y is the class with the highest probability.

Training details

For the training of the proposed method, a batch size of 128 and a dropout probability of 0.25 were used for all datasets. The number of training epochs varied for each dataset due to the varying sizes of the datasets. For larger datasets (cHL 1 MIBI), the number of epochs is set lower, as more training steps are performed within each epoch. Conversely, for smaller datasets (cHL 2 MIBI and cHL CODEX), we utilize a higher number of epochs to ensure an adequate number of training steps. Specifically, the model was trained for 100 epochs on the cHL 1 (MIBI) dataset, and for 500 epochs on all other datasets. Additionally, we implement two essential hyperparameters, namely minimum epoch and patience, specifically designed to address the issue of overfitting. The minimum epoch ensures that the model undergoes a minimum number of training epochs, and the patience parameter enables early stopping if the validation performance does not improve, thus mitigating the risk of overfitting. The model with lowest validation loss was selected as the best model for evaluation on test sets and inference on new data.

Section 5: evaluation across methods

To evaluate the performance of the proposed method, we employed several evaluation methods. Firstly, we used the confusion matrix to visualize the performance of the model. The confusion matrix displays the number of true positive, false positive, true negative, and false negative predictions made by the model. From the confusion matrix, we calculated the precision, recall, and F1-score metrics. Precision measures the proportion of true positive predictions made by the model out of all the positive predictions made, while recall measures the proportion of true positive predictions made out of all the actual positive instances in the dataset. The F1-score is the harmonic mean of precision and recall and is a balanced measure of both metrics.

Additionally, we used the average precision metric, which measures the area under the precision-recall curve. This metric is particularly useful for imbalanced datasets, where there are more negative instances than positive ones. The average precision metric takes into account the precision and recall values at various thresholds and provides a summary of the model’s overall performance.

Finally, we also used the mean cell expression matrix to visualize the expression levels of different markers in the different cell types predicted by the model. This matrix provides a summary of the mean expression levels of each marker in each cell type and can help to identify differences in marker expression between different cell types when compared with the cell expression matrix generated using ground truth labels.

Comparisons with other methods

We compared our proposed method with two existing cell phenotyping methods, namely ASTIR and CellSighter. The code for both ASTIR and CellSighter methods is publicly available for reproducibility and comparison purposes.

ASTIR

ASTIR is a probabilistic model for cell phenotyping that uses deep recognition neural networks to predict cell types without requiring labels for each cell14. Instead, ASTIR only requires a list of protein markers for each expected cell type within a dataset. The method is based on the assumption that each cell type can be characterized by a unique combination of protein markers, and that the expression levels of these markers can be used to classify cells into their respective types. We reported results of the ASTIR method on three in-house datasets. For each dataset, our experts defined the list of protein markers for each cell type. We evaluated the results using five-fold cross-validation, using exactly the same folds as in the proposed method, for a fair head-to-head comparison.

CellSighter

The CellSighter is a deep learning based supervised cell classification method16. Unlike ASTIR and the proposed method which works on cell expression matrices, CellSighter takes image, cell segmentation mask, and cell to class mapping as input. To evaluate the performance of CellSighter, we re-trained it on the same three in-house datasets using the same 5-fold cross validation splits as in the proposed method. This ensures a fair comparison between the methods. We obtained the CellSighter results on the publicly available CRC CODEX dataset from the paper to avoid any re-training bias while comparing it with the MAPS results on the same dataset.

Computation resource evaluation across methods

To evaluate the computation resource usage of each method, we ran the three methods on a Linux platform (2x Intel Xeon 6334 “Ice Lake-SP” 3.6 GHz 8-core 10 nm CPUs; 4x NVIDIA “Ampere” RTX A5000 PCI-E+NVLink 24GB ECC GPU Accelerator/Graphics Cards; 1TB DDR4 memory @ 3200 MHz) using the cHL (CODEX) dataset. During model training and cell type inference of each method, we tracked their CPU, GPU, and memory (RAM) usage using “top”, “ps -ef”, and “nvidia-smi” commands. For the parallel methods, we recorded the resource usage of all its processes and multiplied it by the number of cores used in parallel.

Statistics and reproducibility

We employed data from every accessible sample within each dataset and no statistical method was used to predetermine sample size. Furthermore, in our study design, we incorporated all available markers from in-house datasets, selectively choosing markers from public datasets to harmonize with prior research. Similarly, we restricted our analysis to cell types utilized in previous studies for public datasets. However, for in-house datasets, we only excluded cell types with insufficient sample count. For each dataset, we randomly partitioned the data into five folds for cross-validation. The code, featuring a fixed seed value, is provided to ensure the reproducibility of our results.

Data visualization

Single channel and multi-color images were assembled and visually inspected with either ImageJ29, Qupath30, and Mantis Viewer28. Visualizations of the analysis results were either produced using Excel, or R packages “ggplot2” and “pheatmap”.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *