The National Cancer Institute conducted the Biospecimen Pre-analytical Variables (BPV) study to determine the effects of formalin fixation and delay to fixation (DTF) on the analysis of nucleic acids. By performing whole transcriptome sequencing and small RNA profiling on matched snap-frozen and FFPE specimens exposed to different delays to fixation, this study aimed to determine acceptable delays to fixation and proper workflow for accurate and reliable Next-Generation Sequencing (NGS) analysis of FFPE specimens. In comparison to snap-freezing, formalin fixation changed the relative proportions of intronic/exonic/untranslated RNA captured by RNA-seq for most genes. The effects of DTF on NGS analysis were negligible. In 80% of specimens, a subset of RNAs was found to differ between snap-frozen and FFPE specimens in a consistent manner across tissue groups; this subset was unaffected in the remaining 20% of specimens. In contrast, miRNA expression was generally stable across various formalin fixation protocols, but displayed increased variability following a 12 h delay to fixation.
As Next-Generation Sequencing (NGS) technologies have become more affordable, reliable and powerful, they have been increasingly used to determine clinical diagnosis, to guide treatment and predict prognostic outcome (Reviewed in 1 ). Although traditionally NGS is conducted using high quality RNA such as that obtained from fresh or frozen specimens, FFPE specimens can also be analyzed by NGS and may be particularly desirable if existing morphological and protein expression data is available along with useful clinical data. Use of FFPE specimens may also allow for the study of archival specimens, as NGS has been shown to be successful in FFPE specimens after as many as 32 years of storage 2 . While RNA is affected by preanalytical handling and FFPE processing, which introduces differences in molecular data obtained with FFPE and frozen specimens, previous studies have shown that expression data from paired FFPE and frozen tissues by NGS can be strongly to very strongly correlated 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 . Notably, the expression of 1494 transcripts was found to differ between matched FFPE and frozen specimens 12 . Further, the concordance between FFPE and frozen specimens was affected by RNA quality 3 , 12 , which is known to be influenced by a number of preanalytical factors including post-mortem interval, delay to fixation (i.e. cold ischemia time), fixation duration, temperature, and storage conditions (Reviewed in 13 ). Variation in these preanalytical factors is typical in the clinical laboratories where FFPE biospecimens are generated, but how these factors affect NGS results is unclear.
Although a number of studies have investigated whether nucleic acids extracted from FFPE tissues can be analyzed using NGS platforms, few studies have addressed the effects of formalin-fixation and delayed fixation on the reliability of results of whole transcriptome sequencing and small RNA profiling using case-matched frozen tissues. The National Cancer Institute’s (NCI) Biorepositories and Biospecimen Research Branch (BBRB) developed the Biospecimen Pre-Analytical Variables research program (BPV) to address this and other potential pre-analytical effects of variable FFPE practices. As part of this program, biospecimens collected using strict SOPs were divided shortly after collection and pieces were snap-frozen or formalin-fixed after four different delays to formalin fixation (DTF) 14 . In this paper, we report on differences in whole transcriptome RNA sequencing (RNA-seq) and miRNA expression profiling that are attributable to formalin-fixation and DTF. This study provides guidance on best-practice methodologies for FFPE tissue handling and processing as well as analytical methods to enable the accurate and reliable detection of clinically relevant expression-related endpoints using NGS- based platforms.
Libraries obtained using Illumina’s TruSeq Total Gold RNA kit represented snap-frozen and FFPE specimens from all 30 tumors and included specimens subjected to a delay to fixation of 1, 2, 3, or 12 h (Supplemental Table 1 ). The vast majority of libraries were of sufficient quality to conduct 50b PE sequencing and analysis on Illumina HiSeq 2500 instrumentation with a read depth ranging from 50 M to 120 M clusters (Supplementary Table 2 ), and most specimens sequenced to a depth of ~70 M clusters. The complete dataset is available through dbGaP (phs001639). The RNA-Seq dataset was of very high quality with few outliers and a small set of specimens with noticeable DNA contamination. Only one of the contaminated specimens was deemed an outlier and excluded (specimen #4-4). One lower quality RNA-Seq library (specimen #51-4, colon 1 h DTF) also yielded a poor quality miRNA-Seq library. The main RNA-Seq study thus utilized 148 of the intended 150 specimens for detailed analysis. An overall summary of the gene expression profile via PCA representing all 150 specimens is provided in Supplementary Fig. 1 .
When comparing results from FFPE and matched frozen tissues, a consistent and dramatic shift in the proportion of reads corresponding to intronic/exonic/untranslated regions in FFPE specimens was detected. FFPE specimens had consistently higher amounts of intronic normalized fragments per thousand bases (NFPK) than matched snap-frozen samples. For a snap-frozen sample processed under the Total RNA protocol, approximately 50–60% of reads mapped to the transcriptome, and another 15–25% aligned to other intragenic regions including exon-intron chimeras (Fig. 1a ). However, all formalin-fixed samples displayed an inverse pattern, with only 15–35% of reads aligned to the transcriptome and 50–65% aligned to other intragenic regions (Fig. 1b ). Importantly, this finding is not isolated to a few genes, but instead occurred in the majority of genes. These findings may account for the systematic drops in transcriptome alignment 15 but not genome alignment when using FFPE material. Figure 1 Box plots showing the percentage of RNA-Seq reads that aligned to the transcriptome ( a ) vs. other intragenic alignment including exon-intron chimeras ( b ) in snap-frozen specimens (n = 30) and FFPE specimens fixed after 1 (n = 29), 2 (n = 30), 3 (N = 30) or 12 h (n = 29). The ratio of the percentage of sequences that mapped to the transcriptome vs. other intragenic sequence in FFPE specimens was the inverse of what was observed with snap-frozen specimens. The upper and lower extremes of the box correspond to the first (Q1) and third (Q3) quartiles and the whiskers show the range of the data up to 1.5 times the interquartile range (Q3-Q1). Data more extreme than the range of the whiskers are graphed as individual points. Specimens 4-4 (kidney 12 h DTF) and 51-4 (Colon 1 h DTF). Full size image RNA from a large subset of the DTF samples (82%) exhibited differential expression in reference to the matched snap-frozen control, regardless of the duration of the DTF (Fig. 2 ). Several hundred of these genes were consistently up-regulated in FFPE samples for each DTF time point in comparison to snap-frozen controls. Importantly, the up-regulated gene signature was observed in specimens from all tissue types examined, but it appeared to occur less frequently in colon specimens (24 of the 40 specimens) than other tissues (72 of the 80 kidney and ovary specimens). The magnitude of this increase did not appear to be affected by the DTF time point, but instead appeared to be attributable to fixation alone. Interestingly, when these genes were subjected to enrichment analysis, the pathways typically associated with stress or hypoxia were absent (Supplemental Table 3 ). Instead, DNA/chromatin packaging and nucleosome organization pathways were enriched. Figure 2 Heat map of median centered log2(RSEM +1) of nearly 1800 genes from FFPE specimens subjected to a 1, 2, 3, or 12 h delay-to-fixation (n = 30 for each timepoint) and case-matched snap-frozen controls (n-30). FFPE specimens displayed higher gene expression than case-matched snap-frozen controls for the majority of FFPE samples examined, regardless of delay-to-fixation. The color bar indicates the tissue/tumor of origin (red-Kidney/renal carcinoma, green-Ovarian/fallopian tube and peritoneal carcinoma, purple-Colon/colon adenocarcinoma). Interestingly, 24 FFPE samples show profiles similar to their snap-frozen control counterparts (indicated by the colored up arrows). Differences between the profiles of these 24 FFPE samples and the remaining FFPE samples were not associated with contributing medical institution, time-to-fixation, density of tumor cells, or even case. Specimens displaying DNA contamination (4-4, 59-4, 59-3, and 59-2) grouped together. Specimens 4-4 (highest level of DNA contamination) and 51-4 (lowest amount of usable material) are indicated by black arrows. The scale indicates log2 log2(RSEM +1) values. Full size image We compared lists of genes identified as differentially expressed between tumor/tissue types in snap-frozen specimens with lists generated for 1 or 12 h DTF FFPE specimens. Differential expression was defined as having a greater than 1.5-fold difference in expression between tissue/tumor types and a t-test significance of P < 0.001 (based on the Microarray Quality Control (MAQC) study 16 finding that use of fold-change in addition to P-values increased the reproducibility of genes lists). As shown in Table 1 , overlap of genes differentially expressed between tumor types among snap-frozen controls and the 1 hour and 12 hour DTF specimens exceeded 70% in all cases, regardless of the pipeline used for the differential analysis. As some variability is assumed due to variation in expression measurement, instability of variance […]
Click here to view original web page at Deleterious effects of formalin-fixation and delays to fixation on RNA and miRNA-Seq profiles