Preferential expression of potential markers for cancer stem cells in large cell neuroendocrine carcinoma of the lung. An FFPE proteomic study

Background Large cell neuroendocrine carcinoma (LCNEC) of the lung, a subtype of large cell carcinoma (LCC), is characterized by neuroendocrine differentiation that small cell lung carcinoma (SCLC) shares. Pre-therapeutic histological distinction between LCNEC and SCLC has so far been problematic, leading to adverse clinical outcome. We started a project establishing protein targets characteristic of LCNEC with a proteomic method using formalin fixed paraffin-embedded (FFPE) tissues, which will help make diagnosis convincing. Methods Cancer cells were collected by laser microdissection from cancer foci in FFPE tissues of LCNEC (n = 4), SCLC (n = 5), and LCC (n = 5) with definite histological diagnosis. Proteins were extracted from the harvested sections, trypsin-digested, and subjected to HPLC/mass spectrometry. Proteins identified by database search were semi-quantified by spectral counting and statistically sorted by pair-wise G-statistics. The results were immunohistochemically verified using a total of 10 cases for each group to confirm proteomic results. Results A total of 1981 proteins identified from the three cancer groups were subjected to pair-wise G-test under p < 0.05 and specificity of a protein's expression to LCNEC was checked using a 3D plot with the coordinates comprising G-statistic values for every two group comparisons. We identified four protein candidates preferentially expressed in LCNEC compared with SCLC with convincingly low p-values: aldehyde dehydrogenase 1 family member A1 (AL1A1) (p = 6.1 × 10-4), aldo-keto reductase family 1 members C1 (AK1C1) (p = 9.6x10-10) and C3 (AK1C3) (p = 3.9x10-10) and CD44 antigen (p = 0.021). These p-values were confirmed by non-parametric exact inference tests. Interestingly, all these candidates would belong to cancer stem cell markers. Immunohistochmistry supported proteomic results. Conclusions These results suggest that candidate biomarkers of LCNEC were related to cancer stem cells and this proteomic approach via FFPE samples was effective to detect them.


Introduction
Lung cancer is the leading cause of cancer-related death worldwide [1]. In Japan, annual deaths from lung cancer have been increasing and reached about 70,000 [2] and in USA reached 160,000 even with a recent decreasing trend [3]. Generally, lung cancer is divided into two histological subgroups, non-small cell lung carcinoma (NSCLC) and small cell lung carcinoma (SCLC). NSCLC mainly consists of adenocarcinoma (AC), squamous cell carcinoma (SC) and large cell carcinoma (LCC). AC and SC are differentiated with the features of normal cells but LCC is undifferentiated without such features. The prognosis of lung cancer depends on pathological stages and histological types; in NSCLC, AC is the best, while LCC the worst [4].
Travis et al. [5] proposed a new subtype of LCC, named large cell neuroendocrine carcinoma (LCNEC) in 1991, and the World Health Organization finally adopted it for the revised pathological classification of lung cancer in 1999. LCNEC exhibits morphology similar to LCC but neuroendocrine differentiation like SCLC that could be judged by expression of at least one of three representative neuroendocrine proteins, CD56, synaptophysin (Syn) and chromogranin A (CGA). Among subtypes of LCC, the prognosis of LCNEC was poorer than others even if at early stages [6,7] like SCLC. However therapeutic strategies of LCNEC and SCLC differ from each other. The former needs surgery as the first choice but the latter chemotherapy. It is therefore important to distinguish LCNEC from SCLC definitely but common morphological growth patterns characteristic of neuroendocrine tumors sometimes hinder clear pathologic distinction between the two neuroendocrine cancers.
It follows that new biomarkers should be developed for definite diagnosis of those cancers, even if histopathology has long been the golden standard for diagnosis and determination of disease progression. Genomic and immunohistochemical analyses for such a purpose have been reported [8,9] but there have still been no biomarkers specific to LCNEC. Recent advancements in shotgun sequencing and quantitative mass spectrometry for protein analyses could make proteomics amenable to clinical biomarker discovery [10]. In addition, selective collection of target cells from formalin fixed paraffin embedded (FFPE) tissues by laser microdissection can permit to access to tissues of a variety of cancer types with definite diagnosis. We have used these methods for exploring stage-related proteins on non-metastatic lung AC by both global and multiple reaction monitoring (MRM) mass spectrometrybased proteomics [11,12]. In this study, we applied them to detect the potential protein markers characteristic of LCNEC by label-free semi-quantitative shotgun proteomics using spectral counting.

1. Sample Preparation for FFPE Tissue Specimens
Surgically removed lung tissues were fixed with a buffered formalin solution containing 10-15% methanol, and embedded by a conventional method. Archived paraffin blocks of formalin-fixed tissues obtained from four LCNEC cases, five LCC and five SCLC, which were retrieved with the approval from Ethical Committee of Tokyo Medical University Hospital and used with patients' consents. Patients' characteristics are listed in Table 1. Paraffin blocks were cut into 4 μm sections for diagnosis and 10 μm sections for proteomics. The 10 μm sections were stained with only haematoxylin. Three pathologists (M.N., H.O., and T.N.) independently made a diagnosis using the 4 μm sections stained with haematoxylin and eosin according to the WHO classification. LCNEC has its characteristic cancer cells with relatively larger cytoplasm, less fine chromatin and more distinct nucleoli than those of SCLC. The sections of patients diagnosed unequivocally were used in this study.

Immunohistochemical Staining
The neuroendocrine nature of tumors was confirmed with the three representative antibodies, monoclonal mouse anti CD56 antibody (Novocastra, Newcastle upon Tyne, U.K.), polyclonal rabbit anti CGA antibody (DAKO Japan, Kyoto, Japan) and monoclonal mouse anti SYN antibody (DAKO Japan, Kyoto, Japan). The staining of these antibodies was performed automatically on a Ventana Benchmark ® XT (Ventana Japan, Tokyo, Japan). Expression of four proteomics-identifying proteins specific to LCNEC was tested with the following commercially available antibodies according to the manufacturer's protocols: monoclonal rabbit anti AL1A1 antibody (Abcom Japan, Tokyo, Japan), polyclonal anti AK1C1 antibody (GeneTex, Irvine, CA, USA), monoclonal anti AK1C3 antibody (Sigma Japan, Tokyo, Japan) and monoclonal mouse anti CD44 antibody (Abcom Japan, Tokyo, Japan). Briefly, sections were incubated with xylene, rehydrated with graded ethanol solutions and incubated with methyl alcohol containing 3% hydrogen peroxide to remove endogenous peroxidase activity. After washing thoroughly with PBS, sections were incubated with adequately diluted primary antibodies and then with Histofine simple stain ® (Nichirei Bioscience, Tokyo, Japan), and finally visualized with products of the peroxidase and diaminobenzidien reaction.

3. Laser Capture and Protein Solubilization
Cancerous lesions were identified on serial sections of NSCLC tissues stained with hematoxylin-eosin (HE). For proteomic analysis, a 10 μm thick section prepared from the same tissue block was attached onto DIRECTOR™ slides (Expression Pathology, Rockville, MD, USA), deparaffinized twice with xylene for 5 min., rehydrated with graded ethanol solutions and distilled water and stained by only hematoxylin. Those slides were air-dried and subjected to laser microdissection with a Leica LMD6000 (Leica Micro-systems GmbH, Ernst-Leitz-Strasse, Wetzlar, Germany). At least 30,000 cells (8.0mm 2 ) were collected directly into a 1.5mL low-binding plastic tube. Proteins were extracted and digested with trypsin using Liquid Tissue™ MS Protein Prep kits (Expression Pathology, Rockville, MD, USA) according to the manufacturer's protocol.

4. Liquid Chromatography-Tandem Mass Spectrometry
We here adopted label-free semi-quantitation using spectral counting by liquid chromatography (LC)-tandem mass spectrometry (MS/MS) to a global proteomic analysis. The digested samples were analyzed in triplicates by LC-MS/ MS using reversed-phase liquid chromatography (RP-LC) interfaced with a LTQ-Orbitrap hybrid mass spectrometer (Thermo Fisher Scientific, Bremen, Germany) via a nanoelectrospray device as described in details previously [13]. Briefly, the RP-LC system consisted of a peptide Cap-Trap cartridge (0.5 × 2.0 mm) and a capillary separation column (an L-column Micro of 0.2 × 150 mm packed with reverse phase L-C18 gels of 3 μm in diameter and 12 nm pore size, (CERI, Tokyo, Japan)) connected an emitter tip (For-tisTip of 20 μm ID and 150 μm OD with a perfluoropolymer-coated blunt end, OmniSeparo-TJ, Hyogo, Japan) to the outlet. An autosampler (HTC-PAL, CTC Analytics, Switzerland) loaded an aliquot of samples onto the trap, which then was washed with solvent A (98% distilled water with 2% acetonitrile and 0.1% formic acid) for concentrating peptides on the trap and desalting. Subsequently, the trap was connected in series to the separation column, and the whole columns were developed for 70 min. with a linear acetonitrile concentration gradient made from 5 to 40% solvent B (10% distilled water and 90% acetonitrile containing 0.1% formic acid) at the flowrate of 1 μL/min. An LTQ was operated in the datadependent MS/MS mode to automatically acquire up to three successive MS/MS scans in the centroid mode. The three most intense precursor ions for these MS/MS scans could be selected from a high-resolution MS spectrum (a survey scan) that an Orbitrap previously acquired during a predefined short time window in the profile mode at the resolution of 30 000 in the m/z range of 400 to 1600. The sets of acquired high-resolution MS and MS/MS spectra for peptides were converted to single data files and they were merged into Mascot generic format files for database searching.

Database Searching and Semi-quantification with Spectral Counting
Mascot software (version 2.1.1, Matrix Science, London, UK) was used for database search against Homo sapiens entries in the UniProtKB/Swiss-Prot database (Release 56.6, 20413 entries). Peptide mass tolerance was 10ppm, fragment mass tolerance 0.8Da, and up to two missed cleavages were allowed for errors in trypsin specificity. Carbamidomethylation of cysteines was taken as fixed modifications, and methionine oxidation and formylation of lysine, arginine and N-terminal amino acids as variable modifications. A p-value being < 0.05 was considered significant, and the score cutoff was 44. The lists of identified proteins were merged into a master file where the primary accession numbers and entry names from UniProtKB were used. The false positive rates for protein identification were estimated using a decoy database created by reversing the protein sequences in the original database; the estimated false positive rate of peptide matches was 0.45% under protein score threshold conditions (p < 0.005). Mascot search results were processed through Scaffold software (version 2.02.03, Proteome Software, Portland, OR) to semi-quantitatively analyze differential expression levels of proteins in LCNEC, LCC and SCLC by spectral counting as described [11]. The number of peptide MS/MS spectra with high confidence (Mascot ion score, p < 0.005) was used for calculating spectral counts. Fold changes of expressed proteins in the base 2 logarithmic scale (R SC ) were calculated using spectral counting as described [11]. Candidate proteins between two groups were chosen so that their R SC satisfy >1 or <−1, which correspond to their fold changes >2 or <0.5. G-test was used for evaluating differential protein expression in pair-wise cancer groups [14]. In this study we mainly focus on LCNEC vs. SCLC comparison, but the other pairs were considered. The results are illustrated in a three-dimensional plot to judge whether a protein is specifically expressed in a given cancer group.
Although G-test does not require replicates, spectral counts for each protein from triplicates were pooled and used for G-statistic calculation using a two-way contingency table arranged in two rows for a target protein and any other proteins, and two columns for cancer groups on an Excel macro. Statistical significance should be p < 0.05. The Yates correction for continuity is applied to the 2 × 2 tables. The correction could enable us to handle the data containing small spectral counts including zero. Statisticians, however, showed that the results of G-test using a contingency table containing small counts are not so convincing because it is assumed that the G statistic asymptotically obey a χ 2 distribution with one degree of freedom. To validate the G-test results, we calculated exact p-values for some significant proteins without making any assumptions of statistical distribution based on the permutational distribution of the test statistic, i.e., Fisher's exact test and Mann-Whitney U test for the contingency tables using a R package.

1. Patient groups and pathological classification
To explore protein markers to distinguish LCNEC from SCLC, we investigated cancer cells prepared by laser microdissection from FFPE sections of LCNEC, SCLC, and LCC with a shotgun proteomic method. The LCNEC group consisted of four independent patients and other two groups consisted of five independent ones. For immunohistochemistry, we added more patients so as to amount to 10 patients for each group. Patients were divided into those cancer groups according to the WHO classification and by immunohistochemistry with antibodies raised against established neuroendocrine markers, CD56, CGA and Syn (Table 1 and Figure 1). All LCNEC and SCLC tissues used in this study are positively stained with at least one of these antibodies consistent with the neuroendocrine nature of those cancers. LCC tissues were not stained immunohistochemically except for 2 cases with faintly positive for Syn but histopathological differentiation from SC, AC and SCLC was required for its definite diagnosis. The patient profiles including the TNM pathological classification and staging are summarized in Table 1. There was no difference between the ages for each group (p = 0.076 by ANOVA, mean age + SD: 68.4 + 6.3 for LCNEC, 69.8 + 6.8 for SCLC, and 62.8 + 7.7 for LCC) and the number of male accounts for over 80% for all groups. The majority of patients remained at stages from IA to IIB and accordingly had the extent of the primary tumor (T1 and T2) and of regional lymph node involvement (N0 and N1) except for the most advanced stage IIIA or IIIB in a LCC patient (patient 4) and additional four patients of LCNEC for immunohistochemistry (patients 5, 6, 8, and 10). All patients had no distant metastasis (M0). All the patients but patient 5 (carboplatin + irinotecan) in LCNEC and patient 4 (carboplatin + pacritaxel) in LCC have not undergone pre-operative chemotherapy.

2. LC-MS/MS protein identifications and semiquantification by spectral counting
Trypsin-digests from laser-microdissected samples typically containing~30,000 cells were analyzed in triplicate by LC-MS/MS as described in "Materials and Methods". Under the database search settings used, we identified significant proteins as follows: LCNEC contained a total of 1,124 proteins including 410 unique, 168 in the overlap only between LCNEC and SCLC, 93 in the overlap only between LCNEC and LCC, and 453 in the overlap among three groups; SCLC contained a total of 1,096 including 362 unique, 100 in the overlap only between SCLC and LCC and the overlapped proteins described above; LCC contained a total of 1,083 including 450 unique and the overlapped proteins described earlier. The spectral counts were calculated for these proteins and those from triplicate experiments were pooled, thereby improving the performance of G-test and decreasing false positive rates significantly [14]. There was no significant difference among the total spectral counts of each group (p = 0.248 by ANOVA; mean counts + SD: 1916 + 571 for LCNEC, 1879 + 457 for SCLC, 2491 + 645 for LCC). Next, the values of R sc that is a measure of fold changes for protein expression levels were calculated as described in "Materials and Methods" using the spectral counts of these proteins. The pooled counts for each protein were also subjected to pair-wise G-test between cancer groups. Table 2 shows the identified proteins that are significantly up-or downregulated in LCNEC compared with SCLC as judged by G test under p <0.05. The proteins are listed in descending order of the R sc values; the larger the R sc value of a given protein, the greater its expression level in LCNEC compared with SCLC and vice versa. Representative proteins up-regulated in LCNEC were AL1A1, AK1C1, AK1C3, brain-type fatty acid-binding protein (FABP) and β-enolase. On the other hand, those in SCLC were brain acid soluble protein 1 (BASP), secretagogin (SEGN), fascin and neural cell adhesion molecule (CD56).

Biomarker Candidates for LCNEC
To illustrate the specificity of protein expression toward LCNEC more clearly, we made a 3D scatter plot with an × axis indicating G-statistic values (G values) for LCNEC vs. LCC analysis, a y axis for LCC vs. SCLC, and a z axis for LCNEC vs. SCLC (Figure 2). When the spectral counts of a target protein are zero for both groups in question, it is hereafter defined as G = 0. The proteins expressed specifically to LCNEC will therefore be present in the region and 237 proteins that showed significant changes in expression levels. These proteins were subjected to gene ontology (GO) analysis, highlighting their biological and molecular functions and cellular localization. As Figure 3 shows, the molecular functions and cellular localization of proteins preferentially expressed in the LCNEC vs. SCLC pair were quite different from those of the other pairs.

4. Extended immunohistochemical validation of the proteomics results
From this proteomic study we identified AL1A1, AK1C1, AK1C3 and CD44 as biomarker candidates for LCNEC. The results were immunohistochemically verified using a total of 10 cases for each group. We assessed immunoreactivity with the percentage of immunopositive area and staining intensity compared to those of positive-control samples at the maximal cut-surface of tumors ( Figure 4). All SCLC cases showed no immunoreactivity with AK1C1, AK1C3 and CD44 and the reactivity of all antibodies with LCNEC sections differed impressively from that of SCLC, supporting the proteomic results. Notably, nine cases of LCNEC including four used for the proteomic experiments   were AL1A1 positive in the extent of 30 to 90%. The most intense staining (90% positive area) was observed in patient 2 of LCNEC (Table 1 and Figure 4A). On the other hand, LCC and SCLC sections with typical histology were AL1A1 negative ( Figure 4A). There were four cases with weak immunoreactivity (30-80% area) which would contain the small areas mimicking some LCNEC morphology. In LCNEC four were immuno-positive (30-100% positive area) to both AK1C1 and AK1C3, and there was one more AK1C3 positive case. In LCC group one case was AK1C1 positive and four cases were AK1C3 positive; these cases showed small areas with neuroendocrine tendency in the   tissue structure. Immnoreactivity of LCNEC cells to CD44 were the same as that of LCC.

Discussions
This study aimed at developing the way of proteomic distinction between LCNEC and SCLC, which will assist pathologic distinction that has not sometimes been straightforward, leading to therapeutic inefficiency. We have been focusing our attention on using lasermicrodissection sampling from FFPE sections for proteomics to explore disease-related protein markers. We have already applied this method to both global semi-quantitative shotgun proteomics using spectral counting and MRM-based quantitative proteomics and successfully identified stage-related proteins on lung AC [11,12]. In this study, we used the same global shotgun method for comparison of three cancer groups (LCNEC, SCLC, and LCC) by spectral counting and explicitly interpreted three sets of pairwise G test results in the 3D G-statistic space ( Figure 2). This resulted in identifying four proteins AL1A1, AK1C1 AK1C3 and CD44 that were expressed in LCNEC more than in SCLC and LCC with high probabilities. These proteomic findings using the limited scale of patients were confirmed by routine immunohistochemitry with additional patients. Moreover we identified other proteins related to these cancer groups in the present study, further demonstrating the technical feasibility of this FFPE proteomic method. The identified four proteins physiologically take part in known metabolic processes. AL1A1, AK1C1 and AK1C3 are cytosolic oxidoreductases that are involved in reduction of progesterone to the inactive form 20-alpha-hydroxy-progesterone, metabolism of steroids and prostaglandins with multi-specificity, oxidation of retinal to retinoic acid and the precursor of the storage form vitamin A, respectively. CD44 is one of cell-surface glycoproteins which relates to cell-cell interactions including adhesion and migration, and thus to tumor growth and progression [15]. When we have considered the properties common to these proteins that have apparently no functional relationship with one another, we noticed that AL1A1 [16,17], AK1C1 [18], AK1C3 [19] and CD44 [20] have been proposed to be the markers of cancer stem cells. Their expression in tumor cells could correlate with their aggressive biological behavior, drug resistance and poor prognosis, which are common characteristics of LCNEC and SCLC. The preferential expression of the cancer stem cell markers in LCNEC over SCLC suggests that the mechanism of increasing the extent of malignancy in LCNEC differs from that in SCLC. Previous studies suggested that these redox enzymes were present in a variety of malignant tumor cells. In particular, AK1C1, and AK1C3 are reported in human non-small cell lung carcinoma (A549) cells [21], and a high expression of AL1A1 in lung cancer cell lines, especially in AC cell lines compared to LCC and SCLC cell lines [22][23][24]. To our knowledge, however, this is the first report of the statistically significant proteomic detection of AL1A1, AK1C1 and AK1C3 in clinical samples of lung cancers, especially in LCNEC. Out of the top five LCNEC-specific proteins, brain-type FABP7 is present in highly infiltrative malignant glioma and associated with enhanced cell migratory activity and thus with poor prognosis [25], suggesting for its involvement in the aggressive nature of LCNEC. Out of the top five down-regulated LCNEC proteins compared with SCLC, BASP is a potential tumor suppressor [26], consistent with its down-regulation in LCNEC, and its specific expression in SCLC suggests that different mechanisms of tumor growth could operate between LCNEC and SCLC. Another SCLC-specific SEGN is a novel neuroendocrine marker that has a distinct expression pattern from the conventional ones used in this study, consistent with being negative in LCNEC, and with the reported rate for positive staining in SCLC (26 out of 31) [27]. The role of AL1A1 in lung cancers is still unknown, but it is recently reported that AL1A1 plays an important role in Notch pathway [28]. Though there has been no effective chemotherapy to LCNEC, Sorafenib, a tyrosine kinase inhibitor in the MAP kinase pathway, is effective to malignant tumor cells with AL1A [29]. AL1A1 would be not only cancer stem cell markers, but also an attractive target of treatment of LCNEC. In addition to statistically sorting protein expression levels by spectral counting, GO mapping of significant proteins on pairwise comparison (p < 0.05) provides insights into overall differences from pair and pair in their biological and molecular functions, and cellular components. Gene ontology distributions of molecular function and cellular components in neuroendocrine vs. non-neuroendocrine comparisons, i.e., LCNEC vs. LCC and SCLC vs. LCC, did not significantly differ from each other. On the other hand, those distributions in comparison within neuroendocrine groups, LCNEC vs. SCLC, differed greatly from those of the other pairs. This does encourage us to go ahead with further studies in this line and will promise to get target proteins of LCNEC eventually in future. We checked the rate of positive immuno-reaction of relevant antibodies with proteomics-identifying proteins for ten patients of each group ( Figure 4B). Differences between the rates for all target proteins in LCNEC and SCLC are fully consistent with the proteomic results, confirming the specificity to LCNEC. The preferential expression of AL1A1 and AK1C1 in LCNEC over LCC was also immunochemically confirmed, and the rate of AL1A1 positive cases in LCC (20%) agreed with the previous results (25%, 1 of 4) [16]. In contrast, the positive staining rates of AK1C3 and CD44 in LCNEC and LCC were similar to each other. Close inspection of HE sections showed that the positive cases in LCC had small areas with neuroendocrine tendency in the tissue structure as pointed out above. Almost all sections of LCC exhibited no immunoreactivity with the neuroendocrine markers used except for weak reactivity (20 or 30%) in only two cases. This suggests that the LCNEC like structure observed in small portions of LCC sections does not necessarily contain enough secretory granules, but presumably contain LCNEC specific AK1C3 and CD44. Confirmatory conclusion of this issue should await proof by electron micrographic immunohitochemistry. A previous study indicated that CD44 was expressed more in SC (97%) and AC (71%) compared to LCC (29%) and SCLC (0%) [30] in agreement with the present positive rates for LCC (30%) and SCLC (0%).

Conclusions
We concluded that AL1A1, AK1C1, AK1C3, and CD44 were specific for the LCNEC phenotype in relation to SCLC and LCC through proteomics of FFPE samples. They were useful targets to immunohistochemically distinguish LCNEC from SCLC and LCC. Though we need a variety of studies with more extensive experimental and clinical data to assess the precise function of these marker candidates and confirm them as real biomarkers, this proteomic analysis was effective to detect them and will be applied to other phenotype of malignancies.