About New York State Public Access Cancer Epidemiology Data (NYSPACED)
- Download cancer incidence data by New York State county (self-extracting archive, 83MB compressed, 861MB uncompressed)
- Download cancer incidence data by New York City neighborhood (self-extracting archive, 23MB compressed, 168MB uncompressed)
- Request a password
Frequently Asked Questions
What is NYSPACED?
The New York State Public Access Cancer Epidemiology Data (NYSPACED) are computerized data files containing carefully selected data items. Since the New York State Cancer Registry routinely produces a large volume of tabulated cancer incidence and mortality statistics by race/ethnicity, gender, county, cancer site, and year of diagnosis, most users should be able to find the cancer information they are interested in directly from these tables without needing to access individual-level data. The NYSPACED are prepared for users who want to do additional analysis using the cancer incidence data reported to the New York State Cancer Registry. Currently, we release two NYSPACED datasets, one containing cancer incidence by county, and the other containing cancer incidence by New York City neighborhood.
Who are the intended users of NYSPACED?
NYSPACED is a resource for researchers who are interested in the patterns of cancer incidence by county in New York State or by neighborhood in New York City. Appropriate use of NYSPACED assumes knowledge of the epidemiologic concepts of age-adjusted incidence rates and of confidence intervals.
How do I use NYSPACED?
The NYSPACED files can be used with SEER*Stat, a specialized software designed for analyzing cancer data. SEER*Stat is free, and can be obtained from the National Cancer Institute's Surveillance, Epidemiology and End-Results program Web page. SEER*Stat must be installed on the PC in order to use the NYSPACED files. There are tutorials available on the SEER Web page that will help users learn how to use SEER*Stat. See the section of this document titled "Setting up SEER*Stat and NYSPACED" to learn how to configure SEER*Stat to use the NYSPACED files.
Do I have to use SEER*Stat? Are the data available in another format?
The NYSPACED data are only available in SEER*Stat format, but users can use SEER*Stat to obtain a case listing of subsets of the data to import into other programs such as SAS.
What variables are available on the NYSPACED files?
Not all variables on the New York State Cancer Registry are available in NYSPACED. The variables available in the NYSPACED are listed below. You can also find which variables are available, along with the specific values of these variables, by choosing the dataset from the Data tab in SEER*Stat, then clicking on the data dictionary button.
Variables available in the NYSPACED data files:
- Hispanic origin
- New York State county or New York City neighborhood
- Year of diagnosis
- Site of cancer
- Morphology (histology, behavior, and behavior recode for analysis)
- Diagnostic confirmation
- Summary stage at diagnosis
- Sequence number - central
- Surgery of primary cancer site
What if I want other data?
The NYSPACED datasets must balance the need to protect patient confidentiality with the needs of researchers and planners. Researchers should contact the New York State Cancer Registry if they have needs that are not filled by the NYSPACED datasets or the tabulated data on the New York State Cancer Registry web site.
Setting up SEER*Stat and NYSPACED
- Install SEER*Stat on PC. SEER*Stat can be downloaded from the SEER Web page.
- Because of the way SEER*Stat is configured, you must request and obtain access to SEER data in order to use SEER*Stat. This requires signing a Public Use Data Agreement. Please allow two business days to receive access to SEER*Stat data.
- Download and install the NYSPACED data on your PC or to a LAN. The data are contained in a password protected self-extracting archive. Send requests for the password to firstname.lastname@example.org. Download the file to your personal computer and extract the files to a subdirectory, where they can be accessed by SEER*Stat. It is recommended that these files be unzipped to their own subdirectory, rather than to one that is used for other purposes.
- Open SEER*Stat, click Profile menu, then select Preferences to set up the Data Locations. If you choose to run SEER*Stat through client-server mode, check the box of Client-Server: ssp://seerstat.imsweb.com:2038, and then Add Local subdirectory and select the location where NYSPACED data were saved. If you choose to run SEER*Stat through local mode, you must add both the local subdirectories where the SEER data and NYSPACED are located. The preferences for accessing the SEER*Stat data can be changed later. Please note that access to the client-server mode may be restricted by your firewall. Please refer to the SEER*Stat frequently asked questions for help if you receive a "connection refused" error when attempting to access SEER data via the client-server mode.
The session types that are available with NYSPACED include frequency, rate, and case-listing sessions.
The following variables are available in the NYSPACED files.
- Age at diagnosis - Age at diagnosis is grouped into 19 age groups, beginning with less than 1 year old, 1 to 4 years old, then five-year age groups up to 80-84 years. All persons older than 85 are grouped into the 85 and older age group.
- Sex - Only male and female sex are included in the file. Persons of other gender (i.e., transsexual or hermaphrodite) and unknown gender have been excluded. There are fewer than ten cases per year statewide with other or unknown gender in the New York State Cancer Registry database.
- Race (referred to as "Race recode" in NYSPACED) - Race information is recoded into four categories: Black, White, Other, unknown.
- Hispanic Origin (referred to as "Origin recode NHIA (Hispanic, Non-Hisp)" in NYSPACED) - Hispanic origin is coded into Hispanic or Non-Hispanic based on place of birth, ethnicity recorded in the medical record, and Spanish surname list.
- New York State County or New York City Neighborhood (referred to as "County at DX" in the dataset of cancer incidence by county, and as PUMA in the dataset of cancer incidence by New York City neighborhood) - This data item indicates the patient's county or neighborhood of residence at cancer diagnosis. In the dataset of cancer incidence by county, two larger regions have been defined by aggregating certain counties together: New York City and New York State excluding New York City. The New York City region includes the five counties of New York City: Bronx, Kings (Brooklyn), New York (Manhattan), Queens, and Richmond (Staten Island). The New York State excluding New York City region includes all remaining counties. Persons who are not residents of New York State are not included in this dataset. The dataset of cancer incidence by New York City neighborhood only contains cancer data for New York City residents. The dataset also provides data at the county level for the five New York City boroughs. Neighborhoods are defined in terms of Public Use Microdata Areas (PUMAs). PUMAs approximate New York City Community Districts. For more information about PUMAs, please refer to http://www1.nyc.gov/site/planning/data-maps/nyc-population/geographic-reference.page.
- Year of diagnosis - Year of diagnosis included in the current release ranges from 1995 to 2018 for cancer incidence by county, and from 2001 to 2018 for cancer incidence by New York City neighborhood, and this may change when new data are added to the datasets. 2001 was selected as the starting year for the neighborhood dataset based on completeness of geocoding. Geocoded census tract information is needed for assigning the neighborhood.
- Site recode (referred to as "Site recode ICD-O-3/WHO 2008) - The site recode variables provide users with a way to analyze site specific cancer data without needing to refer to the ICD-O-3 manuals for grouping cancers. SEER site recode definitions are based on the primary (anatomic) site of the tumor and ICD-O-3 histology. The SEER site recode definitions based on ICD-O-3 are used.
- Histology - Histology is coded in ICD-O-3. (Fritz A et al. International Classification of Diseases for Oncology, Third Edition. World Health Organization, Geneva, 2000). Cases diagnosed prior to 2001 (i.e., before ICD-O-3 became the standard) that were originally reported in ICD-O-2 have been forward-converted to ICD-O-3 for NYSPACED.
- Behavior - Behavior is coded using ICD-O-3. Cases diagnosed prior to 2001 that were originally reported in ICD-O-2 have been forward-converted to ICD-O-3 for NYSPACED.
In SEER*Stat, the "Select Only Malignant Behavior" check box is linked to the behavior variable and indicates malignancy using ICD-O-3. Please note that in situ bladder tumors have been recoded to malignant for NYSPACED, consistent with standard published bladder cancer data.
- Behavior recode for analysis - ICD-O-2 and ICD-O-3 have some differences as to which tumors are considered malignant. These differences affect the cases that are reported to the New York State Cancer Registry, and therefore should be taken into account in the analysis. There are two main categories of change: tumors that were previously considered malignant in ICD-O-2 and are now considered borderline behavior in ICD-O-3, and tumors that were previously considered borderline behavior in ICD-O-2 and are now considered malignant in ICD-O-3. Tumors that are considered malignant based on ICD-O-2 but not based on ICD-O-3 are included in the files only for years prior to 2001. Conditions that are considered borderline behavior in ICD-O-2 but are malignant in ICD-O-3 are included in the files only for 2001 and later. All other borderline or benign tumors reportable to New York State Cancer Registry are excluded from the NYSPACED dataset. For convenience of analysis, we have included a variable called behavior recode for analysis, which the SEER Program created to take into account the reportability change due to the switch from ICD-O-2 to ICD-O-3. This variable may be useful to researchers for analyzing time trends that include diagnosis years prior to and after this classification change was made. The behavior recode for analysis is defined as follows:
- 2 - in situ in both ICD-O-2 and ICD-O-3
- 3 - Malignant in both ICD-O-2 and ICD-O-3 (incl. bladder in situ)
- 4 - Malignant only in ICD-O-3
- 5 - No longer reportable in ICD-O-3
- 6 - Malignant only for 2010+
- Laterality - Laterality is used to code for the side of a paired organ or the side of the body on which the cancer originated.
- Grade - Grade is used to indicate the degree of differentiation or abnormality of the reported neoplasm. For lymphomas and leukemias, grade also is used to indicate T-, B-, Null-, or NK-cell origin.
- Diagnostic confirmation - Diagnostic confirmation refers to the best method of diagnostic confirmation of the reported cancer.
- Summary stage (referred to as "Stage(local/regional/distant/unk" in NYSPACED) - Summary stage is grouped into in situ, local, regional, distant, or unknown. "Early stage" is sometimes used to refer to cases diagnosed at in situ and local stages, compared to those with regional or distant spread. Summary stage in NYSPACED is derived using SEER Summary Stage 1977 for cases diagnosed prior to 2001, SEER Summary Stage 2000 for cases diagnosed between 2001 to 2003 and between 2016 and 2017, derived SEER Summary Stage 2000 (from collaborative stage) for cases diagnosed between 2004 and 2015, and Summary Stage 2018 for cases diagnosed 2018 and after.
- Sequence number - central - This variable is used to indicate the sequence of all reportable neoplasms over the lifetime of a person. Sequence number 00 indicates that the person has had only one in situ or one malignant neoplasm. Sequence number 01 indicates the first of two or more reportable neoplasms; 02 indicates the second of two or more neoplasms, and so on.
- Surgery of primary cancer site (referred as "Rx summ - surg prim site" in NYSPACED) - Site-specific codes for the type of surgery performed to the primary site as part of the first course of treatment. This includes treatment given at all facilities as part of the first course of treatment. Surgery treatment information is only available for cancer cases diagnosed in 2004 or later. Refer to FORDS and SEER Program Code Manual for additional instructions.
- Radiation (referred to as "Rx summ - radiation" in NYSPACED) - It indicates the type of radiation therapy performed as part of the first course of treatment. Radiation treatment information is only available for cancer cases diagnosed in 2004 or later.
Cancer Incidence - Public Use Data from 1995 (or 2001) to 2018, New York State Cancer Registry, New York State Department of Health, data as of November 2020.