About New York State Public Access Cancer Epidemiology Data (NYSPACED)
Frequently Asked Questions
What is NYSPACED?
The New York State Public Access Cancer Epidemiology Data (NYSPACED) are computerized data files containing carefully selected data items. Since the New York State Cancer Registry routinely produces a large volume of tabulated cancer incidence and mortality statistics by race/ethnicity, gender, county, cancer site, and year of diagnosis, most users should be able to find the cancer information they are interested in directly from these tables without needing to access individual-level data. The NYSPACED are prepared for the users who want to do additional analysis using the cancer incidence data reported to the New York State Cancer Registry.
Who are the intended users of NYSPACED?
NYSPACED would be most useful to researchers who are interested in the patterns of cancer incidence in New York State. NYSPACED does not contain any data elements related to residence other than region (New York City and the remainder of the state), and therefore would not be helpful for researchers interested in local cancer patterns. Appropriate use of NYSPACED assumes knowledge of the epidemiologic concepts of age-adjusted incidence rates and of confidence intervals.
How do I use NYSPACED?
The NYSPACED files can be used with SEER*Stat, a specialized software designed for analyzing cancer data. SEER*Stat is free, and can be obtained from the National Cancer Institute's Surveillance, Epidemiology and End-Results program Web page. SEER*Stat must be installed on the PC in order to use the NYSPACED files. There are tutorials available on the SEER Web page that will help users learn how to use SEER*Stat. See the section of this document titled "Setting up SEER*Stat and NYSPACED" to learn how to configure SEER*Stat to use the NYSPACED files.
Do I have to use SEER*Stat? Are the data available in another format?
The NYSPACED data are only available in SEER*Stat format, but users can use SEER*Stat to obtain a case listing of subsets of the data to import into other programs such as SAS.
What variables are available on the NYSPACED file?
Not all variables on the New York State Cancer Registry are available in NYSPACED. The variables available in the NYSPACED are listed below. You can also find which variables are available, along with the specific values of these variables, by choosing the dataset from the Data tab in SEER*Stat, then clicking on the data dictionary button.
Variables available in the regional level data file of NYSPACED:
- Hispanic origin
- New York State region
- Place of birth
- Year of diagnosis
- Site of cancer
- Morphology (histology, behavior, and behavior recode for analysis)
- Diagnostic confirmation
- Summary stage at diagnosis
- Multiple tumor indicator
What if I want other data, such as county?
The NYSPACED dataset must balance the need to protect patient confidentiality with the needs of researchers and planners. The current release of NYSPACED only includes regional-level data file. In this file, the county variable has been aggregated into region - New York City and New York State excluding NYC. Researchers should contact the New York State Cancer Registry if they have needs that are not filled by the NYSPACED datasets or the tabulated data on the New York State Cancer Registry web site.
Setting up SEER*Stat and NYSPACED
- Install SEER*Stat on PC. SEER*Stat can be downloaded from the SEER Web page.
- Because of the way SEER*Stat is configured, you must request and obtain access to SEER data in order to use SEER*Stat. This requires signing a Public Use Data Agreement. Please allow two business days to receive access to SEER*Stat data.
- Download and install the NYSPACED data on your PC or to a LAN. The data are contained in a password protected self-extracting archive. Send requests for the password to firstname.lastname@example.org. Download the file to your personal computer and extract the files to a subdirectory, where they can be accessed by SEER*Stat. It is recommended that these files be unzipped to their own subdirectory, rather than to one that is used for other purposes.
- Within SEER*Stat, open the preferences dialog and add the NYSPACED directory as a local directory. If you choose to run the SEER*Stat through client-server mode, check the box of Client-Server: ssp://seerstat.cancer.gov:2038, and then Add Local subdirectory and select the location where NYSPACED data were saved. If you choose to run the SEER*Stat through local mode, you must add both the local subdirectories where the SEER data and NYSPACED are located. The preferences dialog box can be found in the file menu in SEER*Stat. The preferences for accessing the SEER*Stat data can be changed later. Please note that access to the client-server mode may be restricted by your firewall. Please refer to the SEER*Stat frequently asked questions for help if you receive a "connection refused" error when attempting to access SEER data via the client-server mode.
The session types that are available with NYSPACED include frequency, rate, and case-listing sessions.
The following variables are available on the regional-level NYSPACED file.
- Age at diagnosis - Age at diagnosis is grouped into 19 age groups, beginning with less than 1 year old, 1 to 4 year olds, then five-year age groups up to 80-84 years. All persons older than 85 are grouped into the 85 and older age group.
- Gender - Only male and female sex are included in the file, and persons of other gender (i.e., transsexual or hermaphrodite) and unknown gender have been excluded. There are fewer than ten cases per year statewide with other or unknown gender in the New York State Cancer Registry database.
- Race (named as "Race recode" in NYSPACED) - Race information is recoded into four categories: Black, White, Asian/Pacific Islander, and other/unknown.
- Hispanic Origin (named as "Origin recode SEER" in NYSPACED) - Hispanic origin is coded into Hispanic or Non-Hispanic based on place of birth, ethnicity recorded in the medical record, and Spanish surname list.
- New York State Region (named as "County at DX" in NYSPACED) - New York State region is derived based on county of residency at diagnosis. Two regions are defined: New York City and New York State excluding New York City. The New York City region includes the five counties of New York City: Bronx, Kings (Brooklyn), New York (Manhattan), Queens, and Richmond (Staten Island). The New York State excluding New York City region includes all remaining counties. Persons who are not residents of New York State are not included on NYSPACED.
- Place of birth - Place of birth is grouped into US born, foreign born, and unknown. Caution should be exercised in using this variable, since a large proportion of cases have unknown place of birth. This variable cannot be used in a rate session, because appropriate denominators are not available.
- Year of diagnosis - Year of diagnosis included in the current release of NYSPACED ranges from 1996 to 2009, and this may change when new data are added to the dataset.
- Site recode - The site recode variables provide users with a way to analyze site specific cancer data without needing to refer to the ICD-O-3 manuals for grouping cancers. SEER site recode definitions are based on the primary (anatomic) site of the tumor and ICD-O-3 histology. The SEER site recode definitions based on ICD-O-3 are used.
- Histology - Histology is coded in ICD-O-3. (Fritz A et al. International Classification of Diseases for Oncology, Third Edition. World Health Organization, Geneva, 2000). Cases diagnosed prior to 2001 (i.e., before ICD-O-3 became the standard) that were originally reported in ICD-O-2 have been forward-converted to ICD-O-3 for NYSPACED.
- Behavior - Behavior is coded using ICD-O-3. Cases diagnosed prior to 2001 that were originally reported in ICD-O-2 have been forward-converted to ICD-O-3 for NYSPACED.
In SEER*Stat, the "Select Only Malignant Behavior" check box is linked to the behavior variable and indicates malignancy using ICD-O-3. Please note that in situ bladder tumors have been recoded to malignant for NYSPACED, consistent with standard published bladder cancer tables.
- Behavior recode for analysis - ICD-O-2 and ICD-O-3 have some differences as to which tumors are considered malignant. These differences affect the cases that are reported to the New York State Cancer Registry, and therefore should be taken into account in the analysis. There are two main categories of change: tumors that were previously considered malignant in ICD-O-2 and are now considered borderline behavior in ICD-O-3, and tumors that were previously considered borderline behavior in ICD-O-2 and are now considered malignant in ICD-O-3. Tumors that are considered malignant based on ICD-O-2 but not based on ICD-O-3 are included in the file only for the diagnosis year prior to 2001. Conditions that are considered borderline behavior in ICD-O-2 but are malignant in ICD-O-3 are included in the file only for 2001 and later. All other borderline or benign tumors reportable to New York State Cancer Registry are excluded from the NYSPACED datasets. For convenience of analysis, we have included a variable called behavior recode for analysis, which the SEER Program created to take into account the reportability change due to the switch from ICD-O-2 to ICD-O-3. This variable may be useful to researchers for analyzing time trends that include diagnosis years prior to and after this classification change was made. The behavior recode for analysis is defined as follows:
- 2 - in situ in both ICD-O-2 and ICD-O-3
- 3 - Malignant in both ICD-O-2 and ICD-O-3 (incl. bladder in situ)
- 4 - Malignant only in ICD-O-3
- 5 - No longer reportable in ICD-O-3
- Laterality - Laterality is used to code for the side of a paired organ or the side of the body on which the cancer originated.
- Diagnostic confirmation - Diagnostic confirmation is grouped into microscopically confirmed, not microscopically confirmed, or unknown.
- Summary stage (named as "loc/reg/distant stage" in NYSPACED) - Summary stage is grouped into in situ, local, regional, distant, or unknown. "Early stage" is sometimes used to refer to cases diagnosed at in situ and local stage, compared to those with regional or distant spread. Summary stage in NYSPACED is derived using the SEER Summary Stage 1977 for cases diagnosed prior to 2001, the SEER Summary Stage 2000 for cases diagnosed between 2001 to 2003, and the derived SEER Summary Stage 2000 (from collaborative stage) for cases diagnosed in 2004 and later.
- Multiple tumor indicator - This variable is derived based on the number of reportable tumors diagnosed for an individual. It has two values: only one reported tumor and multiple reported tumors.
Cancer Incidence - Public Use Data from 1996 to 2009, New York State Cancer Registry, New York State Department of Health, data as of November 2011.