Preparing Data for Importing into a Geographic Information System Program

Step 1 – Place Data into a Spreadsheet

When entering data into a spreadsheet, make sure that at least one column of data (or “field”) contains location information such as an address, city, state, zip code, county, or a standard geographic ID (such as the U.S. Census FIPS code (see http://quickfacts.census.gov/qfd/meta/long_fips.htm). GIS programs search for a field that contains location information and use this information to map the data.

Example - If one field contains the name of a county and another field contains the average income of residents in each county, the mapping software will first locate the column of county names (the location information) and then map the average income of residents in each county according to the county listed in the associated row.
 

County Avg_Income
Alto 56000
Bolo 43000
Colo 50000

Step 2 – Create Discrete Fields for Each Category of Information

Organize your data so that each column is assigned a discrete category of information and can be associated with the same location information that will be used for mapping. This information can either be recorded as text or numbers. Different data types can be included on the spreadsheet, but DO NOT mix data types within a column. For example, do not put ND or “no data” in a column of numbers. Use a unique numeric code instead, such as 99999, to indicate missing data (do not use “0” as that indicates a potentially “real” value). Use a unique name for each field and try to keep the field headings at 13 characters or less with no spaces (use an underscore). This will prevent the GIS program from truncating your field headings into something unrecognizable.

Example - We have four data sets by county: total population, total numbers of renters, median income and primary political affiliation. We want all four categories of data imported into the GIS program so that each could be mapped separately by county. To prepare the data for importing into a GIS, we would want to make sure that the values for each of these data categories were placed into their own separate column. Also, we would want to make sure that there was a column for the associated location information, in this case counties.

County Total_Pop Total_Renters Median_Income Prime_Politic
Alto 680000 150000 56000 Democratic
Bolo 303000 78000 43000 Democratic
Colo 800000 123000 50000 Republican

IMPORTANT!

  • Do not place more than one category of information or data type into the same column.
     
  • Fill in all the cells in the spreadsheet with some value or text. When importing a spreadsheet into a GIS, many programs will automatically fill in empty cells in a column of numbers with a “0”. This can be problematic when trying to map the data, especially if the actual data set includes “0” as a real value.

Step 3 – Organizing Addresses for Mapping

The process of geocoding uses spreadsheets of addresses to locate and display points on a map. To prepare a spreadsheet for geocoding, the full address must be partitioned into separate columns. Street Address, Zip/Postal Code, and City are necessary components for accurate geocoding; County, State or Country are optional. The order of the columns is not important, but make sure to provide as complete an address as possible (including designations such as Way, Street or Avenue). The more complete the address, the more accurate the geocoding and mapping process will be. NOTE: Addresses using PO Boxes CANNOT be geocoded.

Example:

Street Zip City County State
22 Clearlake Way 98651 Jamestown Alto OR
4560 Bird Street 97143 Freeport Belo CA
302 Common Avenue 99621 Anchorville Celo WA

Example:

Correct Full Address: 333 Maple Street, Fruitvale, Oregon, 97000

Missing Address Number: Maple Street, Fruitvale, Oregon, 97000 --- The GIS will not be able to locate the proper point on Maple Street and will return an error message.

Missing City: 333 Maple Street, Oregon, 97000 --- There may be multiple “333 Maple Street” locations in Oregon and this may cause the GIS program to map the wrong location (it will geocode the first 333 Maple Street it finds in the spreadsheet, which may not be in the correct city).

Other Potential Geocoding Errors: An error will occur if the street does not exist or has been expanded to include a range of addresses not in the software’s internal address database. This is most common for areas experiencing rapid growth.

Step 4 – Entered Data Must be Accurate and Standardized

Spelling, capitalization, and standardized abbreviations are critical to ensuring that the GIS software recognizes the location information. In order to map the data, the mapping software will search through the spreadsheet for a column with location data and will attempt to match the location data with its own location data. Familiarization with the conventions used to record the data contained in the software’s internal databases is helpful in preparing your own spreadsheets for importing.

For example, let’s imagine that your spreadsheet has two columns: county and average income of residents. Ultimately, you want to map the average income of residents by each county listed. The mapping software will first locate the column of county names (the location information) and then try to match the names in the spreadsheet with the county names in its database. Once a match occurs, the software will attach the associated data fields from the spreadsheet and will be able to display the information on a map. However, if the software fails to match the spreadsheet’s county name with an EXACT DUPLICATE county name in its database, then the software will register a failed match and will be unable to map that county data.

Matching issues most likely occur if the location name is misspelled. They may also occur if the capitalizations or abbreviations are different than those used in the software’s database. Therefore, to ensure that all locational data are properly matched and mapped, make sure that the data are spelled correctly and consistently entered using the same conventions as the software’s database.

Example:

Wrong-123 Jmaes Rd.            Correct-123 James Road

Wrong-333 hopper st.            Correct-333 Hopper Street

Wrong-345 THIS WAY ave      Correct-345 This Way Avenue

Step 5 – Joining Tables and Geocoding

Once the data have been properly prepared in a spreadsheet, the next step is to use the tools within the GIS software to join tables or geocode addresses in preparation for mapping. This is generally called georeferencing and serves to import the spreadsheet data into the mapping software. If you are interested in learning how to use georeferencing tools, here are some course options and information depending on the mapping software that you are using:

ArcGIS (Joining Tables): http://resources.arcgis.com/en/help/main/10.1/index.html#//005s0000002n000000

ArcGIS (Geocoding): http://training.esri.com/gateway/index.cfm?fa=catalog.webCourseDetail&courseid=2514

Google Earth (Importing Addresses): https://support.google.com/earth/answer/176685?hl=en

For the more advanced user: Joining Census Data Tables to Shapefiles in ArcMap
http://spatial.scholarslab.org/stepbystep/joining-census-data-tables-to-shapefiles-in-arcmap/