Integrative Use of Census and GIS Vemo Andries Ferreira B.Sc. IT (GIS), B.Sc.Hons (Geography) (2010073746) Research report presented in partial fulfilment of the requirements for a Master' s degree (in Town and Regional Planning) at the University of the Free State Supervisor: M. Campbell Nov 2015 DEPARTMENT OF TOWN AND REGIONAL PLANNING UNIVERSITY OF THE FREE STATE tf\/)) UFS UNIVERSITEIT VAN DIE VR YSTAAT YUNIVESITHI YA FREISTATA ~UV DECLARATION By submitting this thesis electronically, I declare that the entirety of the work contained therein is my own, original work, that I am the owner of the copyright thereof (unless to the extent explicitly otherwise stated) and that I have not previously in its entirety or in part submitted it for obtaining any qualification. November 2015 Copyright © University of the Free State All rights reserved 11 ABSTRACT Census is an ancient phenomenon, and Geographic Information Systems (GIS) is a modern day marvel. What both have in common is their direct relationship to geography. Despite the wealth of information available in the census, unearthing this information with GIS is largely underutilised. This research essay opens with a review on census and GIS as two components for integration. To assess integrative use between census and GIS for decision making, a custom framework was developed called CENGIS (derived from census and GIS) to assess integrative use through key aspects such as tabulation, representation, aggregation and disaggregation. Each integrative aspect is then evaluated according to frequency of use and overall usability from which the degree of integrative use is determined. In conclusion the study ends with a synthesis on its key findings as well as proposals for future research. Keywords: census, geographic information systems, integrative use, tabulation, census cartography, decision making 111 ACKNOWLEDGEMENTS Supervisor To my supervisor for her innovative contribution toward the completion of this research project. Family To my family for their patience and support in this venture. Employer To my current boss (Stephanus Minnie) for allowing me sufficient time toward completing this project. All Towards all who participated indirectly in completion of this project. lV TABLE OF CONTENTS DECLARATION .... ..... ........ ........................................... ..... ..... ... .... .. ..... .............. ... ..... ..... ..... ........ . i ABSTRACT ..................................... ... ............ ...... ..... .............................. .. ..... .. .... ...... .. ............. ... .. ii ACKNOWLEDGEMENTS ........................... ................. .......... ... ........................... .... ................... iii TABLES ....................... .................... ..... ............ .... ..... ... ........... .......... ..... ................ .... ... .. ... .... ..... vi i FIGURES .................. ..................... ..... ........ ........... ... ..... .......................... ..... ... .. .... ... .... ... ... ......... viii ACRONYMS AND ABBREVIATIONS ... ................ ..... ..... ..... ... ..... .. ... .............. .. .. ...................... x CHAPTER 1: INTRODUCTION AND RESEARCH QUESTION .... ... ..... .. ... ..... .. .. ..... .... ........... 1 I. I. OVERVIEW AND BACKGROUND ................ .......... .... .... ........ .. .... ...... ....................... . 2 1.2. RESEARCH FOCUS .. ... .. ... .. .... ..... .... ........ .... ..... ..... ............ .. .......................... ..... ..... .... .. 5 1.2.1 . Problem, Question and Aim .... .... ... .. ... .... ................. ... .. ..... ..... .. ....... ................... ..... . 5 1.2.2. Research Context .... .. .... .... .......... ... .... .... ..... ... ... .. .............. ..... ................................... 5 1.2.3. Research Methodology .................................. ... ....... ... .. ... .. .... .... .. ... .. ... .. ... .. ... .. ... .. .... 6 1.3. CHAPTER OUTLINE ...... .... ... ..... ... ..... ..... ... ... ..... ...... ...... .. .... .. .... ..... ..... ...... ..... ..... ... .. ... . 7 CHAPTER 2: UNPACKING CENSUS AND GIS INTEGRATION .... ..... ..... ..... .. ... .... .. .... .... .. ... . 8 2. 1. INTRODUCTION .............. ..... ..... .................. ..... .......... ...... ..... .......... ................. ... ..... ..... 8 2.2. CENSUS AND GIS OVERVIEW ...... .......... ...... ..... ..... ........... ..... ..... ..... ..... .................... 8 2.3. CENSUS AND GIS INTEGRATION ........... .... .. ............... ..... ..... ..... ..... ..... ............... ... 10 2.3. 1. General aspects on integration ........................... ...... ..................... .......................... 10 2.4.1. Elementary aspects on integration ..... .... .... .... ... ... .. .... ..... .. ... .. .... ..... ....... .... ............ . 11 2.4.2. Intermediate aspects on integration ... ..... .... ....... .. ... ... ..... .. ..... ..... .... .... .. ... .. ... .. ... ..... 14 2.4.3. Advance aspects on integration ... .............. .......... .. ..... .... .... ......... ..... ..... ... .. ... .. ... .... 17 2.4. CONCLUSION ........................................... .... .. .... ........ ..... ... ..... .... ..... ..... ..... ..... ... .. ....... 19 CHAPTER 3: FORMULATING THE CENGIS FRAMEWORK ......... ........ ....................... ....... 20 3.1. INTRODUCTION ... ..... ... .................. .................... ...... ..... ................ .... ...... .............. ..... . 20 (\apt. Stods- en Streekbeplonnlng UV ~pt. Urben t:'lnd r;i ;r.1r-n'll Planning t1 Pc sbusr . ·:..·x 339 Qlosmfonteln 9'.V)O - . -· - v 3.2. CENGIS FRAMEWORK ......... ............. ........ ... .. ... ..................... .. ............... .... ... ....... .. .. 21 3.2.1. Extraction Standard ... ..... ...... ........ ... ... .................................. .. ...... ........................... 21 3.2.2. Extraction Customised .... .. ... ... ............................ .... .. .. .. ........ ... .. ..... .... ..... ..... ..... ..... 22 3.2.3. Map Production ........ .. ......................... ...................... .. .. .. ..... .... .. .. ........................... 23 3.2.4. Representation ............................. ..... ..... ....... ....... ............ ....... ..... ... ..... ........ ............ 25 3.2.5. Spatial Aggregation ........... ..... .. ....... .... ...... .... ...... ..... ....... .... ..... ..... ..... .... .. .... .. .. .. .... 27 3.2.6. Modifiable Area Unit Problem .... .... .. ... ....... ... ..... ..... ... .. ............... ..... ..... ...... .......... 29 3.2.7. Time Series ......... ........... ..................................... ..... ..... .... ..... ... ... ....... ... .. ... .. ... ...... . 30 3.2.8. Zonal Statistics ........ .. .................................. ................................. ..... ..... ...... ... ..... ... 31 3.2.9. Service Areas ..... .. .... ..... .. .... ..... ........... ..... ......... ..... ..... ....... ..... ................................ 32 3 .2.10. Disaggregation ....... ... ........ ... ...... .................. ....... ..... ..... .................................. .. .. 33 3.3. CONCLUSION ... ..... ... ... ..... ...... ... .. ...... ..... ..... ... .. .......... ...... ....... ............... .. ..... ..... ..... .... 35 CHAPTER 4: FINDINGS AND DISCUSSIONS ON CENGIS FRAMEWORK .... ... .......... ...... 36 4.1. INTRODUCTION ... ...... ..... ...... ... .. ..... ... ..... ..... ... ...... .... .... ...... ....... .. ... .. ... ..... ....... .... ..... .. 36 4.2. CENGIS FRAMEWORK .... ..... ... ... ..... ..... ... ..... .......... ..... ..... ..... ..... ... .. ..... ... .. .. .... ... ..... .. 36 4.2.1 . Extraction Standard ...... ........... .............. .. ................. ... ..... .... ..... ..... .. ....................... 38 4.2.2. Extraction Customized ......... ....... ...... .. ... .. ... .. .............. ..... .. ... .. ........ .. ...................... 41 4.2.3 . Map Production .. ..... ... ........ ... .... .... .... .... ..... ..... ..... .... ..... .. ... .. ............. .. .................... 44 4.2.4. Representation ...................... ..... ................ ........ .. ... .. ... ..... ..... ..... ..... ............... ......... 47 4.2.5. Spatial Aggregation ... ..... .. ..... ..... ..... ................ ..... ..... ..... .......... .......... ..... .... .. ... ...... 50 4.2.6. Modifiable Area Unit Problem .... ..... .... .................... ..... ..... ..................... .... ..... ...... 53 4.2.7. Time Series .... .... .............. ... .. .... .. ... ...... .. ... .. ... .. ..... .. ... .. ... ...... .................................. 56 4.2.8. Zonal Statistics ..... ....... ... .... ... .. .. .... .... .................. .... .. ... .. ... .. ... .. ... ..... ..... ..... ..... ..... ... 59 4.2.9. Service Areas ... ........... .......... .. ......... ..... .... ... ....... ........ .............. ................. ... ... .. ..... 62 4 .2.10. Disaggregation .......... ..... ...... .... .. ... .. ...... ..... ..... ..... ... ... .... ... ... .. ... ........................ .. 65 VI 4 .3. CONCLUSION ...... ................................. ..... ..... ... .... ...... ..... .. ......................................... 67 CHAPTER 5: CONCLUSION AND FUTURE RESEARCH .. ... ..... ....... .. .. ......................... ....... 70 5.1 . REVISITING THE AIMS AND OBJECTIVES ................... .................. ..... ..... ... .. ... ... . 70 5.2. SUMMARY OF FINDINGS ............................ ..... ..... ........... .......... ..... ..... ..... .... ..... ..... . 70 5.2. 1. In Relation to the census and GIS integration ... .. ... .. .... .......................... ......... .... ... 70 5.3. CONCLUSION ............ ............ ............................ .......................................................... 72 5.3. 1. Future Research .... ............................................. .. ........ ... .. .. ... .. ............................... 75 BIBLIOGRAPHY ....... ................ .......... ....................... ......... .. .............. .. ... ...... ......... .. ..... ............. 76 APPENDIX A ........... ..... ..... ..... ...................................................... ............................................... 81 Vll TABLES Table 1 Frequency of using census data in GIS .......... ..... .... .... .. .. .... .... .... ......... .... ....... ........ ... ..... 37 Table 2 Frequency of using standard census data in GIS ... ............... .... ... ................................... 39 Table 3 Usefulness of standard census data in GIS for decision making .... ..... ...... ........... .. ....... . 40 Table 4 Frequency of using custom queries from census ... ... .. ... .. ... ..................................... ....... 42 Table 5 Usefulness of custom queries from census for decision making .... ....................... ......... 43 Table 6 Frequency of using data driven pages with census data ...................... ... ... .......... .. ... ...... 45 Table 7 Usefulness of data driven pages using census data for decision making ... .. ... .. ... .. ......... 46 Table 8 Frequency of using different representations of the same census data .. ....... ..... ... ..... .. ... 48 Table 9 Usefulness of different representations of the same census data ............... ... .................. 49 Table 10 Frequency of using multiple spatial levels from census ... ..... ....................................... 51 Table 11 Usefulness of multiple spatial levels in census for decision making ....... ..... ..... .... .. ... .. 52 Table 12 Frequency of using census boundary data ................ .. .... ..... ..... .. ... .. ................. .... ... ..... 54 Table 13 Usefulness of maps containing the MAUP for decision making ...... ........... ..... ...... .... .. 55 Table 14 Frequency of using older census data for comparison ....... .. .... .... .... .. ..... ...................... 57 Table 15 Usefulness of older census data for comparison in decision making. .. .. .... .... .. .... ... .. .... 58 Table 16 Frequency of using zonal statistics with census .. ... .. ..... ....... ..... ... ... .. ............ ............. ... 60 Table 17 Usefulness of zonal statistics on census for decision making ........ ... ..... .... .. .. ... ............ 61 Table 18 Frequency of using network analysis with census data ...... ... ..... ... .. ... .. ... .. ..... ... .. ... .. ... . 63 Table 19 Usefulness of network analysis with census data for decision making ... ... ... ... .. .. ... ..... 64 Table 20 Frequency of using disaggregated census data ...... ............... ........... ............ .. ... ....... ..... 66 Table 21 Usefulness of disaggregated census data for decision making ............ ... .. ... .. ............... 67 Table 22 CENGIS matrix on integrative use between census and GIS ... .. ... .. ..... ..... ... .. ... ..... ... ... 68 V111 FIGURES Figure 1 Census data in map form .......................................................................... ... ... ..... ..... ..... .. 3 Figure 2 Graph depicting population size of Bloemfontein city (Census, 2011) ..... ......... ..... ..... .. 4 Figure 3 Process flow for the research project.. ............................................................................ . 6 Figure 4 Population size per suburb (left) with the same data in graph format (right) ................ 22 Figure 5 Custom tabulation for affluent households in Bloemfontein (left) and Gr 12 earning more than R25 000 per month (right) .................................................... ..... ............... ................... 23 Figure 6 Dynamic ward map generation using census boundaries and stats ........................ ..... .. 24 Figure 7 Difference between three classes and ten classes (top), same data with symbols, or in 3D .................... .......... ... .. ..... ..... ..... ..... ... ..... ............. ..... ... ............... ... .... ..... ... ..... ............. .............. 26 Figure 8 Municipal level (top left) , town level (top right), suburb level (bottom left) and small areas (bottom right) .................................. ..................................... ..... ..... ... ...... ....................... .. ... . 28 Figure 9 Original boundaries (left), modifying the boundaries cause values to change (right) .. 29 Figure 10 Population count in 2001 (left), population count 2011 (right) ................................... 30 Figure 11 Custom population count of 40 139 within the red boundary (left), custom population count of 93 389 within the red boundary (right) using zonal statistics .... ..... ... ............... .... ..... ..... 32 Figure 12 Population estimates from census data using network analysis .. ................................ 33 Figure 13 Population count using disaggregated census data using surveyed parcels ....... ... ..... . 34 Figure 14 Use of census data by sector. .... ............. ....................... .. ... ...... ............... ................... .. 36 Figure 15 Participation by sector .............. ..... ..... ... ... ..... .......... ... .. ....... .... ..... ..... ..... ..... .......... ..... . 37 Figure 16 Standard use of census data ...... ........ ..... ...................... ........... .......... .......................... . 38 Figure 17 Use of census data in G IS ...... ...... ... ..... .... .... ................... ...... ................................. ..... . 39 Figure 18 Custom tabulation process from census data .. ..... ......................................... ..... ..... ..... 41 Figure 19 Use of custom queries with census data in GIS .... ..... ..... .................................... .... ... .. 42 Figure 20 Dynamic map production using data driven pages with attributes ...... ..... ..... ..... ...... ... 44 Figure 21 Use of data driven pages with census ...... ..... .. .... ........... .... .. ................... ..... ..... .... .. ..... 45 Figure 22 The influence of classification, interval count and symbol size on representation ..... 47 Figure 23 Using different representations of the same census data .............. ..... ...... .............. .... .. 48 Figure 24 Spatial aggregation of census data using dwell ing frame points .. ... .. ... .. ... .................. 50 Figure 25 Using multiple spatial levels of census in GIS ....... .... .. .... ...... ..................................... 51 Figure 26 Modifiable area unit problem illustrated through boundary change .. .... ..................... 53 lX Figure 27 Awareness of the MAUP ..... ........... ........... ..... ..... .................... ..... .. ... .. .... ..... ..... .... .. .. .. 54 Figure 28 Time series of different census datasets with usage constraints ... ..... ..... ..... ..... ..... ..... . 56 Figure 29 Use of older census data for comparison .. ..... ..... ..... ....... .... ..... ..... .... .. ........ .. ... .. ... .. ..... 57 Figure 30 Zonal statistics conversion from vector to raster format .... ..... ..... ......... ..... ..... ..... ..... .. 59 Figure 31 Use of zonal statistics with census data ... ..... ..... ..... ..... ........... ..... ..... ......... ............... ... 60 Figure 32 Service area generation using network analysis .. .. ... .... ........ ... ... ... ..... .. .. .. .... .... ... .... ... . 62 Figure 33 Use of network analysis with census .................................. ..... .................... .............. .. 63 Figure 34 Disaggregation of census data for micro analysis ... .. ..................... .............. .............. . 65 Figure 35 Use of disaggregated census data .... .......................... .......... .......... ..... ... ....... .......... ..... 66 x ACRONYMS AND ABBREVIATIONS CENGIS - Census and GIS CGIS - Canada Geographic Information System DUF - Dwelling Unit Frame EAs - Enumerated Areas GIS - Geographic Information System GIT - Geographic Information Technology IDP - Integrated Development Plan MAUP - Modifiable Area Unit Problem NDP - National Development Plan SAL - Small Area Layer SDF - Spatial Development Framework SPOTS - Satellite Pour I' Observation de le Terre No. 5 StatsSA - Statistics South Africa 1 CHAPTER 1: INTRODUCTION AND RESEARCH QUESTION "The first census in 1790 asked just six questions: the name oft he head oft he household, the number off ree white males older than 16, the number off ree white males younger than 16, the number off ree white females, the number ofo ther free persons, and the number ofs laves. " Tom G. Palmer. Census has come a long way with the first official count done under the supervision of King Servius Tullius of Rome, which resulted in a mere 87 000 people in total (Tenney, 1930). The word itself is derived from Latin, which means to keep track of adult males fit for military service. The famous historical account of a census in Biblical chronology under the order by Caesar Augustus is documented in Luke chapter two. Census formed part of the foundation stone of the ancient Roman civilisation. It dynamically transformed the military and political outlook of the empire, who esteemed themselves more than just a barbarian horde, also a populous capable of collective action. Over the year 's census have progressed from solely population counts to a highly sophisticated enumeration of household profiles, service provision, income, expenditure, education etc. Nowadays, census information is even questioned for putting people's personal safety at risk when disclosing private information for 'governmental planning' . ln spite of the questionable breach in personal safety, census nevertheless remains an intrinsic part of administrative action. The fact that census has a strong relationship with geography greatly enhances its usefulness in solving spatial disparities. According to the current statistician-general of South Africa, Pali Lehohla: "there is nothing as powerful as small area information, a statistical representation of the area in a map, for that talk in much better understanding ... " 2 1.1. OVERVIEW AND BACKGROUND Integrative use between census and Geographic Information Systems (GIS) evolved since 1996 until present. This evolution can be accredited to the technological revolution that started especially since the 1990s. Census on the one hand, is an ancient phenomenon, with the first official count recorded early 500 B.C. at the dawn of the Roman Empire. GIS, on the other hand, considered a modem-day marvel, which was created by Roger Tomlinson in the 1960s. The need to map, manage and analyse large areas of terrain gave rise to the first functional GIS (Burrough, 2001 , p. 361). The first official GIS was called the Canada Geographic Information System (CGIS), capable of digitally representing old cartographic maps and allowing users to seamlessly connect multiple maps into one great mosaic from which users could query information. Over the next 50 years this idea evolved into a fully-fledged GIS used in almost every conceivable discipline that uses spatial data in its analysis (McGrath & Sebert, 1999). In addition to GIS, census data has become the norm in strategic planning. Graphs, charts and tables derived from censuses feature frequently in municipal, provincial and national planning regulations, such as the Integrated Development Plan (IDP), Spatial Development Frameworks (SDFs), National Development Plan (NDP), sector plans, precinct plans etc. Apart from the government, the private sector also utilises census data in planning related activities. Academic institutions in particular are a notable user of census data, especially for research papers, where statistical profiles are derived from census data. Census data is very useful for any large-scale- planning initiative, regardless of sector, discipline or institution. Despite the statistical use of census information, the geographic component is seldom explored or utilised (Kakembo & van Niekerk, 2014, p. 451 ). Despite the apparent value residing within census and GIS integration, active utilization thereof has very limited in reports produced by public or private planning institutions. GIS also known as digital cartography, offers census data users great value for planning facilitation. Census seamlessly integrates into the core functionalities of GIS for storage, retrieval, analysis and display of spatial data (Burrough, 200 I, p. 363). GIS converts census data into digital cartography capable of displaying vast amounts of data in a condensed way. _______ j 3 Population Size Bloemfontein City • 0 - 2000 D 2 001 - 5 000 D 5001 -10000 ~Km • 10 001 - 20 000 0 1 2 4 • 20 001 - 48 676 Figure 1 Census data in map form Interpretation of census data in map format 'paints' a more informative picture, showing spatial relationships, areas of interest, which are otherwise difficult to see in the conventional graphs. 4 Population • Popula tion Arboretum Ashbury Bat ho 21666 Baysvalley Bayswater Bayswater Rural Bloem9nda 19325 Bloemfontein A1rP.Or1 Bloemfontein Central 7578 Bloemfontein SP Bloemside Phase l Bloemside Phase 2 Bloemside Phase 3 Bob Rodgers Park Bochabela 11210 Brandwa~ Buitesi Chris Ha 1 Dan Pienaar Deals Gift AH Ehrlic h Park Estoire AH Fauna Fichardt Park Fleurdal Freedom ~quare 25033 Gardenia Park GelukSH Generaal De Wet Grasslands 34601 Grasslands SH Groenvlei SH Grootvlei Prison 5586 Hamilton Heidedal 9605 Helicon HeiQhls He1.,1welsig HlllSQOrO Hilton Hospitaaloork J B Mdfora 22652 Joe Slovo Kopanong 16036 Langenhoven Par"K Linc;:iuinda LourierP.ark Mandela View Mangaung ~P "-Namibia Navalsig Ngordhoe"K Ohve Hill SH Oos-Einde Oraniesig Park W~Sl Pellissier Pentagon Park Phaliameng Rayton SR Rocklands Rodenbeck Shannon SH Spilskop SH Tempe Turflaqgte Uifsig Universil as Vredenhof SH Waverley Westdene While CitY. Wilgehof Willows Woodlands Estate Figure 2 Graph depicting population size of Bloemfontein city (Census, 2011 ) Although the same information is encapsulated in two different formats, the map version speaks more vividly to the user's understanding than the graph itself. 5 1.2. RESEARCH FOCUS This study is intended to evaluate integrative use between census and GIS. The geographic correlation between the two has been largely omitted in planning facilitation. However, despite the wealth of information hidden within census data, extracting the "gold" requires GIS. Having identified this apparent underutilisation of census data in GIS, this project intends to clarify some of the misconceptions and emphasising the notable benefits derived from census in GIS for decision makers. By taking these two standalone components, census and GIS, this project serves as a critical evaluation between the two by addressing several key aspects of integration. 1.2.1. Problem, Question and Aim The underutilisation of census data in planning support is evident by the sporadic use of census in GIS. Despite the wealth of information made available to the public free of charge, optimal utilisation of resources has been widely neglected in general. Having identified the apparent gap between census and GIS, this project will serve as a critical evaluation on the integrative use between Census and GIS for planning support. To answer this question comprehensively several aspects of integration need to be evaluated. A custom framework called CENGIS (derived from Census and GIS) needs to be developed for assessment purposes. The overarching aim of this research project is to evaluate integrative use between census and GIS for planning support in light of the CENGIS framework. This would clarify the misconceptions between the two and highlight the powerful relationship. 1.2.2. Research Context In terms of study area, the majority of examples used in the CENGIS framework are purposely confined to the author's home town, Bloemfontein see Figure 1 and 2. Deploying the CENGIS framework in a place people can relate to will improve overall correctness of answers. However, this study is not intended to be geographically bound, but rather a general assessment on integrative use between census and GIS. The CENGIS framework serves as a generic evaluator that can be used in conjunction with different examples, depending on the audience. Another constraint introduced through the CENGIS framework is the fact that it only addresses some of the more predominant integrative principles without extensively looking at minor ones. 6 To evaluate integrative use one needs to systematically define what and how you measure. All aspects included in the CENGIS framework are derived from the review in chapter 2. There exist myriads of other aspects that serve as integrative indicators of which only the most predominant types would be included between census and GIS. What is important to underline is that the CENGIS framework is designed as a general guideline to evaluate integrative use between census and GIS and can be extended for future research. 1.2.3. Research Methodology Step 1: Literature Review •National and International Literature •Census and GIS integration •Narrowing the sphere of interest, with applicable categories Step 5: Interpretation and Synthesis •Summary of findings / Step 2: Research Problem •Future research •Articulate aims and objectives for the study •Feasibility study of methodology and possible constraints Step 3: Develop CENGIS framework Step 4: Deploy the CENGIS •Quantify the number of aspects framework addressed •Deploy the CENGIS framework •Identify suitable exampes to illustrate •Collect and verify results concepts •Interpret results and discuss important •Feedback from users to improve the findings framework Figure 3 Process flow for the research project 7 1.3. CHAPTER OUTLINE Chapter one focused on giving a brief introduction on census and GIS by outlining the apparent gap with regard to integrative use. Furthermore, it introduced the scope of the study, which is to evaluate integrative use through a custom framework between census and GIS. The following chapters will unfold the research aim systematically. Chapter 2 is composed of a literature review on census and GIS, focusing particularly on integrative aspects between the two for decision support. Chapter 3 introduce the CENGIS framework, which is derived from the aspects reviewed in chapter two. Each aspect on integrative use is explained to the reader. Chapter 4 takes the results collected from the CENGIS framework by discussing each aspect on integrative use as derived from the results obtained. Chapter 5 concludes the research by a brief review on the projects aims as mentioned in chapter one and gives a broad summary of the key findings on integration as derived from the CENGIS framework. The project ends with a general summary of the project with recommendations for future research. 8 CHAPTER 2: UNPACKING CENSUS AND GIS INTEGRATION 2.1. INTRODUCTION This chapter serves as a twofold review. Firstly, an overview of census from a historical perspective within the South African context as it happened in 1996, 2001 and 2011. In addition to being decennially conducted, census remains one of the most ideal sources of information planning support. Secondly, integration between census and GIS is made possible through its strong relation to geography. The GIS revolution has greatly enhanced the use of census when hand-drawn areal units in 1996 were converted for the first time into their digital counterparts for GIS analysis. A review on key areas of integration between census and GIS is covered in this chapter, focusing on aspects of tabulation, dynamic map making, representation, time series, network analysis and spatial aggregation just to mention a few. 2.2. CENSUS AND GIS OVERVIEW Census activities in South Africa are conducted under regulation of the Statistics Act No. 6 of 1999. This act ensures that census activities are independent of political interference. It gives stati sti cian-generals the right to collect information they deem necessary for the production and dissemination of official statistics. It is agreed upon within section 17 of the act that the Statistics body of South Africa (StatsSA) will not disclose any information related to an individual, household, business, or any other organisation to protect the confidentiality of all participants. Information is aggregated to minimise the risk of disclosing anyone's identity (Government, 1999, pp. 20-2 1) . The purpose of official statistics articulated in the Statistics Act is to assist planning facilitation for organs or state, businesses and other public or private organisations in planning, decision-making and monitoring of governmental policy (Government, 1999, p. 6). South Africa conducted fragmented population counts dating as far back as the 181h century. After apartheid South Africa conducted censuses in 1996, 2001 and 2011 respectively. Census information is intended to evaluate the performance of Governmental programs and policies (StatsSAa, 2011 , p. 5). Taking 2011 as an example, planning started already in 2003, with pilot studies conducted in 2008 and 2009. The country was subdivided into enumerated areas (EAs), 9 which are roughly composed of 150 households each. Nationally there were 103 567 EAs and 160 000 staff members for census 2011. An estimate of 15 million questionnaires were distributed and processed through scanning to extract information. Post-enumeration studies were conducted to minimise extravagant inconsistencies (StatsSAd, 2011 , pp. 1-2). A census is intended to sample 100% of the population, whereas surveys only sample a portion thereof. One could readily infer that a 100% count is far better than only a proportional sample. This sets census apart in terms of accuracy, reliability and usability (Peters & MacDonald, 2004, p. 3) for planning, decision-making, monitoring and assessing of policies (Government, 1999, p. 4). However, despite the aspired 100% claim, a 10% undercount is allowed (StatsSAc, 2011, p. 12), which can be adjusted by means of a nationwide post-enumeration survey (StatsSA, 1996, p. 99). Undercount figures can differ significantly depending on gender, age and geographic location (StatsSAb, 2011 , p. 13). Despite the undercount, census information remains the most comprehensive baseline for planning in the country (Peters & MacDonald, 2004, p. 4). Active introduction of GIS in South Africa came in the mid- l 980s (Jobson et al., 1986, p. 59), mostly spearheaded by the Stellenbosch Department of Geography who remained the forerunner since 1975 with its expertise in geographical information technology (GIT), especially in the area of cartography, GIS and satellite remote sensing. During the 1990s due to the technological revolutions, Stellenbosch introduced its own independent GIS laboratory, nowadays known as the Centre of Geographical Analysis. H.L. Zietsman is in its own right the founding father of GIS in South Africa pioneering the work in the early 1970s (Liederman, 2015). Since the 1980s trained staff, geographical datasets, private companies and software developers have steadily emerged to make GIS a means for development in South Africa (MacDevette, 1993, pp. 18-19). Datasets in South Africa range from demographics, education, soil , climate, electrification and infrastructure amongst others. Since the 1990s GIS became part of mainstream planning for Government in relation to census, water management, agriculture, environmental management, health care, forestry etc. In terms of the private sector GIS is used extensively in siting a franchise, logistic operation and mining (MacDevette et al., 1999, p. 914). 10 2.3.CENSUS AND GIS INTEGRATION 2.3.1. General aspects on integration The emergence of GIS over the past few decades became the most powerful contributor to spatial planning. GIS is revolutionising all planning related activities (Chapin, 2003, p. 1) . Since the 1990s GIS became more widely adopted (Felke, 2014, p. 1), featuring in numerous journals (WeiWei & WeiDong, 2015, p. 1). Utilisation of this technology has been limited, partly due to the late introduction of GIS into educational curriculums since 2003 (Felke, 2014, p. 1) . Another reason 1s attributed to GIS's quantitative orientation, which is not suitable for qualitative research. This trend is slowly changing as GIS progressively moulds into a more versatile technology (WeiWei & WeiDong, 2015, p. 1). The rapid expansion of Geographic GIS into socio-economic sciences is a proof of this. Furthermore GIS enables evidence based decisions in critical areas for intervention such as poverty reduction (HSRC, 2011, p. 1) . Since the 1990s governmental adoption of GIS has grown steadily, with more and more municipalities including data analysis into their core workflows of planning (WeiWei & WeiDong, 2015, p. 1). As part of the United Nations development plan, they helped 40 of the poorest countries in the world to gain access to GIS technology for strategic planning. Globally, census has been one of the main areas that has benefited from the adoption of GIS technology (HSRC, 2011 , p. 9). The census mapping systems were already utilised by Japan in 1991 and Israel in 1995, which enabled census data to be georeferenced even to the extent of a dwelling. Another program launched in 1997 for Africa, the GeoSpace program, established National Statistical Offices (NS Os) in 15 countries to provide census mapping solutions (HSRC, 2011 , p. 10). GIS mapping in South Africa is a relatively new introduction, especially in relation to census. For example, prior to 1996 EAs were hand-drawn; it was only in 2001 that the EAs were captured digitally. This dynamic transition formed a strong underlying basis for data capturing that could be referenced and queried geographically. The 2011 census excelled at using GIS in the census workflows throughout the planning and pre-enumeration phase. Satellite imagery of France called Satellite Pour !'Observation de le Terre (SPOT 5) of 2008 was used as a reference 11 to draw the EA boundaries for the 20 11 census. In addition to the digital EAs, SPOT 5 imagery facilitates in the capturing of Dwelling Frame Units (DFUs) dataset of the entire country (HSRC, 20 11 , p. 12). The importance of the geographic frame for census has been elevated extensively since 1996 (Lehohla, 2005, p. 4). Statistical representation of census attributes across space cannot be overemphasised in terms of application power. Decision makers need to know where to focus in terms of investment and development (Lehohla, 2005, p. 3). The term statistical geography has become popular since 2001 , with different geographic layers made available to the public. Introduction of the Small Areas Layer (SAL) in 20 11 improved the accuracy of spatial analysis exponentially. The next revolution would be to move from EAs to Dwelling Frame Units (DFUs) which captures statistics on micro level; producing more reliable statistics (Lehohla, 2005, p. 4). 2.4.1. Elementary aspects on integration Census data is collected on the basis of individual households. StatsSA ensures the confidentiality of participants by taking the appropriate steps to ensure that tabulated data will not reveal the identity of individual participants. To ensure confidentiality, census data is aggregated over a particular geographical area and averaged (Peters & MacDonald, 2004, p. 22). It starts at dwelling frame points and aggregates into enumerated areas, small areas, suburbs, towns, municipalities, districts and ultimately, provinces (StatsSAd, 2011 , p. 5). Software such as SuperCROSS allows diverse tabulation methodologies where users can recode values according to selection for subsets of new labels. Several calculations can be performed on tabulated data, such as column and row totals, percentiles, pareto, variance, asymmetry and skewness (SuperCROSS, 2012, p. 19). Other than tabulation, census data needs to be displayed through a GIS. From the 1980s GIS focused primarily on two key issues, of which one was automated map making (Burrough, 2001, p. 36 1) . Census lends itself toward extensive cartographic output. England was one of the first countries to use automated map production with census data in 1981, effectively transforming nearly 3.2 million numbers into 580 statistical areas of choropleth shaded maps (Browne & 12 Fielding, 1987, p. 82). One of the main concerns for statistical representation in map fonn is homogeneous regions which cannot reflect extreme heterogeneity of variables adequately, as observed on the ground. To keep sample population size relatively even, the size of the delineated area would grow bigger in less populated areas and smaller in urban areas (Browne & Fielding, 1987, p. 83). The importance of scale is another factor that influences representation. This graphic conflict with regard to cartographic representation of census data can be addressed through generalisation to solve the representational conflict induced by scale (Ware et al., 2003, p. 296). Manual map generalisation is intrinsically still a cartographer's work. Until now automated census mapping was still being questioned for reliability, with ongoing research being conducted (Steiniger, 2007, p. i) for identifying rules, which is translated into generalisation processes and algorithms to deal with each map representation scenario (Steiniger, 2007, p. 6). During map making the most time-consuming task is annotation. Labelling of geographic entities takes time and automation seldom does justice to the representation of the data (Freeman, 2005, p. 287). Usability of maps depends on clearly annotated features; although it seems simple the task is indeed complicated. Labelling should clearly articulate the spatial relationships clearly (Freeman, 2005, p. 289). Labelling area features in census requires consideration on the shape and extent of each feature. Placement of labels should ideally fit inside the are~ f.or_.reoogrrirt611. To automate map production, text-placement strategies-"'tffiifadh~;~- t~·~~pmp18'CMtJlnm\nt(llY ,, s,aos- e~ .., I ... J1lP standards need to be implemented (Freeman, 2005, pp. 2~9 ) hC"r' , :/I pr•;:i·J~ ~ . ... Rlc:.n ,,ontGm o'\r(\ Displaying census data visually needs to be done in a manner that is cartographically acceptable (Burrough, 2001 , p. 363). When large amounts of data can be displayed graphically, spatial patterns and relationships should be clearly articulated (Koua & Kraak, 2004, p. 1) . Statistical representation of data, such as census, is a powerful analytical tool for decision makers. Despite textual and numerical analyses, governmental policy and planning rely extensively on visualisation of census data. Usefulness of different cartographic depictions of the data needs to be evaluated and adjusted based on the intended use (Manan & Hashim, 2010, p. 367). Change detection is often clearly visible through spatial representation. Visualisation can be done in numerous ways, one of which being different colour tones (Manan & Hashim, 2010, p. 373). GIS is the most reliable medium with which to visualise census data because of its abil ity to directly 13 link aspatial data (census data) to spatial data (census boundaries) (Manan & Hashim, 2010, p. 376). According to Monmonier (1991 ), all geographic representation contains some form of "lie". For example, entities are represented by symbols that are always larger than their real world footprint. The mere fact that a spherical globe needs to be portrayed on a two-dimensional surface (i.e. map) gives room to distortion, which will always represent a selective and incomplete view of reality. However, the degree of misrepresentation varies from negligible to seriously wrong representations. A cartographer's skill is essentially to know where to "draw the line" in terms of the information they want to convey (Monmonier, 1991, p. 1) . The mere fact that you can produce infinite variations of the same map using the same data should make users aware that cartographic representations are biased. Not to mention the political influence on shaping public opinion through maps by suppressing contradictory information and using dramatic symbolism (Monmonier, 1991, p. 86). Knowing the three factors of representation - classification, generalisation and symbolisation - is of critical importance. Classification can produce an infinite number of varieties; it is inherently a creative process and nothing else. There is no clear and absolute method on classification of data (Dodge et al., 2011 ). Classification introduces order and coherence in the data. Both the purpose and method used need to be evaluated for their constraints. The need to choose appropriate methods for the intended purpose is of utmost importance. Apart from the classification scheme (equal, defined, exponential, manual, quantile, natural or standard deviation), the number of intervals are equally important. Too many intervals may limit distinguishability of data. The choice of symbols placed over a choropleth surface that varies in size depending on the chosen attribute give a good illustration of data variance; however, if extreme values exist smaller symbols may be "swallowed" by bigger ones. Symbols can, however, lead to difficulty in interpretation if the audience is not skilled in cartographic representation (Chainey & Ratcliffe, 2013). Another useful representative means is dots inside a census unit (polygon) that represent the value of dot count within the boundary. Colour variation is seldom necessary for dot density for population estimates (Elangovan, 2006, p. 108). 14 2.4.2. Intermediate aspects on integration Census data are collected on individual household level, but available in aggregated format (HSRC, 2011 , p. 16), which summarises the samples and averages each across the enumerate- area (Peters & MacDonald, 2004, p. 21) . To ensure confidentiality individual entities need to be aggregated before dissemination (Reidl et al. , 2006, p. 900). Aggregation does not portray social activity accurately, and should only be used as very indirect indicators of behaviour. Furthermore, detail is further obscured when normalisation is applied. Census representations elaborate more on the shape and size of the enumerated area that of people actually living and working in them (Reid! et al. , 2006, p. 906). Depending on the level of spatial aggregation, disparities can be hidden, for example, population growth is seen on a higher level of aggregation, yet the underlying lower level shows numerous areas of population decrease (Paez & Scott, 2004, p. 58). Aggregation bias can be adjusted by means of a matrix transform, such as correlation and regression analysis (Paez & Scott, 2004, p. 59). What spatial aggregation inevitably causes is a disregard of heterogeneity of underlying samples. The mere fact that census geographies are made of spatial units shows that different areal units will produce different results during analysis (Dumedah et al., 2008, p. 48). According to Openshaw ( 1984), no sound alternatives to managing aggregated data in a statistically sound framework. The scale and shape of the areal units influences any spatial analysis. It is recommended to compare results from different spatial resolutions to clarify the data (Jacobs-Crisioni et al., 2014, pp. 52-53). It is indeed difficult to predict aggregated elements of coarser resolution, since they follow a stochastic pattern. The shape effect exists due to irregular delineation of spatial geographies that cannot fully account the heterogeneity of the underl..~. ng population (Jacobs-Crisioni et al., 2~14, ,,. \ p. 53). ? .. Using aggregated spatial data with pre-defined areal units such as census creates a well-known issue called Modifiable Areal Unit Problem (MAUP). Studies have been conducted on the MAUP from the 1930s, but only became of real concern since the 1960s and 1970s. Despite the research conducted results remain vague on how the MAUP influences univariate, bivariate and multivariate statistics (Dark & Bram, 2007, p. 472). The boundaries are the source of the MAUP (Reidl et al., 2006, p. 900). Several analytical techniques are affected by the MAUP such as regression and relation analysis, spatial interaction and location-allocation modelling (Paez & 15 Scott, 2004, p. 58). Boundaries can be infinitely modified making the MAUP unavoidable. This arbitrary subdivision of areal units for the purpose of aggregating data is known as the MAUP (Jacobs-Crisioni et al., 2014, p. 48) (Manley et al., 2006, p. 144). The direct result is variation in derived answers if different areal units are used. Both scale and zone are inherently related to the MAUP. The irregular size of spatial areal units in census geographies makes the MAUP unavoidable (Dumedah et al. , 2008, p. 48). Outcomes are always dependent on scale and shape aggregation. Despite the extensive literature on the MAUP, no clearly defined solution has come of date yet (Jahanshiri , et al., 2015, p. 47). Where data gets aggregated into different sizes or shapes, the aggregation problem occurs. The zonation effect is caused by the grouping of smaller areal units into larger ones (Dark & Bram, 2007, p. 4 72). To address the scale problem the use of an optimal zoning system is recommended to create homogeneous units. Despite the effort to minimise scale variability in analysis, the results still remain biased (Dumedah et al., 2008, pp. 48-49). The main concern with census is that data gets collected on non-modifiable entities (households) and aggregated into modifiable units (census boundaries) for reporting. It is not possible to create ideal census geographies that take all spatial scales and processing into account (Manley et al., 2006, p. 159). Misrepresentation is inevitable. One way of minimising this modification and producing more homogeneous zones of data would be to down-size the areal units. This effect is shown where an 800-unit dataset showed a 10% increase in the elderly population cause a $308 decrease in family income; however, with 25-unit dataset a 10% increase produces a $2,654 decrease in family earnings (Prouse et al., 20 14, p. 66). This is quite a significant margin of error. The MAUP is especially problematic in demographic studies such as census when choropleth maps are used to visualise data. Thematic mapping is known to grossly misrepresent the "ground truth" of social and economic variables. Just the mere fact of an abrupt change when moving from one boundary to the next illustrates the shortcoming of zone based statistical representations (Reidl et al., 2006, p. 900). According to Openshaw (1984) the effect of the MAUP could be limited in the census by identifying the appropriate scale for spatial analysis for display. However to work around the MAUP is possible if the individual counted entities are analysed apart from aggregation (Dark & 16 Bram, 2007, p. 477). The implications of using census data depicted in choropleth cartography and thematic mapping has a significant effect on policy. Census geographies are often politically labelled based, on the assumption that the representation is accurate, which results in intensity either being over or underestimated (Reidl et al., 2006, p. 901). Census data has long been used to formulate public policy for public fund distribution; however, the fundamental flaw associated with such use is that policy makers assume that census areal units are fit for the intended purpose. For example, identification of poverty hotspots is not arbitrarily possible with census, because the geography of poverty has little or no correlation with census areal units. Poor people can be found randomly in areas seen as rich; thus census gives only a distorted view of reality (Reid! et al. , 2006, p. 902). Apart from the MAUP, another concern with decennial census data is time. It is recommended that the census intervals be changed from every 10 years to a more continuous measurement. Using decennial data for trend analysis is not effective because most of the important variability is simply ignored (Salvo & Lobo, 2006, p. 226). In South Africa this gap between the census of 2001 and 2011 was breached with a Community Survey in 2007, with the next one planned in 2016. These types of sub-census programs provide data on municipal level but not on the small census geographies as recorded in the full decennial census every 10 years (Radebe, 20 15). There is, however, a positive use of historical census data. Firstly, it allows for meaningful comparison because data is georeferenced. Secondly, data can be visualised and animated. Lastly GIS assists in spatial analysis of coordinate locations of the census features (Gregory & Healey, 2007, p. 639). Because census data is collected spatially, this component makes historical analysis of census optimal for spatial comparison or trend analysis. Data can be joined back to the former boundaries and captured digitally in GIS for temporal analysis (Gregory & Healey, 2007, p. 640). Having historical data enables the users to layer different time periods and study relationship across different categories. Real insight into local patterns of distributions, such as race, can be determined using historic census data (Gordon, 20 11 , p. 10). Some hurdles encountered through historical GIS are the reliability of names and numbers used between census dates. Spatial and attribute precision are two factors that influence the comparability of different census datasets 17 (Southall, 2011, pp. 150-151). If high variability of census boundaries occurs at sublevel , such as enumerated areas, data can always be analysed for spatial temporal analysis using higher-order data, such as municipal boundaries (Masser et al., 1996, p. 91 ). Census units are subject to boundary shifts, which will acquire additional techniques to ensure continuity and quality of time series within census data (Nyerges et al. , 2011, p. 38). 2.4.3. Advance aspects on integration Decomposition of population distribution estimates is a common problem. Several methods of decomposition have been developed for census. As mentioned by Wu (2008), for various reasons people might need to estimate population not based on census boundaries. Areas might be smaller or even irregular in shape, such as a population living within a flood risk area, or number of people within a certain distance of some transport network, i.e. road (Wu et al., 2008, p. 122). By means of raster representation of census vector boundaries can be converted using pixels, representing the original value within the zone (Spiekermann & Wegener, 1999, p. 1) . Methodologies used to decompose census vector data is real weighting, pycnophylactic interpolation and dissyrnmetric mapping. Weighted interpolation is essentially the most common form of interpolation which takes a regular grid, intersects it with the underlying census boundary, and assigned the value based on the proportion of the census boundary contained within each cell. However, this method applies the assumption of uniform distribution of population within the demarcated census zone. Gridded population sets are quite common, such as the Gridded Population of the World (Sheckhar & Xiong, 2008, p. 882). Zones need not necessarily be connected to be summarised (Frank, 2005, p. 202). Zonal statistics essentially summarise the data from a underlying raster based on an overlying zone. Various statistics can be calculated for each zone where the user can specify which operation to use, such as mean, median, max, min, standard deviation, variance, count or sum (Bahgat, 2015, p. 136). Apart from zonal statistics, threshold and capacity estimates are another crucial planning tool. Provision of social amenities, according to the Council of Scientific and Industrial Research (CSIR) provide accepted norms and standards for travel distance to social amenities. These services and amenities are classified according to population density, which in tum determine the acceptable distance and coverage area (CSIR, 2012, pp. 11 ,24). Census data is unfortunately the 18 only available means to ascertain these requirements, and GIS offers the means to do so (Gibson et al., 2011, p. 247). The use of georeferenced data enables the calculation of population estimates within a prescribed distance. Geographic access to services can only be done reliably with GIS. To ensure that the distance calculated is not as the "crow flies", but measured according to topography, the network analysis function is employed. Since the distance between two points is always longer than a straight line, it requires network analysis to give reliable estimates of population estimates within a specified distance of each facility. The network model is the most popular conceptual model to represent a network i.e. roads within a GIS environment. Networks are composed of nodal points and connector polylines. Nodes are one-dimensional entities and polylines are two-dimensional entities. This ensures the topological integrity of the modelled network. Relations of nodes and polylines are stored in a database; this is to ensure the right attributes' associations with each entity, such as speed, elevation, road type, etc. (Fischer, 2006, p. 45). The service area function in network analysis calculates the linear distance road- wise from predefined locations. Service areas can be constructed from individual points or areas. The only requirement to generate a service area is a predefined location, a threshold distance and an underlying network topology. The accuracy of a service area in network analysis depend on the quality of the modelled roadways, directions, connectivity and barriers (Oh & Jeong, 2007, pp. 28-30). Lastly, zonal representations of statistical data take all attributes within the zone and distribute it unifonnly throughout the zone. However topological relationships and complex socio-economic activities are oftentimes ignored which leads to serious methodological problems during analysis (Openshaw, 1984, p. I). The so-called "strait-jacket" assigned to zones captivate it under the inherent weaknesses attributed to zone-based analysis. Spiekermann and Wegener ( 1999) refer to this phenomenon as the 'tyranny of zones'. A combination of vector and raster representations can be used in a disaggregating model to overcome the disadvantages of zones. Interpolation can disaggregate zonal data for micro-scale analysis (Spiekermann & Wegener, 1999, pp. 2-3). To facilitate the process, disaggregated data is required. If no micro scale spatial data is available GIS can be used to generate probabilistic disaggregated spatial data based on zone data. To disaggregate zone-based data, such as census areal units, the land use within the zone needs to be taken into consideration. A combination of raster and vector representation using disaggregated 19 spatial data, such as land parcels or transport network, allows for a powerful reorganisation of data on micro scale. Generating artificial sub-block areas and using it for estimating population within an overlying zone is relatively accurate (Wu et al., 2008, p. 121). As the number of sub- blocks increases, so does the margin of error. Estimation of population size often times does not coincide with census zones. Governments might need to estimate the number of people living in a flood-risk area, which will obviously not correlate with census boundaries. Estimation of population within a custom distance forms a single location, or a corridor renders census zone inadequate for the purpose (Wu et al., 2008, p. 122). Population estimations are generally done in three ways: those done based on census zones, inferred population based on physical or socio- economic variables, or disaggregated census unit populations into sub zones. In the end, detailed land use data will essentially improve disaggregated data reliability when choosing to subdivide data for population estimations. 2.4.CONCLUSION As discussed, the strong relationship between census and GIS is due to its geographic component. In the section covering a brief overview on census, the extensive coverage of census sampling is unparalleled in comparison with other surveys. It remains one of the most reliable baselines for evidence based decision-making. Firstly, integrative use of census and GIS is quietly causing a revolution in planning, since the government's adoption of GIS in the early 1990s. Conversion of the hand drawn census boundaries into their digital counterparts laid the foundation stone for spatial analysis. Tabulation of data in third-party software greatly improves the use of census in different planning scenarios. In addition census takes full advantage of dynamic map making, reducing the overall time needed on generating informative cartographic answers from census with different spatial representations. Besides the fact that census data is aggregated to hide participants' identities, the availability of the SAL greatly reduces the long standing problem of the MAUP, which is especially prevalent when comparing previous census data with newer ones. Although census data is disseminated in census areal units, GIS can reliably free census data form the tyranny of zone through zonal statistics and disaggregation to greatly improve its usefulness for micro scale analysis. 20 CHAPTER 3: FORMULATING THE CENGIS FRAMEWORK 3.1. INTRODUCTION Development of the census and GIS framework, known as CENGIS, takes ten aspects of integration into account. To evaluate integrative use one needs to systematically define what and how you measure. All aspects included in the CENGIS framework are derived from the review in chapter 2. In addition to the ten aspects chosen for evaluation, there exist myriads of other aspects not included in this research project for obvious reasons. The CENGIS framework essentially focuses on the more important integrative uses between census and GIS. What is important to underline is that the CENGIS framework is designed as a general guideline to evaluate integrative use between census and GIS and is not the holy grail of assessment. Besides the given examples, concepts discussed in this section can apply to datasets outside the vicinity of census. Each aspect of integration is evaluated through a brief definition or description on the concept assessed. Some of the aspects are more common to the average user; whereas other aspects may require additional techniques used by more experience users such as conversion of census data into raster datasets for map algebra. Besides evaluation on integrative use, the CENGIS framework is intended to create, amongst census users, some awareness of the vast possibilities effective integration offers for decision support. Although possibilities are endless, constraints nevertheless needs to be properly addressed in a manner that does not undermine decisional accuracy as seen in the spatial analysis "crimes" as mentioned in chapter 2. The CENGIS framework places emphasis on some common pitfalls experienced by census users in the GIS environment, such as spatial aggregation, modifiable area unit problem, disaggregation, comparability, and representation. Hopefully the CENGIS framework can assist users in future to minimise misrepresentation by adhering to accepted norms and standards as prescribed in cartography. 21 3.2. CENGIS FRAMEWORK 3.2.1. Extraction Standard The first aspect of integrative use between census and GIS starts with the basic relationship between census and corresponding spatial entities. Most users have access to tabulated data through online access or private standalone software such as SuperCROSS. The frequent use of census data in governmental reports and academic research mostly originates form users simply using pre-tabulated data from an existing source, be it a website or spread-sheet. When assessing the use of census variables, one seldom finds sophisticated tabulations done by the users. Census variables are geographically referenced through geographic codes depending on the spatial level queried. Defining the geographic extent and level of detail is the initial step before tabulating variables. The following spatial layers are made available for tabulation: • Provincial (9 features) • District (52 features) • Local (234 features) • Ward (4227 features) • Main Place (14039 features) • Suburb (22108 features) • Small Area (84907 features) After defining the geographic extent and level of detail as to geographic extent, it is best to use codes, instead of names, as unique identifiers to join the census tabulation back to the spatial layer based on that unique value. An example of a unique identifier in census works as follows: Free State (value: 2), Municipal (value: 210), Main Place (value: 211), Sub-Place (value: 211001), Small Area (211001001). As seen, depending on the spatial extent, the unique identifier would increase if the geographic area becomes smaller and is lower than the spatial hierarchy. This unique identifier serves as the fundamental link between the tabulated data and the census spatial layer. The census data is then simply "joined" based on this unique identifier and used in any GIS software package for display see Figure 4 for illustration. Maps produced from census data oftentimes portray a picture in a much more comprehensive way than the conventional way of displaying census data in graphs, tables and charts. 22 Po·p-ula-tion /".../ '°"''., I Populotlon Siie • 0 · 2000 • 2(1)1 · SOOO D Sa> l • 10 000 I• 10(1)1 . 20000 , . 2)001 · '8•7• 0 I 2 Figure 4 Population size per suburb (left) with the same data in graph format (right) 3.2.2. Extraction Customised Tabulation through SuperCROSS can be seen as the innovative way to summarise data with queries. Census data is classified according to category such as descriptive stats, dwelling stats, family stats, household services stats etc. Each category contains several tables pertaining to that category to address a vast combination of questions from that specific category. Depending on the level of spatial detail, tabulation can be done from provincial level to small areas which are tinier than suburbs. What tabulation essentially allows the user to do is to build complex queries with multiple criteria, such as: How many households own a refrigerator, vacuum cleaner, washing machine, computer, motor-car, television, cell phone and have access to the internet within Bloemfontein city. Querying the census data in innovative ways can essentially answer this question as depicted in Figure 5. Once a census user realises the ease of tabulating custom queries by recoding field values either individually or collectively it opens up a whole new potential for GIS integration. Integrating custom queries from census data into GIS allows for rapid visualisation for decision support. In general the use of custom tabulations from census in GIS is seldom seen or utilised , yet this aspect off~s ~ .rJ~h supportive function, especially to · - ····~··- those in governmental authorities. 23 ,I,A..g,,e_nd [...-v•.. ... Numbtrot People •• ••• D ••·• • •••• · tl - 1& Figure 5 Custom tabulation for affluent households in Bloemfontein (left) and Gr 12 earning more than R25 000 per month (right) 3.2.3. Map Production After tabulation is completed and joined to the corresponding spatial entities, map production in GIS is relatively straight forward. Over the years GIS developers worked on ways to automate cartographic map production, which require little customisation from the user's side. In South Africa the world's leading propriety software called ESRI continues to dominate the GIS market. ArcGIS includes by default an automate map function called data-driven pages, which essentially loops through a prescribed list of features using a unique identifier. All census spatial boundaries have a unique identifier, making it easy to do automated mapping for all features in census data, regardless of the spatial hierarchy, be it provincial, municipal, ward, suburb, etc. 24 Ma!?i~ung 22 u :u•_...,. C.._.H11 ._. c--~ ""I lo...- ... \.& u 40-'~ ~~· Ill> ':-l ~~:. o. ... ~ .c...,,....\.&, .~ ..'. c- ...._.. ~. ...' X7"'1l ... f"'9~ , -111: ...... 0 ..-.d .._...._..... . •".,~ .."_"" '°° •. ....,,i..- ~ ~ " · ~..... .. ................... .. •40rco- 1 • f'a!J(X: •""ac' ..,.x ·< M,.:O•S••~ • •.ct-·•~ •":!l:!: 22 ·•c-t••'" a.&X I ,x3.3.1·0.ex :-a•'SlE : ~~~ :"l • · .,;.~ ...: ,,.!... ." ! ,._:~_..,, ..._.....,...,.. . -- .. w,,~~ ~·-_~, - 0 .. ' ""j_. .c:-c, .. .., .........,_. ..... ., ~. . 3 •• ;':r~~-... .. ~ .......,..0 ... '"~"o"9-• 11..c-••'t. .... .~..... --:: ~ - c-·~-oco• ........... .,_ ...c ---"""-'° "·"' - Figure 6 Dynamic ward map generation using census boundaries and stats If set up correctly automated mapping virtually reduces time exponentially. Map functions include custom displays of an area 's extent. Dynamic attributes such as area name, and any other variable associated with that feature can be dynamically updated for each map when generating a series as seen in Figure 6 above. Dynamic attributes enable census users to include vast amounts of information supplementary to the map. For example, creating a ward profile map for every ward in the Free State province would total 317 individual maps. With data driven pages users can have custom attributes assigned with each map such as population count, language percentage, population group, age etc. generating quick and informative maps within a short period of time. Besides the functionalities provided through dynamic map making such as data driven pages, census users seldom utilise this function . 25 3.2.4. Representation Cartographic depictions of census data is seldom questioned and regarded as authoritative. As mentioned in chapter two, people are remarkably ignorant about the number of variations a cartographer can generate from the underlying data. Classification of census data is by default univariate and done on the fly. After tabulation of census variables, data needs to be formatted for representation. Classification introduce some form of order into the data that needs to adhere to the intended purpose or use of the data. Using different methods of classification will change the representation accordingly. Among the classification methods used, "Jenks" classification, which is also the default, is the most frequent classifier used in census representation. However, classifiers such as quantile, manual or equal (Figure 7 top left), geometric or standard deviation, might be more suitable, depending on the application. The problem induced with classification is that you can virtually render an infinite number of variations from the same underlying data. For example, depicting poverty in a range from green (not poor) to red (poor), the classification classes one can manipulate to either increase or decrease the visual representation of poor people. Intervals on the other hand are an abrupt change from one class to another. In general, more intervals would produce more subtle variations in colour, making discernment more difficult to the user. ln cartography, going beyond five intervals for a ramp colour is too much (Figure 7 top right), and less than three is not useable. The problem associated with interval count is that there is no concrete guide for choosing the number of intervals. This gives a cartographer room to change the representation of census data by only adding or subtracting intervals. 26 Figure 7 Difference between three classes and ten classes (top), same data with symbols, or in 3D Apart from colour, census data can also be illustrated with symbols. For example, demographic size can be depicted using circles varying in size on a map to illustrate population density. The advantage of symbols is interpretation. However, just as classification with regards to colour for symbols is subject to the user's bias so is symbols size as seen in Figure 7 bottom left. This inevitably leads to misinterpretation by the user during decision making. Besides symbol size, 27 the number of symbols, such as dot density is another way to illustrate density or census values within a census unit. Census can be represented in numerous ways using different classification methodologies, intervals and symbols; however, the problem with all three is their authoritativeness, which can lead to gross misrepresentation of the actual ground truth. Depending on the application of the census data, the pros and cons of each classification scheme, interval count and symbol type in each scenario one needs to carefully consider. This consideration is often times overlooked, however the direct influence of that representation is enormous. 3.2.5. Spatial Aggregation Data aggregation is a fundamental principle that influences the use of census data. The nature of census data collection is done per household called geo-referenced dwelling frame points, which are then aggregated into enumerated areas to protect participants' safety and security. Individual dwelling frame points are aggregated hierarchically into enumerated areas, small areas, sub- place, main-place, municipal, district and provincial is done sequentially. Census makes six of these aggregated layers available for dissemination, of which the smallest is called the SAL. The lower layers on the aggregation pyramid are useful for strategic intervention. Moving up the pyramid, higher order entities serve as strategic indicators for decision makers. For example identifying the poorest areas in the Free State, the principle of aggregation can be used to help solve the problem. Starting from strategic: identify the poorest district, then the poorest municipality, then the poorest place, then the poorest suburb, then the poorest small area as illustrated in Figure 8. Using different levels of spatial aggregation greatly facilitates the decision making process. 28 Poorest Households Pet NIVniciod:ty d HOUMhokte ptt Municlplolty . - . ..., . ,,...· 211'0l 1Q 1tfOl . -4S1M ~Km • «SfW . tU1N 0 2040 ao -- 12'lt• · 2>t170 Legend /~'\/ .-°""" HouHhotdt HwM holdt per Sub-Pf.lice perAru • 0.,, . 0121 ·• .-.,.,..1...0 ·n..rt.r-1K m 0 0 ,,, 5 3 . mJ .-.se l1&S • "m'.,'.", n.Jl.r-lKm . 1'.) • 1' 101)1 00 ,,, 5 3 • 4 11 tot Figure 8 Municipal level (top left), town level (top right), suburb level (bottom left) and small areas (bottom right) 29 3.2.6. Modifiable Area Unit Problem Census data unfortunately suffers from a fatal illness diagnosed as the Modifiable Area Unit Problem (MAUP). The fact that census data needs to be aggregated into predefined area units leads to a serious problem in terms of representation. Boundaries are not absolute and can be modified infinitely, including or either excluding certain areas. For example, the number of samples within area A is two and area B is three. However modifying the shared boundary between the two can change the samples within are A to three and area B to two. All census variables are boundary dependent, which means all values are relative to the demarcation chosen. Besides census data, this problem remains yet unsolved in many GIS applications. Census boundaries are not fixed and often change political transitions. This variability in census is a real concern for the validity in terms of accuracy. For example comparing ward statistics, it is often found that boundaries have been shifted 10 years down the line, defeating any meaningful comparison (Figure 9). The only way to really get rid of the MAUP is to utilise the dwelling frame points, which is, however, a breach in privacy and would not be made available for public due to the privacy constraints (Government, 1999, p. 20). Figure 9 Original boundaries (left), modifying the boundaries cause values to change (right) 30 Users of census data seldom consider the implications of the MAUP, which is a known weakness of any aggregated dataset. Disregard of this problem has led to many unjust applications and wasteful expenditure of resources especially in governmental decision making. The only way to minimise the MAUP is to decrease the boundary size, however, no matter how small, if aggregation is still applied, the MAUP will always be present. 3.2. 7. Time Series With census the need to compare different datasets from different times has always been very much sought after. Census occurs every ten years and a sub-census is performed every 5 years to minimise the gap on trend analysis. However users seldom used sub-census variables and prefer to compare decennial census data because of its reliability. The sub-census in 2007 distributed only 330 000 questionnaires, whereas the census of 2011 distributed more than 14.5 million. The before and after snapshot is crucial for decision makers to monitor and evaluate progress, using actual numbers derived from census. Figure 10 Population count in 2001 (left), population count 2011 (right) Although census data from different time periods exists, such as 1996, 200 I and 20 11 , there are a few major concerns that severely hamper its use. Firstly, boundary changes, between census periods of I 0 years the demarcated boundaries used in census change quite frequently as seen in Figure I 0. Ward boundaries that are politically influenced are particularly vulnerable, as well as 31 administrative boundaries. Comparison of dissimilar boundaries can be misleading as seen in Figure 10. Secondly, spatial resolution of data form census 2001 to 2011 is not the same. As mentioned the SAL only became available in 2011; however, census 2001 only made ward boundaries their lowest level of comparison. Ward boundaries are already quite large in size, which increases the loss of information through aggregation. For most part census comparison is performed on municipal level which is quite sad, not being able to note the finer changes over time due to the spatial level constraint. Lastly, the time lapse between every census is ten years, which is intrinsically too large to do a meaningful comparison. Trend analysis is especially difficult when gaps are that far apart. The fact that official census data is only available for 1996, 2001 and 2011 , gives a very limited view on spatio-temporal change. Apart from the limitations, decision makers still find it useful to see cartographic representations from different time periods. 3.2.8. Zonal Statistics Having vector-based spatial boundaries from census variables limits the type of analysis that can be performed in GIS. Oftentimes users want to estimate population size that does not coincide with the boundaries provided in census. To "escape" the boundary enclosure, GIS enables the conversion of vector boundaries into a raster surface. Raster representation offers significant benefits with a myriad of powerful analysis functions in GIS to enhance the use of census information. To convert census data into raster format, a normal tabulation is done and joined to the corresponding spatial layer. The cell size of the raster is determined, for example 30m x 30m. The area within the boundary of the census areal unit is then divided by the area of the raster cell to calculate the distribution of values within the census areal unit. If done correctly all the raster cells (pixels) within the census areal unit should add up to the original sum census variable within the vector boundary seen in Figure 11. Raster conversion should preferably be done on the lowest level of spatial aggregation to prevent loss of information. The smaller the cell size the more precise would be the calculation performed on the raster surface. 32 Figure 11 Custom population count of 40 139 within the red boundary (left), custom population count of 93 389 within the red boundary (right) using zonal statistics The advantage of having census data available that is not bound to delineated census boundaries allows for reliable estimation of population size within a custom areas, such as the number of people staying within 500m of a river, or people within a custom defined area. Zonal statistics is not entirely accurate because cell values within the census area unit is sti ll averaged, which may be far removed from the ground truth; however, for estimation purposes zonal statistics offers a great enhancement for decision makers. 3.2.9. Service Areas Census data has proven to be particularly useful for estimating population quantiti es within a prescribed distance. GIS offers census users the ability to estimate a host of threshold and coverage statistics using underlying census data. One particularly useful function provided in GIS is network analysis, which buffers a predefined distance from a chosen location or locations, using the underlying road network as a guide. Travel distance is a common planning principle, especially in governmental service provision. For example, planning new social facilities for a community would require standards in terms of acceptable travel/response distan·ce-and the threshold/capacity of the facility see Figure 12. __ Stods· en """"'"''.--·""I"""' \ff 6 Stf~ 1 ()ep1 . .; ~--. . ." '=..·. n. UrbapnO Snb n r/'., ~) t',CY :w lh . • ~-. ;~-.:-rr1 t,e..,,n, t e1 n 33 -- Tra"91 dostanc4I (wlkwlg) lo PfVTI¥f schools for rHodentl of Bloemlontoon .Mo*\- .2. .··.-. -. .......... .. .... Population within ("lo) Ages6to13 0 · 0.5 km 15 • 0.5 - 1 km 28 1 - 2 km 35 ,__ 2 - 5 km 19 >5km 3 Total 58 140 ' •:• •UI• . - .... Figure 12 Population estimates from census data using network analysis After calculating the service area (interval distance form facility) using the underlying network, the census data can be overlaid on top of the service area and joined to the service interval and summarised to compute population estimates. To increase accuracy it is best to convert census data to points to be joined to the service interval to give a more appropriate estimation. This functionality of integration between census and GIS greatly enhanced with the use of census data. In addition to the fact that network analysis and census data is seldom utilised, planning greatly benefits from having reasonable estimates on population size within a prescribed distance of a social facility. Answers can then be represented in cartographic form where totals can be summarised in percentile to make interpretation easier. For example 35% of kids aged 6-13 live within 1-2 km from a primary school. 3.2.10. Disaggregation As mentioned earlier, aggregation is the main reason for data loss and misrepresentation in census. Due to the fact that census data gets aggregated for the smallest unit called a dwelling frame into enumerated areas, small areas, suburbs, place, municipal, district and province, significant loss of data is present as area size increases. Since 2011 the availability of SAL was added to the census dissemination product, which greatly reduced the area size of sampled units. SAL is nearly the same size as the original enumerated areas. Having the SAL at our disposal , disaggregation can be attempted with reasonable success. 34 Population Count Codoster Loyer Count: 20 Minimum: 5.359712 ~~,.Ii~1s1 Mean: . StM!dard Deviation: 1.002684 Nuh: 0 Figure 13 Population count using disaggregated census data using surveyed parcels If data smaller than the SAL exists outside census, it can be used as a guideline to inform the disaggregation process. Depending on the quality of the underlying feature, disaggregation enhances to the usability of census data. Another dataset that is publicly available for users is cadastre boundaries produced by the land surveyor. Since the SAL is smaller in urban areas, the underlying parcel topology could be used to inform disaggregation. Converting parcels to points and joining the point to the census data enables us to distribute the census value within the 35 census area unit equally to all the points within the census boundary as seen in Figure 13. This is an improvement of zonal statistics, which simply ignores the underlying land-use patterns. After disaggregation the results can be joined to the parcel layer, where every parcel can be counted individually. Census data is the most common dataset used in disaggregation for population estimates. With a disaggregated dataset one can, with reasonable care, estimate the total population based on the underlying selected parcels. However, disaggregation is a tedious task and is seldom performed except by more experienced GIS users. In spite of the technical difficulty associated with, disaggregated census data is immensely powerful for decision makers to do estimates based on micro level or sub-census level. 3.3.CONCLUSION To evaluate integrative use between census and GIS, this project does so by drawing up a custom framework based on ten different aspects. Each aspect is briefly discussed with accompanying examples. The CENGIS framework essentially evaluate integration on three levels; firstly, on active use which serves as an indicator where participants actively utilise the prescribed concepts. Secondly, the frequency of use is evaluated; and lastly, usability for decision makers is accessed. With these three evaluators, each aspect of integration was explained in detail as to why it is included in the CENGIS framework on integrative use. Tabulation is the basic and most elementary form of integration, which can be either standard or more extensive with multiple variables. Dynamic cartographic map generation is another important aspect that greatly enhances integrative use by reducing the time needed to produce informative cartographic products for decision makers. Understanding the nature of cartographic display through classification, interval size and symbol type enhances its usefulness by minimising misrepresentations. Spatial aggregation is indispensible for effective use of census data. Knowing the inherent weakness of spatial aggregation, as articulated in the MAUP, is crucial to consider when using census data for decision making, especially in time series where older census boundaries do not necessarily align with newer ones. Knowing how to escape from the tyranny of census boundaries is essential for population estimates, threshold calculations and micro analysis through disaggregation. 36 CHAPTER 4: FINDINGS AND DISCUSSIONS ON CENGIS FRAMEWORK 4.1.INTRODUCTION Chapter 3 is taken as the baseline for chapter four where the findings will be di scussed in terms of the CENGIS framework. This chapter focuses on the results collected after users ' participation in the CENGIS survey. All ten aspects on integration are discussed individually. Each aspect is preceded by an illustration and brief description to clarify the responses received. Discussions cover the three questions asked in the survey for each aspect of integrative use. Firstly, active use which is derived from a close ended question is depicted as a graph. Secondly, frequency of use is rated on a Likert scale form 1-5, which is averaged to derive the frequency. Lastly, usability for decision makers is also rated on a Likert scale form 1-5, from which the averaged usability score is derived. By using the CENGIS framework with the three evaluators, the overarching aim on evaluating integrative use between census and GIS can be effectively answered. 4.2. CENGIS FRAMEWORK The CENGIS framework is intended to evaluate IO aspects on census and GIS integration. The survey can be roughly divided into 10 questions, which cover most of the main concerns, techniques and usage between census and GIS. What is, however, important to note is that the CENGIS framework serves as a preliminary guide toward assessing user awareness of basis and some more advance uses as well as the immediate constraints that all census and GIS users needlessly have to know to improve the decision making process. The CENGIS survey is also intended to evaluate integrative use, specifically among planners. Planning is the primary sector that uses census data in the process of making evidence based decisions through statistical analysis. It is important to keep in mind that when results are evaluated, it reflects the targeted audience. The total number of participants of the CENGIS survey is 23 of which the majority is town planners. 37 • Academic • Public Government • Private Business • Other Figure 15 Participation by sector The first question asked to participants was to identify their respective sectors for using census data. According to the results acquired, users of census data tend to be more institutional orientated such as academic or governmental, which is understandable due to the nature of census variables with 87% of the participants. Private institutions do make use of census ,yet on a much smaller scale than their public counterparts. Table 1 Frequency of using census data in GIS Never Almost Never Occasionally Often Very Often Frequency (1) (2) (3) (4) (5) Score 7 0 8 4 4 Total% 30.4 0 34.8 17.4 17.4 Average Score 3.26 Secondly, concerning the question on frequency of use it is important to understand just how often people are confronted with the various aspects on integrations as discussed in the survey. As seen, 30.4% of the participants have never utilised census data in GIS, about 34,8% uses it occasionally and 34.8% of the participants are identified as frequent users of census data in GIS. The average score on frequency of use is 3.26, which means the average participant utilises census data only occasionally. 38 4.2.1. Extraction Standard Standard Use Census data are widely used in reports and usually formatted as graphs, tables or numeric statements In line with the population growth, thtre has been an increase in the nwnbtr of households in :.'viangaJng. In ::001 thtre wm 185 013 households in ~bngaU'lg in 2011 they han increased to 231 921. The a\·trage household size in 2001 was 3,4% and in 2011 the size has - decreased to 3.2%. •.. \lthou!!Ji the majority of households are headed by mm, female headed households are also increasing rapidly from 40.6% in 2001 to 40,8% in 2011 . Use Census in GIS To use Census data in GIS for map making a simple two step process, explained below: Step 1 Step 2 Spatial Layers (Tabulate) (Join) 1. Provincial (9) Define Geography Join census data to 2. District (52) and other fields the corresponding 3. Local (23.4) spatial layer .4, Geography uses Word (.4277) code values to link 5. Main Pla ce ( 1.4039) to spatial layers 6. Suburb (22108) Nrc: 7. Small Area (8.4907) 8 c Pit CO()( Plt_llAMC (ft. ......t l<)ft t / U lypc 1 Pll_COO£ Male Female Geo type 2 En :tm C.pe 2 ~701 J47llSl FM!ld va kJes: l ' U2"67 I' 1662l ' 'a'a"u:t S:..:t 7 tiO Usu 1 61m-n 60l2l88 I llOtUI Wffi • Use ) ) 4'7167 6 )l886ll > ,...........,C.oe Use 6 9 ll24U6 2180n2 I WtattmC.pe ~ 7 • 197~ 206.Sal S K,;~l.ul ~-~.~ - """:ol~----------~~!!!"!!!l!!'!"!!'~~· 6 17'7ml 1710049 l S649n )80881 • llpuonotonot 1'0 I 2&S&S06 2'64U8 Figure 16 Standard use of census data The first aspect of the CENGIS survey is intended to make users aware of the numerous places that standard census stats are utilised, such as municipal reports or academic research. Despite the extensive use of census data from StatsSA, GIS integration has been largely underutilised due to various reasons, such as access to GIS, ignorance or having difficultly extracting census variables from tabulation software. The basic relationship between censuses and GIS is 39 illustrated in a two-step process. Firstly, defining geography, and adding variables provided such as age, gender, income, education, etc. To facilitate seamless integration into GIS it is recommended that users only use numeric coded values for census areal units as illustrated in figure 16. Secondly, joining the tabulated data to the corresponding spatial layer is a simple GIS function, where the user defines which fields will participate in the join. Tabulation can be done for any of the spatial layers provided with census from Provincial SAL. The main focus is to evaluate general usage of census data in GIS by introducing basic aspects of tabulation and JOmmg. • Yes • No Figure 17 Use of census data in GIS Standard use of census data in GIS through the two-step process as illustrated in Figure 17 is the most common form of integrative use by far - 96%. The main use of GIS is tabulation done in 3rd party software such as SuperCROSS using coded values and joined to the corresponding spatial representations. Only a very small percentage (4%) has not used census data in GIS. Table 2 Frequency of using standard census data in GIS Never Almost Never Occasionally Often Very Often Frequency (1) (2) (3) (4) (5) Score 2 11 7 2 Total% 8.7 4.3 47.8 30.4 8.7 Average Score 3.26 40 Standard use of census data in GIS has a relative high frequency with a combined score of 3.26 as seen in Table 2, which implies that most users do so occasionally. About 40% of users use standard census data frequently and 13% are non-frequent users of standard census data in GIS. This implies that 87% uses standard census data in general. Table 3 Usefulness of standard census data in GIS for decision making Not Useful Barely Useful Fairly Useful Useful Very Useful Frequency (1) (2) (3) (4) (5) Score 2 2 6 7 9 Total % 8.7 8.7 26.1 30.4 26.1 Average Score 3.56 Regarding usefulness of standard census data for decision making, the average is 3.56 seen in Table 3, which fall within the "useful" category of the spectrum. More than 80% of users deem standard census data to be useful for decision making. Only 17% ranks it is less useful for decision makers. Suggesting that users deem standard census data useful for decision support. 41 4.2.2. Extraction Customized Custom Use Oesac>lro~ To bviation in SuperCROSS can be done for any of o..ab61Y the main categories (see right) . For example. · --:I~ • =.I EO..:.tion choosing Household Goods and Small Areas as Fllf'ly • ::J He.tdof~ the highest level of detail Household Goods_BK1or. ...W ards (SXV-1) Custom Tabulation ~ Household Gooo$~. .l '\a« (SCV~) ~ Household Goods_Sl'llal_,Ams (SXV~ ·- =.I HousdlOld s.rvas Name: • =.! Ubo.I "°'~ • ....J l~ Enumerabon area type Geo type Type of dwelrl9 Depending on number of tables available within a Gender o f household head Population 9roup o f househl c ategory. any combination can be defined Refrigerator (attributes). For example: How many households Electnc/9as stove Vacuum cleaner own a refrigerator. vacuum cleaner, washing Washing machine machine, computer, motor-car, television. cell Computer ~telile television phone and hove access to the internet within DVD player Motor-car Bloemfontein city (geography). Attributes within a Television table can be selected to participate in the query Radio Landllne/telephone in compliance with the question. Cell phone l.lai Post box/bag I ':lai dekve'.ed at residence U 4990273 97 4990706 145 4991 267 252 4990142 40 To bviated data can then be joined to the 4990073 54 4990141 88 corresponding spatial layer. in this case the small 4990882 72 area layer using their respective code values. 4990068 66 4990219 101 4990298 105 Figure 18 Custom tabulation process from census data This question addresses a more advanced use of census data using custom tabulations. Firstly, the user is made aware of the different categories with their respective data tables in the SuperCROSS software package. Selecting an appropriate category, constructing a sentence in the form of a query can then be formulated by identifying participatory fields. Fields can be redefined to have only those values that apply to the original query. Grouping of values is a 42 common practice for simplifying output. Advance tabulation greatly enhances strategic use of census variables in decision making. After tabulation the output can then be joined to the corresponding spatial layer in GIS. Deducting from statistics used in reports by governmental or private parties, very little custom tabulation is present despite the immense opportunities possible when utilising advance tabulation on census. It is purposely defined as custom tabulation because standard tabulation mostly include no field refinement or use of multiple participatory fields. • Yes • No Figure 19 Use of custom queries with census data in GIS When users do utilise census data the majority (64%) do so with custom queries. Only 36% of users prefer standard census tabulation seen in Figure 19. This shows that using advance queries through multiple fields being added and recoded has a significantly higher use than only standard queries. The use of custom queries has a much broader field of application than mere standard tabulations. Table 4 Frequency of using custom queries from census Never Almost Never Occasionally Often Very Often Frequency (1) (2) (3) (4) (5) Score 5 7 7 7 2 Total% 21.7 30.4 30.4 8.7 8.7 Average Score: 2.52 43 Frequency of use in GIS using custom tabulated queries as described in the illustration in Table 4 is ranked between almost never to occasionally. Custom queries are significantly more complex to tabulate and need more user discretion on what they want to query which is proven by the fact that 53% almost never use custom queries. Only a small margin uses extensive tabulation with GIS (l 7%). This is unfortunate because tabulation forms a crucial part in answering complex scenarios so often faced by decision makers. Table 5 Usefulness of custom queries from census for decision making Not Useful Barely Useful Fairly Useful Useful Very Useful Frequency (I) (2) (3) (4) (5) Score 3 0 8 4 8 Total % 13 0 34.8 17.4 34.8 Average Score: 3.6 When asked about usefulness, about 87% regard custom queries to be useful for decision makers in Table 5. Only 13% deems it as not useful. With an average score of 3.6 most users see custom tabulation as useful for decision makers in general. A significant margin of 34.5% regards it is as very useful indeed. 44 4.2.3. Map Production Map Production Map productionwith Census data can be done with relative ease. After tabulation in SuperCROSS and joining data to the corresponding spatial layers, dynamic map making is possible. Data-Driven Pages Producing ward maps for the Free State with general census information for each ward in the province would be done as follows: Tabulate the required data in SuperCROSS using either standard or custom queries. Join the data to the corresponding spatial layer (wards). Using the Data Driven Pages function in ArcGIS to produce a series of maps depicting census data can be produced. A .. ... u • ,l WARO _ IO Popul•t•o CMl d f en Yout h A dult Retired lr•d"•U on• ~ •1601001 3S 31 » s 0 .). • 160100:1 "6"7"]"5 l6 )2 28 .. 0 4160100) nn 29 JS l2 s s 41601004 H:M 16 JO JO .. Mangaung 6 4 160100) &M7 2 4 l'I 32 s 7 41601006 6>40 32 u 28 • WAIOllO • 4160XJ01 61) 1 l l ]) JO ., ~ 41602002 6974 ;;>') )6 JO 6 10 4 16020()) ,6.7.,).(,) 3l n ;;>') I Dynamic attributes ll "'160'1004 ll lO 3 1 6 I> •160200, HU ,)..l JJ ) I s 22 derived from the ,J. ). •1602006 5729 1" ) l 4J60X>07 S7>0 ) I )() ll : 13 213 '•opi. census data c an be JS 4 160X>08 6460 :M l> 78 1,.6, 4 160)()01 ],,) 6 C.nsu1 201 1 Slob 0 18 1 40 22 added for each map. 4 160)00.J 7HO :M 28 7 A~ -----------------. CYoru. Otn' e(1n6 (-<~> I5 1 4 4 Attributes are updated Adi.I I (35-051 3S Re:1ed (65• 7 automatically for eac h I • f'ooulo lion Gf °'-" • !.oct At- co n 28 ~ ~ 11'.!l t• •9'00022 map. Coou-ed 4 " " :e 65 ~he- ' Al- oo):nguoge 69 EnQf$1'l 13 t Ndebe e Figure 20 Dynamic map production using data driven pages with attributes The importance of automated map making cannot be overstated for census users. What this question aims to address is user awareness to automated mapping, using census data and dynamic map making functions such as data driven pages. Dynamic map making allows users to attach multiple dynamic attributes to a map, which is automatically updated for each map. Census variability can be attached to a dynamic map and displayed on the map as an information 45 panel in either variable or percentage format. Automated map making in census allows for rapid map production using a master template. Attributes can be custom tabulated to answer predefined questions for the area on any of the provided spatial layers. Dynamic map making with census in GIS is one of the primary integrative uses between the two. Other than the significant improvement to GIS-related output that data driven pages adds for decision makers, it remains one of the least utilised functions on integration. • Yes • No Figure 21 Use of data driven pages with census Dynamic map making forms an intrinsic part to effective GIS and census integration. With 73% of participants being users of dynamic map making functions, such as data-driven pages, the usefulness of these functions are proven. Only 27% has not used dynamic map making functions before as seen in Figure 21. Table 6 Frequency of using data driven pages with census data Never Almost Never Occasionally Often Very Often Frequency (1) (2) (3) (4) (5) Score 4 4 10 4 Total% 17.4 17.4 43.5 17.4 4.3 Average Score 2. 73 46 When asked how often users make use of data driven pages as shown in Table 6 (dynamic map making function in ArcGIS), 43.5% do so occasionally. About 21 % of users identify themselves as frequent users of this function and 34.8% use it infrequently. The technicality of setting up a good template for dynamic map generation is proven by the fact that only 4.3% uses this function very often. Table 7 Usefulness of data driven pages using census data for decision making Not Useful Barely Useful Fairly Useful Useful Ver y Useful Frequency (1) (2) (3) (4) (5) Score 3 7 6 6 Total % 4.3 13 30.4 26.l 26. l Average Score 3.56 Usefulness of dynamic map making in ArcGIS for decision makers is rated 3.56 on the usability index in Table 7. This implies that most users deem this function on census and GIS integration for decision makers as useful, with 26.1 % deeming it as very useful. The majority of users (82.6%) assent to the usefulness of dynamic map making functions in supporting decision makers. Only 17.3% do not approve of its usability. 47 4.2.4. Representation Representation C la ssifica tion Classification of values into different classes is a common practice with census data. Classification definea the intervals within the classes, as illustrated below. Each classification produces different representations. ·1 ·. "'!l' I l 1 ~ f: ..... II I I I I ... Q uintile Manua l 1 Equal I I • ,.., ,.---, I I I " .... -1 I I 1 " u\r ~>' .... - ll u'. , ,.;t, .,·. .. ..'. . Interval C ount Changing the number of intervals will change the number of represented classes. Representation is influenced by the number of interval classes. I !5 i f ,,..t_!!l , ~ ~ ~ f ''°! ·~~~ ~ ,_ f ~ I s :)o .,................._..,...-- uu ,, _., -,-_- " .. ...- _ _, .... Symbol Type Different means of representation can be used with different scenarios. For the illustration of population density, dots are a good choice. For the illustration of population size, proportional symbols work well. Representation is always influenced by the symbols used in representing a value. Figure 22 The influence of classification, interval count and symbol size on representation Users of census data seldom question the authoritativeness of cartographic representations. Multiple parameters reside behind these representations that can drastically alter the representation without changing the actual data itself. This question is intended to raise users awareness of the deceptiveness of representation which can be arbitrarily altered by the user. This aspect of the census and GIS integration speaks to the limitation that cartographic 48 depictions unfortunately inherit. By simply changing the classification scheme, interval count or symbol type the same dataset can be styled in an infinite number of ways, each giving a different impression to the user. It is important that census users understand representational constraints and learn how to best apply cartographic principles to prevent giving users wrong impressions. Representation often requires that different depictions of the same data be given to the user to clarify any obscurities that may be hidden in one cartographic display and shown in another. • Yes • No Figure 23 Using different representations of the same census data Different representations of the same underlying census data is a relatively common practice as seen in Figure 23, with 70% actively making use of it. 30% does not use different representation of the same census data. What's important to realise is that the majority of census users are aware of the inherent cartographic weakness when displaying in GIS. Table 8 Frequency of using different representations of the same census data Never Almost Never Occasionally Often Very Often Frequency (1) (2) (3) (4) (5) Score 3 9 5 3 3 Total% 13 39.1 21.7 13 13 Average Score 2.74 49 Despite the majority of users consenting to the use of different representations to clarify the underlying variables in Table 8, 39. l % almost never uses different representations and 13% never does. Only 21. 7% occasionally utilises different representation and 26% makes more frequent use of different representations. The average score on frequency is 2.74 which is below the "occasionally" category, which testifies to average use of different representations. Table 9 Usefulness of different representations of the same census data Not Useful Barely Useful Fairly Useful Useful Very Useful Frequency (1) (2) (3) (4) (5) Score 2 3 5 8 5 Total% 8.7 13 21.7 34.8 21.7 Average Score 3.48 Although the frequency of use is relatively low, the usefulness is ranked at 3.48 in Table 9, which is above average. A meagre portion of the participants deem different representations to be useful for decision makers with 56% classifying it above average usefulness. Only a small percentage of users (21. 7%) see it as not useful. This result confirms users awareness of the common misconceptions in using census data by making provision for multiple representations of the same underlying data to clarify any disparities. 50 4.2.5. Spatial Aggregation Spatial Aggregation Data Collection Data collection in Census is done using geo-referenced dwelling frame points within an enumerate a rea(see below) . All spatial layers are aggregated from the dwelling point Main Pla ce layer. Sub-Place Small Areas Enumeration Area Geo-