A Pathogenomic Approach towards Characterising the South African Population of Puccinia striiformis f. sp. tritici, the Causal Agent of Wheat Stripe Rust Hester Josina van Schalkwyk Thesis submitted in fulfilment of the requirements for the degree Doctor of Philosophy University of the Free State Bloemfontein South Africa Department of Plant Sciences (Plant Pathology and Plant Breeding) Faculty of Natural and Agricultural Sciences January, 2018 Promoter: Dr R Prins Department of Plant Sciences, University of the Free State and CenGen (Pty) Ltd Co-promoters: Dr DGO Saunders John Innes Centre, Norwich, United Kingdom Dr LA Boyd National Institute of Agricultural Botany, Cambridge, United Kingdom Prof. ZA Pretorius Department of Plant Sciences, University of the Free State Declaration I, Hester Josina van Schalkwyk, declare this thesis hereby submitted by me for the degree Doctor of Philosophy at the University of the Free State is my own independent work and has not previously been submitted by me to another university for any degree. I cede copyright of this thesis in favour of the University of the Free State. Hester Josina van Schalkwyk Date ii iii Dedicated to Mrs Marlize Huisamen (née Vivier), my high school biology teacher who first taught me about DNA and nurtured my curiosity about living things. Acknowledgements I would like to express my sincere gratitude to my mentors and the funding bodies that supported me during my PhD. This work was funded by the Biotechnology and Biological Sciences Research Council (BBSRC), the Department for International Development and (through a grant to BBSRC) the Bill & Melinda Gates Foundation, under the Sustainable Crop Production Research for International Development (SCPRID) programme, a joint initiative with the Department of Biotechnology of the Government of India’s Ministry of Science and Technology. Two SCPRID grants supported this study: (BB/J011525/1) to Dr L Boyd, Dr R Prins and Prof. ZA Pretorius, and (BB/J012017/1) to Dr Cristobal Uauy. Additional support were received from The Monsanto Beachell-Borlaug International Scholars Program (MBBISP) and the Winter Cereal Trust (WCT), South Africa, through PhD scholarships. The contributions of my supervisors go far beyond what I can summarise in a paragraph, nonetheless, a special thank you for the unique role they each played during my PhD. I thank Dr Diane Saunders and Dr Renée Prins for creating environments with nearly unlimited resources where I could work. I thank Dr Lesley Boyd for intense supervision while I was preparing my thesis, and Prof. Zakkie Pretorius for mentoring me in the art and science of rust pathology. I thank Dr Prins for her vision for the project and allowing me to change to this project that I so enjoyed working on. I would also like to thank Dr Cristobal Uauy iv v for being instrumental in arranging my placement in the Saunders lab. I thank the following people for their involvement in obtaining the sequencing datasets: Historical South African isolates were obtained from Zakkie Pretorius. Historical East African isolates were obtained from Mogens Hovmøller. The Pak- istan isolates were obtained from Sajid Ali. Samples of the recent South African Pst population were obtained from Driecus Lesch, Tarekegn Terefe, Zakkie Pre- torius and Willem Boshoff (lost in transit). Renée Prins was instrumental in the preparatory work and shipment of the South African isolates for sequencing. Recent East African isolates were obtained from David Hodson (Ethiopia, 2014) and Ruth Wanyera (Kenya, 2014). Existing datasets of Pst isolates were obtained from Diane Saunders. What fantastic opportunities to work at CenGen (Pty) Ltd, Earlham Institute, John Innes Centre, and the University of the Free State, during my PhD! A special mention to the following people for support in and out of the lab. Debbie Snyman performed qPCR assays and gel electrophoresis towards this project. Zakkie Pretorius multiplied the historical South African Pst urediniospores for sequencing and mentored me in inoculation and scoring of the infection assays on the differential wheat set seedlings. Sarah Holdgate for providing the United Kingdom (UK) differential wheat lines and informative discussions regarding Pst in the UK and UK wheat cultivars. Elsabet Wessels, Debbie Snyman, Jens Mains and Clare Lewis mentored me in specific molecular genetic procedures. I thank Philippa Borril and Oluwaseyi Shorinola for advice on RT-qPCR data analysis, and Albor Dobón for help with the planning of the time course experiment. I also thank Antoine Persoons for valuable discussions in population genetics and advice on sections of this thesis and my fellow PhD students in the Saunders lab, especially Pilar Corredor-Moreno and Vanessa Bueno-Sancho, for always being ready to advise me on the newest updates in data handling or Norwich BioScience Institutes (NBI) cluster computing. Also, thank you to the Computing NBI vi Helpdesk staff, especially Tom Betteridge and Mohamed Imram, for computer support. I thank Sadie Geldenhuys for administrative support at UFS, Lizaan Rademeyr for great practical advice on best practices in laboratory record keeping, Carel van Heerden for input in the early days of the project and Anelda van der Walt for initial bioinformatics training. I thank Cari van Schalkwyk for advice on statistical analysis. I thank Prof. Ed Runge and the MBBISP panel for the very special ongoing experience of being an MBBISP scholar. Thank you to every friend that ran, walked, climbed mountains, or performed some strange hobby with me. That helped to keep me going through the hard times. George, for your rock-solid support and your immense contribution to tailoring my skill set, thank you. I thank my family for all their love and support along the way. Thank you, dad, for reminding me that I am a finisher, and mum, for your consistent positivity, enthusiasm, and encouragement that runs through my life like a golden thread. Abstract Stripe (yellow) rust caused by the fungus Puccinia striiformis Westend. f. sp. tritici (Pst) is a major disease of wheat prevalent in most areas where wheat is culti- vated across the globe. It can completely destroy a crop if left untreated. The Pst fungus develops feeding structures that form a close relationship with the host tissue where it facilitates extraction of water and nutrients from the plant, while manipulating the host for its own benefit using effector proteins. This parasitic behaviour reduces yield and grain quality, leading to the propagation of numerous Pst spores, spreading infection. In South Africa stripe rust was first detected in 1996 with the initial pathotype being designated 6E16A-. Thereafter, three more Pst pathotypes were detected in subsequent years (6E22A- in 1998, 7E22A- in 2001 and 6E22A+ in 2005), gaining virulence in a stepwise manner by overcoming additional resistance genes one by one. However, the source of the original pathotype and the current genetic diversity of the Pst population within South Africa remain open questions. To get a better understanding of the South African Pst pathotypes and how they relate to Pst pathotypes globally, the historical population was described using a recently developed “field pathogenomics” approach. High-resolution, next-generation sequencing data utilised in this method aided in determining the genomic relationships between the four historical pathotypes and investigating their potential origin. Historic South African isolates representing the four identi- vii viii fied pathotypes were re-sequenced, and their comparison with isolates from the United Kingdom, France, Pakistan, Ethiopia, Eritrea and Kenya revealed that the closest relatives of the historical South African isolates were a group of isolates from East Africa. We further described polymorphisms in the South African Pst population that supported the existing hypothesis of stepwise evolution. Through applying pairwise comparisons between polymorphic sites across isolates, 27 potential effector proteins that could be instrumental in the stepwise virulence gain, were identified. To study the role these candidates may play during the infection processes in different pathotypes, gene expression profiling was conducted using RT-qPCR. Preliminary patterns of up- or down-regulation of these effectors be- tween time points, over a time course of compatible interactions, were described. Furthermore, infected wheat tissues collected from locations across South Africa during the 2013, 2014 and 2015 cropping seasons, were sequenced. The “field pathogenomics” method, using RNA-Seq, was applied to compare the historic Pst isolates with the recent population. This analysis indicated the possibility of a novel introduction of Pst into South Africa in recent years, possibly between 2011 and 2013. Pathotyping of selected Pst isolates on supplementary wheat tester genotypes revealed novel variation in infection types that has not been described previously. This study provides a high resolution, genomic view of the historical and prevailing Pst populations and adds valuable information to the potential origin and adaptation of stripe rust in South Africa. The research outcomes provide a genomic base for further investigation of candidate effector genes and the possible recent novel incursion of a pathotype group also seen in Europe, East Africa and New Zealand into South Africa. Keywords: effector, origin, plant pathology, population genomics, virulence Contents Declaration ii Acknowledgements iv Abstract vii List of Figures xv List of Tables xix List of Abbreviations xxi 1 General Introduction 1 1.1 Socio-economic importance of wheat . . . . . . . . . . . . . . . . . 2 1.2 Wheat cultivation in South Africa . . . . . . . . . . . . . . . . . . . 2 1.3 Wheat rusts reduce yields . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Motivation for this study . . . . . . . . . . . . . . . . . . . . . . . . 6 1.5 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.6 Thesis outline and approaches . . . . . . . . . . . . . . . . . . . . . 7 2 The Wheat Rusts: Life Histories, Host Response Mechanisms and Ge- nomic Resources 9 2.1 The rusts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.1 Filamentous plant pathogens . . . . . . . . . . . . . . . . . 9 2.1.2 Rusts and their primary host . . . . . . . . . . . . . . . . . 11 2.1.3 The alternative host . . . . . . . . . . . . . . . . . . . . . . . 12 2.1.4 Global distribution of stripe rust . . . . . . . . . . . . . . . 13 2.1.5 Favourable conditions for wheat rusts . . . . . . . . . . . . 13 2.1.6 Infection cycle of Puccinia rusts . . . . . . . . . . . . . . . . 15 2.1.7 The stripe rust infection process on wheat . . . . . . . . . . 19 2.2 Combating wheat stripe rust . . . . . . . . . . . . . . . . . . . . . . 21 ix CONTENTS x 2.3 Plant defence mechanisms . . . . . . . . . . . . . . . . . . . . . . . 22 2.3.1 Host-pathogen interaction . . . . . . . . . . . . . . . . . . . 23 2.3.2 Other sources of resistance . . . . . . . . . . . . . . . . . . 26 2.4 The Pst genome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.4.1 Genomic variation . . . . . . . . . . . . . . . . . . . . . . . 26 2.4.2 Rust genomics . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.4.3 Challenges in bioinformatics . . . . . . . . . . . . . . . . . 31 2.4.4 Effector identification . . . . . . . . . . . . . . . . . . . . . 32 3 General Materials and Methods 35 3.1 Preparation and collection of materials . . . . . . . . . . . . . . . . 35 3.1.1 Inoculation . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.1.2 Protocol for sampling infected wheat tissue . . . . . . . . . 36 3.2 Nucleic acid extraction and quantification . . . . . . . . . . . . . . 37 3.2.1 Genomic DNA extraction . . . . . . . . . . . . . . . . . . . 37 3.2.2 RNA extraction . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.2.3 DNA and RNA quantification . . . . . . . . . . . . . . . . . 38 3.3 Next-generation sequencing and data analysis . . . . . . . . . . . 39 3.3.1 Library preparation . . . . . . . . . . . . . . . . . . . . . . . 39 3.3.2 Genomic DNA sequencing . . . . . . . . . . . . . . . . . . 39 3.3.3 RNA sequencing . . . . . . . . . . . . . . . . . . . . . . . . 40 3.3.4 Bioinformatics pipeline . . . . . . . . . . . . . . . . . . . . 40 3.3.5 Clustering analysis . . . . . . . . . . . . . . . . . . . . . . . 42 4 Origin of the South African Pst Pathotypes 48 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.1.1 Wheat stripe rust in South Africa . . . . . . . . . . . . . . . 48 4.1.2 Pst population diversity . . . . . . . . . . . . . . . . . . . . 52 4.1.3 Molecular markers and Pst . . . . . . . . . . . . . . . . . . 53 4.1.4 Next-generation sequence analyses of South African Pst . 55 4.2 Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.2.1 Data description . . . . . . . . . . . . . . . . . . . . . . . . 56 4.2.2 Sample preparation for DNA extraction . . . . . . . . . . . 57 4.2.3 Genomic DNA extraction and quantification . . . . . . . . 59 4.2.4 Sequencing and mapping . . . . . . . . . . . . . . . . . . . 59 4.2.5 Phylogenetic analysis . . . . . . . . . . . . . . . . . . . . . 60 4.2.6 Population structure analysis . . . . . . . . . . . . . . . . . 60 CONTENTS xi 4.2.7 Genetic diversity assessment . . . . . . . . . . . . . . . . . 60 4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.3.1 Re-sequencing of South African Pst pathotypes . . . . . . . 61 4.3.2 Purity assessment of samples . . . . . . . . . . . . . . . . . 62 4.3.3 Clustering analyses . . . . . . . . . . . . . . . . . . . . . . . 62 4.3.4 Phylogenetic analysis . . . . . . . . . . . . . . . . . . . . . 62 4.3.5 Population structure analysis . . . . . . . . . . . . . . . . . 64 4.3.6 Population differentiation . . . . . . . . . . . . . . . . . . . 71 4.3.7 Genetic diversity within and between population clusters 71 4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5 Analyses of Polymorphisms in Historical South African Pst Isolates in Search of Candidate Effector Genes 79 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.1.1 The importance of Pst variability . . . . . . . . . . . . . . . 81 5.1.2 Mutations—causes, types and effects . . . . . . . . . . . . 82 5.1.3 Genomic approaches used to identify effectors . . . . . . . 85 5.2 Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . . 86 5.2.1 SNP analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 5.2.2 Positive selection . . . . . . . . . . . . . . . . . . . . . . . . 87 5.2.3 Presence-absence analysis . . . . . . . . . . . . . . . . . . . 87 5.2.4 Comparisons of nonsynonymous SNP sites between isolates 88 5.2.5 Multiple sequence alignments to visualise biallelic SNPs . 88 5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 5.3.1 SNP identification in the genomes of the historical South African isolates . . . . . . . . . . . . . . . . . . . . . . . . . 89 5.3.2 Assessment of polymorphisms to detect positive selection 93 5.3.3 Presence or absence of genes . . . . . . . . . . . . . . . . . 98 5.3.4 Investigation of candidate genes that are likely to experi- ence evolutionary changes . . . . . . . . . . . . . . . . . . . 105 5.3.5 Candidate effectors with sequence polymorphisms between the South African isolates . . . . . . . . . . . . . . . . . . . 106 5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 5.4.1 Polymorphic sites . . . . . . . . . . . . . . . . . . . . . . . . 108 5.4.2 STOP codons . . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.4.3 Transitions and transversions at specific codon positions . 111 CONTENTS xii 5.4.4 Stepwise mutations . . . . . . . . . . . . . . . . . . . . . . . 112 5.4.5 Positive selection . . . . . . . . . . . . . . . . . . . . . . . . 112 5.4.6 Presence-absence analysis . . . . . . . . . . . . . . . . . . . 113 5.4.7 Nonsynonymous polymorphisms . . . . . . . . . . . . . . 114 5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 6 Gene Expression Analysis of Candidate Effectors Identified in South African Pst Isolates 115 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 6.1.1 Regulation of gene expression in eukaryotes . . . . . . . . 116 6.1.2 Quantification of gene expression . . . . . . . . . . . . . . 117 6.1.3 Candidate effector features . . . . . . . . . . . . . . . . . . 118 6.1.4 Gene transcription analysis . . . . . . . . . . . . . . . . . . 118 6.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 6.2.1 Inoculation and sampling . . . . . . . . . . . . . . . . . . . 120 6.2.2 Tissue disruption and RNA extraction . . . . . . . . . . . . 122 6.2.3 RNA quality control and quantification . . . . . . . . . . . 123 6.2.4 Complementary DNA synthesis . . . . . . . . . . . . . . . 123 6.2.5 Primer design . . . . . . . . . . . . . . . . . . . . . . . . . . 123 6.2.6 PCR plate setup . . . . . . . . . . . . . . . . . . . . . . . . . 124 6.2.7 Quantitative real-time polymerase chain reaction . . . . . 126 6.2.8 Reference gene selection . . . . . . . . . . . . . . . . . . . . 127 6.2.9 Efficiency determination of primers . . . . . . . . . . . . . 127 6.2.10 Statistical evaluation of the data . . . . . . . . . . . . . . . 129 6.2.11 Linear mixed effect analysis . . . . . . . . . . . . . . . . . . 129 6.2.12 Relative expression of Pst candidate effector genes . . . . . 130 6.2.13 Assessment of genes . . . . . . . . . . . . . . . . . . . . . . 131 6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 6.3.1 RNA yield, RNA quality scores and cDNA yield . . . . . . 131 6.3.2 Primer design . . . . . . . . . . . . . . . . . . . . . . . . . . 132 6.3.3 Efficiency determination of primers . . . . . . . . . . . . . 134 6.3.4 Statistical analysis of the relative expression of nine Pst candidate effector genes . . . . . . . . . . . . . . . . . . . . 134 6.3.5 Expression profiles of candidate genes . . . . . . . . . . . . 139 6.3.6 Gene validation using revised gene models and transcript data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 6.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 CONTENTS xiii 6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 7 Analysis of the Current Stripe Rust Threat in South Africa 145 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 7.1.1 Pst virulence since 2005 . . . . . . . . . . . . . . . . . . . . 145 7.1.2 Global reports on Pst population shifts . . . . . . . . . . . 146 7.1.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 7.2 Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . . 149 7.2.1 Stripe rust samples used in RNA sequencing analyses . . . 149 7.2.2 Transcriptome sequencing of stripe rust infected wheat leaves151 7.2.3 Pst pathotype determination . . . . . . . . . . . . . . . . . 152 7.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 7.3.1 Clustering analysis using RNA-Seq and whole genome sequencing data . . . . . . . . . . . . . . . . . . . . . . . . . 153 7.3.2 Seedling Pst pathotype testing . . . . . . . . . . . . . . . . 162 7.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 7.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 8 General Discussion 173 8.1 The historical South African Pst population . . . . . . . . . . . . . 173 8.2 Candidate effector identification and evaluation . . . . . . . . . . 175 8.3 The recent South African Pst population . . . . . . . . . . . . . . . 177 8.4 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 8.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 Appendices 181 A The Origin of the South African Pst Pathotypes 181 B Analyses of Polymorphisms in Historical South African Pst Isolates in Search of Candidate Effector Genes 183 B.1 Genes present in the PST130 reference genome but absent in the four historical South African Pst isolates . . . . . . . . . . . . . . . 183 B.2 Annotations of genes homologous to identified PST130 genes . . 185 B.3 Nonsynonymous polymorphisms in candidate genes . . . . . . . 193 C Gene Expression Analysis of Candidate Effectors Identified in South African Pst Isolates 222 C.1 Candidate gene inspection . . . . . . . . . . . . . . . . . . . . . . . 223 CONTENTS xiv C.2 Additional figures of statistical analyses . . . . . . . . . . . . . . . 232 C.3 Variability in RT-qPCR . . . . . . . . . . . . . . . . . . . . . . . . . 239 C.3.1 Variation in the application of treatments to biological repli- cates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 C.3.2 Variation introduced by the RNA extraction process . . . . 240 C.3.3 Variation introduced by the reverse transcription process 241 C.3.4 Variation introduced by RT-qPCR . . . . . . . . . . . . . . 242 C.3.5 Variation introduced by primers . . . . . . . . . . . . . . . 242 C.3.6 Choice of reference genes . . . . . . . . . . . . . . . . . . . 244 C.3.7 Results of efficiency corrected relative gene expression . . 245 D Analysis of the Current Stripe Rust Threat in South Africa 248 Bibliography 254 List of Figures 1.1 Area harvested, production and yield statistics for South African wheat cultivation between 1990 and 2017. . . . . . . . . . . . . . . 4 2.1 The phylogenetic relationship of plant pathogenic ascomycetes, basidiomycetes and oomycetes following Neighbour Joining anal- ysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 Taxonomic classification of the wheat rusts. . . . . . . . . . . . . . 11 2.3 Global distribution of Puccinia striiformis f. sp. tritici, before and after 2000. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.4 Spore stages and the infection cycle of Pst. . . . . . . . . . . . . . . 16 2.5 A stripe rust uredinium pustule. . . . . . . . . . . . . . . . . . . . 16 2.6 Illustration of the infection process of Pst. . . . . . . . . . . . . . . 19 2.7 Illustration of a filamentous plant pathogen haustorium. . . . . . 20 2.8 The five main classes of plant disease resistant proteins. . . . . . . 25 4.1 Locations of the original detections of South African Pst pathotypes. 49 4.2 Temperature and rainfall measured in 1996 during the wheat- growing season in the Western Cape, compared to the 11 year mean. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.3 Schematic illustration of the increase of Pst virulence in South Africa. 52 4.4 Pathotype identification tests of South African Pst pathotypes. . . 52 4.5 Read frequency graphs from heterokaryotic SNP sites for SA1–SA4. 63 4.6 The phylogenetic relationship between the South African Pst iso- lates and European, Asian and East African isolates. . . . . . . . . 65 4.7 Evaluation of the number of population clusters following STRUC- TURE analyses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.8 Bar charts representing STRUCTURE population clusters. . . . . . 67 4.9 Discriminant analysis of principal components analysis of 48 Pst isolates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.10 Bar charts representing DAPC population structure analysis. . . . 70 xv LIST OF FIGURES xvi 4.11 Genetic diversity assessed between 10 population clusters. . . . . 72 5.1 Nucleotide changes that introduced stop codons. . . . . . . . . . . 92 5.2 Distribution of stop codons accross all genes per isolate. . . . . . . 92 5.3 Percentage frequency matrices of transitions and transversions at monoallelic SNP sites. . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.4 Percentage occurrence matrices of transitions and transversions at biallelic SNP sites. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.5 Codon positions of nucleotide changes at homokaryotic SNP sites. 96 5.6 Codon positions of nucleotide changes at heterokaryotic SNP sites. 97 5.7 Presence-absence analysis. . . . . . . . . . . . . . . . . . . . . . . . 103 5.8 Nonsynonymous SNPs in the gene space of the four South African isolates increase over time and with increasing virulence. . . . . . 106 5.9 Translated sequence alignment of gene PST130_00285. . . . . . . . 107 5.10 Over- and underestimates of SNP sites. . . . . . . . . . . . . . . . 109 6.1 Experimental setup for the infection time course experiment. . . . 121 6.2 Plate layouts for RT-qPCR assays. . . . . . . . . . . . . . . . . . . . 125 6.3 Linear regression showing estimated efficiency of primers. . . . . 135 6.4 Relative gene expression of nine candidate effector genes. . . . . . 138 7.1 Prevalence of Pst in South Africa between 2008 and 2016. . . . . . 147 7.2 Locations of Pst collections between 2013 and 2015. . . . . . . . . 151 7.3 Phylogenetic tree displaying the relationship between Pst isolates. 155 7.4 Relative distance maximum likelihood phylogenetic tree. . . . . . 156 7.5 Evaluation of number of population clusters following STRUC- TURE analyses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 7.6 STRUCTURE histogram plots of population clusters. . . . . . . . 159 7.7 Discriminant analysis of principal components analysis of Pst iso- lates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 7.8 Histogram plots indicating population structure as inferred by DAPC analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 7.9 Measurements of genetic diversity by FST calculation of pairs of population groups. . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 7.10 Infection type comparisons between one historical and one recent Pst isolate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 7.11 Number of international tourist arrivals in South Africa between 1995 and 2014. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 LIST OF FIGURES xvii A.1 Read frequency graphs for East African isolates analysed in Chapter 4182 B.1 Translated sequence alignment of gene PST130_02001. . . . . . . . 193 B.2 Translated sequence alignment of gene PST130_02118. . . . . . . . 194 B.3 Translated sequence alignment of gene PST130_02403. . . . . . . . 195 B.4 Translated sequence alignment of gene PST130_05023. . . . . . . . 196 B.5 Translated sequence alignment of gene PST130_05454. . . . . . . . 197 B.6 Translated sequence alignment of gene PST130_05944. . . . . . . . 198 B.7 Translated sequence alignment of gene PST130_06503. . . . . . . . 199 B.8 Translated sequence alignment of gene PST130_06558. . . . . . . . 200 B.9 Translated sequence alignment of gene PST130_07448. . . . . . . . 201 B.10 Translated sequence alignment of gene PST130_07513. . . . . . . . 202 B.11 Translated sequence alignment of gene PST130_07564. . . . . . . . 203 B.12 Translated sequence alignment of gene PST130_08031. . . . . . . . 204 B.13 Translated sequence alignment of gene PST130_08984. . . . . . . . 205 B.14 Translated sequence alignment of gene PST130_09018. . . . . . . . 206 B.15 Translated sequence alignment of gene PST130_09275. . . . . . . . 207 B.16 Translated sequence alignment of gene PST130_10286. . . . . . . . 208 B.17 Translated sequence alignment of gene PST130_12487. . . . . . . . 209 B.18 Translated sequence alignment of gene PST130_12491. . . . . . . . 210 B.19 Translated sequence alignment of gene PST130_12956. . . . . . . . 211 B.20 Translated sequence alignment of gene PST130_13969. . . . . . . . 212 B.21 Translated sequence alignment of gene PST130_14091. . . . . . . . 213 B.22 Translated sequence alignment of gene PST130_14831. . . . . . . . 214 B.23 Translated sequence alignment of gene PST130_16778. . . . . . . . 215 B.24 Translated sequence alignment of gene PST130_17605. . . . . . . . 216 B.25 Translated sequence alignment of gene PST130_17605. . . . . . . . 217 B.26 Translated sequence alignment of gene PST130_07579. . . . . . . . 218 B.27 PST130_07579 continued from previous page. . . . . . . . . . . . . 219 B.28 Translated sequence alignment of gene PST130_15131. . . . . . . . 220 B.29 PST130_15131 continued from previous page. . . . . . . . . . . . . 221 C.1 Nonsynonymous polymorphisms and primer design of the candi- date effector gene PST130_02001 in SA1 and SA4. . . . . . . . . . 223 C.2 Nonsynonymous polymorphisms and primer design of the candi- date effector gene PST130_02403 in SA1 and SA4. . . . . . . . . . 224 LIST OF FIGURES xviii C.3 Nonsynonymous polymorphisms and primer design of the candi- date effector gene PST130_05023 in SA1 and SA4. . . . . . . . . . 225 C.4 Nonsynonymous polymorphisms and primer design of the candi- date effector gene PST130_06503 in SA1 and SA4. . . . . . . . . . 226 C.5 Nonsynonymous polymorphisms and primer design of the candi- date effector gene PST130_07513 in SA1 and SA4. . . . . . . . . . 227 C.6 Nonsynonymous polymorphisms and primer design of the candi- date effector gene PST130_09725 in SA1 and SA4. . . . . . . . . . 228 C.7 Nonsynonymous polymorphisms and primer design of the candi- date effector gene PST130_12487 in SA1 and SA4. . . . . . . . . . 229 C.8 Nonsynonymous polymorphisms and primer design of the candi- date effector gene PST130_12491 in SA1 and SA4. . . . . . . . . . 230 C.9 Nonsynonymous polymorphisms and primer design of the candi- date effector gene PST130_12956 in SA1 and SA4. . . . . . . . . . 231 C.10 Graphical tests for normality and equal variances of the residuals and random intercepts. . . . . . . . . . . . . . . . . . . . . . . . . . 233 C.11 Gene and isolate specific tests for equal variances after the model was fitted to the relative gene expression values. . . . . . . . . . . 234 C.12 Gene and isolate specific tests for equal variances after the model was fitted to the relative gene expression values. . . . . . . . . . . 235 C.13 Graphical tests for normality and equal variances of the residuals and random intercepts following a log10 transformation. . . . . . 236 C.14 Gene and isolate specific normal probability plots of the residuals after the model was fitted to the log10 transformed relative gene expression values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 C.15 Gene and isolate specific tests for equal variances after the model was fitted to the log10 transformed relative gene expression values. 238 C.16 High inter-run variability in relative expression patterns. . . . . . 246 C.17 The Pfaffl method of relative gene expression shows the relative gene expression of SA1 to SA4. . . . . . . . . . . . . . . . . . . . . 247 D.1 Read frequency graphs from heterokaryotic SNP sites for the recent South African field isolates. . . . . . . . . . . . . . . . . . . . . . . 249 D.2 Read frequency graphs from heterokaryotic SNP sites for the recent East African field isolates. . . . . . . . . . . . . . . . . . . . . . . . 250 D.3 Circular relative distance maximum likelihood phylogenetic tree. 251 List of Tables 1.1 Domestic grain consumption of the three highest consumed grains worldwide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1 Whole genome sequencing projects using next- and third-generation sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.1 Global isolates included in the clustering and genetic diversity analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.2 Historical isolates used in re-sequencing and an infection time course experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.3 Statistics of read alignment of the historical South African isolates to the PST130 reference genome . . . . . . . . . . . . . . . . . . . . 63 5.1 Homokaryotic and heterokaryotic SNPs in the South African isolates 90 5.2 The number of SNPs identified in coding regions of the four South African Pst isolates . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.3 Polymorphic genes with positive dN values indicating nonsynony- mous changes in isolate pairwise comparisons . . . . . . . . . . . 99 5.4 Polymorphic genes with positive dS values indicating synony- mous changes in isolate pairwise comparisons . . . . . . . . . . . 99 5.5 Number of absent genes in the four South African Pst pathotypes 100 5.6 Potential orthologs of genes absent in all four of the South African isolates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.7 The number of potential paralogs identified in genes absent in all four South African isolates . . . . . . . . . . . . . . . . . . . . . . . 102 5.8 Potential paralogs of genes absent in the four South African isolates102 5.9 Potential orthologs of genes absent in three or less of the South African isolates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 5.10 Number of potential paralogs in PST130 . . . . . . . . . . . . . . . 104 xix LIST OF TABLES xx 5.11 Paralogs of genes that only occurred in one of the South African isolates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 6.1 Effector features of the identified candidate effectors . . . . . . . . 119 6.2 Summary statistics describing RNA yield, integrity and cDNA yield as required in the MIQE guidelines . . . . . . . . . . . . . . . 132 6.3 Primer and amplicon specifications for Pst candidate effector gene identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 6.4 Significance of the factor “Time Point” in the linear mixed model for those genes where it was significant . . . . . . . . . . . . . . . 139 6.5 Multiple comparisons between time points for each gene that showed significant difference in expression over the time series . 140 7.1 Wheat differential lines used at Agricultural Research Council, Small Grain, South Africa . . . . . . . . . . . . . . . . . . . . . . . 146 7.2 African isolates collected between 2013 and 2015 . . . . . . . . . . 150 7.3 Infection type scores used to assess Pst infection on wheat seedlings153 B.1 PST130 genes (211) that were absent in all four historical South African isolates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 D.1 Differential testing of South African Pst isolates previously defined as pathotype 6E16A- on an extended set of wheat seedling testers 252 D.2 Differential testing of South African Pst isolates previously defined as pathotype 6E22A+ on an extended set of wheat seedling testers 253 List of Abbreviations 3′ three prime 5′ five prime A adenine ABI Applied Biosystems Integrated ADP Adenosine diphosphate ACTB β-Actin AFLP Amplified Fragment Length Polymorphism ANOVA analysis of variance ARC-SG Agricultural Research Council, Small Grain ARF ADP ribosylation factors ATP Adenosine triphosphate Avr avirulence BAC bacterial artificial chromosome BAM binary alignment map BBSRC Biotechnology and Biological Sciences Research Council BGRI Borlaug Global Rust Initiative BIC bayesian information criterion bp base pairs C cytosine CAF Central Analytical Facilities cDNA complementary DNA CEC Crop Estimate Committee CIMMYT International Maize and Wheat Improvement Center CTAB cetyltrimethylammonium bromide CVEGE clonal variation in effector gene expression DA discriminant analysis xxi LIST OF ABBREVIATIONS xxii DAPC discriminant analysis of principal components DNA deoxyribonucleic acid dpi days post inoculation ds double stranded dsDNA double stranded DNA EMS ethyl methanesulfonate EST expressed sequence tag ETI effector-triggered immunity FIR flanking intergenic regions G guanine GAPs GTPase activating proteins GAPDH glyceraldehyde 3-phosphate dehydrogenase gDNA genomic DNA gene virus induced GTR general time reversible HCD hypersensitive cell death HIGS host-induced gene silencing HMC haustorial mother cell IH infection hyphae IP infection peg kbp kilo base pairs Lr wheat leaf rust resistance gene designation MAMPs microbe-associated molecular patterns MAS marker assisted selection MBBISP Monsanto Beachell-Borlaug International Scholars Program Mbp mega base pairs MCMC Markov Chain Monte Carlo miRNAs microRNAs mRNA messenger RNA MSL Molecular marker Service Laboratory NB-LRR nucleotide-binding site (NBS)-leucine-rich repeat (LRR) proteins NBI Norwich BioScience Institutes NGS next-generation sequencing NLS nuclear-localisation signal LIST OF ABBREVIATIONS xxiii NMD nonsense-mediated mRNA decay NTC non template control oligo-dTs thymine oligonucleotides PAMPs pathogen-associated molecular patterns PCA principal component analysis Pgt Puccinia graminis f. sp. tritici PI phosphoinositide PR pathogen-related PRRs pathogen receptor proteins Pst Puccinia striiformis f. sp. tritici Pt Puccinia triticina PTI PAMP triggered immunity qPCR quantitative or real time PCR R resistance or resistant RAxML randomized axelerated maximum likelihood RIN RNA integrity number RNA ribonucleic acid RNA-Seq RNA sequencing ROS reactive oxygen species RT reverse transcriptase S susceptible (as in Avocet S) SAGL South African Grain Laboratory SAM sequence alignment map SCAR sequence-characterised amplified region SCPRID Sustainable Crop Production Research for International Development SCR small and cysteine rich siRNA small interfering RNA SNP single nucleotide polymorphism SNPs polymorphisms Sr wheat stem rust resistance gene designation ss single strand SSV substomatal vesicle T thymine tRNA transfer RNA LIST OF ABBREVIATIONS xxiv TUBB β-Tubulin UK United Kingdom UKCPVS UK Cereal Pathogen Virulence Survey USA United States of America UTRs untranscribed regions UV ultraviolet VIGS virus induced gene silencing WC wheat control WCT Winter Cereal Trust Yr wheat stripe rust resistance gene designation Z12 Zadoks growth stage 12 Mathematical notation CT threshold cycle FT fluorescence threshold R2 Pearson correlation coefficient Chapter 1 General Introduction WHEAT IS A STAPLE CROP in many countries around the globe, including South Africa. In most areas of wheat cultivation, one or more of the three rust diseases have the potential to severely compromise yields (Kolmer, 2005; Huerta-Espino et al., 2011; Shaw and Osborne, 2011; Dean et al., 2012; Beddow et al., 2015). Rusts are specialised in infecting wheat and maintain an obligatory parasitic symbiosis with susceptible hosts throughout their life cycles, using resources predestined for plant growth, maintenance, and grain development to ultimately produce multitudes of spores (Chen, 2005). The continuously growing demand for wheat requires careful consideration of mechanisms to address host resistance to manage these crippling diseases. Management strategies aim to increase crop yields and reduce quantities of inoculum. Smaller rust population sizes reduce the potential of the fungus to gain new pathogenicity through evolutionary machineries such as mutation and somatic and sexual recombination (Hovmøller and Justesen, 2007a; Jin et al., 2010; Zhao et al., 2013; Jiao et al., 2017). 1 CHAPTER 1: GENERAL INTRODUCTION 2 1.1 Socio-economic importance of wheat Bread wheat, Triticum aestivum L., is an important food source making up 20 % of global calories and protein intake (Shiferaw et al., 2014). Recent estimates placed domestic consumption at 736.86 million tons for the 2016/2017 market year (FAS USDA, 2017). The three most prominent staple grains—wheat, maize and rice— are under heavy pressure for increased yields to secure food for the growing world population (Table 1.1; FAS USDA, 2017). By mid-2017, the estimated global population size was 7.6 billion people and predictions estimate increases of up to 9.8 billion by 2050, with a further increase to 11.2 billion by 2100 (United Nations, 2017). The growing population places increased pressure on crop production as a primary source of human nutrition, animal feed and bio-fuel (Edgerton, 2009). Yield improvements of roughly 2.4 % per year are needed to be able to meet the target of doubling global crop production, but currently global average rates are failing to reach this target (Ray et al., 2013). Other sectors also rapidly out-compete the agricultural sector for land, adding to the pressure to produce enough food for the growing population, while acreage continues to diminish. 1.2 Wheat cultivation in South Africa Wheat was brought to South Africa by the Dutch settlers in 1652 and drought, wind, and disease challenged early wheat production (Du Plessis, 1933), as is still the case today (FAS USDA, 2016). Currently, wheat is the most planted winter cereal crop in South Africa, ranking second to maize for overall crop size (SAGL, 2012) and consumption (FAS USDA, 2017) in the country. South Africa is the largest consumer of wheat in Sub-Saharan Africa, and population growth and urbanisation will likely continue to increase the demand (ITA USDC, 2017). Most of the crop is cultivated on dry land and grown in CHAPTER 1: GENERAL INTRODUCTION 3 Table 1.1: Domestic grain consumption of the three highest consumed grains worldwide, in million tons, as recorded for 2016/17 (FAS USDA, 2017) Grain World South Africa Rice 478.46 0.83 Wheat 736.86 3.40 Maize 1 053.85 11.70 the winter-rainfall areas of the Western Cape where there is a Mediterranean climate. Here wheat is planted from mid-April until mid-June, and harvested from October to December. In the Eastern Free State, a summer-rainfall area, wheat is sown from June until August and harvested between November and January. Irrigated wheat cultivation is practised in the Northern Cape, using water from the Orange River (Van Niekerk, 2001; SAGL, 2012). The trend over the last 20 years indicates a reduction in wheat cultivation (Figure 1.1). This is driven by considerable annual production fluctuations, caused by unpredictable weather patterns and declining profit margins for wheat (AgriOrbit, 2017). The decrease in wheat cultivation—in favour of other, more climate tolerant and often higher value crops such as maize, canola and soybeans— increases the dependence on imports to meet the growing wheat demand in South Africa (ITA USDC, 2017). The prominent grain industry in South Africa contributes more than 30 % of the total gross value of agricultural production in the country (DAFF, 2015). On average, 63 % of the total demand over the past 10 years was produced domesti- cally, while the remainder was imported (DAFF, 2016). In 2017, 1.8 million tons of wheat were imported (IndexMundi, 2017). South Africa exported 0.2 million tons in 2017 (IndexMundi, 2017). Most exports are destined for neighbouring coun- tries, Zambia, and Mauritius (FAS USDA, 2016). The increase in yield, despite the reduction in planted area (Figure 1.1), can be attributed to improved agronomic practices and the development of better CHAPTER 1: GENERAL INTRODUCTION 4 WHEAT CULTIVATION IN SOUTH AFRICA SINCE 1990 3000 4 3.5 2500 3 2000 2.5 1500 2 1.5 1000 1 500 0.5 0 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 9 9 9 * 0/ 1/ 2/ 3/ 4/9 0 1 2 3 4 5 6 7 8 9 9 9 9 9 95 /9 6/9 7/9 8/9 9/0 /0 /0 /0 /0 /09 9 9 9 00 01 02 03 04 05 /0 6/0 7/0 8/0 9/1 /1 /1 /1 /1 /1 /1 /1 1 9 9 9 9 9 9 9 9 9 9 0 0 0 0 0 0 00 00 00 00 01 0 11 12 130 0 0 01 4 150 01 6 17 / 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 20 Production years Area (ha) Production (t) Yield (t/ha) Linear (Area (ha)) Linear (Yield (t/ha)) Figure 1.1: Area harvested, production and yield statistics for South African wheat cultivation between 1990 and 2017 (adapted from Production Reports - Crop Estimate Committee (CEC), GRAIN SA, 2017). varieties. Together and in parallel with global efforts, local research has assisted South African wheat breeders by improving yields, bread making quality, and pest and disease resistance of South African wheat varieties (Smit et al., 2010). 1.3 Wheat rusts reduce yields Some of the extra demand for wheat has been met by continuing genetic im- provement, which leads to the development of high-yielding varieties, but the protection of crops from diseases remains critical to support the higher produc- tion requirements (Edgerton, 2009). The wheat rusts—leaf (brown) rust, stem (black) rust and stripe (yellow) rust—occur in most wheat-growing areas around the world and cause widespread disease which is detrimental for yields (Kolmer, 2005; Dean et al., 2012). Rust infection cripples all components of the host, whilst robbing the plant of water and nutrients (Panstruga and Dodds, 2009; Chen et al., Thousand ha or ton 1550.6 1702.4 1434.0 2133.0 747.3 1316.1 1064.8 1975.3 1039.5 1832.2 1363.2 1968.5 1293.8 2700.0 1382.3 2500.5 745.0 1687.5 718.0 1770.0 934.0 2348.6 973.5 2450.0 941.1 2427.0 748.0 1540.0 830.0 1680.0 805.0 1905.0 764.8 2105.0 632.0 1905.0 748.0 2130.0 642.5 1958.0 558.1 1430.0 604.7 2005.0 511.2 1870.0 505.5 1870.0 476.6 1750.0 482.2 1457.0 508.4 1909.5 496.4 t/ha CHAPTER 1: GENERAL INTRODUCTION 5 2015). Rusts further reduce water content in the host through compromising the epidermis. This allows increased water evaporation and renders the plant an easy target for secondary attack by other pests and diseases (Bockus and Wiese, 2010; Malinovsky et al., 2014). Rust infection ultimately results in the death of photosynthetic tissues (Chen et al., 2015). Together, water and green tissue loss decrease the ability of the plant to trap solar energy through photosynthesis for growth and production of grain (Bockus and Wiese, 2010; Chen et al., 2015). The occurrence of rust on wheat in South Africa was documented in reports dating back to 1726 (Du Plessis, 1933) and today all three rusts occur in South Africa (Pretorius et al., 2007). Pretorius et al. (2007) explain early record-keeping of rust occurrence in South Africa: Improved records became available as struc- tured pathotyping of stem rust started in 1920, and in 1960 regular surveys were introduced. Leaf rust pathotypes were first described in 1937 but were not closely monitored until the 1980s after new pathotypes caused significant yield losses. In contrast, no early official disease reports could be found for stripe rust. In 1996, however, it was seen on spring wheat in the Western Cape and sur- veys throughout the growing season revealed stripe rust infections in most of the winter-rainfall wheat cultivating areas. Irrigated wheat in the Northern Cape was also under attack (Pretorius et al., 1997). Mean yield losses attributed to wheat rusts in South Africa were estimated to be between 35 and 65 % (Pretorius et al., 2007). Given the global and local importance of wheat, the detrimental effects of rust pathogens, and the constant emergence of new pathotypes of the pathogen, researchers need to continually monitor the changing rust populations, while searching for new ways and sources of resistance to protect wheat (McIntosh et al., 1995). CHAPTER 1: GENERAL INTRODUCTION 6 1.4 Motivation for this study The foliar disease stripe rust, caused by the biotrophic fungus Puccinia stri- iformis Westend. f. sp. tritici (Pst), results in major yield losses annually around the globe (Hovmøller et al., 2010). Growing resistant host varieties has reduced the impact of stripe rust (Hovmøller et al., 2016). However, knowledge of increased aggressiveness and shifts in Pst populations (Milus et al., 2009; Rodriguez-Algaba et al., 2014; Hubbard et al., 2015; Hovmøller et al., 2016; Bueno-Sancho et al., 2017) encourages investigation of this pathogen and how it is actively evolving in different geographical areas. At the start of this project, little was known about the genetic diversity of stripe rust in South Africa. Previous work had genotyped South African Pst pathotypes using amplified fragment length polymorphism (AFLP) markers (Hovmøller et al., 2008) and more recently microsatellite markers (Ali et al., 2014; Visser et al., 2016). However, these limited marker systems do not provide a comprehensive genetic picture of the changes that can occur in a Pst population. The four pathotypes of Pst found in South Africa suggest a clonal lineage, which has evolved within South Africa since its original introduction in 1996 (Visser et al., 2016). In this study, next-generation sequencing and advanced bioinfor- matic tools were used to answer a number of questions regarding the origins of Pst and the evolution of the Pst population within South Africa. Through an examination of candidate effector genes, the study also aimed to facilitate a better understanding of the biological interaction between wheat and Pst. 1.5 Objectives Stripe rust first appeared as a significant field disease in South Africa in 1996 (Pretorius et al., 1997). Since its introduction, four distinct pathotypes of Pst have CHAPTER 1: GENERAL INTRODUCTION 7 been detected and pathologically confirmed, the last being identified in 2005 (ZA Pretorius, unpublished data). This well-defined and presumed clonal population of Pst formed an ideal population to study the genetic evolution of Pst within a defined geographical region, addressing the hypothesis of a stepwise gain of virulence within the four South African pathotypes. The availability of next-generation sequencing datasets of Pst isolates from locations in East Africa, South Asia and Europe also allowed a comparative approach to determine where the Pst introduction in 1996 may have originated. In addition, the genome sequences obtained for each of the four historical South African Pst isolates were used to identify candidate effector proteins that may be associated with avirulence. Lastly, a survey of Pst isolates in 2013, 2014, and 2015 within South Africa was undertaken to compare current field isolates to the four historical isolates to assess the stability of Pst populations across cropping seasons. 1.6 Thesis outline and approaches Background information concerning Pst can be found in Chapter 2, while detailed methodology is described in Chapter 3 or the relevant research chapters. Five approaches were undertaken to characterise the Pst population in South Africa, which are presented across four research chapters. Firstly, the genomes of the four historical South African pathotypes were sequenced using Illumina next- generation sequencing. In Chapter 4, this data was analysed using phylogenetic and statistical clustering analyses to assess the relationship and genetic diversity between isolates and to hypothesise a potential origin of the Pst incursion in South Africa in 1996. To further describe the differences between the four South African pathotypes, comparative genomics analyses were performed, as presented in Chapter 5, by investigating signatures of positive selection, as well as the presence CHAPTER 1: GENERAL INTRODUCTION 8 or absence of genes and polymorphisms in genic regions. Chapter 6 reports an RT- qPCR approach that was used to assess candidate effectors showing differential gene expression between different Pst pathotypes. To compare the more recent field population of Pst with the historical South African isolates and to describe the evolutionary dynamics in the Pst population within South Africa, Pst-infected wheat leaf tissues from the 2013–2015 seasons were collected and sequenced using an RNA sequencing (RNA-Seq) approach. Pathotyping of a selection of the 2013–2015 field isolates was conducted to link their genotypes to their pathotypes and identify any isolates with profiles distinct to those previously identified in South Africa. This research on the recent population is discussed in Chapter 7. A final discussion of the findings and last remarks for future research conclude the thesis in Chapter 8. Chapter 2 The Wheat Rusts: Life Histories, Host Response Mechanisms and Genomic Resources 2.1 The rusts 2.1.1 Filamentous plant pathogens FILAMENTOUS PLANT PATHOGENS are highly specialised and include a wide va- riety of fungi and oomycetes (Wang et al., 2017). Ascomycota and Basidiomycota are both phyla in the fungi kingdom. The oomycetes include an array of plant pathogens that share many morphological characteristics with fungi, although be- ing distantly related. A representation of the phylogenetic relationships between a number of plant pathogens are illustrated in Figure 2.1 (Fernández-Ortuño et al., 2007). Many of these pathogens have a similar infection process, using haustoria to maintain close interaction with the host (Dodds et al., 2009). There exist thousands of different rust species in the Pucciniales order, of which about 4000 species belong to the genus Puccinia (Hawksworth et al., 1995; Kirk et al., 2008). 9 10 Ascomycetes Blumeria graminis f. sp. tritici Podosphaera fusca Mycosphaerella jiensis 0.1 Powdery mildew Powdery mildew Black Tsigatoka (wheat) (melons) (Banana leaf-spot) Botrytis cinerea Venturia inaequalisLeotiomycetes Dothideomycetes Apple scabBroad host range (viticulture: botrytis bunch rot) (horticulture: grey mould) Verticillium dahliae Verticillium wilt Oomycetes (Broad host range) Sordariomycetes Phomopsis viticola Downy mildew Podospora anserina (Grapevine) Model fungus Phytophthora infestans Magnaporthe grisea Late blight Rice blast (Potato and tomato and (Rice and other cereals) some other nightshades) Fusarium oxysporum Broad host range Uromyces appendiculatus Bean rust Puccinia graminis f. sp. tritici Basidiomycetes Wheat stem rust Bootstrap > 80% Figure 2.1: The phylogenetic relationship of plant pathogenic ascomycetes, basidiomycetes and oomycetes following Neighbour Joining analysis (adapted from Fernández-Ortuño et al., 2007). Bootstrap values are obtained from 1000 replications. The length of the bar represents 0.1 substitutions per nucleotide. The tree was constructed using nucleotide sequences of nuclear ribosomal DNA internal transcribed spacer regions. CHAPTER 2: WHEAT RUSTS 11 2.1.2 Rusts and their primary host Rusts are a group of fungi that are harmful to a wide variety of plants with high socio-economic importance such as cereals, legumes, fruit trees, sugarcane, coffee and trees (See taxonomic classification in Figure 2.2; Kirk et al., 2008). Kingdom: Fungi Subkingdom: Dikarya Phylum: Basidiomycota Class: Pucciniomycete Order: Pucciniales Family: Pucciniaceae Genus: Puccinia P. striiformis Species: P. graminis P. triticina Figure 2.2: Taxonomic classification of the wheat rusts (Chen, 2005; Kirk et al., 2008). Within Puccinia (P.) species, different formae speciales (f. sp.) describe spe- cialisation towards specific grass hosts (Anikster, 1984; Wellings, 2007). To date nine f. sp. have been defined (Chen et al., 2017). The three wheat rusts that infect wheat are obligate biotrophs, requiring living plant tissues from which they extract water and nutrients (Dean et al., 2012). Stem rust, also known as black rust, occurs on the leaf and stem surface as oval-shaped brick-red pustules that burst through the host tissue and is caused by the fungus P. graminis Pers. f. sp. tritici, or Pgt (Schumann and Leonard, 2000). Leaf rust, also known as brown rust, caused by P. triticina Erikss. (Pt) is the most common of the three rusts and the orange to brown spores occur on the leaf surface in round lesions (Bolton et al., 2008). Stripe rust mainly forms yellow to orange lines as pustules occur along leaf veins of adult plants, but it can also infect other parts of the plant such as leaf sheaths, glumes and awns. Stripe rust of wheat, also known as yellow rust, is caused by P. striiformis Westend. f. sp. tritici (Pst; Roelfs and Hettel, 1992). CHAPTER 2: WHEAT RUSTS 12 Each f. s. is further divided into races, strains, or pathotypes (Wellings, 2007), where the ability to infect the host plant depends on the avirulence genes carried by the Pst isolate and the resistance genes present in the host plant genotype (Chen, 2005). In the present study, the term “pathotype” is used throughout. To further describe the differences in different rust genotypes, a set of wheat lines with known resistances, is used in infection assays to determine the virulence profile of the isolate. These host plant genotypes form a differential set and the range of Pst infection phenotypes seen on each host plant genotype define the pathotype of the Pst isolate (Allison and Isenbeck, 1930; Roelfs et al., 1992). 2.1.3 The alternative host Besides the grass hosts, the rust fungi can also infect a second group of hosts. Pgt has been known to infect alternative hosts Berberis L. (Jin et al., 2010; Zhao et al., 2011) and Mahonia Nutt. (Wang and Chen, 2013), while Pt infects Thalictrum spp. as alternative hosts (Bolton et al., 2008). Only recently has Berberis been confirmed as an alternative host for Pst. Berberis spp. are not native to South Africa, but are popular ornamentals, commonly stocked by nurseries and are becoming invasive in the wild (Keet, 2015). In South Africa, cultivation of 24 species of Berberidaceae, including 18 Berberis and 5 Mahonia have been reported (Glen, 2002). Among these are rust susceptible Berberis holstii, Berberis vulgaris, and Berberis aristata (Keet, 2015), but Jin (2011) advised that many more susceptible species could still be discovered. The sexual life cycle of rust fungi is completed in the alternative host (Chen, 2005). Infection of the alternative host has not been reported in South Africa. The rare occurrence thereof globally is fortunate, as it limits the potential for sexual recombination that can lead to faster evolving populations. CHAPTER 2: WHEAT RUSTS 13 2.1.4 Global distribution of stripe rust Stripe rust exists in most parts of the world where wheat is cultivated and continues to spread (Figure 2.3). In recent years epidemics of stripe rust have been seen in regions of the world where it did not previously occur (Chen, 2005; Milus et al., 2006). In contrast with the other rusts, distant dispersal of Pst has only recently been reported (Zadoks, 1961; Hovmøller et al., 2002; Justesen et al., 2002; Hovmøller and Justesen, 2007b; Wellings, 2011). There is evidence that new pathotypes of Pst are more aggressive and able to thrive at higher temperatures, showing the ability of this fungus to adapt to new environments (Milus et al., 2006; Markell and Milus, 2008). To date, aggressive pathotypes have not been described in South Africa. 2.1.5 Favourable conditions for wheat rusts The occurrence of stripe rust on wheat is dependent on climatic and environmen- tal conditions. Compared to leaf and stem rust, stripe rust has lower temperature optima, is prominent in cooler, high altitude and maritime regions and tends to occur earlier in the growing season (Chen, 2005). Stripe rust urediniospore ger- mination is most successful between 9 ◦C to 13 ◦C, while stem rust’s germination optimum is higher at 15 ◦C to 24 ◦C (Roelfs et al., 1992) and leaf rust, the most ver- satile and common, can infect the host in temperatures ranging between 10 ◦C to 25 ◦C (Bolton et al., 2008). Reports of adaptation to higher temperatures in newly emerging Pst populations in North America (Milus et al., 2006) show that higher temperatures, while suboptimal, is not insurmountable to Pst. Another study suggests that with sufficient light intensity, high temperatures are not necessarily inhibiting to Pst infection (de Vallavieille-Pope et al., 2002). However, Chen (2005) reports that temperatures below −10 ◦C can kill the pathogen in infected leaves. Free moisture in the form of rain or dew for 3 to 6 hours is essential for germi- CHAPTER 2: WHEAT RUSTS 14 Global 1960–1999 Not recorded Rare Localised in some seasons Localised in most seasons Widespread in some seasons Widespread in most seasons Global 2000–2012 N/A Figure 2.3: Global distribution of Puccinia striiformis f. sp. tritici, before and after 2000 (from Beddow et al., 2015). CHAPTER 2: WHEAT RUSTS 15 nation of Pst urediniospores (Roelfs et al., 1992; Chen, 2005). On the contrary, dry weather and wind, towards the end of the growing season, are favourable for pathogen survival, as dry spores stay viable for longer and are wind dispersed (Zillinsky, 1983; Chen, 2005). Compared to moisture and temperature optima, little work has been done on optimal light requirements during the rust life cycle. There is some evidence that exposure of wheat seedlings to elevated light intensities before inoculation with urediniospores increases infection success (de Vallavieille-Pope et al., 2002). Conversely, compared to stem and leaf rust, Pst urediniospores are sensitive to ultraviolet light, and excess exposure reduces long-term viability (Roelfs et al., 1992). 2.1.6 Infection cycle of Puccinia rusts The life cycles of the three wheat rusts are similar. In this section, the Pst life cycle is described. There are five spore stages in the life cycle of Pst. Three of these—urediniospores, teliospores and basidiospores—occur on wheat and the remaining two—pycniospores and aeciospores—on the alternative host. This is illustrated in Figure 2.4. Very few cases of sexual reproduction have been reported, leaving the fungus to almost completely rely on asexual reproduction (Jin et al., 2010; Zhao et al., 2013; Chen et al., 2017). In areas where the sexual cycle takes place, aeciospores are formed after infection of the alternative host (Chen, 2005). These spores can infect wheat and result in pustules releasing urediniospores for reinfection. In the majority of regions, where the grass host is the main or only host, only urediniospores are available for host infection. Characteristic of this spore stage, each spore carries two haploid nuclei. About two weeks after urediniospores landed on a leaf and entered the leaf through the stoma, the newly produced, yellow urediniospores erupt through the surface of the leaf (Figure 2.5). The urediniospores are dispersed by wind, or the mechanical action resulting CHAPTER 2: WHEAT RUSTS 16 Uredia Telia Teliospore Mini cycle of infection by urediniospores 2n Aeciospores infection Basidiospores on wheat n + n Asexual stage on wheat Aeciospore Sexual stage on Berberis spp. n Aecial-cup clusters Aecial-cup bearing aeciospores Pycnium n + n Pycniospores Pycnial nectar n n Figure 2.4: Spore stages and the infection cycle of Pst. The mini cycle of (re)infection, indicated with red arrows, is the primary source of inoculum for most stripe rust outbreaks in wheat-growing areas worldwide. Only recently, the sex- ual cycle, indicated with blue arrows, have been observed under natural conditions in China (from Zheng et al., 2013). Figure 2.5: A stripe rust uredinium pustule. Thousands of yellow spherical echinulated spores, typically 28–34 µm in diameter (Zillinsky, 1983), erupts through the wheat leaf surface (Photo: Kim Findley, John Innes Center, UK). CHAPTER 2: WHEAT RUSTS 17 from raindrops falling onto leaves (Chen, 2005). This phase of Pst development constitutes the asexual cycle. This cycle typically takes 12 to 14 days depending on the isolate and environmental conditions (Chen et al., 2014), but Australian studies confirmed a shorter life cycle in aggressive Pst pathotypes (Sharma, 2012). The number of infection cycles the pathogen complete in a season determines the severity of the epidemic (de Vallavieille-Pope et al., 2012). Urediniospores can over summer on voluntary wheat plants and other sus- ceptible grasses. Examples include the wild rye species, Secale L. strictum subsp. africanum, seen in South Africa (Pretorius et al., 2007, 2015). Alternatively, towards the end of the wheat-growing season, as the wheat plant undergoes senescence, infection sites from some Pst isolates can form telia (Chen et al., 2014). The subepi- dermal telia are present on both sides of the leaf blade and produce dark brown, two-celled, oblong-clavate teliospores (Zillinsky, 1983; Chen, 2005; Chen et al., 2014). Through karyogamy, the nuclei in each of the two cells of the teliospore fuse, resulting in two diploid cells. The diploid nucleus in each cell undergoes meiosis, and the two cells grow into a promycelium of four cells. This develops into a basidium consisting of four cells, each of which releases a haploid basid- iospore. These basidiospores can infect an alternative host, initiating the sexual cycle (Chen et al., 2014). The haploid basidiospores infect the alternative host and forms either pycnia (female) or spermagonia (male) on the adaxial side of the leaf. These spore- producing structures contain haploid reproductive structures. Rusts are het- erothallic, and spermatia produce pycniospores (the male gametes), which are transferred to pycnia to fertilise receptive hyphae, the female gamete (Rapilly, 1979). Dispersal of pycniospores can be facilitated by precipitation running down the leaf, while the pycnia also produce nectar. It has been described in stem and leaf rust that visiting insects that come into contact with the nectar can act as vectors to spread the spermatia to other pycnia (Leonard and Szabo, 2005; Bolton CHAPTER 2: WHEAT RUSTS 18 et al., 2008). After fertilisation, plasmogamy of compatible mating types develops into a dikaryotic primordium, which matures into an aecium on the abaxial side of the alternative host leaf. The aecium produces dikaryotic aeciospores that can only infect the primary host (wheat), forming an urediospore-producing uredium—the starting material for the roughly 14 day asexual cycle that contin- ues on wheat throughout the growing season (Chen et al., 2014). Currently, two factors are considered responsible for the rare occurrence of sexual recombination. Firstly, contrasting to other rusts of wheat, teliospores do not enter a dormant phase and readily germinate under prolonged dew condi- tions (Chen et al., 2014). The time frame in which viable teliospores exist is thus short. Secondly, germination of teliospores requires very specific environmental conditions. The rare occurrence of alternative host infection by Pst testifies to the fact that spore availability and lengthy periods of dew formation do not often coincide. Such a natural occurrence has only been recorded twice, both times in China (Zhao et al., 2011, 2013). Although infection of the alternative host remains rare, these observations explain the increased Pst population variation found in the Himalayan region, compared to other regions (Ali et al., 2014). Barberry is also common in these areas, further supporting the hypothesis of genetic recombination through sexual reproduction in the Himalaya region (Ali et al., 2014). Additional evidence based on AFLP and microsatellite markers illustrates the need for further investigation, determining the importance of the sexual stage in Pst for the generation of genetic variability (Mboup et al., 2009; Duan et al., 2010; Zheng et al., 2013). Fortunately, in South Africa and most other wheat-growing areas where stripe rust occurs, mutation and somatic hybridisation are believed to be the major sources of variation, theoretically supporting slower evolution. However, in the absence of the sexual cycle, somatic recombination can still contribute to variation leading to the formation of new pathotypes, as described by Lei et al. (2017). CHAPTER 2: WHEAT RUSTS 19 2.1.7 The stripe rust infection process on wheat Wheat, as the primary host of Pst, provides water and photosynthates for uredio- niospore production, maintaining the dominant asexual stage (Chen et al., 2014). Throughout the wheat-growing season, it repetitively infects the crop while cycling through clonal reproduction (Figure 2.6). Pst, as an obligate biotroph, needs to maintain the integrity of the plant cells during this infection process. Resources, predestined for plant growth and grain development, are diverted by the fungus for hyphal growth and spore production. In resistant wheat varieties, the evoking of a cellular hypersensitive response causes necrosis and chlorosis, stopping pathogen development but further compromising the plant’s ability to photosynthesise (Chen, 2005). Figure 2.6: Illustration of the infection process of Pst (from Cantu et al., 2013). dpi, days post inoculation; S, uredinospore; SV, substomatal vesicle; IH, invasive hyphae; HM, haustorial mother cell; H, haustorium; P, pustule; G, guard cell. With sufficient moisture on the leaf surface for the urediniospore to germinate, the germ-tube grows across the leaf surface in search of a stoma through which it enters the plant. Unlike Pgt and Pt, Pst does not produce a visible appressorium (Niks, 1989). A substomatal vesicle (SSV) forms within the substomatal cavity from which up to four infection hyphae (IH) develop (Figure 2.6). When an IH CHAPTER 2: WHEAT RUSTS 20 reaches a mesophyll cell, the tip of the IH differentiates a haustorial mother cell (HMC). An infection peg (IP) forms at the tip of the HMC that breaches the cell wall of the plant mesophyll cell (Figure 2.7). Spore or hypha Plant extracellular space Infection peg Neck band Host cell wall Host plasmalemma Extrahaustorialmembrane Effector with Extrahaustorial N-terminal Haustorium matrix secretion tag Pathogen cell wall Mature effector Secretory and plasmalemma pathway Exocytosis Host cytoplasm Endocytosis ? Figure 2.7: Illustration of a filamentous plant pathogen haustorium. Three mem- branes and the extra haustorial matrix separate the host cytoplasm and the pathogen’s haustorium content. The pathogen cell wall and plasmalemma is situated on the haustorium side. The modified host plasma membrane and neck band seals off the haustorial matrix from the host cytoplasm. Effector delivery is illustrated by the inset (from Panstruga and Dodds, 2009). Some fungi use mechanical force aided by the turgor of the cell to breach the cell wall, for example in Magnaporthe oryzae (Hebert) Barr, or enzymes as in the case of Pgt (Duplessis et al., 2011), or a combination, as used by powdery mildew (Pryce-Jones et al., 1999). A different set of enzymes has been found in Pgt and other fungi, that likely plays a role in disguising the penetrating hyphae by remodelling of the fungal cell wall (El Gueddari et al., 2002). However, it is currently unknown how the Pst IP achieve cell wall penetration (Panstruga and Dodds, 2009). Having breached the plant cell wall, Pst needs to establish a compatible association with the cell, keeping it alive while feeding. From the end of the CHAPTER 2: WHEAT RUSTS 21 IP a haustorium develops that invaginates the plant cell membrane, causing the plant cell membrane to envelope the haustorium (Figure 2.7; Panstruga and Dodds, 2009). Three layers separate the content of the haustorium from the cytosol of the plant cell: the haustorial plasmalemma, the haustorial wall and the extrahaustorial membrane. The haustorial membrane and wall are surrounded by a gel-like layer, called the extrahaustorial matrix (Panstruga and Dodds, 2009). The extrahaustorial membrane is likely derived from the plant cell plasma membrane and is in contact with the cytoplasm of the plant cell (Szabo and Bushnell, 2001). Due to the biotrophic nature of cereal rust pathogens, it is mostly impossible to culture the fungus artificially. As the multi-layered haustorium cannot be grown in vitro (Panstruga and Dodds, 2009), the exact mechanisms of how transport across the membranes is facilitated are currently not confirmed. The haustorium has a dual function, allowing two-way traffic across the membranes (Mendgen et al., 2000). It acts as a feeding structure to take up amino acids and sugars from the host (Panstruga and Dodds, 2009), while at the same time delivering fungal molecules to the plant that enable pathogenicity (Mendgen et al., 2000). Among these are effector proteins that are delivered into the host cytosol and the apoplast, altering plant processes to the advantage of the pathogen, while protecting itself against the host defence systems (Kamoun, 2007; Rovenich et al., 2014; Petre et al., 2016a). Once infection is established, long hyphae branch lengthwise within the leaf, colonising a large area and causing the typical striped pattern of uredinia seen on older plant leaves (Moldenhauer et al., 2006). 2.2 Combating wheat stripe rust Agronomic management of stripe rust involves both the deployment of host resis- tance and the application of fungicides. Multiple fungicide applications are often CHAPTER 2: WHEAT RUSTS 22 required during the wheat-growing season, being costly, potentially problematic to the environment, and not always 100 % effective, whereas the right combina- tion of resistance genes can provide complete stripe rust resistance (Boshoff et al., 2003). Despite treatment, significant losses have been recorded (Oerke and Dehne, 2004). Even with resistance breeding and chemical crop protection, yield losses of 14 % to 40 % have been reported (Flood, 2010). Increased success in protection of foliage and ears can, however, be achieved when fungicide application is timed correctly (Boshoff et al., 2003). Genetic resistance in the host causes selection pressure on the pathogen to overcome that resistance. Strategies to relieve selection pressure include the rotational deployment of resistance genes, regional gene deployment and pyramiding of resistance genes (Chen et al., 2017). Quantitative, polygenic resistance is considered a better choice due to its potential durability and will be discussed later in this chapter. 2.3 Plant defence mechanisms Plants have passive and active defence mechanisms to protect them from biotic stresses. A compatible interaction between a pathogen and its host is one where the pathogen successfully infects and colonises the host. However, incompati- ble interactions exist between some combinations of Pst pathotypes and wheat genotypes. Different mechanisms contribute to the host being able to withstand a pathogen attack. Passively, preformed defence mechanisms include the composition of the waxy layers, the cuticle being the first structural barrier to pathogen invasion. Fur- ther passive defence is put in place by pre-formed antimicrobial proteins and sec- ondary metabolites, including photoanticipins, inhibitors of essential pathogen enzymatic activities, hydrolytic enzymes, lectins, and defensins (Selitrennikoff, CHAPTER 2: WHEAT RUSTS 23 2001; Egorov et al., 2005; Coram et al., 2008). Passive defence is not pathogen specific, in contrast with many active defence mechanisms that are induced by the presence of the pathogen, and can be either specific or non-specific. Both physical and chemical changes are seen. These include the deposition of callose, cell wall cross-linking and the formation of papillae, changes in membrane per- meability, production of reactive oxygen species (ROS), and the synthesis of a whole range of pathogen-related (PR) proteins and secondary metabolites, such as phytoalexins (Malinovsky et al., 2014). 2.3.1 Host-pathogen interaction The active plant defence mechanisms require very specific pathogen recogni- tion. For biotrophic pathogens, the current model of host-pathogen interac- tions involves an initial general recognition of a potential pathogen, triggered by plant recognition of conserved pathogen molecular motifs. The conserved pathogen molecular motifs are referred to as pathogen-associated molecular patterns (PAMPs) or microbe-associated molecular patterns (MAMPs), as de- scribed by van der Hoorn and Kamoun (2008). These motifs are recognised by transmembrane pathogen receptor proteins (PRRs). Recognition of pathogen-associated patterns triggers defence responses, which are collectively known as PAMP triggered immunity (PTI; Jones and Dangl, 2006). Many pathogens are able to suppress the defence responses mounted by PTI, leading to successful infection. Proteins, secreted by the pathogen into the plant, facilitate the down-regulation of PTI defence. These proteins, generally consid- ered to be small peptides able to cross membranes, are referred to as effectors (Franceschetti et al., 2017). In addition to down-regulating host defence responses, effectors also have an active role to play in pathogenicity, modifying the plant cellular and molecular environment in such a way that it eventually supports CHAPTER 2: WHEAT RUSTS 24 pathogen growth and reproduction (Rovenich et al., 2014). The second stage of the host-pathogen interaction involves specific recogni- tion by the plant of specific pathogen effector molecules (Jones and Dangl, 2006). This second layer of defence is referred to as effector-triggered immunity (ETI). This involves recognition of specific pathogen effectors, now termed an aviru- lence (Avr) factor, by a receptor protein in the plant termed an R gene (Dangl and Jones, 2001; van der Hoorn and Kamoun, 2008). Alternative models of indirect associations, referred to as the guard and decoy models, have been described (van der Hoorn and Kamoun, 2008). The direct relationship between R genes and their corresponding Avr genes is known as the gene-for-gene concept described by Flor (1956). This specific plant-isolate recognition enables the plant to trigger a stronger defence response that restricts pathogen growth and reproduction, with the strength of resistance differing with each R gene/Avr combination. ETI is a specific host-pathogen interaction, depending on the presence of the R gene in the plant genotype and the presence of the corresponding avirulence factor in the pathogen isolate. The R gene/Avr interaction usually results in death of the infected plant cell (and possibly also surrounding plant cells) in a reaction known as hypersensitive cell death (HCD; Jones and Dangl, 2006). The pathogen can evade R gene recognition by selection of mutations within the avirulence effector factor that break the R gene/Avr interaction (Dodds and Rathjen, 2010). When this occurs, the aviruelence factor is subsequently referred to as a virulence factor. The continuing cycle of the pathotype-specific R gene/Avr interaction breakdown is known in wheat disease breeding as the Boom-and-Bust cycle (Knott, 1989; McDonald, 2004) and is one of the reasons why wheat breeders are interested in characterising and using rust resistance genes that do not fit the R gene/Avr model (see Section 2.3.2). Five classes of proteins encoded by plant R genes have been modelled, and are illustrated in Figure 2.8. The biggest class characteristically encode for nucleotide- CHAPTER 2: WHEAT RUSTS 25 LRR CC TIR NB NB • Cf-2 Kin Kin CC • Cf-4 • Cf-5 • Pto • Xa21 LRR LRR • Cf-9 • FLS2 RPW8 NB-LRRs Figure 2.8: The five main classes of plant disease resistant proteins (from Dangl and Jones, 2001). Cytoplasmic nucleotide-binding site leucine-rich repeat proteins are typically not membrane-associated and represent the largest class of resistance proteins. Cf-X and Xa21 typically carry a large transmembrane leucine-rich repeat region. The serine/threonine protein kinase is encoded by the Pto gene, with possible membrane association through the N-terminal myristoylation site. A putative N-terminal signal anchor is carried by the RPW8 gene product. CC, coiled-coil domains; NB, nucleotide-binding site; LRR, leucine-rich repeat; TIR, Toll and Interleukin-1 receptor type region; Kin, kinase binding site (NBS)-leucine-rich repeat (LRR) proteins (NB-LRR; Kolmer, 2005). NB-LRRs are thought to be cytoplasmic and in contrast with the other four classes of R proteins, Xa21 and Cf-X proteins contain transmembrane and extracellular LRR domains, while the Pto gene product is membrane-associated with a cyto- plasmic kinase. The RPW8 protein has a putative signal anchor at the N-terminus (Dangl and Jones, 2001). The expression of R gene resistance is usually qualitative and expressed at all wheat growth stages (Dangl and Jones, 2001). The profile of R gene/Avr interactions, tested on a set of wheat lines with known resistance, defines the CHAPTER 2: WHEAT RUSTS 26 pathotype of any given Pst isolate. “Yr”, followed by a number, designate genes that confer resistance to stripe rust (McIntosh, 1983). 2.3.2 Other sources of resistance Other forms of stripe rust resistance have been characterised that are not pathotype- specific. These forms of resistance have remained effective to all Pst isolates tested and therefore are termed pathotype-non-specific resistance (Van der Plank, 1968). These forms of resistance are usually quantitative, being partial in effect, ex- pressed more strongly in mature wheat tissues and is therefore also termed adult plant resistance (Simmonds, 1991; Parlevliet, 2002; Mallard et al., 2005). It can fur- ther reduce the rate of disease progress, called slow-rusting, partial, or horizontal resistance (Van der Plank, 1968). 2.4 The Pst genome 2.4.1 Genomic variation When point mutations occur in genes, it can change an amino acid which in turn can change the functionality and stability of the protein. If this has a no- table impact on the phenotype, it will change the way in which the organism interacts with its environment. Such a change will be under selection to either eliminate it from the population or increase the frequency, depending on the impact of the change on the reproductive ability of individuals with the particular polymorphism. The variation in Pst Avr genes has been evaluated for many years. Pathotype (race) profiling is widely deployed and extremely informative. It has been prac- tised for about 100 years (Thach et al., 2015) and changes in pathotype profiles mostly support a clonal lineage for Pst. The ability to genotype isolates to support CHAPTER 2: WHEAT RUSTS 27 pathotypes was a major addition to the development of rust population studies. In the last 30 years the development of molecular markers, which have been used to track global movement, has supported the hypotheses of these clonal popula- tion structures. Deployment of molecular marker technologies for Pst genotyping have included AFLP markers (Steele et al., 2001; Brown and Hovmøller, 2002; Hovmøller et al., 2008; Mboup et al., 2009), and more recently microsatellite markers (Mboup et al., 2009; Ali et al., 2014; Visser et al., 2016; Walter et al., 2016) and sequence-characterised amplified region (SCAR) markers (Walter et al., 2016) were implemented. The so-called “genomics era” provides an even higher resolution view of the diversity within and between Pst populations. The development of high throughput sequencing techniques provides the opportunity to answer many more questions about the ongoing evolutionary processes in Pst and just how far this airborne pathogen can travel. 2.4.2 Rust genomics Stem rust was the first of the wheat rusts to be sequenced, followed by leaf and stripe rust. The Fungal Genome Initiative at the Broad Institute of Massachusetts Institute of Technology and Harvard University was instrumental in sequencing all three wheat rusts. Genomic research in Pst saw fast development as the international community has published a high number of Pst next-generation sequencing datasets. Some of these resources have been applied specifically to develop represen- tative draft reference sequences of Pst pathotypes from distinct pathotypes and geographical areas. These are summarised in Table 2.1 and include the North American isolates, PST130 (Cantu et al., 2011) and PST-78 (Cuomo et al., 2017), the Chinese isolate, CY32 (Zheng et al., 2013) and the Indian isolates 46S 119 (Kiran CHAPTER 2: WHEAT RUSTS 28 et al., 2017) and 38S102 (Aggarwal et al., unpublished). The Australian founder pathotype, Pst 104E137A-, has recently been assembled using a combination of next-generation Illumina sequencing and third generation sequencing, alterna- tively termed long read sequencing, on the PacBio platform (Schwessinger et al., 2018). Deployment of such advances in sequencing technology enables compari- son of the dikaryotic nuclei in Pst to investigate the evolutionary machinery used to drive the development of new Pst variation. Puccinia graminis f. sp. tritici The first rust reference genome and, to date, only Pgt reference, was sequenced from the pathotype CRL 75-36-700-3 (Duplessis et al., 2011). The project was led by the Szabo group at the USDA-ARS Cereal Disease Laboratory, University of Minnesota, USA. In 2007 the first 7.88× draft of the genome sequence assembly was released. It was updated in 2010 with a mitochondrial assembly and its accompanying annotation data, and finally in 2011 with an RNA-Seq based annotation. In addition to sequencing the genome, the shotgun fosmid library was used to prepare a physical fingerprint map. To investigate gene expression at various stages of Pgt development, complementary DNA (cDNA) libraries were constructed for such tissues. The estimated genome size of Pgt is 80 mega base pairs (Mbp). The outbreak of the highly virulent Pgt pathotype, Ug99, prompted this research, resulting in the development of many useful markers for pathotype-diagnostic tests since (Godfrey et al., 2010). Puccinia triticina Genome sequencing of the Pt isolate 1-1 was done using Fosmid-end and bacterial artificial chromosome end (BAC-end) libraries and a hybrid of 454 and Applied Biosystems Integrated (ABI) sequencing technologies, also known as Sanger 29 Table 2.1: Whole genome sequencing projects using next- and third-generation sequencing. Genomes that were proposed as reference sequences are listed exclusively. Various methodologies have been used for library construction, sequencing and assembly, with varying results. These assemblies are invaluable tools that can be used to reveal genome characteristics of the three wheat rusts (adapted from Kang, 2017 including Cantu et al., 2011, 2013; Cuomo et al., 2017; Schwessinger et al., 2018) Wheat rust Isolate Genome Size Protein coding Secreted No. of contigs* Sequencingpathogen (Mbp) genes proteins or scaffolds % TE technology Illumina Genome Analyzer II P. striiformis PST130 Φ64.8 18 149 1 088 *22 815 ∆17.8 sequencing Fosmid-to-fosmid strategy by P. striiformis CYR32 110.0 25 288 2 092 12 833 48.9 Illumina GA paired-end sequencing Roche 454 FLX and Illumina P. striiformis PST-78 117.3 19 542 2 146 9 716 31.5 fosmid-end sequencing P. striiformis 38S102 75.6 – – 996 – Illumina NextSeq 500 P. striiformis Pst-104E 79.8 15 303 – 996 53.7 PacBio RSII Roche 454 FLX and Sanger P. triticina 1-1 135.3 14 880 1 358 14 820 50.9 fosmid-end and BAC-end sequencing Sanger sequencing P. graminis CRL 75-36-700-3 88.6 15 800 1 106 392 36.5 whole-genome shotgun strategy Φ, 60 % of genome; TE, Transposable and repetitive elements; ∆, Only transposable elements; *, indicate number of contigs if present, otherwise number of scaffolds; –, not available; BAC, bacterial artificial chromosome CHAPTER 2: WHEAT RUSTS 30 sequencing (Cuomo et al., 2017). Considerable advances in characterising Pt genes and genomic variation was enabled through the assemblies of two more genomes—the virulent pathotype, Race77, and an older avirulent pathotype, Race106 (Kiran et al., 2016). Puccinia striiformis f. sp. tritici A number of draft sequences are now available for Pst. The PST130 isolate was first identified in Oregon and Washington, USA, in 2007 (Chen et al., 2010). The isolate was chosen to be sequenced for technical reasons and not because it was biologically specifically interesting. Subsequent to genome assembly the PST130 genome has been continually investigated in the research group of Dr Diane Saunders (JIC, UK). PST130 was used as reference genome in the present study as the candidates association with this research group allowed building on and making direct comparisons with previous work in the group. CYR32 was sequenced as it was a highly prominent pathotype in China. This work confirmed and further emphasised previous reports of high heterozygosity between the two nuclei as a fosmid-to-fosmid sequencing strategy was applied (Zheng et al., 2013). PST-78 was chosen to represent the Pst pathotypes virulent to Yr8 and Yr9 that were first identified in 2000 (Cuomo et al., 2017). The isolate was collected from the US Great Plains. Incorporating many sequencing platforms, this multi-approach resulted in a high quality genome. Gene annotation was done using transcriptome sequence data and de novo gene prediction (Cuomo et al., 2017). The initial approximately 81× cover assembly of PST-78 was released in 2012, with the RNA-Seq-based annotation containing 19 542 genes. The first genome from an Indian Pst isolate was published in 2017 (Kiran et al., 2017). The pathotype 46S 119 has virulence to Yr9 and emerged and recently spread into the north-western plains of India. The 38S102 pathotype was first isolated from the CHAPTER 2: WHEAT RUSTS 31 Neelgiri Hills in India in 1973 and also has avirulence to Yr9 (Aggarwal et al., unpublished). These isolates are interesting as many wheat varieties in the north- west of India are protected by the Yr9 resistance gene (Kiran et al., 2017). The long read assembly of the Australian pathotype, Pst 104E137A- (Schwessinger et al., 2018), refined earlier conclusions on genetic diversity that were drawn from short read assessments. 2.4.3 Challenges in bioinformatics All rust genome sequencing projects have used urediniospores, the major spore stage on wheat. The two nuclei of the dikaryotic urediniospore have been shown to be highly heterozygous (Zheng et al., 2013). A large portion of all genomes was repetitive content and transposable elements. The PST130 genome reference, with 18 % transposable elements, was estimated to include only about 60 % of the genome, although assembly of 95 % of the reads was possible. Highly similar repetitive sequences would be assembled in common contigs, and it was esti- mated that repetitive content that was misassembled could add an additional 10.6 Mbp to the genome size (Cantu et al., 2011). These repetitive sequences and high density of transposable elements impede the principles assemblers use to reconstruct a genome (Duplessis et al., 2011; Castanera et al., 2016). Haplotype-phased genomes address this problem to some extent. The first phased Pst sequencing effort, (Schwessinger et al., 2018), using long-read DNA sequencing technology, demonstrated the nucleotide and structural differences between the two haploid nuclei. It is expected that single consensus sequences, as generated for all former Pst genome sequencing experiments, would be subop- timal in their description of genome diversity and structure. CHAPTER 2: WHEAT RUSTS 32 2.4.4 Effector identification After assembly and gene annotation, the focus for plant pathogen research is shifted to effector coding gene identification. Investigation of effector proteins is crucial as these proteins are utilised by pathogens to alter biological and metabolic processes in the host (Kamoun, 2007). Resources developed by earlier studies, as the development of cDNA and expressed sequence tag (EST) libraries (Ling et al., 2007; Zhang et al., 2008), and existing knowledge of known effector characteristics of other pathogens, provide resources for the development of bioinformatic pipelines. Using computational methods and gene discovery algorithms, these pipelines facilitate rapid effector gene identification. High throughput sequencing technologies and bioinformatics further relief the challenges of studying effectors of obligate biotrophs by providing a platform to investigate complete transcripts (Joly et al., 2010; Hacquard et al., 2011; Saunders et al., 2012). Highly conserved motifs have been useful in identifying effector families, such as the RXLR and LXFLAK motifs in oomycetes (Bozkurt et al., 2012). For Pgt the [YFW]xC motif has been identified by Godfrey et al. (2010). However, the characteristic of many of the rusts to rarely display conserved motifs known from other plant pathogens makes effector prediction challenging (Hacquard et al., 2011; Saunders et al., 2012; Lorrain et al., 2015). This constraint stresses the need for functional validation that remains a limiting factor due to the relatively low throughput of validation systems that can confirm the pathogen effector targets in the host (Petre et al., 2016a). Only a few such targets have been identified in hosts of filamentous plant pathogens, among which the dothideomycete (Figure 2.1) Cladosporium fulvum Cooke causing tomato leaf mold, the rice blast fungus Magnaporthe oryzae, the potato blight fungus Phytophthora infestans (Mont.) de Bary and Ustilago maydis from the class Ustilaginomycetes, causing corn smut (Rovenich et al., 2014). For CHAPTER 2: WHEAT RUSTS 33 Blumeria graminis (DC.) Speer f. sp. hordei, the causal agent of powdery mildew in barley, an ARF-GAP target protein was identified in the host (Rovenich et al., 2014). Adenosine diphosphate (ADP) ribosylation factors (ARF) are important for vesicle trafficking, while its activity is regulated by Guanosine triphosphatase (GTPase) activating proteins (GAPs). The pathogen targets this protein com- plex to interfere with the host’s trafficking of vesicles containing biochemical molecules (Mandiyan et al., 1999). Association of pathogen genes with vesicle trafficking in the host has also been proposed in Pst-wheat interaction using RNA-Seq (Dobon et al., 2016). Genomic resources enabled the use of yeast-two hybrid screens to identify associations between Pst and wheat proteins (Lowe et al., 2011). Non-host model plants were further proposed to characterise effector candidates, specifically Nicotiana benthamiana Domin, as rust fungi hosts are difficult to manipulate with molecular genetic techniques (Petre et al., 2015). This approach has been instrumental in functional characterisation of a number of Pst effectors (Petre et al., 2016a). The authors warn that although the leaf cell environment of N. benthamiana is advantageous for protein interaction screens, compared to expression in yeast, false negatives are common due to differences between N. benthamiana and the host species (Petre et al., 2016b). A combination of the two approaches can be followed (Liu et al., 2016). Other examples of functional validation include transient expression assays and host-induced gene silencing (HIGS) using RNA interference (Yin and Hulbert, 2015; Liu et al., 2016). Recent successes in rust effector identification were achieved with the cloning of the two stem rust effectors, AvrSr35 (Salcedo et al., 2017) and AvrSr50 (Chen et al., 2017). Variation in AvrSr35 and loss of heterozygosity in AvrSr50 resulted in the respective inability of Sr35 and Sr50 to recognise specific isolates of the stem rust fungus, resulting in disease. The methodology that was implemented could be transferable to other rust effector searches and is therefore noteworthy. CHAPTER 2: WHEAT RUSTS 34 Candidates were obtained from comparative transcriptomic analysis between wild type and mutant Pgt isolates. Validation of candidates included a whole host of techniques including microscopy, transient expression in N. benthamiana and N. tabacum and yeast-two-hybrid analyses. Transient expression in wheat made use of transforming constructs into Escherichia coli (Migula) Castellani and Chalmers and Agrobacterium tumefaciens (Smith and Townsend) Conn. strains. Virus-mediated effector expression assays were also performed in wheat using the barley stripe mosaic virus (Lee et al., 2012). The present study is based on advances in Pst bioinformatics regarding Pst next-generation sequencing and gene and effector annotations. Annotation pro- cedures considered knowledge of the life history, molecular mechanisms, and complementing computational biology resources in Pst and related filamentous plant pathogens. Together, these techniques enabled the identification of genes likely involved in distinct virulence profiles of South African Pst pathotypes. Ad- ditional functional validation methods discussed in this review would add value in future studies to further investigate the identified candidate effector genes. Furthermore, genomic and transcriptomic Pst resources allowed predictions to be made regarding the relatedness of different Pst isolates to one another, based on genetic proximity when single nucleotide polymorphisms (SNPs) were evaluated in population analyses. This provided valuable insights into the global preva- lence of specific genetic groups to better understand their potential movement and the risks it may involve. Chapter 3 General Materials and Methods 3.1 Preparation and collection of materials 3.1.1 Inoculation THE FOLLOWING STANDARD Pst inoculation protocol, developed and performed at the University of the Free State (UFS), South Africa, was performed to obtain urediniospores for genomic DNA (gDNA) extraction used for next-generation sequencing (NGS) and total RNA extraction of infected tissue used for analyses of gene expression through RT-qPCR. For multiplication of urediniospores for sequencing purposes (Chapter 4), as well as the time course (Chapter 6), and infection assays (Chapter 7), the wheat variety Morocco was used as a susceptible host. The time course itself was performed on Avocet S (susceptible), and the infection assay varieties are listed in Chapter 7. Seedlings were grown for seven days until two unfolded leaves developed (Zadoks growth stage 12 (Z12); Zadoks et al., 1974). For initial multi- plication, urediniospores, previously dried on silica gel and stored at −80 ◦C were suspended in Soltrol® 130 Isoparaffinic Solvent oil (Chevron Phillips Chemical Company, USA), at 5 mg/ml, upon retrieval from the freezer. Several rounds of multiplication were performed for the sequencing experiment (see Chapter 4). 35 CHAPTER 3: GENERAL MATERIALS AND METHODS 36 Inoculations of the time course and the infection assays were done with fresh spores harvested from initial multiplication. Seedlings of seven-day-old wheat (Z12), grown in Mikskaar Professional Potting Soil 70 (Mikskaar, Estonia) in 10 cm diameter plastic pots, were lightly sprayed with the spore-oil suspension. Inoculated plants were dried in a growth cabinet at 25 ◦C for about 45 minutes. Custom-made incubation chambers (755× 500× 300 mm) made from galvanised metal sheeting, with a 30 mm raised grid at the bottom, were filled with hot tap water to just below the grid level. Seedlings were then placed on the grid, and the chambers were immediately sealed to capture maximum water vapour and maintain saturated conditions. The chambers were housed in a cold room at 11 ◦C, after which plants were incubated for 24 hours at 11 ◦C, in total darkness. These conditions simulate high atmospheric moisture levels and low tempera- tures resulting in dew formation, usually during night time, in natural conditions. Next, inoculated plants were transferred to a growth chamber at 17 ◦C for 1.5 days, with a 14 hour day and 10 hour night cycle. Daylight was simulated with a light intensity of 200 µmol/(m2 s). Plants were then moved to a glasshouse with natural light and a day-night temperature cycle set to 20 ◦C (06:00–18:00) and 15 ◦C (18:00–06:00), respectively. 3.1.2 Protocol for sampling infected wheat tissue Infected wheat leaf samples that were used for RNA-Seq discussed in Chapter 7 were collected in wheat fields in South Africa. For every sample, an area of approximately 20 mm of the leaf covered in Pst pustules was cut into small segments of roughly 7 mm and placed in a 5 ml tube with RNAlater® solution (Thermo Fisher Scientific, USA), immediately after sampling from the wheat plant. RNAlater® was used to preserve RNA integrity as advised by Taylor et al. (2010). The same procedure was used to collect material from the time course for gene expression analysis (Chapter 7). CHAPTER 3: GENERAL MATERIALS AND METHODS 37 3.2 Nucleic acid extraction and quantification 3.2.1 Genomic DNA extraction Genomic DNA was extracted from urediniospores using the cetyltrimethylammo- nium bromide (CTAB) extraction method of Chen et al. (1993). Beforehand, CTAB was heated to 65 ◦C, and 70 % ethanol was prepared and chilled at −20 ◦C. Spores were frozen using liquid nitrogen and ground using a pestle and mortar. Silicon dioxide (SiO2; Sigma-Aldrich, USA) was used to aid in tissue disruption, using 100 mg of spores with 600 mg of sand. The disrupted material was transferred to a 15 ml Falcon tube. In a separate tube, 2 ml of pre-warmed CTAB buffer was added to 5 µl Proteinase K 10 mg/ml), mixed, and incubated at 65 ◦C for 2 hours. After incubation, 1 volume of chloroform:isoamylalcohol (24:1, v/v) was added to the previous mixture and vigorously mixed followed by centrifugation at 12 000 g for 10 minutes. The aqueous, upper phase was transferred to a fresh tube, and 20 µl of RNaseB 10 mg/ml was added after which samples were incubated at room temperature ( 20 ◦C) for 1 hour. The chloroform step was repeated and the supernatant transferred to a fresh tube again. Pre-chilled isopropanol was added (1 volume), followed by gentle inversion to precipitate the gDNA. Samples were incubated at −20 ◦C overnight. The next day, samples were centrifuged at 12 000 g for 10 minutes. The pellet was washed in 1 ml to 2 ml of the pre-chilled 70 % ethanol. The ethanol was decanted without disturbing the pellet, which was subsequently allowed to dry at room temperature ( 20 ◦C) and dissolved in 50 µl 1 % TE buffer [10 mM Tris-Cl (pH 8.0); 1 mM Ethylenediaminetetraacetic acid (EDTA) (pH 8.0)]. 3.2.2 RNA extraction Total RNA was extracted from Pst inoculated leaf tissue, non-inoculated wheat and germinated fungal spores using the RNeasy Plant Mini Kit (Qiagen, Ger- CHAPTER 3: GENERAL MATERIALS AND METHODS 38 many) according to the manufacturer’s instructions. Tissue was disrupted with a pestle and mortar. To promote tissue disruption, SiO2 was added to the mortar. All instruments used, including the mortar and pestle and the spatula used to scrape the homogenised tissue from the mortar, were washed with detergent, ethanol, and RNase AWAY Decontamination Reagent (Thermo Fisher Scientific, USA) between extractions. All instruments were cooled in liquid nitrogen or on dry ice to prevent degradation of RNA due to ubiquitous RNase activity (Holland et al., 2003). The dry mortar and pestle were placed on dry ice in a polystyrene box, and further cooled with liquid nitrogen. Approximately 100 mg SiO2 was added to the mortar with the liquid nitrogen before the leaf sample was added. Forceps were used to move the preserved sample material from the tubes to a clean paper towel, where samples were tapped dry to prevent the RNAlater solution from forming ice crystals when the sample came into contact with the liquid nitrogen. Samples were then placed in the mortar with liquid nitrogen and SiO2, followed by homogenisation of the sample into a fine powder. The ground sample was scraped with a cooled spatula into a 2.2 ml safe-lock microcentrifuge tube without allowing it to thaw. The tube with the ground sample was kept on dry ice until extraction buffer was added. The procedure was concluded followed the optional step in the protocol. To prevent degradation RNase inhibitor (0.5 µl) was added to each sample. Aliquots of 3 µl were prepared for RNA quantification and quality control. Extracted RNA samples were stored at −80 ◦C. 3.2.3 DNA and RNA quantification Extracted gDNA was quantified using the Qubit 2.0 Fluorometer (Invitrogen/ Thermo Fisher Scientific, USA). The rationale behind the method is that it detects dyes that only fluoresce when bound to a specific substrate, in this case, double CHAPTER 3: GENERAL MATERIALS AND METHODS 39 stranded (ds) DNA. The intensity of the fluorescence is indicative of the amount of dsDNA in the sample (Simbolo et al., 2013). Assays were performed at room temperature ( 20 ◦C) as recommended. The instrument was calibrated with the Quant-iT dsDNA BR Assay according to the manufacturer’s instructions, and DNA concentrations quantified for all samples. The Agilent 2100 Bioanalyzer (Agilent Technologies, USA) was used to assess the quality and quantity of the extracted RNA. The reaction kit was stored at 4 ◦C. A gel-dye mix was first prepared according to the manufacturer’s instruc- tions. The quality of RNA samples was assessed within one to three days after preparation and RNA was converted into cDNA within one to three days after an aliquot passed the quality assessment. Aliquoting prevented multiple freezing and thawing cycles, as this imposes a risk of degradation of RNA (Taylor et al., 2010). RNA stocks were stored at −80 ◦C between extraction and being used for cDNA synthesis. 3.3 Next-generation sequencing and data analysis 3.3.1 Library preparation A sequencing library was prepared from raw extracted nucleic acids. DNA fragmentation was followed by size selection and the addition of oligonucleotide adapters to fragments, for the sequencer to process the library. 3.3.2 Genomic DNA sequencing Libraries for gDNA sequencing were prepared by the Earlham Institute, UK, us- ing the Illumina TruSeq DNA Sample Preparation Kit (Illumina, UK), according to the manufacturer’s instructions. To assess library quality before sequencing, a High Sensitivity DNA analysis assay was performed on the Agilent 2100 Bioana- CHAPTER 3: GENERAL MATERIALS AND METHODS 40 lyzer. Quantification of libraries was conducted with the Qubit 2.0 Fluorometer. One lane of the Illumina flow cell was used for a pool of 10 libraries diluted to a concentration of 12.71 nM. Sequencing was performed on the Illumina HiSeq 2500 platform at the Earlham Institute, UK, where after adapter and multiplexing barcode oligonucleotide sequences were removed. Upon receipt of the data, read quality was assessed using FastQC software (version 0.10.1; Andrews, 2010). 3.3.3 RNA sequencing Sequencing of messenger RNA (mRNA) extracted from Pst infected wheat sam- ples was performed at Earlham Institute, UK. The mRNA was reversed tran- scribed to cDNA. Sequencing libraries were prepared using the Illumina TruSeq RNA Sample Preparation Kit (Illumina, UK). The RNA 6000 Nano kit was used to assess the library quality on the Agilent 2100 Bioanalyzer. Libraries were sequenced using the Illumina HiSeq 2500 platform, and adapters and barcodes were removed from the resulting sequences. 3.3.4 Bioinformatics pipeline Mapping of gDNA samples The 100 bp Illumina paired end reads were filtered using a Perl script to discard reads containing N calls where nucleotides could not be determined by the sequencer (Cantu et al., 2013; Hubbard et al., 2015). After filtering, each gDNA sample was independently aligned to the PST130 reference genome (Cantu et al., 2011) implementing Burrows-Wheeler Alignment tool (BWA version 0.7.7; Li and Durbin, 2009) with parameters set to the default setting. CHAPTER 3: GENERAL MATERIALS AND METHODS 41 Mapping of cDNA (RNA-Seq) samples Similar to the gDNA samples, the 100 bp Illumina paired end reads were filtered to discard reads containing nucleotides that could not be determined by the sequencer (Cantu et al., 2013; Hubbard et al., 2015). The alignment of cDNA samples was carried out using the Bowtie alignment program (version 0.12.7; Langmead et al., 2009) from the TopHat package (version 1.3.2; Trapnell et al., 2012), again aligning to the PST130 reference genome (Cantu et al., 2011), using the parameter –r set to 200 to accommodate the mate pair sequences with 50 bp ends. Identifying single nucleotide polymorphisms Resulting sequence alignment map (SAM) format files from the gDNA and RNA-Seq mapping, were converted to binary alignment map (BAM) format with the software package SAMtools (version 0.1.19; Li et al., 2009). SAMtools sort, SAMtools index and SAMtools mpileup were used to identify SNPs. Custom Perl scripts were used to extract allele counts at each position of the genome. A depth of coverage threshold was set for polymorphic sites, and gDNA SNPs with a minimum depth of coverage of 10× were extracted, while for RNA-Seq data minimum depth coverage of 20× were required. Allele frequencies between 0.2 and 0.8 were classified as heterokaryotic sites, whereas sites with allelic frequencies above 0.8 were classified as homokaryotic sites (Cantu et al., 2013). SnpEff (version 3.6; Cingolani et al., 2012) was used to annotate polymorphisms, to indicate whether they resulted in synonymous or nonsynonymous substitutions, or whether a stop codon was gained or lost in coding regions. SnpEff further displayed the codon position of polymorphisms. Polymorphisms in intergenic regions were also indicated. CHAPTER 3: GENERAL MATERIALS AND METHODS 42 Quality assessment of samples through sequence data Each of the two haploid nuclei in the dikaryotic urediniospore is assumed to contribute a maximum of one allele to each nucleotide site. A variant site is de- scribed as a homokaryotic SNP when both alleles are identical, but different from the reference PST130 nucleotide. A heterokaryotic SNP describes the situation where two different alleles occur at the nucleotide site. These alleles may both be different from the reference, or only one, while the other would be identical to the reference (Hubbard et al., 2015). To ensure that the genomic data was in each case derived from a single genotype, the allelic distribution at heterokaryotic sites was assessed across the genome. It is important to note that the reference genome is not phased. Implications are discussed in Chapter 5. When a single genotype is present, it is expected that the frequency plot, exhibiting both alleles at the heterokaryotic SNP sites, will form a distribution with a mode of 0.5 due to the equal contribution of both nuclei (Yoshida et al., 2013). In this analysis, the number of heterokaryotic SNP sites were plotted on the y-axis, and the proportion of alleles across reads at each site, ranging between 0 and 1, on the x-axis, as explained in the supplementary documents of Cantu et al. (2013). Read frequency graphs of isolates unique to the current study are summarised in Appendices A and D. 3.3.5 Clustering analysis Clustering analyses are grouping algorithms that operate in such a way that individuals placed in the same group are more similar compared to individuals in other groups. Genomic and transcriptomic data were used for phylogenetic clustering and population cluster analyses. As the transcriptomic data does not include intergenic regions, only the coding regions of the gDNA samples CHAPTER 3: GENERAL MATERIALS AND METHODS 43 were considered in this analysis. Brief descriptions of the different underlying statistical and genetic models deployed in the analyses follow in the next sections. Phylogenetic analysis A “Randomized Axelerated Maximum Likelihood” (RAxML) phylogenetic ap- proach was used to determine the genetic relationships between South African Pst and to compare them to Pst isolates from other countries. First a subset of sites in each gene in the PST130 gene models was used to construct synthetic genes. Sites identical to the PST130 reference genome were only included when a minimum of 2× depth of coverage was reached. Variant sites were included when coverage depths of 10× for gDNA samples or 20× for cDNA samples were reached. Introducing placeholders at sites where the required depth of coverage was not achieved preserved codon positions. Then a phylip file was prepared as input to RAxML software (version 8.0.20; Stamatakis, 2014) to construct the phylogenetic tree. Accurate nucleotide substitution models are required in most phylogenetic analyses as the rate of nucleotide substitution varies in molecular evolution (Jia et al., 2014). To account for the fact that all sites do not evolve at an identical rate, codon positions and the model used to determine phylogenetic clades were considered. Due to the degeneracy of codons, there is redundancy in the genetic code that can cause the occurrence of synonymous substitutions. Substitutions at the third position are more often synonymous, and therefore less likely to influence the phenotype and be a target for positive or negative selection, than at the first and the second codon position. Nucleotide changes at the third codon positions can, for this reason, be considered to evolve at a higher rate (Rambaut and Grass, 1997). The third codon position further shows less nucleotide bias and a more homogenous rate of evolution when compared to the first and second CHAPTER 3: GENERAL MATERIALS AND METHODS 44 codon position (Bofkin and Goldman, 2006). Nonsynonymous sites are not evolutionary neutral and, depending on the effect of the resulting phenotype, can experience high levels of selection pressure resulting in gene specific evolution. Phylogenetic trees derived from such data can be misleading when convergent evolution of such genes in different popu- lations are present. The phylip input file was therefore prepared containing the third codon positions of synthetic genes to illustrate the evolutionary history of the populations without being influenced by gene specific evolutionary devel- opment. The third codon position of those synthetic genes that had a minimum of 80 % breadth of coverage of the original reference gene length in at least 80 % of isolates were included in the phylogenetic analysis to ensure that only genes with high coverage were included. In addition, the General Time Reversible (GTR) model of nucleotide substi- tution under the Gamma (Γ) model of rate heterogeneity was selected for the RAxML model parameter (–m GTRGAMMA). The GTR model parameters account for unequal frequencies for the four nucleotides and the unique rate of each of the possible six nucleotide substitutions. Furthermore, the Γ model uses a discrete Γ distribution to assign different rates of heterogeneity to different sites (Stamatakis, 2014). Reproducibility was ensured by specifying an initialising value for the pseudo-random number generator (–p 100) and the process was parallelised on 10 threads (–T 10). To demonstrate the reliability of the inferred tree, bootstrap- ping was applied by generating 100 (–N 100) alternative runs on distinct starting trees (–b 12345). Bootstrap values were added to the maximum likelihood tree with the –f b parameter to generate the bipartition tree where after MEGA (ver- sion 6.06; Tamura et al., 2013) was used to visualise the phylogenetic tree (Cantu et al., 2013; Hubbard et al., 2015). CHAPTER 3: GENERAL MATERIALS AND METHODS 45 Population structure analyses Two methods were used to predict population structure: STRUCTURE (version 2.3.4; Pritchard et al., 2000) and Discriminant analysis of principal components (DAPC; Jombart et al., 2010). STRUCTURE is a model-based approach, whereas DAPC does not make any assumptions about the biological processes that influ- enced and shaped the dataset. Both methods have limitations and benefits, and these are discussed in the relevant research chapters. The same depths of cover- age minima as for the phylogenetic tree were required: 10× coverage for gDNA samples and 20× coverage for cDNA samples. The SNP data was prepared using BEDTools (version 2.17.0; Quinlan and Hall, 2010) for variant site annotation in SnpEff. Sites where a synonymous substitution was introduced in at least one iso- late were extracted. These, together with sites identical to the reference with at least 2× coverage, were repositioned according to their position in the reference genome. From these files, a data matrix was generated using a custom python script. The software, STRUCTURE, was used to assign isolates to specific popula- tion groups and to determine the number of these groups, or clusters (K), due to genetic differentiation. For this analysis, nonsynonymous SNPs were excluded as these sites are more likely involved in fitness traits and under selection and STRUCTURE relies on neutral substitution models. Furthermore, different popu- lations could have evolved convergently. Such similarity would falsely deduce that individuals are related. Analyses consisting of five independent runs for each value of K were carried out. The “admixture” model was used, and each run was set to a burn-in period of 110 000 iterations. Thereafter, 200 000 Markov Chain Monte Carlo (MCMC) generations for each value of K, ranging from 1 to 15, were carried out. K values were evaluated in two ways: the Evanno method (Evanno et al., 2005) and by CHAPTER 3: GENERAL MATERIALS AND METHODS 46 calculating the log probability, referred to as LnP(D), of each K value (Pritchard et al., 2000). STRUCTURE assumes a population that is under Hardy-Weinberg equilibrium, and the Pst data does not fit this assumption. Therefore the multi- variate DAPC analysis within the adegenet R software package (Jombart et al., 2010), was carried out on the same dataset used with STRUCTURE. Principal component analysis (PCA) summarised genetic variation in the dataset by re- ducing the dataset to include only the most impactful loci. The lowest Bayesian information criterion (BIC) suggested the optimum number of population clus- ters (K), thereafter discriminant analysis (DA) was used to divide samples into subgroups of population clusters. Differentiation between and within population clusters In a segregated population, individuals that aggregate into a subpopulation tend to interbreed more than what is expected under random mating of the whole population under Hardy-Weinberg equilibrium. When assessing a dataset, groups with low levels of heterozygosity among individuals within groups allow the identification of genetic structure in a global population from which biological interpretations can be made. To quantify the variation between subpopulations, the general reduction in heterozygosity HX is assessed by evaluating the observed heterozygosity Hobs against the expected heterozygosity Hexp using the equation Hexp − HH obsX = . (3.1)Hexp Three specific inbreeding coefficients need consideration to take into account heterozygosity observed in individuals, subpopulations and the whole popula- tion, substituting HX in Eq. (3.1) with HI , HS and HT, respectively. CHAPTER 3: GENERAL MATERIALS AND METHODS 47 Reduction in heterozygosity that is due to the population structure can then be evaluated using the so called “F-statistics” HS − HF IIS = ,HS HT − HFIT = I ,HT H F = T − HS ST ,HT with the relationship 1− F FST = 1− IT1− .FIS The proportion of the genetic variance assigned to the differences between subpopulations, evaluated in Section 4.2.7 and Section 7.3.1, were calculated using GenePop (version 4.2; Rousset, 2008) to estimating Wright’s FST statistic (Hubbard et al., 2015). The FST values varied from zero to one, where zero indicated the absence of differentiation and one complete differentiation (Hartl and Clark, 1998). To assess the genetic diversity within each of the Pst population clusters identified herein, the population diversity parameter theta (θ) was estimated in Section 4.2.7 and Section 7.3.1. Theoretically, θ estimates genetic differentiation amongst subpopulations depending on the number of reproducing individuals in the population and the mutation rate. Different empirical approximations of θ exist. In this study, Watterson’s theta, θ̂W , was reported as it takes into account the number of segregating sites—SNPs in the current case—to estimate the mutation rate of the population. The degree of polymorphism between genes in individuals of a subpopula- tion was calculated using DnaSP (version 5.10.1; Librado and Rozas, 2009) as suggested by Hubbard et al. (2015). Chapter 4 Origin of the South African Pst Pathotypes 4.1 Introduction 4.1.1 Wheat stripe rust in South Africa IN MOST WHEAT CULTIVATION REGIONS globally, Puccinia striiformis f. sp. tritici prevails and is a threat to wheat production (Brown, 2003; Hovmøller et al., 2010; Sharma-Poudyal et al., 2013). Wind dispersal of the asexual urediniospores en- ables Pst to travel thousands of kilometres (Kolmer, 2005; Hovmøller et al., 2008; Ali et al., 2014). Foreign incursions can become established in new geographical regions, completely shifting the pathotype profile of the Pst population in a single season. In addition to wind dispersal, Pst can be transmitted via anthropogenic activities such as human travel. For instance, Wellings et al. (1987) considered that the introduction of Pst into Australia in 1979 could easily have been facilitated by human-assisted movement. With increases in global travel and freight move- ment in recent years, multiple destinations are now within easy reach of many pathogens in a single day (Parker and Gilbert, 2004), regardless of wind dispersal patterns. In South Africa, the first verified identification and characterisation of 48 CHAPTER 4: THE ORIGIN OF SOUTH AFRICAN PST 49 6E22A- 7E22A- 6E22A+ Free State 6E16A- Western Cape Figure 4.1: Locations of the original detections of South African Pst pathotypes. Stripe rust was first detected near Moorreesburg in the Western Cape in 1996. It occurred throughout the wheat breeding regions of the southwestern part of South Africa during the season. The pathotype 6E16A- was designated. New pathotypes (6E22A-, 7E22A- and 6E22A+) observed in following years were first detected in the Eastern Free State and Lesotho. stripe rust was in the Western Cape in 1996 (Figure 4.1; Pretorius et al., 1997), making it a relatively new disease compared to leaf rust and stem rust that were already recorded in the 1700s (Du Plessis, 1933). Subsequent surveys in 1996 confirmed that the disease was well established throughout the winter rainfall regions of the Western, Northern and Eastern Cape (Pretorius et al., 1997). Traces were also found on irrigated wheat in sum- mer rainfall regions. As stripe rust has a lower temperature optimum (Roelfs and Hettel, 1992), the lengthy cool and wet conditions in the Western Cape in 1996 (Figure 4.2), likely contributed to the rapid spread and development of Pst epidemics (Boshoff et al., 2002). The first Pst pathotype was confirmed as pathotype 6E16A- through testing of 32 Pst isolates on 17 standard stripe rust wheat differential lines and seven supplementary tester lines with known resistance genes (Pretorius et al., 1997). CHAPTER 4: THE ORIGIN OF SOUTH AFRICAN PST 50  14  12   10   8    26 24   22  20 18    16  100     80 60 40     20 Month 11 year mean  1996 Figure 4.2: Temperature and rainfall measured in 1996 during April to November in the Western Cape compared to the 11 year mean (from Boshoff et al., 2002). Max. temp., maximum temperatures; Min. temp., minimum temperatures. Rainfall (mm) Max. temp. (°C) Min. temp. (°C) Apr May Jun Jul Aug Sep Oct Nov CHAPTER 4: THE ORIGIN OF SOUTH AFRICAN PST 51 This pathotype was similar to the stripe rust pathotype 6E16 found in the Mediter- ranean region in the 1970s (Wahl et al., 1984). A similar pathotype, 6E16 was also detected in East and North Africa, the Middle East and Western Asia (Stubbs, 1988; Badebo et al., 1990; Pretorius et al., 1997). The “A-” added to the pathotype name of the South African isolate expanded on the notation protocol developed by Johnson et al. (1972) by adding testing for virulence to YrA, as described by Wellings et al. (1988). In 1998 another stripe rust epidemic occurred in South Africa, this time in the Eastern Free State. The wheat varieties Hugenoot and Carina, that were resistant to 6E16A-, were widely and severely affected (Boshoff and Pretorius, 1999). Frequent cases of severe Pst infection were observed, often colonising 100 % of wheat leaves. Virulence tests on an expanded wheat differential set confirmed a virulence gain for Yr25, defining a new pathotype, 6E22A- (Figure 4.3; Boshoff and Pretorius, 1999). Pathotype 6E22 has since been reported in Iran in 2009 and 2010 (Elyasi-Gomari and Petrenkova, 2011). In 2001 yet another new pathotype, 7E22A- (Figure 4.3), was detected on the wheat variety Chinese 166 in trap nurseries in Makobateng, Lesotho (Pretorius et al., 2007). This pathotype contained additional virulence to Yr1, but although Lesotho neighbours the Eastern Free State, an important wheat cultivation area in South Africa, the pathotype was not considered a threat to the South African wheat industry, as Yr1 did not occur in local wheat varieties (Pretorius et al., 2007). In 2005 a fourth new pathotype, 6E22A+ (Figure 4.3), was detected near Clocolan in the Eastern Free State. This pathotype was virulent to YrA, but avirulent to Yr1 (Visser et al., 2016). The phenotypic characterisation of the four Pst pathotypes is indicated in Figure 4.4. CHAPTER 4: THE ORIGIN OF SOUTH AFRICAN PST 52 First detection: 6E16A- (1996) SA 1 Virulent to: Yr2, Yr6, Yr7,Yr8, Yr11, Yr14, Yr17, Yr19 +Yr25 6E22A- (1998) SA 2 +Yr1 +YrA 7E22A- (2001) SA 3 6E22A+ (2005) SA 4 Figure 4.3: Schematic illustration of the increase of Pst virulence in South Africa. Gain of virulence in South African Pst populations, based on traditional pathotype analysis, between 1996 and 2016 (Pretorius et al., 1997; Boshoff et al., 2002; Pretorius et al., 2007; ZA Pretorius, unpublished data). Pathotypes analysed in this study that represent the identified pathotypes were named SA1—SA4. 4.1.2 Pst population diversity Sufficient genetic diversity in a population increases the likelihood that some individuals will have superior fitness in changing environmental conditions (Hartl and Clark, 1998). Due to the stepwise gain in virulence together with molecular evidence (Visser et al., 2016), Pst likely reproduces clonally in South Africa. Factors that can increase genetic diversity in asexual Pst populations are mutations and gene flow, and although not considered to occur frequently, somatic recombination. Newly introduced alleles–that can be slightly deleterious, neutral, or slightly advantageous–can stay in the population just by chance, called genetic drift. When new alleles provide a fitness incentive, positive selection can SA4 SA3 Resistant SA2 Virulent SA1 r1 r10 r11 r14 r15 r17 r19 r2 r25 r27 r3a r4a r4b r5 r6 r7 r8 r9 e v II r d p u Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y A Yr l oYr C rCY rH V M rS rS rS Y Yr Y Y Y Resistance Figure 4.4: Pathotype (race) identification tests of South African Pst pathotypes. Patho- types were defined by compatibility with wheat hosts possessing indicated sources of resistance (data from Visser et al., 2016). Pst pathotype Virulence gain CHAPTER 4: THE ORIGIN OF SOUTH AFRICAN PST 53 fix such alleles in the population, while negative selection will remove deleterious mutations. Such selective evolutionary forces can result in an erosion in genetic diversity, dominating the direction of change in allele frequencies, which can, in turn, be counteracted by balancing and diversifying selection to increase diversity again (Hartl and Clark, 1998). Allele frequencies in a population can be influenced by multiple biotic and abiotic factors. Clustering analyses can be implemented to illustrate the genetic relationship between individuals, define the number of populations, and assign isolates within these populations. This population structure indicates the evolu- tionary history through alleles present in samples (McDonald and Linde, 2002). To quantify the genetic diversity between individuals and populations a wide range of molecular markers have been developed and deployed over the past 37 years (Schlötterer, 2004). 4.1.3 Molecular markers and Pst Molecular markers improved the traceability of Pst considerably, enabling refine- ment of dispersal distance approximations and population dynamics. Population studies based on AFLP molecular markers (Vos et al., 1995) were first applied (Hovmøller et al., 2002). A widely inclusive population study, analysing isolates from North America, Australia, Europe, Western and Central Asia, the Red Sea Area, East Africa and South Africa provided the first genotyping information for the South African pathotypes (Hovmøller et al., 2008). These 876 Pst isolates, collected over a period of 30 years between 1975 and 2005, were pathotyped on a set of 30 wheat differential lines including at least 17 stripe rust resistance genes. A subset containing 151 of the collected isolates, which represented the diversity with respect to virulence phenotypes, region, and sampling year, were then genotyped using AFLP molecular markers (Hovmøller et al., 2008), identify- CHAPTER 4: THE ORIGIN OF SOUTH AFRICAN PST 54 ing presence-absence polymorphisms. This subset contained South African Pst isolates representative of the pathotypes 6E16A-, 6E22A-, and 7E22A- that were sampled between 1996 and 2001. The subset was screened with 117 informative AFLP makers, however, these markers did not show any differentiation between the South African isolates. This analysis indicated that the South African Pst isolates were closely related to isolates detected in Central (sampled in 2003) and Western Asia (sampled in 2005), and Southern Europe (sampled in 1997 and 1998). Differential testing showed that 6E16A- is similar to the pathotype 6E16, also called PstS3 (Hovmøller et al., 2016), that was identified in Southern Europe since 1985 (Enjalbert et al., 2005). In the south of France, a stable divergent subpopu- lation was described using AFLP markers, also comparable to an Italian isolate sampled in 1998 (Enjalbert et al., 2005). Pathotypes similar to the South African pathotypes (Figure 4.3) have also repeatedly been detected in Northern Europe since 2004 (Hovmøller et al., 2008). Ali et al. (2014) concluded similar results using 20 microsatellite markers (Vieira et al., 2016), identifying the Mediterranean re- gion and Central Asia as the probable origin of the South African Pst pathotypes. In these two studies, seven and six South African isolates were used, respectively. Pathotype 6E22A+, detected in South Africa in 2005, was not included in these analyses. Similar to AFLP markers, microsatellite markers reported low levels of genetic diversity in the South African population and could not differentiate between pathotypes. Since 1996 characterisation of the Pst population in South Africa has largely been carried out through traditional pathotype analysis methods (see Figure 7.1). More recently 17 microsatellite markers were used to genetically characterise the South African Pst pathotypes (Visser et al., 2016), confirming previous findings of low genetic variability between pathotypes (Hovmøller et al., 2008; Ali et al., 2014). These markers were however able to distinguish between the South CHAPTER 4: THE ORIGIN OF SOUTH AFRICAN PST 55 African pathotypes. Through network analysis Visser et al. (2016) proposed seven hypothetical intermediates between the four South African pathotypes, indicating a model for the establishment of Pst in South Africa. 4.1.4 Next-generation sequence analyses of South African Pst Along with the cost and time limitations in the development of traditional marker systems such as microsatellites and AFLPs, genotyping samples with traditional marker panels—even with a large marker selection—will only provide a low resolution view of the genetic diversity between samples (Davey et al., 2011). This can be especially problematic when aiming to distinguish between samples with low genetic variability. Next-generation sequencing relieved this limitation of traditional molecular markers by facilitating the limitless identification of markers in a multitude of samples (Davey et al., 2011). As is the case with AFLP markers, another advantage is that no prior knowledge of the target is needed (Naccache et al., 2014). The extensive datasets generated from this technology across species’ genomes, enable searches for diversity at nucleotide level that tra- ditional marker systems will never generate. It allows addressing of population structure questions with a level of detail and improved accuracy that ordinary markers have not achieved. To add to the traditional pathology and marker work carried out on the South African Pst pathotypes whole genome sequencing of four Pst isolates was undertaken. These isolates represent the major pathotypes following the first confirmed incursion of stripe rust into South Africa in 1996. Data from the four representative isolates of the identified South African pathotypes, together with available data from global isolates, were used to (i) re-evaluate the potential origin of the South African pathotypes using a comparative genomics approach and to (ii) assess the genetic diversity within the South African population. The CHAPTER 4: THE ORIGIN OF SOUTH AFRICAN PST 56 South African pathotypes identified between 1996 and 2005 will be referred to as the historical South African population. Specific isolates analysed in this study that represent the identified pathotypes were named SA1—-SA4. 4.2 Materials and methods Work done by co-workers is indicated in the relevant sections. The methodology followed the field pathogenomics approach described in (Hubbard et al., 2015). See Chapter 3 for detailed descriptions. 4.2.1 Data description Four isolates representing the four pathotypes observed in South Africa to date have been sequenced in this study. Hubbard et al. (2015) reported an in-depth analysis of the UK population comparing several UK Pst isolates, collected between 1974 and 2013. A subset of the data used by Hubbard et al. (2015) was included in the present study to draw comparisons between the South African isolates and other available Pst datasets. The UK Pst population in 2013 showed high diversity and differed to the pre-2011 population Hubbard et al. (2015). Population genetic analysis defined this 2013 population into four distinct genetic groups. Notable features of these four groups were that UK Group II was detected on triticale and UK Groups I and II were genetically less diverse compered to Groups III and IV. Sequence data of the South African historical isolates, together with sequence data of 44 other isolates including 32 isolates from Europe (Table 4.1) that were sequenced and described before (Hubbard et al., 2015), five isolates from Pakistan (Bueno-Sancho et al., 2017) and seven isolates from East Africa, including three isolates from Ethiopia, two from Kenya and two from Eritrea, were used in this chapter to determine the relationship of the South African isolates with the CHAPTER 4: THE ORIGIN OF SOUTH AFRICAN PST 57 available data from other wheat-growing areas where stripe rust occurs. The East African isolates were obtained from Mogens Hovmøller. The isolate ET03b/10, that was assigned to the pathotype group PstS2, and ET08/10 were included in previous analysis by Ali et al. (2017). Isolates KE74217, KE89069 (V23) and ET87094 are part of the Stubbs collection and were described by Thach et al. (2015, 2016). 4.2.2 Sample preparation for DNA extraction The urediniospores used for extraction of gDNA were purified and multiplied at UFS, South Africa. The isolates that were sequenced were representative of the identified pathotypes. Table 4.2 lists the UFS stocks collection identities and the collection date of the Pst isolates that were used for multiplication of the urediniospore samples that were sequenced. To obtain single pustule isolates for genome sequencing, seeds of the suscep- tible wheat variety, Morocco, were planted and grown for seven days to the two leaf stage (Z12; Zadoks et al., 1974). Urediniospores of the four pathotypes were previously dried on silica gel and kept at −80 ◦C in storage. Inoculations were performed where after plants were moved to a glasshouse with natural light and a day—night temperature cycle set to 20 ◦C (06:00-18:00) and 15 ◦C (18:00-06:00), respectively. When flecks appeared, all plants were cut away to leave only half a leaf with a single infection site, the result of infection by a single spore. Due to the systemic nature of the infection, the entire leaf segment eventually sporulated from the single infection site. For each isolate urediniospores were collected from one actively sporulating lesion and increased twice on Morocco seedlings to produce several grams of spores. The final spore harvest was desiccated for five days on silica gel and used to extract the DNA for sequencing. To maintain isolate purity, multiplication of the different isolates were spatially or temporally separated in the glasshouse. CHAPTER 4: THE ORIGIN OF SOUTH AFRICAN PST 58 Table 4.1: Global isolates included in the clustering and genetic diversity analyses Isolate Isolates Country of Year of Type ofnumber isolation isolation data References 1 88.55S1 UK Pre 2011 gDNA Hubbard et al. (2015) 2 03/7 UK Pre 2011 gDNA Hubbard et al. (2015) 3 08/21 UK Pre 2011 gDNA Hubbard et al. (2015) 4 88.45SS UK Pre 2011 gDNA Hubbard et al. (2015) 5 78.66SS1 UK Pre 2011 gDNA Hubbard et al. (2015) 6 88.44SS3 UK Pre 2011 gDNA Hubbard et al. (2015) 7 J0085F France Pre 2011 gDNA Hubbard et al. (2015) 8 J01144Bm1 France Pre 2011 gDNA Hubbard et al. (2015) 9 J02-022 France Pre 2011 gDNA Hubbard et al. (2015) 10 J02055C France Pre 2011 gDNA Hubbard et al. (2015) 11 11/13 UK 2011 gDNA Hubbard et al. (2015) 12 11/75 UK 2011 gDNA DGO Saunders & S Holdgate 13 11/128 UK 2011 gDNA Hubbard et al. (2015) 14 11/140 UK 2011 gDNA Hubbard et al. (2015) 15 11/08 UK 2011 gDNA Hubbard et al. (2015) 16 11/08 UK 2011 RNA-Seq Hubbard et al. (2015) 17 13/19 UK 2013 RNA-Seq Hubbard et al. (2015) 18 13/15 UK 2013 RNA-Seq Hubbard et al. (2015) 19 13/123 UK 2013 RNA-Seq Hubbard et al. (2015) 20 13/27 UK 2013 RNA-Seq Hubbard et al. (2015) 21 CL1 UK 2013 RNA-Seq Hubbard et al. (2015) 22 T13/2 UK 2013 RNA-Seq Hubbard et al. (2015) 23 T13/3 UK 2013 RNA-Seq Hubbard et al. (2015) 24 T13/1 UK 2013 RNA-Seq Hubbard et al. (2015) 25 13/38 UK 2013 RNA-Seq Hubbard et al. (2015) 26 13/21 UK 2013 RNA-Seq Hubbard et al. (2015) 27 13/33 UK 2013 RNA-Seq Hubbard et al. (2015) 28 13/182 UK 2013 RNA-Seq Hubbard et al. (2015) 29 13/25 UK 2013 RNA-Seq Hubbard et al. (2015) 30 13/29 UK 2013 RNA-Seq Hubbard et al. (2015) 31 13/71 UK 2013 RNA-Seq Hubbard et al. (2015) 32 13/40 UK 2013 RNA-Seq Hubbard et al. (2015) 33 SA1 SA 1996 gDNA Pretorius et al. (1997) 34 SA2 SA 1998 gDNA Boshoff and Pretorius, (1999) 35 SA3 SA 2001 gDNA Pretorius et al. (2007) 36 SA4 SA 2005 gDNA Pretorius, (Unpublished) 37 KE74217 Kenya 1974 gDNA Thach et al. (2015; 2016)* 38 KE89069 Kenya 1989 gDNA Thach et al. (2015; 2016)* 39 ET87094 Ethiopia 1987 gDNA Thach et al. (2015; 2016)* 40 ET08/10 Ethiopia 2010 gDNA Ali et al. (2017)** 41 ET03b/10 Ethiopia 2010 gDNA Ali et al. (2017)** 42 ER179b/11 Eritrea 2011 gDNA Ali et al. (2017)** 43 ER181a/11 Eritrea 2011 gDNA Ali et al. (2017)** 44 Qld-1 Pakistan 2014 gDNA Bueno-Sancho et al. (2017) 45 Qld-2 Pakistan 2014 gDNA Bueno-Sancho et al. (2017) 46 ATR-1 Pakistan 2014 gDNA Bueno-Sancho et al. (2017) 47 ATR-2 Pakistan 2014 gDNA Bueno-Sancho et al. (2017) 48 ATR-3 Pakistan 2014 gDNA Bueno-Sancho et al. (2017) *Isolates KE74217, KE89069, and ET87094 were provided by Aarhus University, Denmark, and Plant Research International, Wageningen, The Netherlands, maintaining the Global Yellow Rust Gene Bank of the late ir. RW Stubbs up to 25-01-2010. ** Provided by MS Hovmøller. Personal communication with MS Hovmøller confirmed inclusion in the listed studies. CHAPTER 4: THE ORIGIN OF SOUTH AFRICAN PST 59 Table 4.2: Historical isolates used in re-sequencing and an infection time course experi- ment (Chapter 6) Pathotype First occurrence Alias Isolate ID Collection date 6E16A- 1996 SA1 Isolate 49 2003 6E22A- 1998 SA2 Isolate 3 2001 7E22A- 2001 SA3 Isolate 27 2004 6E22A+ 2005 SA4 Isolate 35 2011 4.2.3 Genomic DNA extraction and quantification Genomic DNA was extracted from urediniospores using the CTAB extraction method described by Chen et al. (1993) and quantified using the Qubit 2.0 Fluo- rometer (Invitrogen/Thermo Fisher Scientific, USA). 4.2.4 Sequencing and mapping Sequencing libraries were prepared, quality assessed, quantified and sequenced by the Earlham Institute. Sequences containing missing data indicated with “N” were discarded (Cantu et al., 2013; Hubbard et al., 2015). The 100 bp paired end reads were aligned to the PST130 draft reference genome (Cantu et al., 2011) using BWA (version 0.7.7; Li and Durbin, 2009) with default parameters producing sequence alignment map (SAM) format files. SAMtools (version 0.1.19; Li et al., 2009) was used, to identify variant sites. SnpEff (version 3.6; Cingolani et al., 2012) was used to identify whether homokaryotic SNPs resulted in synonymous or nonsynonymous substitutions similar to the procedures in Cantu et al. (2013). Based on the rationale explained in Yoshida et al. (2013), the read frequency graph of each isolate was assessed to determine whether the starting material could be considered uncontaminated containing predominantly a single genotype (Cantu et al., 2013; Hubbard et al., 2015). Read frequency graphs of other isolates used in this chapter that have not been published before are displayed in Appendix A, Figure A.1. CHAPTER 4: THE ORIGIN OF SOUTH AFRICAN PST 60 4.2.5 Phylogenetic analysis A maximum likelihood phylogenetic approach was used to determine the genetic relationships amongst the South African Pst isolates and to compare them with isolates from elsewhere. Synthetic genes were prepared, and the third codon positions of these genes were used to determine the phylogeny. Due to the degeneracy of the genetic code, this will include mostly nucleotide changes that do not result in amino acid changes resulting in more evolutionary neutral positions. The RAxML software (version 8.0.20; Stamatakis, 2014) was used. One hundred iterations of bootstrapping were performed to assess the reliability of the maximum likelihood dendrograms (Cantu et al., 2013; Hubbard et al., 2015). 4.2.6 Population structure analysis The genetic differentiation of the 48 isolates (Table 4.1) was assessed by two population-clustering methods: (i) STRUCTURE (version 2.3.4; Pritchard et al., 2000) was used to assign isolates to subpopulation clusters (K) based on genetic differentiation at nearly neutral or neutral SNP sites, and (ii) Multivariate DAPC within the Adegenet package (Jombart et al., 2010) was carried out in the R environment on the same dataset as STRUCTURE. 4.2.7 Genetic diversity assessment Inter-cluster variance The SNP dataset used in STRUCTURE and DAPC analyses containing only bial- lelic synonymous SNPs was converted to the applicable format for the program Genepop (version 4.2.2; Rousset, 2008) using a Perl script. The dataset was split into population clusters as differentiated by DAPC. The between population differentiation was then determined by calculating the special case of Wright’s CHAPTER 4: THE ORIGIN OF SOUTH AFRICAN PST 61 F-statistic (FST) to describe the repartition of allelic frequency between subpopu- lations. Intra-cluster variance Synthetic genes, containing SNP sites and sites identical to the reference that passed the respective coverage thresholds, were used to quantify the genetic diversity in subpopulations that were determined by clustering analysis. The program DnaSP (version 5.10.1; Librado and Rozas, 2009) was used to compare loci between individuals within each cluster. The average and standard deviation of the Watterson theta estimate (θ̂W) across all sites were calculated to obtain the genetic diversity estimate within each cluster. A characteristic of DnaSP is that it cannot differentiate between intra-individual (between haplotypes) and inter-individual diversity (between isolates). It means that when the diversity of a population is computed, it actually considers the haplotype diversity. Every haplotype is considered as one “isolate”. Generally speaking, Pst contains two haplotypes, therefore one can compute the diversity with only one isolate. This was not the main focus of this analysis but was conducted on the isolate that was on its own in a genetic group. Haplotype diversity in Pst is generally considered to be high and was confirmed by the phased haplotype sequencing effort of Schwessinger et al. (2018). 4.3 Results 4.3.1 Re-sequencing of South African Pst pathotypes To investigate variation in the South African population, whole genome, next- generation sequencing of four historical South African isolates (SA1–SA4) was performed. More than 20 million reads were generated for each isolate using the CHAPTER 4: THE ORIGIN OF SOUTH AFRICAN PST 62 Illumina HiSeq2500 platform (Table 4.3). Reads were filtered and subsequently mapped to the PST130 reference genome (Cantu et al., 2011). The average genome depth of coverage across the PST130 genome for SA1–SA4 was between 25 and 39× (Table 4.3). All four alignments spanned 97 % of the breadth of the reference genome with at least 2× coverage depth. 4.3.2 Purity assessment of samples To assess whether the urediniospores used as starting material consisted of a single genotype, allele frequencies for each of the historical South African isolates were analysed. The resulting plots displayed clear peaks at 0.5 (Figure 4.5) and a fairly bell-shaped distribution. Although a pattern such as seen in SA4 is more desirable, SA1–SA3 still followed the expected trend that supports that samples consisted predominantly of a single genotype. 4.3.3 Clustering analyses Three methods of data clustering were implemented to infer population structure. First, a maximum likelihood RAxML phylogenetic tree was generated, using the third codon position of the synthetic genes. Next, STRUCTURE and DAPC were used to assign isolates to population clusters. 4.3.4 Phylogenetic analysis To determine the relationship of the historical South African Pst isolates to avail- able isolates from the UK, France, Pakistan, Eritrea, Ethiopia and Kenya, phylo- genetic analyses using available genomic and transcriptomic data from 48 Pst isolates (Table 4.1) were carried out. To characterise the genetic relationship between these isolates, a maximum likelihood approach was used. The third codon position across 5844 predicted genes, including 2 437 462 sites, were used 63 Table 4.3: Statistics of read alignment of the historical South African isolates to the PST130 reference genome. An average of 85.2± 4.0 % of filtered reads mapped to the reference genome Lab Platform Pathotype Total number Filtered Percent Number of Unmapped Average depthcode of reads reads discarded reads aligned reads of coverage SA1 Illumina Hi-Seq 6E16A- 23 031 402 22 827 102 0.89 % 20 131 984 2 695 118 30 SA2 Illumina Hi-Seq 6E22A- 22 628 648 22 433 194 0.86 % 16 490 301 5 942 893 25 SA3 Illumina Hi-Seq 7E22A- 26 876 262 26 637 762 0.89 % 23 960 896 2 676 866 36 SA4 Illumina Hi-Seq 6E22A+ 30 300 476 30 056 556 0.81 % 26 751 160 3 305 396 41 SA1 SA2 SA3 SA4 60000 60000 60000 60000 50000 50000 50000 50000 40000 40000 40000 40000 30000 30000 30000 30000 20000 20000 20000 20000 10000 10000 10000 10000 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Frequency Frequency Frequency Frequency Figure 4.5: Read frequency graphs from heterokaryotic SNP sites for SA1–SA4. C o u n t C o u n t C o u n t C o u n t CHAPTER 4: THE ORIGIN OF SOUTH AFRICAN PST 64 to generate the phylogenetic tree (Figure 4.6), including those sites in genes that had 80 % breadth of coverage in 80 % of the isolates. From the phylogenetic tree (Figure 4.6), it can be concluded that the South African isolates a) are closely related to one another, and b) are most closely related to isolates from Kenya and Ethiopia. This is indicative of either (i) south- ward movement of inocula, with the South African pathotypes being derived from East African isolates, or (ii) that the South African and the identified East African isolates may share a common origin. 4.3.5 Population structure analysis STRUCTURE To assign individual Pst isolates to population groups the Bayesian model based clustering method STRUCTURE (Pritchard et al., 2000) was applied to the 146 400 biallelic synonymous SNP sites that were identified across the 48 isolates. The log probability plot in Figure 4.7(i) confirmed the optimum number of population clusters as 4, with the graph reaching a plateau parallel to the x-axis for 4 or more population clusters (Pritchard et al., 2000). The number of popula- tion clusters was also evaluated using the Evanno method of population cluster analysis (Evanno et al., 2005). This method, based on the second order derivation of the maximum likelihood estimation of the model given a specific K, suggested the population number K = 2 (Figure 4.7(ii)). From these two estimates of K, STRUCTURE suggests the number of population clusters is either K = 2 or K = 4. Figure 4.8 displays bar charts representing STRUCTURE population clus- ters. To further assess population structure, STRUCTURE results were compared to DAPC clustering that does not assume Hardy-Weinberg equilibrium. 65 88.5SS1 88.45SS UK (pre-2011—WGS) Pakistan (2010—WGS) South Africa (WGS) 08/21 03/7 France (pre-2011—WGS) Kenya (Old—WGS) Bootstrap values > 80 11/140 J0085F UK & France (Pre-2011) 88.44SS3 UK (2011—WGS) Ethiopia (Old—WGS) Race: Warrior 78.6SS1 j02-022 UK (2011—RNA-Seq) Ethiopia (2010—WGS) Race: PstS2 J01144Bm1 J02055C UK (2013—RNA-Seq) Eritrea (2011—WGS) 11/128 13/33 UK (2013 - Group III) 13/21 13/182 T13/3 T13/1 UK (2013 - Group II) CL1 T13/2 13/38 UK (2013 - Partially assigned to Group III : blue and Group IV: red) 13/40 13/27 13/19 UK (2013 - Cluster I) 13/15 13/123 11/08 11/08 13/29 13/25 UK (2013 - Group IV) 13/71 11/13 ATR-1 Qld-1 Qld-2 Pakistan (2010) East Africa (B) — (2001 to 2011) ATR-2 ATR-3 × 3 ET08/10 // ER179b/11 ER181a/11 SA3 SA4 South Africa — (2001 to 2011) SA1 SA2 ET03b/10 KE89069 East Africa (A) — (1974 to 2010) KE74217 ET87094 0.0007 Figure 4.6: The phylogenetic relationship between the South African Pst isolates and European, Asian and East African isolates. South African Pst isolates are closely related to isolates from East Africa. RAxML non-routed phylogenetic analysis were performed assessing four South African and 44 global Pst isolates using the third codon position of 5844 PST130 gene models. Only those genes that had 80 % coverage in 80 % of the isolates were included, resulting in the inclusion of 2 437 462 sites to construct the tree. Clades are supported by evaluation of 100 bootstrap iterations. Bootstrap values of greater than 80 are indicated with green dots on applicable nodes. CHAPTER 4: THE ORIGIN OF SOUTH AFRICAN PST 66 ● ● ● ● ● ● ● ● ● ● ● ● ● −3200000 ● −3600000 −4000000 ● 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 K (i) Log probability of data L(K) as a function of K to identify the optimal amount of clusters. The population structure of Pst inferred by model based Bayesian cluster analysis of genome-wide SNP data indicate the optimum number of clusters K = 4. 800 ● 600 400 ● 200 ● ● 0 ● ● ● ● ● ● ● ● ● ● 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 K (ii) The Evanno method of inferring the number of STRUCTURE pop- ulations (K) from the modal value of ∆K. A strong signal was detected for K = 2 with where ∆K was at a maximum. ∆, Delta. Figure 4.7: Evaluation of the number of population clusters following STRUCTURE analyses. Delta K LnP(D) 67 K2 K3 K4 K5 K6 K7 K8 K9 K10 K11 K12 K13 K14 K15 II IV III I Old French Old UK UK Pakistan UK South Africa Kenya Ethiopia Eritrea Pre 2011 2011 2014 2013 2011 Figure 4.8: Bar charts representing STRUCTURE population clusters, with colour representing a group and each bar indicating the fraction of sites assigned to a specific group representing estimated membership fractions for each individual isolate. The UK 2013 population is divided in subgroups: green (UK Cluster II), red (UK Cluster IV) blue (UK Cluster III) and pink (UK Cluster I) as previously described by Cantu et al. (2013). Asterisk (*) indicates genomic data of isolate 11/08, while no asterisk indicates RNA-Seq data for 11/08. K = 4 was proposed as the optimal population number (see Figure 4.7(i)). K2 to K15 indicate the number of clusters individuals in the population were assigned to in each cluster number evaluation. J 0 0 8 5 F J 0 1 1 4 4 B m 1 j 0 2 - 0 2 2 J 0 2 0 5 5 C W Y R 8 8 . 5 S S 1 W Y R 7 8 . 6 S S 1 W Y R 8 8 . 4 5 S S W Y R 8 8 . 4 4 S S 3 1 1 / 1 4 0 0 8 / 2 1 0 3 / 7 1 1 / 1 2 8 1 1 / 7 5 1 1 / 1 3 P K 5 P K 3 P K 4 P K 1 P K 2 T 1 3 / 2 T 1 3 / 3 T 1 3 / 1 C L 1 1 3 / 7 1 1 3 / 2 9 1 3 / 2 5 1 3 / 4 0 1 3 / 3 8 1 3 / 1 8 2 1 3 / 2 1 1 3 / 3 3 1 3 / 2 7 1 3 / 1 9 1 3 / 1 5 1 3 / 1 2 3 1 1 / 0 8 1 1 / 0 8 * 1 9 9 6 S A 1 1 9 9 8 S A 2 2 0 0 1 S A 3 2 0 0 5 S A 4 1 9 8 9 K E 8 9 0 6 9 1 9 7 4 K E 7 4 2 1 7 1 9 8 7 E T 8 7 0 9 4 2 0 1 0 E T 0 3 b / 1 0 2 0 1 0 E T 0 8 / 1 0 2 0 1 1 E R 1 7 9 b / 1 1 2 0 1 1 E R 1 8 1 a / 1 1 CHAPTER 4: THE ORIGIN OF SOUTH AFRICAN PST 68 Discriminant analysis of principal components The same 146 400 synonymous biallelic SNP sites were used as input for the analysis. Genetic variation within and between population clusters was then summarised using PCA. The elbow of the Bayesian Information Criterion (BIC) curve formed at 6 and a minimum was observed at 10 (Figure 4.9(i)), indicating the optimum number of clusters ranged between 6 and 10. Discriminant analysis (DA) of eigenvalues was performed to assign individuals to population clusters. The bar-plot in Figure 4.9(ii) represents the DA of eigenvalues for the main principal components. The scatterplot (Figure 4.9(iii)) uses the first two principal components (the y-axis and x-axis, respectively) of the DAPC of the synonymous SNP sites. Each circle represents a single Pst isolate. The non-parametric DAPC of the Pst isolates identified at most ten clusters (K = 10), as supported by the BIC curve (Figure 4.9(i)). Some similarities between the STRUCTURE groups and the DAPC groups can be seen (Figure 4.8 and Figure 4.10). The elbow of the BIC curve suggests six populations (Figure 4.9(ii)) (Jombart et al., 2010). The bar charts corresponding to K = 6 has similarity to the STRUCTURE bar chart for K = 4. Differences between STRUCTURE and DAPC included that UK Cluster I was the fifth cluster to differentiate in DAPC analysis, while the post 2011 UK clusters did not show clear differentiation in the STRUCTURE analysis. Pakistan isolates differentiated at K = 4 in the STRUCTURE analysis and only differentiated at K = 7 in the DAPC analysis. Due to Pst predominantly reproducing asexually, specifically in regions where isolates were obtained from, DAPC is more suitable for the specific dataset. Subsequent analyses were based on DAPC results. CHAPTER 4: THE ORIGIN OF SOUTH AFRICAN PST 69 Value of BIC versus number of clusters Discriminant analysis eigenvalues 5 10 15 Number of clusters Linear Discriminants (i) Bayesian information criterion (ii) Discriminant analysis (DA) of (BIC) curve. eigenvalues. Cluster 4 Pakistan (2014) Cluster 8 UK Cluster I Cluster 1,2,3 UK & French (Pre-2011 & 2011) Cluster 5 UK Cluster I Cluster 6 Cluster 9 UK Cluster III & IV Cluster 10 East Africa & East Africa South Africa Cluster 5,6,7 (including PstS2) (UK 2013) Cluster 7 UK Cluster I Cluster 1 Cluster 6 Cluster 2 Cluster 7 Cluster 3 Cluster 8 Cluster 4 Cluster 9 Cluster 5 Cluster 10 (iii) Relative proximity of Pst population clusters. Figure 4.9: Discriminant analysis of principal component (DAPC) analysis of 48 Pst iso- lates. (i) Bayesian Information Criterion (BIC) curve suggesting the minimum number of clusters (K) required to explain variation between pathotype clus- ters to be between 6 and 10. The first nine eigenvalues components from the DAPC analysis (ii), supported the maintenance of three discriminant functions in the DAPC analysis indicated with red bars. (iii) DAPC for 48 Pst isolates. BIC 410 420 430 440 450 F-statistic 0 1000 2000 3000 4000 5000 6000 7000 70 K2! K3! K4! K5! K6! K7! K8! K9! K10! K11! K12! K13! K14! K15! Old French ! Old UK! UK ! Pakistan ! UK ! South Africa! Kenya! Ethiopia! Eritrea! Pre 2011 ! 2011 ! 2014 ! 2013 ! 2011 ! Figure 4.10: Bar charts represent DAPC population structure analysis, with each bar estimating the proportion ascription of each isolate to a population cluster. UK clusters are indicated similar to Figure 4.8. Asterisk (*) indicates genomic data of isolate 11/08, while no asterisk indicates RNA-Seq data for 11/08. K2 to K15 indicate the number of clusters individuals in the population were assigned to in each cluster number evaluation. J 0 0 8 5 F ! J 0 1 1 4 4 B m 1 ! j 0 2 - 0 2 2 ! J 0 2 0 5 5 C ! W Y R 8 8 . 5 S S 1 ! W Y R 7 8 . 6 S S 1 ! W Y R 8 8 . 4 5 S S ! W Y R 8 8 . 4 4 S S 3 ! 1 1 / 1 4 0 ! 0 8 / 2 1 ! 0 3 / 7 ! 1 1 / 1 2 8 ! 1 1 / 7 5 ! 1 1 / 1 3 ! P K 5 ! P K 3 ! P K 4 ! P K 1 ! P K 2 ! T 1 3 / 2 ! T 1 3 / 3 ! T 1 3 / 1 ! C L 1 ! 1 3 / 7 1 ! 1 3 / 2 9 ! 1 3 / 2 5 ! 1 3 / 4 0 ! 1 3 / 3 8 ! 1 3 / 1 8 2 ! 1 3 / 2 1 ! 1 3 / 3 3 ! 1 3 / 2 7 ! 1 3 / 1 9 ! 1 3 / 1 5 ! 1 3 / 1 2 3 ! 1 1 / 0 8 ! 1 1 / 0 8 * ! 1 9 9 6 ! S A 1 ! 1 9 9 8 ! S A 2 ! 2 0 0 1 ! S A 3 ! 2 0 0 5 ! S A 4 ! 1 9 8 9 ! K E 8 9 0 6 9 ! 1 9 7 4 ! K E 7 4 2 1 7 ! 1 9 8 7 ! E T 8 7 0 9 4 ! 2 0 1 0 ! E T 0 3 b / 1 0 ! 2 0 1 0 ! E T 0 8 / 1 0 ! 2 0 1 1 ! E R 1 7 9 b / 1 1 ! 2 0 1 1 ! E R 1 8 1 a / 1 1 ! CHAPTER 4: THE ORIGIN OF SOUTH AFRICAN PST 71 4.3.6 Population differentiation FST values were calculated using the software Genepop (version 4.2.2; Rousset, 2008) to show genetic differentiation between clusters. Pairwise comparisons of biallelic SNP data were assessed for each group comparison. This analysis quantifies the correlation of alleles within a subpopulation comparing to all subpopulations. Some clusters were very similar and others more divergent with FST values ranging between 0.08 and 0.86 across the 10 Pst clusters (Figure 4.11). The biggest genetic differentiation (0.37 to 0.86) was seen when Group 10—East Africa (B)—was compared to other groups. Comparison between Group 9 and Group 7 showed the highest genetic differentiation involving the South African isolates. Group 7 comprised of the UK Cluster I isolates. In addition to calculating the FST values for the population groups as de- fined by DAPC, this diversity statistic was calculated among the historical South African isolates and isolates from East Africa that were co-arranged by the phy- logenetic tree (Figure 4.6) and the clustering analysis (Figure 4.10). The high similarity between these two groups (Group A: SA1-SA4 and Group B: KE89069, KE74217, ET87094 and ET03b/10) was quantified by a very low FST of 0.08. In contrast, a second group of East African isolates containing two isolates from Eritrea and one Ethiopian isolate, was generally the most genetically diverse from all other Pst isolates maintaining high FST values throughout all comparisons. This genetic difference was also reflected by their position in the distantly related clade in the phylogenetic analysis (East Africa (B); Figure 4.6). 4.3.7 Genetic diversity within and between population clusters To estimate the genetic variation within the subpopulations the Watterson esti- mator was used as described in Chapter 3. The Watterson estimator incorporates the number of SNPs and the population size of each population cluster. The 72 Group ! 1 ! 2 ! 3! 4! 5! 6! 7! 8! 9! 10 ! 0.0031 # 1 ! "! "! "! "! "! "! "! "! "! 0.0041! 0.0003 # 2! 0.08! "! "! "! "! "! "! "! "! 0.0013! 0.0022 # 3! 0.18! 0.20! "! "! "! "! "! "! "! 0.0035! 0.0012 # 4! 0.32! 0.39! 0.16! "! "! "! "! "! "! 0.0021! 0.0006 # 5! 0.41! 0.61! 0.36! 0.33! "! "! "! "! "! 0.001! 0.0005 # 6! 0.39! 0.52! 0.23! 0.31! 0.21! "! "! "! "! 0.0008! 0.0002 # 7! 0.47! 0.74! 0.48! 0.46! 0.53! 0.38! "! "! "! 0.0009! 0.0042 # 8! 0.38! 0.59! 0.40! 0.45! 0.60! 0.49! 0.32! "! "! 0.0092! 0.002 # 9! 0.21! 0.27! 0.23! 0.26! 0.29! 0.39! 0.43! 0.35! "! 0.003! 0.0031 # 10 ! 0.39! 0.49! 0.57! 0.59! 0.78! 0.78! 0.86! 0.71! 0.37! 0.0055! GROUPS ! 1! 1! 1! 2! 2! 1! 1! 1! 1! 1! 1! 3! 3! 3! 4! 4! 4! 4! 4! 5! 5! 5! 5! 6! 6! 6! 6! 6! 6! 6! 6! 6! 7! 7! 7! 7! 8! 9! 9! 9! 9! 9! 9! 9! 9! 10! 10! 10! ISOLATES ! ORIGIN! Old French ! Old UK! UK ! Pakistan ! UK ! South Africa! Kenya! Ethiopia! Eritrea! COLLECTED! Pre 2011 ! 2011 ! 2014 ! 2013 ! 2011 ! Figure 4.11: Genetic diversity assessed between 10 population clusters derived from DAPC analysis of biallelic SNP data. FST values are indicated in the lower diagonal matrix, with the diversity in the groups indicated on the diagonal. Group 8 contains one isolate indicating haplotype diversity in this isolate on the diagonal. Isolate information is displayed in the key. The East African isolates in group 9 (purple) are referred to as East Africa I, while group 10 (red) is referred to as East Africa II in the text. Asterisk (*) indicates genomic data of isolate 11/08, while no asterisk indicates RNA-Seq data for 11/08. J 0 0 8 5 F ! J 0 1 1 4 4 B m 1 ! j 0 2 - 0 2 2 ! J 0 2 0 5 5 C ! W Y R 8 8 . 5 S S 1 ! W Y R 7 8 . 6 S S 1 ! W Y R 8 8 . 4 5 S S ! W Y R 8 8 . 4 4 S S 3 ! 1 1 / 1 4 0 ! 0 8 / 2 1 ! 0 3 / 7 ! 1 1 / 1 2 8 ! 1 1 / 7 5 ! 1 1 / 1 3 ! P K 5 ! P K 3 ! P K 4 ! P K 1 ! P K 2 ! T 1 3 / 2 ! T 1 3 / 3 ! T 1 3 / 1 ! C L 1 ! 1 3 / 7 1 ! 1 3 / 2 9 ! 1 3 / 2 5 ! 1 3 / 4 0 ! 1 3 / 3 8 ! 1 3 / 1 8 2 ! 1 3 / 2 1 ! 1 3 / 3 3 ! 1 3 / 2 7 ! 1 3 / 1 9 ! 1 3 / 1 5 ! 1 3 / 1 2 3 ! 1 1 / 0 8 ! 1 1 / 0 8 * ! 1 9 9 6 ! S A 1 ! 1 9 9 8 ! S A 2 ! 2 0 0 1 ! S A 3 ! 2 0 0 5 ! S A 4 ! 1 9 8 9 ! K E 8 9 0 6 9 ! 1 9 7 4 ! K E 7 4 2 1 7 ! 1 9 8 7 ! E T 8 7 0 9 4 ! 2 0 1 0 ! E T 0 3 b / 1 0 ! 2 0 1 0 ! E T 0 8 / 1 0 ! 2 0 1 1 ! E R 1 7 9 b / 1 1 ! 2 0 1 1 ! E R 1 8 1 a / 1 1 ! CHAPTER 4: THE ORIGIN OF SOUTH AFRICAN PST 73 degree of polymorphism in the gene set of each subpopulation was calculated by evaluating SNPs across isolates in a population cluster, gene-by-gene. Thetas of different clusters, as shown on the diagonal of the matrix in Figure 4.11, can subsequently be compared to assess the relative nucleotide diversity in the dif- ferent clusters. This metric of Group 8 was calculated on a single isolate and indicates haplotype diversity of this isolate. The highest intra-cluster variability was computed for Groups 1 and 10 and the lowest for Group 7. 4.4 Discussion To test prevalence and identify new pathotypes of Pst, surveys are routinely car- ried out in South Africa by seasonal phenotyping of rust isolates on a differential set of wheat lines that possess an array of rust resistance genes. Pathotype names, such as 6E16A-, are based on such traditional pathology screens on differential sets. In addition to the pathotype description pathologists often report the viru- lence profile of specific isolates that show virulence to additional resistance genes not represented in the differential set. These descriptions are complementary, but not necessarily identical across all isolates of a specific pathotype. For example, Ethiopian wheat varieties resistant to Pst isolates of pathotype 6E16A- and 6E22A- from South Africa were susceptible to a 6E22 isolate from Germany (Hussein and Pretorius, 2005; Denbel, 2014). Also, different isolates of the 0E0 Pst pathotype, showing avirulence to all wheat genotypes with known Yr genes, were suggested to be genetically different using microsatellite marker screens (Hovmøller et al., 2016). In addition to these phenotypic markers, genotyping, using molecular mark- ers has aided in a more detailed description of Pst isolates. For instance, South African pathotypes have been genotyped using AFLP markers (Hovmøller et al., 2008) and phylogenetic analysis using these markers indicated that the South CHAPTER 4: THE ORIGIN OF SOUTH AFRICAN PST 74 African isolates were related to isolates from Western and Central Asia and South- ern Europe. However, the seven isolates, belonging to the pathotype groups 6E16A-, 6E22A-, and 7E22A-, collected between 1996 and 2001 could not be differ- entiated using AFLP markers. A subsequent study using microsatellite markers that genotyped South African isolates collected between 1996 and 2004 also in- dicated a close relationship with Central Asian and Mediterranean Pst isolates (Ali et al., 2014). Only a single genotype was recorded for the six South African samples tested. More recently the pathology characterisation of the virulence profiles of the South African isolates has been complemented with genotype in- formation from microsatellite markers. The diversity in these molecular markers successfully distinguished the South African isolates (Visser et al., 2016). The close relationship of the South African pathotypes and the stepwise development of new pathotypes were confirmed through this analysis. Further to this work, the current study implemented a next-generation sequen- cing approach to determine the possible origin and characterise the genetic relatedness of the four historical South African Pst pathotypes identified in 1996, 1998, 2001 and 2005, through investigation of isolates SA1–SA4. First, population substructure was assessed based on allele frequencies at multiple loci of neutral or nearly neutral alleles. After that, the FST was calculated to quantify genetic variation between the predefined population clusters (Pritchard et al., 2000) and the diversity amongst isolates in a group was assessed. Knowledge of population structure is valuable in the study of emerging and re-emerging pathogens as it reports the dynamics of subpopulations with distinct pathogenicity (Hubbard et al., 2015). In this study, the Bayesian clustering method STRUCTURE (Pritchard et al., 2000) and multivariate DAPC (Jombart et al., 2010) were used to identify genetic clusters. It is often hard to meet the assumptions analysis methods rely on. STRUC- TURE is one of the most popular methods to infer population structure. It was CHAPTER 4: THE ORIGIN OF SOUTH AFRICAN PST 75 developed to be applied to various markers that are not closely linked, and assumes Hardy-Weinberg equilibrium (Pritchard et al., 2000). The high marker density obtained from re-sequencing data, together with the asexual reproduc- tion of Pst, resulted in violation of this prerequisite, making STRUCTURE less appropriate for analysing clonal populations. An additional shortcoming of STRUCTURE is that the complex models include many parameters to estimate, causing lengthy runtimes when assessing large data sets (Jombart et al., 2010), as is the case with sequence data. In contrast, DAPC first transforms the data using PCA to prepare the input variables to the DA to be uncorrelated principle components. The DA then predicts a grouping variable using one or more of the principle components. This approach is time efficient, and can easily be applied to large re-sequencing datasets. In DAPC, like in STRUCTURE, K-means clustering is run with different numbers of clusters (K). The clustering models resulting from each chosen K can be assessed by their likelihood. DAPC uses BIC to determine the model that fits the data best and by implication the number of clusters (Jombart et al., 2010). After assessment of population structure, the genetic differentiation between and within proposed clusters can be calculated to quantify the diversity between and within groups. In the pairwise comparisons of clusters, lower FST values indicate groups that are closely related, while groups distant from each other have high FST values. Phylogenetic and clustering analysis illustrated that from the isolates evaluated in this study, the historical South African isolates were most closely related to isolates from East Africa (A), also confirmed by the low FST of 0.08. Higher genetic differentiation between East African and South African isolates (FST = 0.23) was previously reported using microsatellite markers (Ali et al., 2014). In the present study, high differentiation was observed between East African isolates, with an FST of 0.37 observed between Group 9 (containing East Africa (A)) and group 10 (East Africa (B)). Group 9 and Group 10 included isolates from Ethiopia CHAPTER 4: THE ORIGIN OF SOUTH AFRICAN PST 76 sampled in 2010. This indicates high diversity in the Pst population in Ethiopia and that the South African isolates is closely related to some of the East African isolates, but different to others. Diversity calculations amongst isolates assigned to groups using DAPC in- dicated that groups 2, 5, 6 and 7 were less diverse by one order of magnitude compared to groups 1, 3, 4, 9 and 10. Group 8 consists of a single isolate and the diversity calculation represent the haplotype diversity for this isolate. This high haplotype diversity is a characteristic of Pst. Schwessinger et al. (2018) describe the haplotype diversity measured in Pst-104E higher than a number of plant pathogens, including Puccinia coronata Corda f. sp. avenae, Zymoseptoria tritici (Desm.) Quaedvl. & Crous and Verticillium dahliae Kleb., and associates this diversity with long-term asexual reproduction. One isolate in Group 10, ET08/10, has previously been assigned to the patho- type PstS2 (Ali et al., 2017). This aggressive pathotype possibly originated in East Africa and quickly spread to the Middle East, Australia, and Europe. In aggressive pathotypes like PstS2 generation time is shortened and it is able to infect in spite of relatively warm and dry climates (Hovmøller et al., 2008; Walter et al., 2016). From this analysis, it was concluded that the closest relatives of the South African isolates were a group of isolates from East Africa. As the East African isolates included historical isolates that date back to the 1970s and 1980s, this result supports the hypothesis that inoculum could have moved southwards from East Africa with subsequent introduction to South Africa. The East African isolates showing high similarity to the South African isolates also included a more recent isolate from 2010, indicating that the historical pathotypes are likely still occurring in Ethiopia. Alongside these pathotypes, new pathotypes have clearly developed, as reported for the aggressive pathotypes PstS1 and PstS2, for example. Group 10 included two isolates from Eritrea that was sampled CHAPTER 4: THE ORIGIN OF SOUTH AFRICAN PST 77 in 2011 and the PstS2 2010 isolate from Ethiopia. The historical South African isolates showed significant differentiation from this group. Previous studies that speculated about the origin of South African Pst excluded East Africa as a possible origin based upon the diversity observed between South African and Eritrean isolates (Hovmøller et al., 2008; Ali et al., 2014). These studies did not include isolates from Ethiopia. Apart from considering an incursion from East Africa, the South African and some East African isolates could also share a similar origin. To assess their relationship with isolates from Central and Western Asia and the Mediterranean, suggested to be the origin of the South African isolates (Hovmøller et al., 2008; Ali et al., 2014), the same resolution of variation assessment would be needed for historical isolates from these regions. Currently, molecular marker work suggests East African isolates to have originated from the Middle East (Ali et al., 2014) and isolates sampled from this region, at different time points in the past, should also be considered to unravel possible origins. From this study, it cannot be confirmed that the South African isolate SA1 is closely related to the 6E16 pathotype found in Southern and Northern Europe (Enjalbert et al., 2005; Hovmøller et al., 2008). Although samples from the same regions and possibly the same time frame were considered, the samples did not overlap between the current study and the work of Enjalbert et al. (2005) and Hovmøller et al. (2008). 4.5 Conclusion Based on genomic analysis, this study confirms the association between the South African and East African Pst populations previously proposed through pathotype analysis (Pretorius et al., 1997; Boshoff et al., 2002; Pretorius et al., 2007). In future, similar next-generation sequencing analysis of Central and CHAPTER 4: THE ORIGIN OF SOUTH AFRICAN PST 78 Western Asian, Mediterranean and Middle Eastern isolates would fill in the missing information to be able to draw parallels between the traditional marker work and the next-generation sequencing data analysis included in this work. From the samples analysed in this work, it was demonstrated that the South African isolates are closely related to one another, which supports the findings of the microsatellite marker work of Visser et al. (2016) that stepwise evolution is likely responsible for the consecutive pathotypes. This hypothesis is further assessed in Chapter 5 when polymorphisms in the South African isolates will be analysed in search of the evolutionary changes that gave raise to subsequent pathotypes of Pst in South Africa. Chapter 5 Analyses of Polymorphisms in Historical South African Pst Isolates in Search of Candidate Effector Genes MANY FILAMENTOUS PLANT PATHOGENS, such as Pst, use effector proteins to manipulate their hosts (Kamoun, 2007). These proteins also put the pathogen at risk of being recognised by the host via the resistance (R) proteins leading to an incompatible interaction (Rovenich et al., 2014). A change in amino acid sequences could lead to the host defence mechanisms not being able to recog- nise the pathogen. This inability results in a compatible interaction where the pathogen is virulent on host genotypes that were previously able to detect the at- tack and restrict or stop infection. In this study, Pst isolates collected from a wide geographical area were assessed using different clustering analysis methods to assign isolates to population clusters (discussed in Chapter 4). It was concluded that the historical South African isolates that were collected between 2001 and 79 CHAPTER 5: EVOLUTION OF SOUTH AFRICAN PST 80 2011 (Table 4.2) are closely related, while their closest relatives outside South Africa are isolates from East Africa. In this chapter differences and similarities among these South African isolates were further explored. In particular, to gain an understanding of how the different pathotypes became established in the pop- ulation. In accordance, a search for candidate genes that could be involved in the specific virulence of individual isolates was conducted using three approaches: i) polymorphisms in the genomes were evaluated to determine whether selection pressure could be detected, ii) the presence or absence of selected genes and the impact that such inclusion or exclusion could bring about was investigated and iii) genes of interest with regards to virulence were identified through isolate specific nonsynonymous polymorphisms in putative effector coding genes. 5.1 Introduction To obtain nutrients from the host for its own development, Pst must grow in- fection structures able to bridge host structural barriers, while simultaneously trying to avoid recognition by the host’s molecular defence mechanisms (Garnica et al., 2014). To achieve this, Pst, like other filamentous plant pathogens, makes use of a diverse set of proteins called effector proteins which the pathogen uses to manipulate host metabolism for its own advantage in cases where it can es- cape the host’s ETI (see Section 2.3.1). These proteins have critical roles during the infection process and fulfil specific tasks with accurate timing at particular locations inside the host (Hogenhout et al., 2009; Stergiopoulos and de Wit, 2009). Two major groups of effectors exist, namely apoplastic and cytoplasmic effec- tors. Among the apoplastic effectors are toxins and cell wall degrading proteins, which are important for necrotrophs, freeing up nutrients by degrading plant tissues. For hemibiotrophic and biotrophic pathogens, a more subtle approach is needed, in which the integrity of the host cell is preserved, allowing the pathogen CHAPTER 5: EVOLUTION OF SOUTH AFRICAN PST 81 to obtain nutrients from living tissues. These groups of pathogens rely more on intracellular effector proteins to modify the host cellular environment (Dou and Zhou, 2012; Stotz et al., 2014). Biotrophic fungi, like rusts, make use of haustoria to deliver fungal effectors into the plant’s living cells (Garnica et al., 2014). Some genotypes of the host have the ability to recognise these cytoplasmic effector proteins that activate ETI, triggering a cascade of defence processes that reduce or completely halt ingress of the pathogen. Genetic changes within the plant, or elimination or modification of effector genes by the pathogen, can prevent recognition of pathogen invasion by the host’s defence system (Dodds et al., 2006). This underpins what is commonly known as resistance gene mediated, pathotype-specific resistance. This type of resistance leads to the classic “Boom- and-Bust” cycle described for R-Avr interactions in phytopathology (McDonald, 2004). 5.1.1 The importance of Pst variability Isolates with a pathotype that enables the pathogen to remain undetected, or which is able to overcome plant defence systems, will become established in the population. In addition to gene flow, genetic recombination and mutation can introduce genetic variability within the population that enables Pst pathotypes to continue to evolve and overcome host resistance. Genetic recombination can occur during sexual reproduction in the form of sexual recombination, or in asexual populations through somatic recombination. Somatic recombination is believed to be rare in Pst (Little and Manners, 1969), however recent evidence in the stem rust gene, AvrSr50 indicates somatic re- combination as the mode of action to overcome Sr50 (Chen et al., 2017). This illustrates the significance of somatic recombination to be responsible for new variation. Sexual recombination in Pst requires an alternative host to wheat. CHAPTER 5: EVOLUTION OF SOUTH AFRICAN PST 82 Although Pst susceptible Berberis and Mahonia species have been found in South Africa, providing the opportunity for sexual reproduction, infection by Pst has not been observed in nature (Visser et al., 2016). The apparent stepwise changes in virulence seen in South African Pst isolates further confirms the absence of sexual recombination in South African Pst populations, suggesting that variation in the Pst population in South Africa might be mostly due to mutations (Visser et al., 2016). 5.1.2 Mutations—causes, types and effects Mutations occur naturally due to errors in DNA replication (Griffiths et al., 2015), spontaneous DNA lesions (Bienko et al., 2005) and by the action of mobile elements within the genome, called transposons (Klug, 2012). The mutation rate is the number of mutations that occur in a gene or organism in a given time period. Natural mutations vary between genes within an organism and occur at different rates across species (Drake et al., 1998; Scally, 2016). In general mutation rates are low in most organisms, but this depends on evolutionary forces, the life history of the organism and chance events (Drake et al., 1998). Agents called mutagens can accelerate the rate of mutation. A wide variety of mutagens exist, and they induce different types of mutations. Physical mutagens such as radiation from the invisible light spectrum can cause chromosomal aberrations, including chromosomal inversions, chromosomal arm deletions, duplications and repeat expansions, for example, ultraviolet light can cause various types of mutations with distinct properties for each wavelength component UVA (320—400 nm), UVB (280—320 nm), and UVC (200—280 nm) (Pfeifer et al., 2005). Some chemicals react directly with DNA, for example, ethyl methanesulfonate (EMS) and sodium azide induce SNPs in the form of random point mutations (Rao and Sears, 1964; Olsen et al., 1993). Mutagens can cause diseases such as CHAPTER 5: EVOLUTION OF SOUTH AFRICAN PST 83 cancer in mammals (Ames, 1979), but are also used in functional genomic studies to develop populations used in reverse genetic techniques such as targeting induced local lesions in genomes, or TILLING (Henikoff et al., 2004). A common challenge in these approaches is that many of the mutated individuals are in a compromised condition, highlighting that beneficial mutations are rare. Most mutations are either neutral or deleterious and not conserved in the population (Kimura and Ohta, 1969). In the absence of gene flow and genomic recombination, mutations are the main source of genetic variation. Natural selection removes harmful mutations from the population through a reduced ability of affected individuals to grow and reproduce. A carrier of a beneficial mutation will have enhanced fitness traits and therefore will be able to pass the mutation on to the next generation. Such a mutation will likely become fixed in the population (Hartl and Clark, 1998). Mutations that are passed on to the next generation increase gene polymorphisms, for example, multiple alleles of the same gene in the species (Salemi et al., 2009). In a deterministic model of evolution, changes in allele frequency depend on fitness and selection, assuming an infinitely large population size. On the other hand, the stochastic model acknowledges the influence of genetic drift, that increases as the effective population size decreases. Depending on the phenotype of the mutation—whether it is advantageous, neutral or deleterious—and the effective population size, population evolution is more influenced by either drift or natural selection (Salemi et al., 2009). Polymorphisms outside coding regions are not usually under strong selection pressure, however, depending on where SNPs occur in intron splice sites of pre- mRNA, they may interfere with alternative splicing operations during or shortly after transcription. This can lead to altered levels of mRNA, modified mRNA, or a complete shift in the reading frame. Additionally, in coding regions, synonymous SNPs can occasionally have functional consequences due to alterations in the CHAPTER 5: EVOLUTION OF SOUTH AFRICAN PST 84 structure and stability of the translated protein, but are generally considered to maintain the integrity and function of the protein. Nonsynonymous SNPs, however, result in amino acid changes, which can significantly change the protein. These SNPs can have an effect on the function of the resulting protein and the phenotype. Mutations within genes When a mutation results in a purine (nucleotides G and A) being substituted with another purine, or a pyrimidine (nucleotides C and T) with another pyrimidine, it is called a transition, while substitution of a purine with a pyrimidine, or vice versa, is called a transversion (Salemi et al., 2009). Although there are twice as many possible transversions mutations compared to transitions, transitions are 10 times more common than transversions because of chemical and steric properties (Klug, 2012; Griffiths et al., 2015). Mutations do not always result in a functional change in the protein encoded by the gene. A silent mutation or synonymous mutations describes a codon change that does not alter the amino acid in the encoded protein due to degen- eracy in the genetic code. Most synonymous mutations are considered to be selectively neutral, but may alter RNA secondary structure and stability (Salemi et al., 2009). In addition, tRNA molecules can vary in abundance, which is impor- tant for the success of translation. Mutations in genic regions can be missense or non-sense. Missense being single point mutations that result in amino acid changes, while non-sense mutations introduce early stop codons that truncate proteins. Some consider conservative missense mutations synonymous, as in the case where similar chemical properties or structures are encoded by the new amino acid, for instance, leucine and isoleucine that are both aliphatic. Nonsyn- onymous mutations describe a mutation where the new codon specifies an amino CHAPTER 5: EVOLUTION OF SOUTH AFRICAN PST 85 acid with different chemical properties from the amino acid it replaces. In this study, silent mutations are regarded as synonymous mutations and missense and non-sense mutations as nonsynonymous mutations (Miyata and Yasunaga, 1980; Li et al., 1985; Nei and Gojobori, 1986). Mutations can also interfere with gene expression if they occur in a promotor region of a gene or at the splice site of an intron. Mutations in these regions of the gene are not considered in this study. 5.1.3 Genomic approaches used to identify effectors Effector annotation used in this chapter relied on the bioinformatics pipeline developed by Saunders et al. (2012). The pipeline provides a basis for candidate effector gene identification. It first clusters secreted proteins into protein families and classifies and ranks these protein families for their likeliness to be effectors. Using a modified version of this pipeline, Cantu et al. (2013) annotated the PST130 transcriptome, identifying genes encoding candidate effectors and ranking these to generate a top 100 tribe list that contained high priority candidate effector genes. Due to the biotrophic nature and the infection structures produced by Pst, effector proteins are likely to be secreted. Therefore, at first, the pipeline screened the predicted proteome for candidates with secreted signals. Markov clustering was then used to group secreted and non-secreted proteins into protein families using sequence similarity with secreted proteins. Thirdly, tribe annotation was carried out based on sequence homology, after which a search for conserved motifs was performed. Individual members of secreted protein families were annotated based on features they share with known effectors. Through hierarchi- cal clustering of tribes, a priority list was compiled for functional validation of candidates that were most likely effectors. In this chapter, the focus was on the investigation of SNPs found between CHAPTER 5: EVOLUTION OF SOUTH AFRICAN PST 86 the genomes of the four historic South African Pst pathotypes, with specific concentration on the protein coding regions of predicted effector genes to link specific Pst virulence profiles with nucleotide polymorphisms within these effec- tor genes. The effector feature annotations, ranking protein tribes according to their probability of containing effectors, were used (Saunders et al., 2012; Cantu et al., 2013). 5.2 Materials and methods The genomes of four South African historical isolates, representing the four patho- types found in South Africa, were sequenced, mapped to the PST130 reference genome, and polymorphisms were identified, as described in Chapter 3. 5.2.1 SNP analysis From the SAMtools mpileup files, with coverage information of each position, Perl and Python scripts were used to find SNPs with at least 10× depth of coverage and to identify homokaryotic and heterokaryotic SNPs (see Chapter 3). SNP effect prediction SnpEff software (version 3.6; Cingolani et al., 2012) was used to predict the effects of the polymorphisms and to investigate the frequency of transitions and transversions in the gene space. SnpEff distinguishes SNP location and type, including characterisation of nonsynonymous and synonymous SNPs in coding regions, which indicates introduced or lost stop codons, lost start codons and changes in splice sites and introns. For this analysis, a bed format file of each isolate’s SNP set was prepared using BEDTools (version 2.17.0; Quinlan and Hall, 2010) and the annotation information of the PST130 genome. The bed file was converted into a SnpEff input file using a Perl script. The predicted effects of SNPs in the gene space were evaluated with specific focus on the introduced stop CHAPTER 5: EVOLUTION OF SOUTH AFRICAN PST 87 codons and synonymous and nonsynonymous polymorphisms. Codon positions of SNP sites that introduced stop codons were evaluated, and the gene positions where stop codons occurred were considered to evaluate any biases that could indicate the effect on the resulting protein. The frequency of specific nucleotide changes resulting in transitions and transversions were determined and further evaluated to determine biases in codon positions for specific nucleotide changes. 5.2.2 Positive selection The program Yn00 (Yang and Nielsen, 2000), which is part of the PAML package (Yang, 2007), was used to assess genetic diversity through polymorphism and positive selection analysis using the synthetic genes described in Chapter 3. A pairwise comparison that yielded a nonsynonymous substitution rate or dN value of more than zero indicated a polymorphic gene, while positive selection was considered when a dN/dS value that indicates the rate of nonsynonymous vs synonymous polymorphisms, also called the omega value, of more than one was observed. Perl scripts were used to enable the automated use of Yn00 on the PST130 gene set (Cantu et al., 2013). 5.2.3 Presence-absence analysis Unique presence and absence of genes were investigated to identify possible asso- ciations between specific genes and a gain in virulence in the four South African isolates. The read coverage of each gene was calculated using BEDTools (version 2.17.0; Quinlan and Hall, 2010). Genes with zero coverage were considered ab- sent from the specific isolate (Cantu et al., 2013). The nucleotide and amino acid sequences of these genes were used to query publicly available databases using the basic local alignment search tool (BLAST version 2.6.0; Altschul et al., 1997) to find homologous genes in related species and orthologs in the PST130 reference genome. CHAPTER 5: EVOLUTION OF SOUTH AFRICAN PST 88 5.2.4 Comparisons of nonsynonymous SNP sites between isolates An additional method to investigate polymorphisms across isolates was used. Polymorphic sites predicted to cause nonsynonymous changes were identified and nucleotides at these positions, across isolates, were compared in a pairwise manner. The number of nucleotide sites at which a difference in nucleotides between two isolates was observed was used as a distance statistic in an un- weighted pair group method with arithmetic mean (UPGMA) tree, indicating the relationship of isolates to one another in terms of the number of nonsynonymous changes. The list of genes showing differences between each pairwise compari- son was compared to the list of candidate effector genes and the list of secreted proteins generated by Cantu et al. (2013). These lists were generated as described in Section 5.1. 5.2.5 Multiple sequence alignments to visualise biallelic SNPs A custom Python script was developed to visualise translated proteins of candi- date genes indicating the presence of alternative amino acids due to nonsynony- mous polymorphisms. Where coverage was lower than 2× at nonpolymorphic sites or 10× at polymorphic sites, manual inspection of the genome was done using Integrative Genomics Viewer (IGV version 2.3.91; Thorvaldsdóttir et al., 2013). In cases where the low coverage sequence was the same as in the other South African isolates, these nucleotide sequences were included in the figure, but indicated with lighter shading. Blank spaces indicate isolates with no se- quence information. Colours were assigned according to the “Clustal X Colour Scheme” used in Jalview (Waterhouse et al., 2009), indicating specific categories. CHAPTER 5: EVOLUTION OF SOUTH AFRICAN PST 89 5.3 Results 5.3.1 SNP identification in the genomes of the historical South African isolates Polymorphism data provides information on how a population is evolving. After filtering the Illumina paired end reads and independent mapping of each of the four South African isolates to the PST130 draft reference genome (as described in Chapter 3), SNPs were identified across the whole genome, using SAMtools mpileup. Variant sites were only taken into account in cases where a coverage depth of 10 reads or more was seen. The four isolates displayed similar SNP frequencies with 0.62± 0.12 % of the genomes containing polymorphisms when compared to the PST130 refer- ence, resulting in an average rate of heterozygosity of 6.25± 1.15 SNPs/kbp. Heterokaryotic SNPs were polymorphic to the reference, being biallelic or multi- allelic, while homokaryotic SNPs were monoallelic. Heterokaryotic SNPs were in the majority and averaged 92.96± 0.18 % of all variant sites across the four isolates, with a SNP density of 5.81± 1.06 SNPs/kbp, a high number comparing to the 1.51 SNPs/kbp found on Melampsora larici-populina Kleb., the sexually re- producing poplar rust fungus (Persoons et al., 2014). The remaining 7.04± 0.18 % of variant sites comprised of homokaryotic sites occurring at a frequency of 0.44± 0.09 SNPs/kbp (Table 5.1). Determining the genetic impact of polymorphisms Information regarding polymorphisms in genes can be used to determine the impact of the variant on the resulting protein. Identifying the nature and location of SNPs show how the pathogen changes on the genetic level, including changes related to its pathogenicity phenotype. To determine the nature and genome position of polymorphisms, the SNPs identified in SA1 to SA4 (Table 5.1) were Table 5.1: Homokaryotic and heterokaryotic SNPs in the South African isolates Homokaryotic Heterokaryotic PST130 Total Monoallelic Biallelic Biallelic Multiallelic Total Isolate reference number % ofreference SNPs/kb One alternative One alternative Two alternative Three or foursites of SNPs allele allele alleles alternative alleles Number % SNPs/kbp Number Number Number Number % SNPs/kbp SA1 64 782 816 378 259 0.58 5.84 25 975 6.87 0.40 351 719 228 337 352 284 93.13 5.44 SA2 64 782 816 324 200 0.50 5.00 22 788 7.03 0.35 300 839 211 362 301 412 92.97 4.65 SA3 64 782 816 414 489 0.64 6.40 28 853 6.96 0.45 384 958 275 403 385 636 93.04 5.95 SA4 64 782 816 501 728 0.77 7.74 36 588 7.29 0.56 464 344 334 462 465 140 92.71 7.18 Average 0.62 6.25 7.04 0.44 92.96 5.81 Standard deviation 0.12 1.15 0.18 0.09 0.18 1.06 CHAPTER 5: EVOLUTION OF SOUTH AFRICAN PST 91 annotated using SnpEff. Across the four isolates 29.93± 0.20 % of SNPs were within genes, of which 52.74 % resulted in synonymous substitutions, while 47.26 % represented nonsynonymous substitutions. Loss or gain of start and stop codons can also have major effects on translation, resulting in complete loss of translation or truncated peptides. Table 5.2 describes the major predicted effects of polymorphisms in genic regions in the four South African isolates. Table 5.2: The number of SNPs identified in coding regions of the four South African Pst isolates Location of polymorphism Isolate Synonymous Nonsynonymous Stop coding coding gained SA1 58 868 52 499 3 347 SA2 50 008 44 829 2 933 SA3 59 595 53 481 3 380 SA4 71 140 64 278 3 992 Between about 3000 and 4000 SNPs resulted in stop codons (Table 5.2). The three stop codons are TAA, TAG, and TGA. C to T mutations in the first codon position often introduces stop codons in the gene space (Hane and Oliver, 2010). In the second and third codon position, SNP sites where changes to an A or G occur, are responsible for the introduction of stop codons. The majority (99.4 %) of SNPs that introduced stop codons were biallelic/heterokaryotic. G to Y (C or T) mutations occurred most frequently (29.2 %), followed by C to R (A or G) at 17.5 % at the second codon position and 14.7 % at the third codon position. Biases in SNP type at codon positions were assessed in Figure 5.1. Patterns of nucleotide changes were conserved between isolates. To identify the impact of introduced stop codons, the gene positions where a stop codon was introduced were evaluated for possible patterns in occurrence (Figure 5.2). No distinct trend was observed, and it appears that stop codons are introduced with no particular preference, randomly appearing in the gene. CHAPTER 5: EVOLUTION OF SOUTH AFRICAN PST 92 a) Monoallelic SNP sites introducing stop codons Isolate SA1 10 SA2 5 SA3 SA4 2 3 1 2 3 1 2 3 C−A C−A C−T G−A G−A G−T T−A T−A IUPAC b) Biallelic SNP sites introducing stop codons K G or T 900 M A or C 600 R A or G 300 S G or C 0 1 1 2 3 2 3 2 3 1 1 1 2 3 2 3 1 2 3 2 3 2 3 W A or T Y C or T Nucleotide change at codon position Figure 5.1: Nucleotide changes that introduced stop codons were highly conserved be- tween isolates. A small number of monoallelic SNPs (0.6 %) were responsible (a), but 99.4 % of stop codons were introduced at biallelic SNP positions (b). Numbers indicate codon positions 1, 2 or 3. Nucleotide changes are indicated underneath the codon position, the first nucleotice indicating the reference nucleotide and the second, the polymorphism nucleotide(s). SA1 SA2 75 50 25 Isolate 0 SA1 SA2 SA3 SA4 SA3 75 SA4 50 25 0 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Proportion of gene retained Figure 5.2: Distribution of introduced stop codons across all genes per isolate. The bar charts show the number of genes with a specific gene proportion retained after a stop codon was introduced. Number of genes SNP count SNP count A−W C−K C−M C−M C−R C−R C−S C−S C−W C−Y G−K G−M G−M G−R G−R G−Y T−K T−K T−R T−R T−W T−W CHAPTER 5: EVOLUTION OF SOUTH AFRICAN PST 93 Frequency of transitions and transversions at polymorphic sites The SNPeff information was used to determine whether mutations represented transversions or transitions. When considering the frequency at which transitions occurred in comparison with transversions, more transitions than transversions occurred at SNP sites, as expected. At synonymous SNP sites, C to T transitions were most common, while A to G transitions occurred most frequently in non- synonymous SNPs at homokaryotic sites. At homokaryotic SNP sites, where synonymous substitutions were observed, a transition to transversion ratio of 2 : 1 was displayed, while a 3.5 : 1 ratio was observed for nonsynonymous substitutions (Figures 5.3 and 5.4). Similar to the finding in Figure 5.1, Figures 5.5 and 5.6 indicated conserved patterns in the specific nucleotide changes at codon positions 1, 2 and 3, respectively. 5.3.2 Assessment of polymorphisms to detect positive selection This SNP data reveals information about how the population is evolving. Highly polymorphic genes are more likely linked with improved fitness and being under positive selection. The dN/dS statistic, which assesses the ratio of non- synonymous polymorphisms to synonymous polymorphisms, was evaluated to identify genes that are under selection. The term “dN” describes nonsynonymous polymorphisms that replace an amino acid and “dS” describes synonymous poly- morphisms where the amino acid remains unchanged. SNPs within all genes annotated within the PST130 reference genome (18 023 genes) were compared in a pairwise isolate analysis. It is commonly expected that synonymous sites will evolve more neutrally and that changes in allele frequencies would be due to random chance (genetic drift). In contrast, a polymorphism that affects fitness will evolve more rapidly due to its selective advantage. Synthetic, consensus genes were created for each isolate that incorporated SNPs that had a 10× or higher coverage and where nonpolymorphic sites had 94 +',",'#"-. 5"6'#"&78).#234 ! . / 0 ! "#$%&' ( %$)*' &$)+' ( %$,"' &$)%' ( %$"-' /010&0,*0234 . ""$")' ( %$,%' #$%%' ( %$)%' ,$#*' ( %$"&'/ &$#)' ( %$)#' &$)&' ( %$"#' )%$"+' ( %$+%' 0 &$-+' ( %$"#' ,$"&' ( %$"+' )%$%-' ( %$#&' 3",.',",'#"-. 5"6'#"&78).#234 ! . / 0 ! ),$*-' ( %$)1' &$-1' ( %$"-' "$*&' ( %$),' /010&0,*0234 . "+$+#' ( %$-"' "$1%' ( %$")' ,$"1' ( %$"&'/ &$+-' ( %$)*' "$1)' ( %$""' ",$&+' ( %$,#' 0 "$#-' ( %$%+' )$&+' ( %$%,' ))$#1' ( %$&,' Figure 5.3: Percentage frequency matrices of transitions and transversions at monoallelic SNP sites. In both synonymous and nonsynony- mous substitutions, transitions were more frequent compared to transversions. Darker red indicates a higher percentage and darker blue a higher standard deviation. ! " # " $ % & ' " ( ) * 95 8/%,%/3,76 1,2/3,$45063'() 734-45640 834!4564/ 934!4564- :34-4564/ ;34!45640 234/45640 ! "#$%& ' (#($& "#")& ' (#("& *#+"& ' (#(*& (#(,& ' (#((& %#*)& ' (#("& %#*,& ' (#(%& !"#"$"%&"'() - ,#%)& ' (#($& ,#*.& ' (#($& *#%*& ' (#($& %#.$& ' (#,(& (#((& ' (#((& %#**& ' (#(.&/ "#.,& ' (#("& "#$,& ' (#(+& *#*)& ' (#(*& %#.+& ' (#(.& (#(,& ' (#((& ,+#"%& ' (#(*& 0 ,#1%& ' (#("& "#,%& ' (#(%& +#)1& ' (#()& (#(,& ' (#((& %#1+& ' (#($& ,%#%"& ' (#()& (,%6/%,%/3,76 1,2/3,$45063'() 734-45640 834!4564/ 934!4564- :34-4564/ ;34!45640 234/45640 ! $#(.& ' (#(1& $#(+& ' (#(1& .#..& ' (#(+& (#(,& ' (#((& ,#*(& ' (#(%& ,,#)$& ' (#,1& !"#"$"%&"'() - ,#*,& ' (#($& ,#%,& ' (#($& *#)*& ' (#(*& "#,1& ' (#()& (#(,& ' (#((& ,,#*1& ' (#,,&/ ,#*,& ' (#($& "#)*& ' (#(+& ,,#(,& ' (#("& "#,)& ' (#(%& (#(,& ' (#((& +#1.& ' (#,(& 0 ,#+.& ' (#($& ,#,.& ' (#("& ,$#,,& ' (#,,& (#((& ' (#((& ,#.(& ' (#("& %#)$& ' (#(.& Figure 5.4: Percentage occurrence matrices of transitions and transversions at biallelic SNP sites. Biallelic SNP sites showed a high transition frequency of 14 % to 15 % for C and T to Y (C or T), and 8.5 % for A and G to R (A or G) at synonymous sites. For nonsynonymous sites transition occurrences were still fairly high with an average of 6.84 % across all possible transitions. However, transversion occurrences were more frequent at 11.98 %. Darker red indicates a higher percentage and darker blue a higher standard deviation. * " + " $ , - . $ / , + 0 & 96 Homokaryotic nonsynonymous SNPs 400 Isolate SA1 200 SA2 SA3 0 SA4 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 A−C A−G A−T C−A C−G C−T G−A G−C G−T T−A T−C T−G IUPAC Homokaryotic synonymous SNPs K G or T 800 M A or C 600 R A or G S G or C 400 W A or T 200 Y C or T 0 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 A−C A−G A−T C−A C−G C−T G−A G−C G−T T−A T−C T−G Nucleotide change at codon position Figure 5.5: Codon positions of nucleotide changes at homokaryotic SNP sites explained broadly in terms of transitions and transversion in Figures 5.3 and 5.4. S N P c o u n t S N P c o u n t 97 Heterokaryotic nonsynonynous SNPs 4000 3000 Isolate 2000 SA1 1000 SA2 SA3 0 SA4 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 A−K A−M A−R A−S A−W A−Y C−K C−M C−R C−S C−W C−Y G−K G−M G−R G−S G−W G−Y T−K T−M T−R T−S T−W T−Y IUPAC Heterokaryotic synonymous SNPs K G or T 7500 M A or C R A or G 5000 S G or C 2500 W A or T Y C or T 0 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 A−K A−M A−R A−S A−W A−Y C−K C−M C−R C−S C−W C−Y G−K G−M G−R G−S G−W G−Y T−K T−M T−R T−S T−W T−Y Figure 5.6: Codon positions of nucleotide changes at heterokaryotic SNP sites explained broadly in terms of transitions and transversion in Figures 5.3 and 5.4. S N P c o u n t S N P c o u n t CHAPTER 5: EVOLUTION OF SOUTH AFRICAN PST 98 at least 2× coverage of the PST130 reference gene (see Section 3.3.5). Pairwise isolate comparisons of each consensus gene were carried out using the YN00 program in the PAML package. Pairwise comparisons yielding positive dN values indicated that the specific gene under investigation was polymorphic between the two isolates. Alternatively, where positive dS values were obtained, genes were considered to have evolved more neutrally. No signals for positive selection were detected, as no dN/dS values, also known as omega values, of greater than 1.0 were observed. Only seven genes were given a positive dN value in the pairwise comparisons of the South African isolates, while positive dS values were computed for two genes. There were no genes in common and therefore all dN/dS values were undefined. These nine genes (Tables 5.3 and 5.4) were not investigated further as they did not display characteristics of genes coding for secreted proteins or putative effectors, as identified in the lists reported by Cantu et al. (2013), and were therefore not considered likely candidates for pathogenicity factors. 5.3.3 Presence or absence of genes Elimination of an effector gene and its resulting protein could aid the pathogen to escape host recognition. Similarly, specific genes may enhance the pathogenicity and reproducibility of the pathogen. Therefore, in addition to point mutations, inclusion or exclusion of entire genes was also assessed to look for associations of genes with virulence phenotypes. After monitoring whether there were genes in the PST130 reference genome that were not covered by read sequences from the South African isolates, 211 genes were found to be absent in all four the South African isolates. In addition, there were 36 genes that were absent in three or fewer of the South Africa isolates, in different combinations, that were present in the reference genome of PST130 (Table 5.5). 99 Table 5.3: Polymorphic genes with positive dN values indicating nonsynonymous changes in isolate pairwise comparisons Gene SA1 vs SA2 SA1 vs SA3 SA2 vs SA3 SA1 vs SA4 SA2 vs SA4 SA3 vs SA4 PST130_03694 0 0 0 0.000 9± 0.000 9 0.000 9± 0.000 9 0.000 9± 0.000 9 PST130_07979 0.000 2± 0.000 2 0 0.000 2± 0.000 2 0 0.000 2± 0.000 2 0 PST130_09146 0 0 0 0.001 3± 0.001 3 0.001 3± 0.001 3 0.001 3± 0.001 3 PST130_10326 0 0.001 3± 0.001 3 0.001 3± 0.001 3 0.001 3± 0.001 3 0.001 3± 0.001 3 0 PST130_10374 0.003 6± 0.003 6 0 0.003 6± 0.003 6 0 0.003 6± 0.003 6 0 PST130_11223 0 0 0 0.001 7± 0.001 7 0.001 7± 0.001 7 0.001 7± 0.001 7 PST130_17618 0 0 0 0.000 6± 0.000 6 0.000 6± 0.000 6 0.000 6± 0.000 6 Table 5.4: Polymorphic genes with positive dS values indicating synonymous changes in isolate pairwise comparisons Gene SA1 vs SA2 SA1 vs SA3 SA2 vs SA3 SA1 vs SA4 SA2 vs SA4 SA3 vs SA4 PST130_00923 0.003 7± 0.003 7 0 0.003 7± 0.003 7 0 0.003 7± 0.003 7 0 PST130_04022 0 0 0 0.001 5± 0.001 5 0.001 5± 0.001 5 0.001 5± 0.001 5 CHAPTER 5: EVOLUTION OF SOUTH AFRICAN PST 100 Table 5.5: Number of absent genes in the four South African Pst pathotypes. This in- cludes a total of 247 genes, where 211 genes were absent in all four isolates and 36 genes that were absent in one to three isolates Isolate Pathotype Number of absent genes SA1 6E16A- 211 + 11 SA2 6E22A- 211 + 13 SA3 7E22A- 211 + 19 SA4 6E22A+ 211 + 18 Figure 5.7 displays genes that are absent in the South African isolates. Presence- absence genes may be involved in virulence of the pathogen, however, none of these genes was on the list of putative effector genes (Cantu et al., 2013). The number of genes absent in a single isolate increased with the increase in virulence. (See Appendix B, Table B.1, for gene names of the 211 genes that were absent in all four South African isolates). A BLAST search against the National Center for Biotechnology Information (NCBI) non-redundant nucleotide databases using default parameters, revealed homology in other plant pathogens for eight of the 211 genes absent in all South African isolates (Table 5.6). Investigations of the functionality of the Pgt homologs were undertaken, and characteristics are listed in Appendix B, Section B.2. As redundancy often exists in genomes of filamentous plant pathogens (Dangl and Jones, 2001) a BLAST search of the 211 genes against the PST130 transcriptome was performed. Of the 211 genes, 152 had one or more potential paralogs within the PST130 genome (Table 5.7). 101 Table 5.6: Potential orthologs of genes absent in all four South African isolates. All orthologs identified were from fungi, besides one ortholog from the oomycete, Albugo laibachii PST130 Homolog PST130 Matchgene length length PST130_00159 Pgt isoleucyl-tRNA synthetase (PGTG_09131), mRNA 252 252 PST130_07080 Pgt hypothetical protein (PGTG_01952), mRNA 252 133 PST130_16763 Pgt hypothetical protein (PGTG_02128), mRNA 798 431 PST130_17182 Pgt hypothetical protein (PGTG_02971), mRNA 270 247 PST130_17354 Pgt hypothetical protein (PGTG_20899), mRNA 1 188 345 Pgt glycogen [starch] synthase (PGTG_07651), mRNA 1 188 562 PST130_17620 Pgt hypothetical protein (PGTG_15464), mRNA 141 136 PST130_17815 Pgt 1,3-beta-glucan synthase component FKS1 (PGTG_00125), mRNA 666 535 PST130_06262 Albugo laibachii Nc14, genomic contig CONTIG_2252_NC14_v4_941_117 210 175 Rhynchosporium orthosporum mitochondrion, complete genome 210 196 Rhynchosporium secalis mitochondrion, complete genome 210 196 Rhynchosporium commune mitochondrion, complete genome 210 196 Rhynchosporium agropyri mitochondrion, complete genome 210 196 CHAPTER 5: EVOLUTION OF SOUTH AFRICAN PST 102 Table 5.7: The number of potential paralogs identified in genes absent in all four South African isolates Number of Number of potential genes paralogs in PST130 107 1 22 2 9 3 9 4 2 7 2 5 1 10 In the group of 211 genes that were absent in all four South African isolates, only five genes were coding for secreted proteins according to the lists in Cantu et al. (2013). They were PST130_01946, PST130_03059, PST130_03060, PST130_- 06608 and PST130_08220. These five genes returned no hits in a BLAST search against NCBI non-redundant nucleotide databases. Two of the genes, PST130_- 01946 and PST130_03059, had potential paralogs within the PST130 transcriptome with higher than 80 % identity and E-values lower than 0.01 (Table 5.8). The PST130 paralogs identified in these BLAST hits did not appear in the original list of 247 genes absent across the four South African isolates and there- fore were present in the South African isolates. PST130_01946 had four paralogs, while PST130_03059 had one paralog, highlighting the occurrence of redundancy in the Pst genome that could be the result of duplication events. Table 5.8: Potential paralogs of genes absent in the four South African isolates qseqid sseqid % Identity Length Mismatch Gaps E-value Bit score PST130_01946 PST130_00235 95.745 329 11 1 1.04E-15 527 PST130_10569 89.362 235 25 0 3.21E-80 296 PST130_08196 87.179 234 30 0 2.51E-71 267 PST130_11479 87.342 158 20 0 9.36E-46 182 PST130_03059 PST130_02767 92.958 142 9 1 7.54E-53 206 qseqid, query sequence ID; sseqid, subject sequence ID CHAPTER 5: EVOLUTION OF SOUTH AFRICAN PST 103 PST130_01827 PST130_05182 2 PST130_03318 PST130_03983 PST130_04442 1 2 PST130_03509 PST130_01450 3 PST130_08345 PST130_10298 2 nt in two isolate 3 PST130_14450 S s AA1, ab se , S 2SA3 A1, SA S 211 genes PST130_00826 2 PST130_14554 SA1 absent in all 4 isolates SA1, SA3, SA4 1 PST130_14553 SA2 SA1, SA2, PST130_09396 SA4 PST130_13177 3 2 PST130_17608 PST130_00111 PST130_12299 6 PST130_04061 PST130_07666 39 PST130_03002 PST130_12309 PST130_15299 PST130_13389 PST130_16907 PST130_17504 PST130_14325 PST130_00758 PST130_01120 PST130_01245 PST130_01754 PST130_04241 PST130_04996 PST130_10076 PST130_12228 PST130_16847 Figure 5.7: Presence-absence analysis revealed 211 genes absent in all four South African isolates and an additional 36 genes absent in some isolates. Of the 36 genes that were absent in three or less of the South African iso- lates (Figure 5.7), three had highly similar nucleotide sequences in NCBI non- redundant nucleotide databases with more than 80 % identity and E-values smaller than 0.01 in BLAST searches (Table 5.9). PST130_00758, PST130_08345 (only present in SA4) and PST130_12299 (only present in SA3) had hits with PGTG_02401, PGTG_03886 and PGTG_14583 respectively. However, these three Pgt proteins are uncharacterised to date. Conserved domains of the Pgt proteins are listed in Appendix B, Section B.2. isolate G on e en SA in es 3 SA4 SA3, SA4 Gene es st ab ola se s nt nes absent in t 4 G e hree 3, S A i , SA SA2 SA2, SA3 SA4 SA1 , CHAPTER 5: EVOLUTION OF SOUTH AFRICAN PST 104 Table 5.9: Potential orthologs of genes absent in three or less of the South African isolates PST130 gene Homolog Species PST130_00758 Pgt hypothetical protein (PGTG_02401) Fungi PST130_08345 Pgt hypothetical protein (PGTG_03886) Fungi PST130_12299 Pgt hypothetical protein (PGTG_14583) Fungi Of the 36 genes absent in three or less of the isolates, nine genes were present in only one of the South African isolates. These nine genes included three genes in SA1: PST130_03002, PST130_13389 and PST130_14325, one in SA2: PST130_14553, two in SA3: PST130_00111 and PST130_12299 and three genes in SA4: PST130_- 03509, PST130_08345 and PST130_14450. Notable BLAST hits for two of these genes, PST130_12299 and PST130_08345, were obtained showing high similarity with Pgt genes as shown in Table 5.9, where they were identified according to their absence in one or more of the isolates. Conserved domains are listed in Appendix B, Section B.2. Of these 36 genes absent in three or fewer isolates, 24 displayed potential paralogs in the PST130 genome (Table 5.10). Table 5.10: Number of potential paralogs in PST130. Of the 36 genes that were absent in three or less of the South African isolates, 24 had potential paralogs in the PST130 genome. All potential paralog genes were present in all isolates Number of Number of potential genes paralogs in PST130 14 1 3 3 3 2 2 4 1 7 1 10 Two potential paralogs were identified in the PST130 genome for PST130_- 00111 (SA1) and one for PST130_03002 (SA1), PST130_14325 (SA1), PST130_12299 (SA3) and PST130_08345 (SA4) as summarised in Table 5.11. To investigate possible functions of the present and absent genes, functional CHAPTER 5: EVOLUTION OF SOUTH AFRICAN PST 105 annotation of possible orthologs were assessed (see Appendix B, Section B.2). Table 5.11: Paralogs of genes that only occurred in one of the South African isolates qseqid sseqid % Identity Length Mismatch Gaps E-value Bit score PST130_00111 PST130_12514 95.78 332 14 0 2.00E-156 536 PST130_15801 92.77 332 24 0 3.00E-140 481 PST130_03002 PST130_08845 89.36 235 23 2 2.00E-84 294 PST130_08345 PST130_11503 96.73 275 9 0 2.00E-133 459 PST130_12299 PST130_05481 96.21 396 15 0 0 649 PST130_14325 PST130_00979 95.93 246 10 0 1.00E-115 399 qseqid, query sequence ID; sseqid, subject sequence ID 5.3.4 Investigation of candidate genes that are likely to experience evolu- tionary changes By comparing heterokaryotic SNPs in the four South African isolates in a pairwise manner, all genes with unique nonsynonymous changes in the four South African isolates were identified. It was found that the number of genes with nonsyn- onymous mutations increased with an increase in virulences as indicated by the UPGMA dendrogram in Figure 5.8(a). This supports the previous hypothesis of stepwise evolution, with each pathotype derived from the preceding pathotype through single-step mutation events (Visser et al., 2016). Nonsynonymous heterokaryotic biallelic SNPs that differed between isolates (11 185 SNPs) were observed in 2689 genes. According to the gene annotation of Cantu et al. (2013), 138 of these were predicted to encode secreted proteins (613 SNPs), of which 27 were putative effector proteins (106 SNPs) that could be involved in the specific virulence phenotypes of the four South African Pst pathotypes. Figures 5.8 (b), (c) and (d) display the pairwise comparison of isolates, with the number of genes that show nonsynonymous SNPs in each gene set comparison, for example, proteomes, secretomes and effectomes. CHAPTER 5: EVOLUTION OF SOUTH AFRICAN PST 106 a) Distance tree b) Proteome c) Secretome SA3 53 SA3 1045 SA2 53 75 SA2 1095 1333 SA1 44 40 49 d) Effectome SA1 912 924 1084 SA3 12 SA2 7 11 SA1 10 9 9 Figure 5.8: Nonsynonymous SNPs in the gene space of the four South African isolates increase over time and with increasing virulence. The branch lengths of the UPGMA distance tree (a) is derived from the distance matrix in (b) and illustrates the progressive accumulation of genes with nonsynonymous mu- tations over time as new pathotypes developed, given that the population evolved stepwise through mutations. Heat maps indicate frequencies of unique nonsynonymous substitutions in the Pst Proteomes (b), secretomes (c) and effectomes (d). UPGMA, unweighted pair group method with arithmetic mean. 5.3.5 Candidate effectors with sequence polymorphisms between the South African isolates After applying the three assessment methods (positive selection analysis, presence- absence analysis and nonsynonymous polymorphism analysis) to the polymor- phic datasets, only genes that were members of the top 100 ranking protein families for effectors as described in Cantu et al. (2013), were considered for fur- ther investigation to identify candidate genes that could explain gain-of-virulence. The justification of selection of these 27 candidate genes (Figures 5.8), is shown in Section 6.2, Table 6.1. As an example, Figure 5.9 illustrates five nonsynonymous changes due to heterokaryotic SNPs in one of the 27 candidate genes, PST130_- 00285. Please consult the Appendix B, Section B.3, for changes in the remaining 26 genes. A molecular analysis, focussed on a selection of the 27 polymorphic candidate Distance 0 300 600 900 1200 SA4 SA3 SA2 SA1 SA2 SA3 SA4 SA2 SA2 SA3 SA3 SA4 SA4 CHAPTER 5: EVOLUTION OF SOUTH AFRICAN PST 107 SA1 M H L P F Y L I F L L I P L H G I G G V A H G P V G V E N G I H D L E S I K T L A L G N K SA2 M H L P F Y L I F L L I P L H G I G G V A H G P V G V E N G I H D L E S I K T L A L G N K 45 SA3 M H L P F Y L I F IL L I P L H G I G G V A H G P V G V E N G I H D L E S I K T L A L G N K SA4 M H L P F Y L I F IL L I P L H G I G G V A H G P V G V E N G I H D L E S I K T L A L G N K E T G T M G E E A G D E L K L G P L E R T S S T Q N S I V E T N R V D L A N D D V D S E E E T G T M G E E A G D E L K L G P L E R T S S T QR N S I V E T N R V D L A N D D V D S E E46 90 E T G T M G E E A G D E L K L G P L E R T S S T QR N S I V E T N R V D L A N D D V D S E E E T G T M G E E A G D E L K L G P L E R T S S T QR N S I V E T N R V D L A N D D V D S E E A E E E A A L L I Y C L R E R E S M E T S L V Q S R T M T G R Q Q KR T L V K R G H S H N K K A E E E A A L L I Y C L R E R E S M E T S L V Q S R T M T G R Q Q K T L V K R G H S H 91 R N K K 135 A E E E A A L L I Y C L R E R E S M E T S L V Q S R T M T G R Q Q KR T L V K R G H S H N K K A E E E A A L L I Y C L R E R E S M E T S L V Q S R T M T G R Q Q KR T L V K R G H S H N K K C H K Y N G I P K R Q L W W L A A K S R L R Q A K H H T Q T H F Y R F S I W C R E M I A A C H K Y N G I P K R Q L W W L A A K S R L R Q A K H H T Q T H F Y R F S I W C R E M I A A 136 180 C H K Y N G I P K R Q L W W L A A K S R L R Q A K H H T Q T H F Y R F S I W C R E M I A A C H K Y N G I P K R Q L W W L A A K S R L R Q A K H H T Q T H F Y R F S I W C R E M I A A L T S K S F W K L W K H K M R W A F F R K Y C L DY L P * L T S K S F W K L W K H K M R W A F F R K Y C L D L P * 181 208 L T S K S F W K L W K H K M R W A F F R K Y C L D L P * L T S K S F W K L W K H K M R W A F F R K Y C L DY L P * Figure 5.9: Translated sequence alignment of gene PST130_00285. This gene has been identified to encode a putative effector protein (Cantu et al., 2013). The sig- nal peptide, predicted using SignalP (version 2; Emanuelsson et al., 2007) is indicated by the black box. Alternative amino acids resulting from nonsyn- onymous SNPs at biallelic sites are indicated in the below diagonal triangles. Please consult the appendix for the sequence alignments of the remaining 26 candidates. Colours were assigned according to the “Clustal X Colour Scheme” used in Jalview (Waterhouse et al., 2009), categorising amino acid profiles. effector genes that were identified was the next step of investigation and is reported in Chapter 6. 5.4 Discussion The present study implemented the gene models developed for the PST130 draft genome sequence (Cantu et al., 2011). These gene models have been further assessed for various effector features to create a subset of genes that could likely be involved in pathogenicity (Cantu et al., 2013). In a clonal population, mutations are the main source of genetic variation. In CHAPTER 5: EVOLUTION OF SOUTH AFRICAN PST 108 this study, the focus was on point mutations causing SNPs—other DNA aber- rations were not investigated. Characterisation of SNPs was undertaken to understand how the pathogen changes at the genetic level to achieve changes in its pathogenicity phenotype. SNPs that result in nonsynonymous amino acid changes present an allelic pool of protein variation upon which selection pres- sures can impact, leading to changes in allelic frequencies within the pathogen population. 5.4.1 Polymorphic sites SNP analysis showed a higher frequency of SNPs in isolate SA4 compared to iso- lates SA1, SA2 and SA3. This is expected as the biggest time span between the col- lection of these isolates was between SA3 and SA4 (seven years), while only one to two years passed between collection of SA1 to SA3 and progressive accumula- tion of mutations is expected over time (Salemi et al., 2009). The density at which homokaryotic (0.44± 0.09 SNPs/kbp) and heterokaryotic (5.81± 1.06 SNPs/kbp) SNPs occurred in the South African isolates mapped against the PST130 refer- ence were comparable to SNP densities described by Cantu et al. (2013). The authors investigated five isolates with distinct virulence profiles, two from the UK and three from the USA. These displayed a homokaryotic SNP density of 0.41± 0.28 SNPs/kbp, and 5.29± 2.23 SNPs/kbp heterokaryotic SNP density (Cantu et al., 2013). Using similar methods similar to Cantu et al. (2013), Kiran et al. (2017) reported SNP densities of 1.90± 1.27 SNPs/kbp at homokaryotic sites and 4.67± 1.17 SNPs/kbp at heterokaryotic sites for three Indian isolates from different epidemiological regions, sequenced and mapped against each other. An average rate of heterozygosity of 6.25± 1.15 SNPs/kbp was computed in the South African isolates. This is slightly higher when compared to the average CHAPTER 5: EVOLUTION OF SOUTH AFRICAN PST 109 Identical sites VariSainntg slei tiesso l(aSteNsPs) Reference genome Mapped reads Key: False positive SNP Indicate the allele that was retained in the consensus reference sequence False negative site Figure 5.10: Over- and underestimates of SNP sites. Overestimation of heterokaryotic SNP sites is indicated with a green star and underestimation of homokary- otic SNP sites with a pink star. These misinterpretations occur due to un- phased reference genomes (adapted from Cantu et al., 2013). between PST-21 (USA), PST-43 (USA), PST130 (USA), PST-87/7 (UK) and PST- 08/21(UK) (5.70± 2.47 SNPs/kbp) (Cantu et al., 2013). Increased heterozygosity was seen in intergenic regions compared to genic regions in the South African isolates, as also reported by Cantu et al. (2013) and Cuomo et al. (2017). This is expected as selection acts more strongly on coding regions. Next-generation sequencing approaches for sequencing Pst have only re- cently implemented long read information to produce phased genomes where the genomes of the two haploid nuclei are separated (Schwessinger et al., 2018). Due to this constraint, it is expected that homokaryotic SNPs will be underesti- mated and heterokaryotic SNPs will be overestimated using short read assembly reference genomes such as PST130 (Cantu et al., 2011), CY32 (Zheng et al., 2013), PST-78 (Cuomo et al., 2017) and 46S 119 (Kiran et al., 2017). Every position in the reference genome represents only one allele at that posi- tion, although for genetic material present in both nuclei, two alleles (identical or not) would be present in the genome (Figure 5.10). At nucleotide bases where the reference would have two different alleles, such as heterozygous sites, only CHAPTER 5: EVOLUTION OF SOUTH AFRICAN PST 110 one allele would be in the consensus reference sequence used to align reads in re-sequencing. The mapped isolate identical to the biallelic reference site will appear to be a heterokaryotic SNP site. For example, when the reference is a heterokaryotic site (AT) and the mapped isolate is identical (AT) and the chosen reference site is either A or T, it would indicate a polymorphism causing an over- estimation of heterokaryotic SNP sites. It is however expected that heterokaryotic SNPs will be in the majority as mutations are expected to be random and inde- pendent between nuclei. True variant sites for single isolates that contain only one genotype would have an allele frequency of one over all aligned reads at monoallelic sites. When the consensus reference sequence contains the allele at a biallelic site that is the same base in the mapped isolate in all alleles in the mapped reads, it would not be known that the mapped isolate was not identical to the reference genome. For example, when the reference is a heterokaryotic site (AT) and the mapped isolate is homokaryotic (AA) and the chosen reference site is A, it would underestimate homokaryotic sites. The availability of a high quality phased reference genome (Schwessinger et al., 2018) allows the improvement of accuracy of current polymorphism classification. 5.4.2 STOP codons This study focused on polymorphisms in genic regions. SNP analysis revealed the introduction of multiple stop codons. These stop codons appeared at similar frequencies across genic sites in all four isolates. This is of interest as premature stop codons can cause gain in virulence when it causes loss of an avirulence effector function (Dong et al., 2015). The majority (99.4 %) of the SNP sites that introduced stop codons were biallelic. This result will be interesting to re-evaluate using a phased Pst genome to account for the overestimation in heterokaryotic SNPs identified when using an unphased genome. CHAPTER 5: EVOLUTION OF SOUTH AFRICAN PST 111 5.4.3 Transitions and transversions at specific codon positions A transition mutation does not alter the amino acid encoded by that codon, while a transversion would incorporate a different amino acid into the peptide. Due to the degeneracy of the genetic code, the third codon position can be changed for 12 of the 20 amino acids, without altering the amino acid. This is displayed in the biallelic SNP data, where nonsynonymous biallelic SNP sites displayed more transversions, while synonymous biallelic SNPs mainly displayed transitions. At synonymous biallelic SNPs sites, transitions were most frequent with the highest SNP frequencies in C↔ T and G↔ A mutations, while nonsynonymous biallelic sites (excluding sites that induced stop codons) displayed higher number of transversions, with C and T to R (A or G) and A and G to Y (C or T) occurring most frequently. Transition:transversion biases occurred at different levels at the three codon positions due to variability in physical constraints that in turn caused variability in selection for or against a specific nucleotide change (Bofkin and Goldman, 2006). G to A and C to T changes have been described as the most frequent mutations induced by long wave ultraviolet A (UVA) and short wave ultraviolet B (UVB) irradiation in mouse embryo fibroblasts (Pfeifer et al., 2005). The exposure of urediniospores to solar radiation and short wave ultraviolet (UV) light are suggested to reduce viability (Sharp, 1967; Maddison and Manners, 1972). It has also been hypothesised that the distance of dispersal of Pst is shorter in comparison with Pgt and Pt, likely due to its sensitivity to UV light (Rapilly, 1979). Further investigation is needed to draw more parallels between the effect UV irradiation has on mammalian cells, as explained by Pfeifer et al. (2005), and urediniospores, or whether the phenomenon is mostly due to the stronger selective pressure in favour of transitions compared to transversions (Bofkin and Goldman, 2006). Nonetheless, multiple studies have shown the mutagenic effect CHAPTER 5: EVOLUTION OF SOUTH AFRICAN PST 112 of UV light on urediniospores of Pst (Johnson, 1978; Cheng et al., 2014), while in this study biases were observed in the frequency of nucleotide changes at specific codon positions. 5.4.4 Stepwise mutations It is hypothesised that the stepwise changes in virulence seen in South African Pst pathotypes have resulted from mutations within a fairly static Pst population (Visser et al., 2016). Establishment of new alleles in the population is due to the unique combination of selection and genetic drift in the population (Salemi et al., 2009). In the gene space, selection pressure acts on mutations that cause changes in the function and stability of the gene or the resulting protein, ultimately changing the manner in which the organism interacts with its environment. Genotype frequencies depend on selection that is driven by fitness traits. Genes that are highly polymorphic are thus likely to be involved in fitness traits that enable the genotype to contribute to the next generation. 5.4.5 Positive selection The YN00 software package was implemented to investigate the presence of sig- natures of selection by comparing synonymous and nonsynonymous substitution rates. The dN/dS statistics were computed. New alleles introduced by random mutations that evolve neutrally will change in frequency in the population only due to genetic drift and not because it has an effect on fitness. This is generally expected for synonymous SNPs. In contrast, a nonsynonymous polymorphism that affects fitness will evolve more rapidly. Comparing synonymous and nonsynonymous substitution rates can reveal whether a specific allele at a locus is under positive or negative selection. No omega values greater than 1 were obtained in this analysis. The inability to CHAPTER 5: EVOLUTION OF SOUTH AFRICAN PST 113 identify genes under selection could indicate that genes under strong selection pressure in the South African isolates do not exist in the PST130 reference genome. However, trade-offs exist between statistical robustness and power. It is known that dN/dS methods often fail to detect signals of selection (Salemi et al., 2009). The stringency of dN/dS methods could therefore fail to detect selection pressure between the four clonally derived, and therefore relatively similar, pathotypes. The McDonald-Kreitman test (McDonald and Kreitman, 1991) is often considered more powerfull to detect positive selection. It compares dN/dS intra-species against a sister species to remove the demographic background. 5.4.6 Presence-absence analysis In addition, the South African pathotypes were compared on the basis of genes that were uniquely present in, or absent from the South African pathotypes. This method has shown changes in virulence in other pathogens (Bubić et al., 2004; Yoshida et al., 2009; Gilroy et al., 2011). Homology with genes of known functions was investigated to determine whether genes could play a specific role in pathogenicity. BLAST searches in public databases provided homology information for 11 genes. Gene ontology of characterised identified genes suggested that these homologs were involved in protein translation, sugar transport, metabolism and components of the fungal cell wall. Postulated gene function did not indicate a role of the homologs in host manipulation or the escape of host recognition, as expected for virulence factors. Biological validation of suggested functionality is needed to draw clearer conclusions. BLAST searches against the PST130 transcriptome revealed putative paralogs that could indicate functional redundancy for many of the genes shown as absent from the South African isolates, where these paralogs could functionally replace CHAPTER 5: EVOLUTION OF SOUTH AFRICAN PST 114 the absent gene. Such redundancy has been described as genetic buffering (Dangl and Jones, 2001). However, genes that were absent or uniquely present between pathotypes did not fit effector protein characterisation and were not in the putative effector subset of Cantu et al. (2013). Therefore these genes were not considered as candidate genes involved in pathogenicity dynamics. 5.4.7 Nonsynonymous polymorphisms Lastly, pairwise comparisons of the South African pathotypes, evaluating non- synonymous differences gene-by-gene, were performed similarly to Cantu et al. (2013). A total of 2689 genes showed nonsynonymous differences in pairwise comparisons between the four pathotypes. Of these genes, 138 carried a secretion signal, of which 27 were in the subset of putative effector genes. 5.5 Conclusion After characterisation of the polymorphisms across the genomes of the four South African isolates, three methods were used to identify differences in the gene space of the four South African pathotypes. Where applicable, results were compared to lists containing genes that encode secreted proteins and putative effectors (Cantu et al., 2013), to further narrow down the list of candidate genes. Of the three methods followed, namely to search effector candidates with signatures of positive selection, to evaluate the complete exclusion or the unique inclusion of effector candidates and to evaluate nonsynonymous polymorphisms between effector candidates between isolates, only the latter included genes that were previously identified as effector candidates. Different methods exist for validation of candidates, although limitations exist due to the biotrophic nature of Pst. One example of validation is to test expression of genes at infection stages using time course experiments. Candidate genes will be further investigated in Chapter 6. Chapter 6 Gene Expression Analysis of Candidate Effectors Identified in South African Pst Isolates 6.1 Introduction THE OBLIGATE BIOTROPHIC NATURE of rust prevents in vitro functional valida- tion. In addition, rust cereal hosts are difficult to transform, which makes in vivo functional characterisation challenging (Petre et al., 2016b). While in planta stud- ies have been undertaken using techniques as for example virus induced gene silencing (VIGS), they are difficult and time consuming (Panwar and Bakkeren, 2017). Recent successes in stem rust effector identification are reviewed in Chap- ter 2 Section 2.4.4. As an early step in functional validation, effector gene function in the infection process can be predicted by evaluating gene expression at specific developmental stages of the fungus (Wang et al., 2007, 2009; Sørensen et al., 2012; Cantu et al., 2013). These gene expression levels can be evaluated using methods including microarrays, transcriptome sequencing, and RT-qPCR that was used in this chapter. 115 CHAPTER 6: GENE EXPRESSION ANALYSIS 116 6.1.1 Regulation of gene expression in eukaryotes Gene expression differs throughout development, between cell types and in response to different environmental stimuli. Regulation of gene expression is an intricate, multi-stage process. Transcription regulatory processes occur in the nucleus, while regulation of pre- and post-translation occur in the cytosol. This ability to selectively express genes is essential for the development and survival of a complex organism (Bustin and Nolan, 2004). For transcription factor proteins to access genes and initiate transcription the chromatin must be remodelled through a process of acetylation. Acetylation opens up nucleosomes and allows transcription factor proteins access to gene promoter sites. The transcription process is further regulated by the assembly and arrange- ment of the transcriptional machinery enzymes that initiate transcription of RNA from the DNA template. Processing of the pre-mRNA molecule prepares it as a template for protein synthesis. A methylated cap is added to the 5′ end soon after transcription starts, while at the 3′ end a poly-adenylated tail is added upon com- pletion of transcription. Introns, if present, are then spliced from the pre-mRNA molecule. Binding sites for microRNAs (miRNAs) and regulatory proteins are often found in the 3′ untranscribed regions (UTRs) that can down-regulate gene expression or degrade the mRNA molecule. Double stranded, small interfering RNA (siRNA) can also modulate gene expression at the post-transcription stage. After maturation, the mRNA molecule leaves the nucleus through a nuclear pore and enters the cytosol. Stable mRNA molecules can now be translated into peptides. Post-translational modifications may also be required to transform gene products into functionally active proteins (Klug, 2012). These processes happen in various different organelles depending on the protein, and determine whether a functional gene product is produced. CHAPTER 6: GENE EXPRESSION ANALYSIS 117 6.1.2 Quantification of gene expression Several approaches can be taken to assess the different stages of gene expression. These include validating protein levels, transcription of genes and the effective- ness of small interfering RNAs (siRNAs; Schmittgen and Livak, 2008). Different methods have been developed for these multiple approaches (Speed, 2004; Mehta et al., 2010). One such tool used to measure gene transcript levels is quantitative or real time PCR (qPCR). The first form of qPCR was developed by Higuchi et al. (1993). It measures the level of gene transcription by quantifying the amount of a specific RNA (Schmittgen and Livak, 2008). Quantitative PCR is a powerful tool, with its strength lying in its ability to detect DNA sequences with high specificity, for a wide range of concentrations. In addition, qPCR also eliminates downstream processing that is needed by some other assays using a camera that can detect fluorescence (Higuchi et al., 1993). The fluorescent dye intercalates with the double stranded DNA (dsDNA) as it is synthesised, so that, as dsDNA accumulates the fluorescence increases. The rate at which the fluorescence in- creases (kinetics) is directly proportional to the original amount of target cDNA. The fluorescent signal is observed by the camera in the qPCR instrument at each annealing/extension phase during thermocycling (Higuchi et al., 1993). Different methods have been developed to study relative gene expression, e.g. the comparative CT method, the simulated kinetic model (Livak and Schmittgen, 2001; Schmittgen and Livak, 2008) and the efficiency correction method (Pfaffl, 2001). The efficiency correction method of relative gene expression was used for the analyses in this chapter. This method accounts for differences in the efficiencies of the PCR reaction (see Section 6.2.9) when amplifying the target regions of the test and reference genes, in contrast with the comparative CT method (Livak and Schmittgen, 2001) that assumes equal amplification efficien- cies between the two compared gene products. This is however only possible for CHAPTER 6: GENE EXPRESSION ANALYSIS 118 small experiments, with a limited number of genes. Both the efficiency corrected and simulated kinetic model approaches aim to improve the accuracy of the comparative CT method. The simulated kinetic model is the best for studying large numbers of genes (Schmittgen and Livak, 2008) as the efficiency correction method is a relatively costly and time consuming process (VanGuilder et al., 2008). 6.1.3 Candidate effector features In Chapter 5, 27 candidate effector genes that displayed nonsynonymous SNPs between the historical South African isolates were identified. These genes, based on the PST130 gene models, were previously identified as putative effectors (Cantu et al., 2013) using a modified version of the effector identification pipeline developed by Saunders et al. (2012). For the 27 candidate effector proteins, annotation and tribe rankings, as taken from Cantu et al. (2013) are listed in Table 6.1. None of the 27 candidate genes had flanking intergenic regions (FIR) of 10 kbp (kilo base pairs) or more. Only PST130_05944 had a nuclear-localisation signal (NLS) at amino acid position 238, and only PST130_07564 was classified as a small and cysteine rich (SCR) protein. 6.1.4 Gene transcription analysis In this chapter gene transcription is measured as an indication of gene expression, although it is clear from the preceding explanations that many regulatory steps need to be successfully completed to yield a functional protein. When gene expression is studied under different conditions, or at different time points in a developmental time series, spatial and temporal patterns of gene expression show differential accumulation of gene products that are associated with treatment or the specific stages of development (Tomancak et al., 2007). Ideally time points Table 6.1: Effector features of the identified candidate effectors. Identified candidate effectors were secreted proteins in tribes ranking within the top 100 potential effector tribes as described by Cantu et al. (2013) Isolate pairs with Tribe Tribe Length Similarity to No. of ExpressedGene ID nonsynonymous substitutions no. ranking (amino HESPs or repeat Effector motifs PFAM in infected Expressed acids) fungal AVRs units (amino acid position) mapping material in Haust. PST130_06558 SA2 & SA3; SA3 & SA4 9 6 341 No 9 No No No PST130_12487 SA1 & SA2; SA1 & SA3; SA1 & SA4; 31 7 197 No 0 No Yes Yes SA2 & SA3; SA2 & SA4; SA3 & SA4 PST130_14091 SA1 & SA2; SA2 & SA4 11 14 167 No 0 Y/F/WxC(85);LIAR(32) Yes Yes Yes PST130_17605 SA2 & SA4; SA3 & SA4 11 14 239 Yes 7 Y/F/WxC(103) No Yes Yes PST130_05454 SA1 & SA2; SA2 & SA3; SA2 & SA4 68 15 266 No 0 Yes Yes No PST130_09275 SA1 & SA2; SA1 & SA3; SA1 & SA4 134 16 210 Yes 0 Yes Yes No PST130_12491 SA1 & SA4 8 17 182 No 13 No No No PST130_05023 SA1 & SA4; SA3 & SA4 351 22 281 No 6 Yes Yes Yes PST130_13969 SA3 & SA4 437 23 394 No 0 No Yes Yes PST130_00285 SA1 & SA3; SA3 & SA4 317 25 207 No 0 Yes Yes Yes PST130_14831 SA2 & SA4 596 31 139 No 0 No Yes Yes PST130_10286 SA3 & SA4 54 33 254 No 0 LIAR(96) Yes Yes Yes PST130_16778 SA3 & SA4 409 40 172 No 0 No Yes Yes PST130_06503 SA1 & SA4; SA2 & SA3; SA3 & SA4 120 41 292 No 9 No Yes Yes PST130_05944 SA2 & SA4; SA3 & SA4 320 49 318 No 0 LIAR(10) No Yes Yes PST130_07579 SA2 & SA4 170 68 926 No 0 Yes Yes Yes PST130_09018 SA1 & SA3 289 69 430 No 0 No Yes Yes PST130_08031 SA1 & SA2; SA2 & SA4 162 77 206 No 0 LIAR(18) No Yes Yes PST130_02403 SA1 & SA4; SA2 & SA3; SA3 & SA4 21 83 215 No 8 No Yes Yes PST130_02001 SA1 & SA2; SA1 & SA3; SA1 & SA4; 65 84 148 No 0 Yes Yes Yes SA2 & SA4 PST130_08984 SA2 & SA4 65 84 116 No 0 Yes Yes Yes PST130_07564 SA1 & SA2; SA1 & SA3 482 86 145 No 10 No Yes Yes PST130_15131 SA1 & SA2; SA1 & SA3; SA2 & SA3 186 87 546 No 2 No Yes Yes PST130_02118 SA1 & SA2; SA2 & SA3; SA2 & SA4 92 88 187 No 0 Y/F/WxC(21) No Yes Yes PST130_07513 SA1 & SA2; SA1 & SA3; SA1 & SA4 128 95 154 No 0 Yes No Yes PST130_12956 SA1 & SA4 128 95 156 No 0 Yes Yes Yes PST130_07448 SA1 & SA3; SA3 & SA4 192 100 191 No 0 Y/F/WxC(73) No Yes Yes HESPs, Haustorial expressed secreted proteins; AVRs, proteins encoded by avirulence genes ;Haust, Haustorial library. PFAM, Protein family database. Genes in boldface had nonsynonymous substitutions between SA1 and SA4 and their expressions were evaluated over a time series. Genes marked in grey were also nonsynonymous between PST-87/7 and PST-08/21 (refer to Cantu et al., 2013). PST130_14091 also known as PST21_19014 and PST130_13696 also known as PST21_18360. CHAPTER 6: GENE EXPRESSION ANALYSIS 120 would be chosen that capture gene expression during early infection processes, at various stages of haustorial and hyphae network development, and sporulation. Comparisons were drawn from histological evaluation of Pst haustorial develop- ment (Sørensen et al., 2012) and gene expression studies in seedlings (Wang et al., 2007). Spore germination, formation of the substomatal vesicle, development of infection hyphae, the formation of the haustorial mother cells, and haustoria formation are all apparent within the first 24 hours after inoculation. Hyphae and haustoria continue to develop in the host tissue until roughly 5 days post inoculation (dpi). Sporogenous cells become visible at about 7 dpi. By 12 dpi to 14 dpi, depending on the experimental setup, visibly sporulating pustules are usually apparent. Two of the historical South African Pst isolates were further investigated for gene expression using a selection of the 27 candidate effectors. The isolates that were used are representatives of the first Pst pathotype detected in South Africa in 1996: 6E16A- (SA1), and the most recent pathotype, 6E22A+ (SA4), that was identified in 2005. These two isolates are the furthest apart in terms of time of collection and pathogenicity as they differ in virulence for three Yr resistance genes and were collected seven years apart (Table 4.2). They were chosen to improve the chances of identifying a virulence-related effector candidate. This chapter focuses on further investigation of these candidates using RT-qPCR gene expression analysis. The nine genes selected were those polymorphic between SA1 and SA4 (Table 6.1 boldface). 6.2 Methods 6.2.1 Inoculation and sampling Seedlings of the stripe rust susceptible wheat variety, Avocet S, were inocu- lated with urediniospores of the Pst South African pathotypes 6E16A- (SA1) and CHAPTER 6: GENE EXPRESSION ANALYSIS 121 Tray 1 Tray 2 Tray 3 Isolate SA1 21 plants 21 plants 21 plants 126 samples evaluated for gene expression of nine genes, in triplicate. Isolate SA4 21 plants 21 plants 21 plants Figure 6.1: Experimental setup for the infection time course experiment. 6E22A+ (SA4) (see Section 3.1.1), using an inoculation concentration of 5 mg/ml. As each seedling can only be sampled once, nine plants—subsequently referred to as biological replicates—were sampled for each treatment (isolate SA1 or SA4) at each time point (Figure 6.1) . The 63 plants (9 seedlings × 7 time points) were equally divided between three trays (21 plants per tray). The three trays for each of the treatments were inoculated independently, which introduced a blocking variable to test reproducibility. Inoculated leaf samples were taken at 0, 1, 2, 3, 5, 9 and 12 dpi, taking three seedlings, per time point, from each of the three trays. Samples were taken about 8 cm from the tip of each leaf, cut into shorter pieces and immediately stored in the RNA stabilising agent, RNAlater (Thermo Fisher Scientific, USA; Taylor et al., 2010). Scissors used to cut inoculated leaf samples were wiped clean with ethanol between sample collections. Fresh spores of both isolates were germinated and used as positive, fungal controls. The germinated spore samples were prepared in a laminar flow cabinet. Spores were sprinkled on a thin layer of autoclaved double distilled water in a sterilised Petri dish, comparable to the method of Zhang et al. (2008), and kept overnight in a dark room at 11 ◦C. After 8–12 hours a thick mat of intertwined germination tubes was collected from the surface of the water with a spatula and stored in RNAlater. The preserved samples in RNAlater were kept at room CHAPTER 6: GENE EXPRESSION ANALYSIS 122 temperature for 20 days before RNA was extracted. Caution was taken throughout the experiment to control and define condi- tions to minimise external stimuli that could interfere with the sensitive process of mRNA transcription (Taylor et al., 2010). A detailed explanation of the inocu- lation protocol can be found in Chapter 3. 6.2.2 Tissue disruption and RNA extraction Total RNA was extracted from the inoculated leaf tissue, non-inoculated wheat and germinated fungal spore controls using the Qiagen RNeasy Plant Mini Kit according to the manufacturer’s instructions. To minimise the time between subsequent sampling events, the sample processing steps that follow were per- formed on small batches of 12–24 samples as recommended by Taylor et al. (2010). Tissue was disrupted using a mortar and pestle and the addition of extraction sand (SiO2). All instruments used were washed with detergent, ethanol and RNase AWAY decontamination reagent between samples, and cooled down in liquid nitrogen, or on dry ice, to prevent degradation of RNA due to ubiquitous RNases activity (Holland et al., 2003). The dry mortar and pestle were placed on dry ice in a polystyrene box and was further cooled with liquid nitrogen. About 100 mg of extraction sand was added to each sample. Forceps were used to move the preserved sample material from the tubes and tapped dry on a clean paper towel to prevent the stabilising solution from forming ice crystals when the sample comes in contact with liquid nitrogen. Samples were then placed in the mortar, along with liquid nitrogen and extraction sand, and homogenised into a fine powder. Without letting it thaw, the powder was scraped with a cooled spatula into a 2.2 ml safe lock microcentrifuge tube. The ground sample was kept on dry ice until extraction buffer was added. CHAPTER 6: GENE EXPRESSION ANALYSIS 123 6.2.3 RNA quality control and quantification Automated capillary-electrophoresis systems are popular for generating accurate profiles for RNA quality assessment (Fleige and Pfaffl, 2006). The Agilent 2100 Bioanalyzer (Agilent Technologies, USA) was used to assess the quality and quantity of the extracted RNA. The reaction kit was stored at 4 ◦C. A gel-dye mix was first prepared according to the manufacturer’s instructions. The quality of samples was assessed within 1 to 3 days after RNA extraction. RNA samples were appropriately aliquoted to prevent multiple freezing and thawing steps that impose the risk of RNA degradation (Taylor et al., 2010). RNA stocks were stored at −80 ◦C. 6.2.4 Complementary DNA synthesis The SuperScript IV First-Strand Synthesis System (Invitrogen/Thermo Fisher Scientific, USA) was used for the conversion by reverse transcription of mRNA to cDNA according to the manufacturer’s instructions. Excess RNA was removed by adding 1 µl of E. coli RNase H to the synthesised cDNA. An aliquot of 3 µl of cDNA was prepared and quantified on the Qubit 2.0 Fluorometer (Thermo Fisher Scientific, USA) at the Central Analytical Facilities (CAF) at Stellenbosch University, South Africa. cDNA was diluted to approximately 12.5 ng/µl for use in PCR reactions, and cDNA was stored at −20 ◦C (Taylor et al., 2010). 6.2.5 Primer design Primers for RT-qPCR were designed using the compiled Illumina sequences obtained of the two Pst isolates, SA1 and SA4 respectively (Chapter 4). The PrimerQuest Tool by Integrated DNA Technologies1 was used to design primers for the nine Pst genes of interest. Primers were designed that would amplify the 1http://eu.idtdna.com/scitools/Applications/RealTimePCR/ CHAPTER 6: GENE EXPRESSION ANALYSIS 124 respective gene from both SA1 and SA4, and produce gene amplicons between 84 bp to 129 bp in length (see Section 6.3.2), as the kinetics of the PCR reaction are influenced by the length of the resulting amplicon. The primer sequences were evaluated in NCBI BLAST (version 2.6.1; Altschul et al., 1997) homology searches to ensure that they would not amplify sequences within the wheat genome. The likelihood of the primers to form secondary structures, such as primer dimers and hairpins were also assessed, and absence of SNPs in primer sequences was confirmed (Derveaux et al., 2010). Primers were manufactured by Integrated DNA Technologies, USA. Primers were empirically tested for a negative result in a reaction with wheat template DNA and for specificity to amplify the desired amplicon with Pst cDNA by evaluating the melt curve of the RT-qPCR, followed by gel electrophoresis to confirm amplicon length. Primer efficiencies were determined using CT (threshold cycle) of serial dilutions (Derveaux et al., 2010). 6.2.6 PCR plate setup Complementary DNA templates were used to evaluate transcription levels of the nine Pst genes of interest. Only one target gene and the reference gene, which is expected to be expressed constantly over the infection time course, were evaluated on each PCR plate. The same isolates as used in sequencing in Chapter 4 were used for inoculation. Three controls were included: two positive controls—SA1 and SA4—in duplicate, a negative wheat control (WC) from the same wheat variety, Avocet S, and a Non Template Control (NTC; Figure 6.2). Quantitative PCRs of each cDNA sample were performed in triplicate for each gene assay and time point measured in days post inoculation. 125 rep 1 rep 2 rep 3 rep 1 rep 2 rep 3 rep 1 rep 2 rep 3 rep 1 rep 2 rep 3 cDNA 1 2 3 4 5 6 7 8 9 10 11 12 SA1: 0-3 dpi A SA1:0dpi SA1:0dpi SA1:0dpi SA1:1dpi SA1:1dpi SA1:1dpi SA1:1dpi SA1:1dpi SA1:1dpi SA1:3dpi SA1:3dpi SA1:3dpi SA4: 0-3 dpi B SA4:0dpi SA4:0dpi SA4:0dpi SA4:1dpi SA4:1dpi SA4:1dpi SA4:1dpi SA4:1dpi SA4:1dpi SA4:3dpi SA4:3dpi SA4:3dpi SA1: 0-3 dpi C SA1:0dpi SA1:0dpi SA1:0dpi SA1:1dpi SA1:1dpi SA1:1dpi SA1:1dpi SA1:1dpi SA1:1dpi SA1:3dpi SA1:3dpi SA1:3dpi SA4: 0-3 dpi D SA4:0dpi SA4:0dpi SA4:0dpi SA4:1dpi SA4:1dpi SA4:1dpi SA4:1dpi SA4:1dpi SA4:1dpi SA4:3dpi SA4:3dpi SA4:3dpi SA1: 5-12 dpi E SA1:5dpi SA1:5dpi SA1:5dpi SA1:9dpi SA1:9dpi SA1:9dpi SA1:12dpi SA1:12dpi SA1:12dpi SA1 rep 1 SA4 rep 1 WC SA4: 5-12 dpi F SA4:5dpi SA4:5dpi SA4:5dpi SA4:9dpi SA4:9dpi SA4:9dpi SA4:12dpi SA4:12dpi SA4:12dpi SA1 rep 2 SA4 rep 2 NTC SA1: 5-12 dpi G SA1:5dpi SA1:5dpi SA1:5dpi SA1:9dpi SA1:9dpi SA1:9dpi SA1:12dpi SA1:12dpi SA1:12dpi SA1 rep 1 SA4 rep 1 WC SA4: 5-12 dpi H SA4:5dpi SA4:5dpi SA4:5dpi SA4:9dpi SA4:9dpi SA4:9dpi SA4:12dpi SA4:12dpi SA4:12dpi SA1 rep 2 SA4 rep 2 NTC rep 1 rep 2 rep 3 rep 1 rep 2 rep 3 rep 1 rep 2 rep 3 rep 1 rep 2 rep 3 Primers 1 2 3 4 5 6 7 8 9 10 11 12 A SA1:0dpi SA1:0dpi SA1:0dpi SA1:1dpi SA1:1dpi SA1:1dpi SA1:1dpi SA1:1dpi SA1:1dpi SA1:3dpi SA1:3dpi SA1:3dpi REF B SA4:0dpi SA4:0dpi SA4:0dpi SA4:1dpi SA4:1dpi SA4:1dpi SA4:1dpi SA4:1dpi SA4:1dpi SA4:3dpi SA4:3dpi SA4:3dpi C SA1:0dpi SA1:0dpi SA1:0dpi SA1:1dpi SA1:1dpi SA1:1dpi SA1:1dpi SA1:1dpi SA1:1dpi SA1:3dpi SA1:3dpi SA1:3dpi GOI D SA4:0dpi SA4:0dpi SA4:0dpi SA4:1dpi SA4:1dpi SA4:1dpi SA4:1dpi SA4:1dpi SA4:1dpi SA4:3dpi SA4:3dpi SA4:3dpi E SA1:5dpi SA1:5dpi SA1:5dpi SA1:9dpi SA1:9dpi SA1:9dpi SA1:12dpi SA1:12dpi SA1:12dpi SA1 rep 1 SA4 rep 1 WC REF F SA4:5dpi SA4:5dpi SA4:5dpi SA4:9dpi SA4:9dpi SA4:9dpi SA4:12dpi SA4:12dpi SA4:12dpi SA1 rep 2 SA4 rep 2 NTC G SA1:5dpi SA1:5dpi SA1:5dpi SA1:9dpi SA1:9dpi SA1:9dpi SA1:12dpi SA1:12dpi SA1:12dpi SA1 rep 1 SA4 rep 1 WC GOI H SA4:5dpi SA4:5dpi SA4:5dpi SA4:9dpi SA4:9dpi SA4:9dpi SA4:12dpi SA4:12dpi SA4:12dpi SA1 rep 2 SA4 rep 2 NTC Figure 6.2: Plate layouts for RT-qPCR assays. Template cDNA layout: Plate layout for DNA of each biological replicate and gene assay. Nine biological replicates were assessed for each gene assay. Primer layout: Plate layout for PCR reaction mix. Nine genes were assessed in total. Each plate assessed transcript levels of one target, candidate Pst effector gene and one reference gene. REF, reference gene; GOI, Gene of interest. CHAPTER 6: GENE EXPRESSION ANALYSIS 126 6.2.7 Quantitative real-time polymerase chain reaction Reactions were set up manually. Plates and accompanying seals were manu- factured by Thermo Fisher Scientific, USA. Transcript levels of nine candidate effector genes (Chapter 5; Cantu et al., 2013) were assessed using RT-qPCR. A fully skirted, 96 well PCR plate was prepared with 2 µl (approximately 25 ng) of template cDNA. The 8 µl reaction mix consisted of 2.4 µl of double distilled water, 5 µl of BioRad Precision Melt Supermix and 3 pmol of each forward and reverse primer. The plate with template cDNA was kept on an Eppendorf PCR Cooler block (Sigma-Aldrich, USA) while the reaction mix was added. The plate was sealed, briefly centrifuged and ran on the BioRad CFX96 Touch Real-Time PCR System. The first part of the PCR program included the following steps: An initiation step of 5 minutes at 95 ◦C, followed by 40 cycles of a 15 second denaturation step at 95 ◦C, a 20 second primer annealing step at 60 ◦C and a 20 second primer extension step at 72 ◦C. The second part of the PCR program was included to generate a dissociation curve as an indication of the amplification specificity of the primers and to evaluate the formation of primer dimers. High specificity was expected as the fluorescent dye, EvaGreen, is known for its high sequence specificity, and allowing a robust PCR with less PCR inhibition than SYBR Green I, due to its thermal and hydrolytic stability (Mao et al., 2007). The following steps were included in the program: 1 minute at 95 ◦C to denature the double stranded DNA (dsDNA) with fluorescent intercalating dye to single strand DNA (ssDNA). No fluorescence is expected after this step. To induce the formation of dsDNA, the temperature was lowered for 10 seconds to 40 ◦C. A ramped step with a 0.2 ◦C/s incremental increase of temperature starting at 60 ◦C and stopping at 90 ◦C was used to denature the dsDNA incrementally. Fluorescence decreases as the dye dissociates. A final cooling step of 10 seconds at 40 ◦C was added, where after CHAPTER 6: GENE EXPRESSION ANALYSIS 127 the reaction was kept at 15 ◦C. 6.2.8 Reference gene selection Examples of genes that are often used as internal references in qPCR are 18S rRNA, 7S rRNA, U6 RNA, β-actin and glyceraldehyde 3-phosphate dehydroge- nase (GAPDH; Schmittgen and Livak, 2008). Three genes were assessed for use as standards of gene expression: P. striiformis elongation factor 1 (PST_EF1; Ling et al., 2007), β-Actin (ACTB) and β-Tubulin (TUBB; Huang et al., 2012). Amplifi- cation signals in the negative wheat control occurred in multiple qPCR reactions with the primer pair for PST-EF1. However, no amplification with the wheat DNA control was observed with the PST-ACTB and PST-TUBB primers. Both genes would therefore be suitable to use as references. Due to limited wells on the PCR plate, only one reference gene was used, and PST-TUBB was arbitrarily chosen as the reference gene in this study. 6.2.9 Efficiency determination of primers The BioRad2 Precision Melt Supermix contains hot-start iTaqTM DNA poly- merase, dNTPs, MgCl2, EvaGreen dye, enhancers and stabilisers. The poly- merase enzyme is responsible for producing amplicons using primers, dNTPs, and template cDNA, with the help of magnesium as a cofactor and optimal temperature cycles. PCR efficiency describes the rate of action of the polymerase and indicates the fold increase of the target DNA per thermocycle (Ruijter et al., 2013). Full efficiency would mean that there is a 2-fold increase of amplicon with every thermocycle during the exponential phase (Yuan et al., 2006). Efficiencies between 90 % and 110 % are acceptable. Poorly calibrated pipettes are often the reason for efficiencies to fall outside of this range. Additionally, low efficiency 2http://www.bio-rad.com/webroot/web/pdf/lsr/literature/10022094.pdf CHAPTER 6: GENE EXPRESSION ANALYSIS 128 can be caused by suboptimal temperatures, the presence of inhibitors or inactive polymerase, poor primer design or amplicons with secondary structures, while overly high efficiencies result from primer dimers or nonspecific amplicon ampli- fication (Taylor et al., 2010). Efficiency is also not constant throughout the PCR reaction, and low levels of DNA template can result in inaccurate determination of efficiency (Karlen et al., 2007). The efficiencies of primers were estimated by calculating the slope of the standard curve of a serial dilution of template DNA. Two 2-fold serial dilutions were made by adding RNase free water to the DNA sample, with PCR reactions being done in duplicate. For each DNA concentration, in each dilution series, the mean of the CT values of the two replicate PCRs was plotted against the base-10 logarithmic transformation of the dilution factor. The data was fitted to a linear regression model and the Pearson correlation coefficient (R2) was assessed. The amplification efficiency E is theoretically expected to be between 0 and 1, and was calculated with E = 10(−1/s) − 1, (6.1) where the s is the gradient of the linear regression line (Kubista et al., 2006). The obtained efficiencies were used in the efficiency corrected method to obtain the expression pattern of each gene of interest. The relative expression, R, of the candidate genes to the reference gene was first determined with ′ R = ECT/E′CT , where E and E′ are the efficiencies as calculated in Eq. (6.1) for the gene of interest and reference gene, and CT and C′T are the cycle threshold values for the gene of interest and reference, respectively. The cycle threshold indicates the number of cycles it took to reach the fluorescence threshold, FT. It is important that CHAPTER 6: GENE EXPRESSION ANALYSIS 129 this threshold value is set to fall in the exponential phase of the amplification process (Karlen et al., 2007). This is the earliest phase, with ample reagents, and is followed by the linear phase as reagents decrease and finally reach the plateau phase where reagents become depleted (Yuan et al., 2006). Default FT was used for all PCR runs. Transcript levels of the candidate genes were expressed as relative expression to the reference gene P. striiformis β-tubulin (TUBB; Huang et al., 2012). 6.2.10 Statistical evaluation of the data The treatments applied were SA1 and SA4 inoculations. These were applied three times in three independent tray inoculations. For each tray inoculation, three seedlings were prepared for each of the seven time point sampling efforts (7 time points × 3 = 21 seedlings). The three treatment applications were used as a grouping variable in a linear mixed model. Each of the nine biological replicates per time point (3 plants × 3 trays) was assessed on a different plate. Inter-plate variability was not corrected for. Intra-plate variability was addressed by performing three technical PCR replications of each biological replicate (plant) per plate (Schmittgen and Livak, 2008). Grubbs’ test (Grubbs, 1969) was applied to identify outliers as suggested by Burns et al. (2005). The relative expression values obtained by using the efficiency corrected method were statistically analysed. One-way analyses of variance (ANOVAs) were performed to assess the vari- ation within and between the groups of biological replicates at different time points in each gene expression assay for both isolates, SA1 and SA4. 6.2.11 Linear mixed effect analysis The R package, lme4, was used for statistical evaluation of the data (Bates et al., 2014). To determine the relationship between the time that elapsed after inocula- CHAPTER 6: GENE EXPRESSION ANALYSIS 130 tion and the relative expression of the candidate genes in each isolate, a linear mixed model with random intercepts was fitted for the data generated for each gene: yij = β0 + β1xTij + β2xIij + β3xTij xIij + b0j + eij, (6.2) where β0 is the fixed intercept; β1, β2, and β3 are fixed effects for time, isolate, and interaction, respectively; boj is a random intercept for each tray j; the xT and xI terms are independent variables for time point and isolate, respectively; and eij is error. The model was fitted, and assumptions that linear mixed models are based on were assessed. These assumptions include equal variances, and normality of the residuals and random intercepts. The tests were repeated, and re-evaluated after a log10 transformation (Burns et al., 2005) of the relative expression values. A likelihood ratio test of the full model against the model without the effect (xI (Isolate), xT (Time Point), or xTxI (Isolate × Time Point)) in question were performed (Winter, 2013) to assess which model fits the data best. A p-value lower than 0.001 were considered statistically significant, providing evidence for inclusion of the effect in the model. Such a high significance threshold was used to account for the expected high variability in RT-qPCR data. Tukey multiple comparison post-hoc tests were used to indicate where the significant differences in effects were (Section 6.3.4). 6.2.12 Relative expression of Pst candidate effector genes The Pst gene expression fold difference between the standardised expression levels of SA1 and SA4 was estimated using the method proposed in Pfaffl (2001) taking primer amplification efficiency into account: R = E∆Ct(SA1−SA4) ′ /E′∆Ct(SA1−SA4), CHAPTER 6: GENE EXPRESSION ANALYSIS 131 where E and E′ are the efficiencies as calculated in Eq. (6.1) for the gene of interest and reference gene, respectively, and ∆CT and ∆C′T are the difference between the two isolates (SA1 and SA4) in cycle threshold values for the gene of interest and reference, respectively. This method is similar to the 2−∆∆Ct method (Schmittgen and Livak, 2008) for determining linearised values, with the difference that the 2−∆∆Ct method assumes that primers have 100 % efficiency causing a two-fold increase of the replicated amplicon in every thermocycle. 6.2.13 Assessment of genes BLAST searches were performed to assess whether genes were present in both the PST130 gene models and the revised gene models (Dobon et al., 2016). The original PST130 gene discovery was done using the machine learning algorithm geneid3 and Pgt gene annotations as training set, followed by filtering for trans- posable elements (Cantu et al., 2011), while the revised annotation made use of the 2013 UK Pst RNA-Seq data and the annotation tools cufflinks, trinity, stringtie and portcullis (D Bunting, personal communication). BLAST searches of the nine candidates against Pst transcript data sequenced from the 2013 UK Pst population were used to evaluate the occurrence of alternative splicing. 6.3 Results 6.3.1 RNA yield, RNA quality scores and cDNA yield The integrity of each RNA sample was evaluated on the Agilent 2100 Bioanalyzer producing gel-like visuals, RNA integrity number (RIN) scores, RNA concentra- tions, and ratios between ribosomal units. Summary statistics were performed on the RNA yields, RIN scores and the reverse transcribed cDNA yields as required 3http://genome.crg.es/software/geneid/ CHAPTER 6: GENE EXPRESSION ANALYSIS 132 Table 6.2: Summary statistics describing RNA yield, integrity and cDNA yield as re- quired in the MIQE guidelines (Bustin et al., 2009). Yield was measured in ng/µl n Median IQR Mean SD RNA_Yield 128 786.00 382.50 793.81 297.84 RIN 128 6.10 0.50 6.06 0.77 cDNA_Yield 128 151.50 137.00 178.26 90.98 RIN: RNA integrity number, n: number of samples, IQR: Inter-quartile range, SD: standard deviation. for reporting qPCR experiments (Table 6.2; Bustin et al., 2009). RIN scores had a satisfactory mean of 6.06, while the respective means for total RNA and cDNA were 786 and 151 ng/µl. 6.3.2 Primer design Unique primers to each of the nine Pst candidate effector genes were designed using PrimerQuest (Table 6.3). The NCBI databases were used in a BLAST (version 2.6.1) search to test uniqueness of primers (Altschul et al., 1997). In no case was a sequence similarity found that spanned 100 % of the primer length. Primer lengths ranged from 19 to 23 nucleotides, and GC content was between 41 and 58 %. Amplicon size affects the number of amplicon copies at the threshold fluorescence (Rutledge and Cote, 2003), so primers were designed to amplify amplicons of identical size to ensure equal specificities in the two treatments (SA1 and SA4; Karlen et al., 2007). Amplicons were between 84 and 129 bp in length (Table 6.3). Melting temperatures were optimised at 60 ◦C. Primers were tested and dissociation curves were evaluated for specificity in the positive control Pst cDNA. The negative control, wheat variety Avocet S gDNA and the NTCs did not show any amplification. Further details on primer design, the location of amplicons and the depth of coverage of the sequence data used to design primers can be found in Appendix C, Figures C.1 to C.9. 133 Table 6.3: Primer and amplicon specifications for Pst candidate effector gene identification Gene Primer Primer Amplicon Amplicon GC name sequence length sequence length content Efficiency % PST130_02001 GTGGCCCTAGTGTACCAATTAT 22 GTGGCCCTAGTGTACCAATTATCTGGCATCAATGCCAACTCGATCGTCTCGCCTAAGCCCAACCAAA 84 50 88 CTCTCCTGGATTGAGAGTTTGG 22 CTCTCAATCCAGGAGAG 50 PST130_02403 CGAGGAACCCAAATATGCTAGT 22 CGAGGAACCCAAATATGCTAGTCCAAAATATGATSCGCCCTACGAGAAGACCCCTGATGAAGAGCCA 45 GACGGTAGCCGTCTTTCTTT 20 122 107AAATACTCGGCCCCAAGCTACGATTACAATCCACCAAAGAAAGACGGCTACCGTC 50 PST130_05023 ACTTGGTACGGTGGACATTC 20 ACTTGGTACGGTGGACATTCGGCTGTGGCCAGGTTTTTGCGCCGCTTGGTTAATTACTTTCACCCAA 97 50 97 CCTTGGCTTCCTTGCTCTTA 20 GAAAGATGAGTAAGAGCAAGGAAGCCAAGG 50 PST130_06503 CAGCGGTGTCATTGCTTTAC 20 CAGCGGTGTCATTGCTTTACCTACTTCCAACCAAGCACAAATCGAAACTCGGGCCGAGAAGACCCGT 98 50 107 TGTATTCGGAAGAGGCGTATTT 22 TCCAGCGACAAATACGCCTCTTCCGAATACA 41 PST130_07513 GTACCGAGCAGGACGAATTATG 22 GTACCGAGCAGGACGAATTATGTGCCGAGCATTTACTTCCAAGTTACCCAACTCTCAAGGTGTTTT 5022 89 45 94GTATACGGCCATCCTTCCATTT CAAATGGAAGGATGGCCGTATAC PST130_09275 GAGCGAACTCAACCGCTAATA 21 GAGCGAACTCAACCGCTAATACCCCTGCTGCAAGTACTCCTGTCGCTAACACGACCTCCCCGACCCA22 92 48 45 101CAGCCGTACCCGAGTTATATTT ATCCACATCCTCCACTGGTGCACCA PST130_12487 CTACCATCATTAGACGGCACAT 22 CTACCATCATTAGACGGCACATTGTCGAATGCCCCATCACCTTCGTGGCAACTGACTATTGACAAT21 107 45 90 GCACTTGCTTCCACCATAAAC GGTCAAATCAGGAACCGTAGGTTTATGGTGGAAGCAAGTGC 48 PST130_12491 CAGAGCACTTCCGCCTTAC 19 CAATTTTCGAGAAGCGTGCCGAGACTGAAGGCACCGGAAAAGGTGAATCAAGCTCCCGCTCCTTAG 58 CGAGAGGGCAATGTTGAGAA 20 90 90GTGGCTGCAGCAACCAAGTTGGCC 50 PST130_12956 TGTTTGCCCTAGCTTCTTCTATC 23 TGTTTGCCCTAGCTTCTTCTATCCATGCCGACGCAGGACTCAACCCCAATGACGCTCCAGATGACGT 98 43 92 GGTGTCGAAGTTCTCTGATGTC 22 CATCGAATTGACATCAGAGAACTTCGACACC 50 CHAPTER 6: GENE EXPRESSION ANALYSIS 134 6.3.3 Efficiency determination of primers Primer efficiency was evaluated using the standard curve method. The CT values of a cDNA dilution series were plotted, with log10 dilution fold on the x-axis and CT on the y-axis. A linear regression was fitted to the data and the Pearson correlation coefficient (R2) calculated (Figure 6.3). This indicated how well the data fitted a linear model, with R2 = 1 being a 100 % fit. A high R2 is needed to accurately determine the efficiency of primers. It is recommended that the efficiency of primers should be within 10 % of each other when a relative gene amplification comparison is to be made. Less optimisation is required when the efficiencies are taken into account as in the efficiency correction methodology used in this work (Schmittgen and Livak, 2008). R2 values of greater than 0.95 were achieved for all Pst gene primers except for PST130_12491, which had a R2 value of 0.81. 6.3.4 Statistical analysis of the relative expression of nine Pst candidate effector genes Relative expression values were calculated using the method proposed in Pfaffl (2001, See Section 6.2.12). To determine the relationship between the time that has elapsed since Pst inoculation and the relative expression of the candidate genes in each South African isolate, a linear model with mixed effects was fitted to the data with “Gene” and “Time Point” and their interaction as fixed effects, and “Tray” as a blocking variable or random intercept. This approach was taken as sampling was not random as is expected in a simple linear model (Fitzmaurice et al., 2008). The model explains the relationship between the independent and dependent variables. An error term is used where the model does not fully represent the data. It is expected that the three plants that were inoculated together, placed in the same tray, will be more similar to each other. The mixed model therefore 135 TUBB PST130_02001 PST130_02403 PST130_05023 PST130_06503 36 32 28 y = 27 + -3.1 ⋅ x, R2 = 1, E = 111% y = 30 + -3.6 ⋅ x, R2 = 0.99, E = 88% y = 28 + -3.2 ⋅ x, R2 = 0.99, E = 107% y = 28 + -3.4 ⋅ x, R2 = 0.99, E = 97% y = 24 + -3.2 ⋅ x, R2 = 0.96, E = 107% 24 PST130_07513 PST130_09275 PST130_12487 PST130_12491 PST130_12956 36 32 28 y = 32 + -3.5 ⋅ x, R2 = 0.96, E = 94% y = 27 + -3.3 ⋅ x, R2 = 1, E = 101% y = 32 + -3.6 ⋅ x, R2 = 0.98, E = 90% y = 32 + -3.6 ⋅ x, R2 = 0.81, E = 90% y = 26 + -3.5 ⋅ x, R2 = 1, E = 92% 24 -1.5 -1.0 -0.5 0.0 -1.5 -1.0 -0.5 0.0 -1.5 -1.0 -0.5 0.0 -1.5 -1.0 -0.5 0.0 -1.5 -1.0 -0.5 0.0 x Figure 6.3: Linear regression figures indicate estimated efficiency of primers for nine Pst Candidate gene assays and the reference gene, β-tubulin (TUBB). The threshold cycle number is indicated on the y-axis and plotted against the log10 dilution fold (x-axis). The Pearson correlation coefficient, R2, indicate how well the data fitted a linear model. Values over 0.95 are desired. y CHAPTER 6: GENE EXPRESSION ANALYSIS 136 reduces the error term by introducing the variable “Tray”. At every time point, three samples (seedlings) were taken from each of the three trays. The same sampling procedure was applied for both SA1 and SA4. The mixed model shown in Equation 6.2 was at first fitted to the data. Evaluations of the assumptions of linear mixed models were performed for the relative expression dataset. The residuals of relative gene expression and the random intercept (the grouping variable “Tray”) did not fit a normal distribution. The residuals also did not scatter equally around the y = 0 horizontal line as expected when variances are equal and showed clear fan-like patterns in some cases. Appendix C, Figures C.10(i), (ii), and (iii) illustrate the graphical tests for the whole dataset, while Figures C.11 and C.12 show assessments for each isolate and gene. Due to the use of the grouping variable, the normal probability plot of the random intercepts was constructed from limited points as the intercepts per gene only consisted of six data points at each time point, three per isolate. This data was therefore only plotted for the whole dataset and not by gene. The relative expression data did not follow a normal distribution and a log10 transformation was applied. Graphical tests for normality and equal variances of the residuals were repeated. The log10 transformed data fitted the assumptions of a linear mixed model considerably better and it was concluded to proceed using the transformed data in the linear mixed model (Appendix C, Figures C.13, C.14, and C.15). As equal variances and normality are assumed for the residuals of the log10 transformed data, parametric tests can be applied. Variability of the data across different trays was assessed by using a one-way ANOVA with “Trays” as fixed effect on subsets of the data that included expression data of one isolate and one gene at a specific time point (nine data points for each of the time points). Time points 0 and 1 were excluded from this evaluation due to too many missing values. This resulted in analysing the effect “Trays” on nine genes at five time CHAPTER 6: GENE EXPRESSION ANALYSIS 137 points and was done for two isolates (90 ANOVAs). The effect of “Trays” over these 90 cases were quantified. The between group variance (between the three trays) was in only 15 % of the cases more than the within groups variance (plants per tray). This showed that there existed a high level of variability in the data. Such variation is often accumulated from the multiple steps in RT-qPCR, described by some as a “fragile assay” (Bustin and Nolan, 2004), due to its sensitivity to inevitably accumulate technical noise. This result should be considered in further interpretation of the data. To assess the significance of the fixed effects (“Time Point” and “Isolate”) in the model, likelihood ratio tests were performed on two linear mixed models, one including the effect in question (“Time Point” or “Isolate”) and one without. Because of the high variability in RT-qPCR data, a p-value was only considered significant if it was smaller than 0.001. A significant p-value obtained indicated that the fixed effect term was significant to include in the model. The factor “Time Point” was significant for seven Pst genes (Table 6.4). For PST130_12956 and PST130_02403 the term “Time Point” was not significant. Figure 6.4 further revealed relatively stable expression for PST130_12956, while PST130_02403 showed large error bars, especially at early time points. Variability in the data makes it difficult to conclude a change in expression for PST130_02403. The fixed effect “Isolate” was not statistically significant with any of the nine Pst genes, both isolates displaying a similar expression profile across all time points (Figure 6.4). Multiple comparisons were done using the Tukey test to determine between which time points significant differences in gene transcription occurred (Table 6.5). As the term “Isolate” and the interaction term “Isolate × Time Point” were not significant for any of the nine Pst genes, this showed that SA1 and SA4 have a similar expression profile across all time points, for all genes (Figure 6.4). CHAPTER 6: GENE EXPRESSION ANALYSIS 138 PST130_02001 PST130_02403 PST130_05023 2 1 0 -1 -2 -3 PST130_06503 PST130_07513 PST130_09275 2 1 Isolate 0 SA1 -1 SA4 -2 -3 PST130_12487 PST130_12491 PST130_12956 2 1 0 -1 -2 -3 0 1 2 3 5 9 12 0 1 2 3 5 9 12 0 1 2 3 5 9 12 Days Post Inoculation Figure 6.4: Relative gene expression (log10 transformed) of nine candidate effector genes expressed in the Pst isolates SA1 and SA4 measured at different time points after inoculation. Significant changes in expression across the time series were seen in all genes, except PST130_02403 and PST130_12956. PST130_06503 and PST130_09275 showed the most dynamic expression patterns, while other genes showed smaller differences in gene expression across time points. The gene, β-tubulin, was used as reference gene. Relative Expression of Target Gene to Reference Gene CHAPTER 6: GENE EXPRESSION ANALYSIS 139 Table 6.4: Significance of the factor “Time Point” in the linear mixed model for those genes where it was significant Gene Chi-squared Df p-value PST130_02001 22.542 6 0.000 965 4 PST130_05023 22.919 5 0.000 349 8 PST130_06503 113.71 6 < 2.2× 10−16 PST130_07513 31.358 5 7.96× 10−6 PST130_09275 173.93 6 < 2.2× 10−16 PST130_12487 23.837 5 0.000 233 4 PST130_12491 27.644 5 4.27× 10−5 6.3.5 Expression profiles of candidate genes Significant changes in expression across the time series were seen in all genes, ex- cept PST130_02403 and PST130_12956. PST130_06503 and PST130_09275 showed similar and the most dynamic expression patterns. The remaining five genes showed smaller differences in gene expression across time points. Expression pro- files of PST130_02001 and PST130_05023 were comparable, while PST130_07513, PST130_12491 and PST130_12487 followed a similar trend. (Compare Figure 2.6 that broadly illustrates the infection process and describes the physical processes during the time course of infection in Pst.). 6.3.6 Gene validation using revised gene models and transcript data The nine genes were assessed for alternative splicing using transcript data. The quality of the PST130 gene models, specifically for the nine genes evaluated were also assessed using improved PST130 gene models (Dobon et al., 2016). PST130_07513 and PST130_12491 lacked high sequence similarity with predicted genes in the revised gene models. The remaining seven gene sequences had high (roughly 95 %) similarity and reasonable coverage with the revised predicted genes. In four of the seven genes, PST130_02001, PST130_05023, PST130_06503 and PST130_09275, no evidence for alternative splicing was found. PST130_- 02001, PST130_05023, PST130_06503 and PST130_09275 are therefore most likely CHAPTER 6: GENE EXPRESSION ANALYSIS 140 Table 6.5: Multiple comparisons between time points for each gene that showed signifi- cant difference in expression over the time series. Differences with a p-value of <0.001 were considered significant. From this data and Figure 6.4 it was clear that PST130_06503 and PST130_09275 displayed a much more dynamic expression pattern across time points compared to the other genes tested Gene Time Point comparison z value Pr(> |z|) PST130_02001 3 - 1 3.002 0.03673 12 - 1 3.77 0.00266 12 - 2 3.343 0.01265 12 - 9 3.076 0.02929 PST130_05023 5 - 1 3.613 0.00396 12 - 1 3.933 0.00113 12 - 2 3.242 0.01432 PST130_06503 3 - 0 6.876 < 0.001 5 - 0 9.337 < 0.001 9 - 0 9.008 < 0.001 12 - 0 3.532 0.0074 3 - 1 7.08 < 0.001 5 - 1 9.671 < 0.001 9 - 1 9.293 < 0.001 12 - 1 3.578 0.00616 3 - 2 5.257 < 0.001 5 - 2 8.409 < 0.001 9 - 2 7.963 < 0.001 5 - 3 3.153 0.02647 12 - 3 -4.403 < 0.001 12 - 5 -7.603 < 0.001 12 - 9 -7.137 < 0.001 PST130_09275 3 - 0 4.295 <0.001 5 - 0 6.541 <0.001 9 - 0 8.595 <0.001 12 - 0 3.305 0.0157 3 - 1 5.607 <0.001 5 - 1 8.297 <0.001 9 - 1 10.763 <0.001 12 - 1 4.466 <0.001 3 - 2 4.685 <0.001 5 - 2 8.064 <0.001 9 - 2 11.296 <0.001 12 - 2 3.335 0.0142 9 - 3 5.498 <0.001 12 - 5 -5.077 <0.001 12 - 9 -8.217 <0.001 PST130_12487 12 - 1 2.99 0.03029 12 - 2 4.44 < 0.001 12 - 3 3.45 0.00676 12 - 5 3.38 0.00852 12 - 9 3.53 0.00495 PST130_12491 12 - 1 4.158 < 0.001 12 - 2 3.864 0.00147 12 - 3 4.486 < 0.001 CHAPTER 6: GENE EXPRESSION ANALYSIS 141 correctly annotated and low risk sequences for alternative splicing. Significant alternative splicing was revealed for PST130_02403. PST130_12487 displayed two retained introns, while two overlapping genes in the new gene models mapped to PST130_12956. 6.4 Discussion Early time points yielded little fungal RNA due to the low Pst biomass in infected wheat tissues. This was also the case in the RNA-Seq study of Dobon et al. (2016). This is unfortunate as multiple effector proteins are known to be deployed during the first 24 hours after inoculation. Consequently, amplification failed in samples that were collected early after inoculation, mostly at 0 and 1 dpi, and occasionally at 2 dpi, as the copy number of target sequences was not sufficiently high. Statistical evaluation using a linear mixed model revealed that expression patterns between the two isolates did not vary significantly. Differences in gene expression across different time points were significant for most genes, with some genes showing a dynamic expression pattern over the course of the time series. However, considerable inter-plate variation was detected, and the relative gene expression determination with efficiency correction did not correct for inter-plate variability. One option of standardisation is to include a calibration sample in multiple wells across all plates to correct for plate technical variation. Such a sample can be prepared for each gene in the experiment to allow sufficient quantities for all inter-plate comparisons. The possibility of high biological variance in expression patterns of effectors cannot be excluded. In the rice blast fungus Magnaporthe oryzae, clonal variation in effector gene expression (CVEGE) has been suggested as a mechanism to escape host recognition, a different suite of effector genes being expressed in individual blast lesions (Mark Farman, University of Kentucky, personal commu- CHAPTER 6: GENE EXPRESSION ANALYSIS 142 nication). If this was the case in Pst, different seedlings, or even infection sites on a single seedling, inoculated with the same isolate might exhibit differences in effector gene expression profiles. The discovery in M. oryzae establishes a new paradigm for plant-microbe recognition wherein resistance involves detection of deterministic Avr effectors which are layered over suites of effectors that are variably expressed among individuals. Consequently, tracing the expression of such effector genes in host-microbe interaction studies becomes a more difficult proposition and would require a different approach to RT-qPCR analysis in whole seedling leaves. Pst gene expression early in the infection process, between 0 dpi and 1 to 2 dpi, needs further investigation to draw sound conclusions. For later time points, PST130_05023 and PST130_02001 displayed a similar expression pattern, showing an increase in expression early in the infection process that differed between SA1 and SA4, although it was not statistically different, but had nearly identical expression patterns at the later time points. This could indicate that both these genes are functional in the same or co-occurring infection processes. PST130_05023 was the only gene that was assessed in the current study as well as in the RT-qPCR evaluation of Avocet S inoculated with PST-08/21 (Cantu et al., 2013). In Cantu et al. (2013) it was found that PST130_05023 expression peaked at sporulation (14 dpi), similar to the result in the current study, where the expression peak was observed at sporulation (12 dpi). The main differences in the evaluated gene expression profiles were between 5 and 9 dpi, and 9 and 12 dpi. Genes can be placed in three groups according to their expression profiles. Group 1 PST130_02001 and PST130_05023 shared an increase in expression up to 3 dpi to 5 dpi, followed by a decrease in expression from 5 dpi to 9 dpi and another increase from 9 dpi to 12 dpi. This could indicate that the gene CHAPTER 6: GENE EXPRESSION ANALYSIS 143 is involved in the early establishment of the Pst colony, and then functional again during the sporulation processes, such as the formation of vertical hyphae and spores. These genes all contained a PFAM domain (PFAM, Protein family database), and were expressed in both infected material and haustoria (Cantu et al., 2013). Group 2 PST130_07513, PST130_12491 and PST130_12487 exhibited an expres- sion pattern of initial increase up to 3 dpi, followed by a relatively stable expression, showing a slight increase all the way up to 12 dpi. This could indicate some functionality during the early stages of colony establishment, plus a constant requirement for the protein throughout the asexual lifecycle. Group 3 PST130_06503 and PST130_09275 showed a similar expression profile. A steep increase in gene expression was observed from 2 dpi to 5 dpi, with maximum expression at 9 dpi falling off at 12 dpi. From the expression profile, one can speculate that these genes have their main function in establishment and maintenance of the Pst colony, and do not have a role in sporulation. In Cantu et al. (2013) PST130_06503 was expressed in the haustoria, while PST130_09275 was expressed in the infected material, but not in the haustoria. No statistically significant change was identified in PST130_02403 and PST130_- 12956 expression over the time course of this study. For PST130_02403 this could be due to high variability in the data, as illustrated by the error bars at early time points. Further investigation of the expression profile of this gene is needed to draw conclusions. Variation in the data for PST130_12956 is smaller, and a fairly stable expression across the infection process for this gene is concluded. Some similarities can be drawn between the expression profiles of PST130_12956 and genes in Group 1 in the previous paragraph. The nine candidate effector genes were assessed for alternative splicing using Pst transcript data. The genes were further verified by evaluating whether the CHAPTER 6: GENE EXPRESSION ANALYSIS 144 candidate effector genes were included in both PST130 gene annotations. This analysis revealed no evidence of alternative splicing for PST130_02001, PST130_- 05023, PST130_06503 and PST130_09275. High sequence similarity was also found in the new gene models for these four genes. PST130_07513 and PST130_12491 did not have good hits in the new gene models and could have been misidentified, in either attempt to predict genes. Although primers were not designed to amplify fragments across splice sites, underestimation of gene expression could have resulted in alternatively spliced genes if the exon containing the amplicon was excluded during splicing. In a functional study using heterologous expression screens in Nicotiana benthamiana, accumulation patterns of PST130_05023 were observed in endomem- branes that are suspended in the cytoplasm of leaf cells (Petre et al., 2016b). 6.5 Conclusion Clear conclusions regarding gene expression could not be drawn from the RT- qPCR experimental procedure applied in this chapter. Interesting questions arise from the variability in the relative expression data. Future work addressing these questions should involve the inclusion of different biological replicates in one PCR run to investigate reproducibility. Other methods, such as RNA-Seq could be explored, but as shown, does not address the problem of low fungal transcripts at early time points. In retrospect, it could be argued that the method would only work if genes had no homologs and if they were absent from one of the isolates. If primers were designed across SNP sites, they could have been more successful in displaying the differences between the isolates for the nine candidate genes. Further discussion on the qPCR experimental procedure outlining pitfalls and precautions taken is included in Appendix C.3. Chapter 7 Analysis of the Current Stripe Rust Threat in South Africa 7.1 Introduction 7.1.1 Pst virulence since 2005 THE FIRST DISCOVERY of Puccinia striiformis f. sp. tritici in South Africa was in 1996 (Pretorius et al., 1997), with three subsequent pathotypes that appeared to have evolved in a clonal, stepwise manner (Visser et al., 2016). Previous analysis that compared the virulence profiles of the historical and current Pst popula- tions suggested that the population has stayed fairly consistent, with routine, traditional pathology testing on wheat differential sets (Table 7.1) reporting no additional virulences since 2005 (Agricultural Research Council, Small Grain (ARC-SG), personal communication). The prevalence of Pst pathotypes in South Africa during the growing seasons of 2008 to 2016 is shown in Figure 7.1(i). Data was obtained from the South African Pst virulence survey undertaken by ARC-SG, South Africa. The SA2 pathotype, 6E22A- (detected in 1998), and the SA4 pathotype, 6E22A+ (detected in 2005), were present in all eight seasons. Pathotype 6E16A- (SA1), which 145 CHAPTER 7: CURRENT PST THREAT IN SOUTH AFRICA 146 Table 7.1: Wheat differential lines used at Agricultural Research Council, Small Grain, Bethlehem, South Africa to identify Pst pathotypes. Standard world (1 to 7) and European (10 to 17) differential sets are listed. Lines 9, 8 and 18, containing resistance genes Yr5, Yr9 and YrA respectively, are used as supplemental lines No Line/variety Yr gene 1 Chinese 166 1 2 Lee 7,22,23 3 Heines Kolben 2,6 4 Vilmorin 23 3a,4a 5 Moro 10,Mor 6 Strubes Dickkopf 25,Sd 7 Suwon 92/Omar Su,4 8 Clement 2,9,25,Cle 9 Triticum spelta 5 10 Hybrid 46 4b 11 Reichersberg 42 7,25 12 Heines Peko 2,6,25 13 Nord Desprez 3a,4a 14 Compair 8,19 15 Carstens V 25,32,Cv 16 Spaldings Prolific Sp,25 17 Heines VII 2,25,HVII 18 Avocet R A was first detected in 1996, only occurred in samples collected in 2009 and 2011. Figure 7.1(ii) displays the percentage of Pst samples, classified by pathotype, collected between 2008 and 2012, and in 2016, and Figure 7.1(iii) shows the corresponding sampling sites of each isolate by pathotype from 2008 to 2012. Information about the number of samples collected per year per location could not be obtained. The available survey data indicate that pathotype 6E22A+ was the most prevalent, followed by regular occurrence of 6E22A-, at a lower frequency. It seems that 6E16A- has mostly been replaced by the 6E22 pathotypes, with the pathotype 6E22A+, virulent to YrA, predominating. 7.1.2 Global reports on Pst population shifts The dynamics and demographics of several Pst populations have been described. Wellings (2007) described three reasons for a change in population demography in clonal populations as seen in Australia. Firstly, increased pathogen virulence CHAPTER 7: CURRENT PST THREAT IN SOUTH AFRICA 147 2016 100 2015 2014 75 2013 Pathotype Absent 6E16A−2012 50 Present 6E22A− 6E22A+ 2011 2010 25 2009 2008 0 6E16A− 6E22A− 6E22A+ 2008 2009 2010 2011 2012 2016 Race Year (i) South African Pst pathotypes ob- (ii) Percentage of Pst isolates, by specific served between 2008 and 2016. pathotypes, found between 2008 and 2012, and 2016. 6E22A+ 6E22A- Limpopo 6E16A- Mpumalanga North West Gauteng KwaZulu-Natal Northern Cape Lesotho Free State Eastern Cape Western Cape (iii) Collection sites and pathotypes of Pst isolates between 2008 to 2012 in South Africa. Figure 7.1: Prevalence of Pst pathotypes in South Africa between 2008 and 2016. Data was made available by the Agricultural Research Council, Small Grain (ARC- SG) of South Africa (map adapted from SENSAKO’s oral presentation during the Borlaug Global Rust Initiative (BGRI), New Delhi, 2013). Year Samples collected in year (%) CHAPTER 7: CURRENT PST THREAT IN SOUTH AFRICA 148 through mutation and selection following resistance gene deployment are com- mon mechanisms (Brown and Hovmøller, 2002; McDonald and Linde, 2002; Milus et al., 2009; de Vallavieille-Pope et al., 2012). Secondly, exotic incursions have been shown to occur over long distances, causing sudden unsuspected epidemics and shifts in the pathogen population dynamics. This has included Pst pathotypes with increased aggressiveness (Milus et al., 2009). The establish- ment of such incursions seems to depend on the host population and possible abiotic stressors (Wellings, 2007). Lastly, the survival of Pst mutations by genetic drift, during unfavourable conditions, can totally change the following season’s re-emerging population. Such population bottlenecks can lead to a severe shift in allele frequencies. Exotic incursions in the USA in 2000 and Australia in 2002 have shown relatively homogeneous incursions suggesting that a single genotype of Pst was introduced (Wellings, 2007; Milus et al., 2009; Hovmøller et al., 2016). In Europe, a major population shift was seen in 2011 that included several Pst pathotypes, some of which could infect the wheat variety, Warrior. Through pathotyping in subsequent years, these newly introduced Pst pathotypes were shown to be diverse. A method to rapidly genotype and compare field samples was developed by Hubbard et al. (2015). Using next-generation sequencing data, it was confirmed that the older UK Pst population was replaced by a new, much more diverse population. UK and French Pst isolates pre-2011 were closely related, with low genetic diversity, while isolates from 2011 and 2013 formed a distinct, more diverse population. Isolates collected post-2011, included the Pst pathotype virulent on the wheat variety Warrior and three more genetic groups. Hubbard et al. (2015) also found historical and new Pst isolates with different genetic profiles, but the same virulence profile. This radical population shift in 2011 was also confirmed by Hovmøller et al. (2016), and the authors suggested that the two new pathotypes, “Warrior” and “Kranich” carried characteristics that suggested that they might have originated from a sexual population possibly from the near-Himalayan region in Asia. CHAPTER 7: CURRENT PST THREAT IN SOUTH AFRICA 149 7.1.3 Objectives Stripe rust is a global problem of increasing proportion (Hovmøller et al., 2010) and migration of spores over long distances was repeatedly reported (Ali et al., 2017). Furthermore, the existence of recombinant Pst populations increases the risk for new variants appearing in each new season (Rodriguez-Algaba et al., 2014). In Chapter 4, four historical South African Pst isolates were analysed in context with other global isolates. In this chapter, changes seen in the current field population of Pst in South Africa were characterised in context with the global isolates examined in Chapter 4. 7.2 Materials and methods 7.2.1 Stripe rust samples used in RNA sequencing analyses Field samples of stripe rust were collected in South Africa during the 2014 and 2015 wheat growing seasons. Twenty-five single lesion leaf samples of Pst- infected wheat leaves were collected from various locations (Figure 7.2; Table 7.2). In 2013, a Puccinia sample was collected on wild rye and found to be virulent on wheat with the pathotype classification 6E16A- (Pretorius et al., 2015), similar to SA1. This isolate was included in this analysis and named 13/SAZP1. Mi- crosatellite markers have also been used to describe this isolate, also known as Sutherland (Visser et al., 2016). In addition, four Pst isolates were collected from Ethiopia and 14 isolates from Kenya during the 2014 growing season. All stripe rust infected leaf samples were stored in RNA stabilising solution (RNAlater, Life Technologies, UK). Selected samples (44) that passed quality assessments as explained in Chapter 3 were included in the analysis. CHAPTER 7: CURRENT PST THREAT IN SOUTH AFRICA 150 Table 7.2: African isolates collected between 2013 and 2015. Read frequency graphs of these isolates are displayed in Appendix D, Figures D.1 and D.2 Isolate Isolates Country of Year Type number (year/code) isolation collected of data 49 14/SADL1 South Africa 2014 RNA-Seq 50 14/SADL2 South Africa 2014 RNA-Seq 51 14/SADL3 South Africa 2014 RNA-Seq 52 14/SADL4 South Africa 2014 RNA-Seq 53 14/SADL5 South Africa 2014 RNA-Seq 54 14/SADL6 South Africa 2014 RNA-Seq 55 14/SATT1 South Africa 2014 RNA-Seq 56 14/SATT2 South Africa 2014 RNA-Seq 57 14/SATT3 South Africa 2014 RNA-Seq 58 14/SATT4 South Africa 2014 RNA-Seq 59 14/SATT5 South Africa 2014 RNA-Seq 60 13/SAZP1 South Africa 2013 RNA-Seq 61 14/SAZP2 South Africa 2014 RNA-Seq 62 14/SAZP3 South Africa 2014 RNA-Seq 63 15/SAZP1* South Africa 2015 RNA-Seq 64 15/SAZP2 South Africa 2015 RNA-Seq 65 15/SAZP3 South Africa 2015 RNA-Seq 66 15/SAZP4 South Africa 2015 RNA-Seq 67 15/SAZP5 South Africa 2015 RNA-Seq 68 15/SAZP6 South Africa 2015 RNA-Seq . 69 15/SAZP7 South Africa 2015 RNA-Seq 70 15/SAZP8 South Africa 2015 RNA-Seq 71 15/SAZP9 South Africa 2015 RNA-Seq 72 15/SAZP10 South Africa 2015 RNA-Seq 73 15/SAZP11 South Africa 2015 RNA-Seq 74 15/SAZP12 South Africa 2015 RNA-Seq 75 14/ET2 Ethiopia 2014 RNA-Seq 76 14/ET3 Ethiopia 2014 RNA-Seq 77 14/ET4 Ethiopia 2014 RNA-Seq 78 14/ET5 Ethiopia 2014 RNA-Seq 79 14/K2 Kenya 2014 RNA-Seq 80 14/K4 Kenya 2014 RNA-Seq 81 14/K5 Kenya 2014 RNA-Seq 82 14/K6 Kenya 2014 RNA-Seq 83 14/K7 Kenya 2014 RNA-Seq 84 14/K8 Kenya 2014 RNA-Seq 85 14/K9 Kenya 2014 RNA-Seq 86 14/K10 Kenya 2014 RNA-Seq 87 14/K11 Kenya 2014 RNA-Seq 88 14/K12 Kenya 2014 RNA-Seq 89 14/K13 Kenya 2014 RNA-Seq 90 14/K14 Kenya 2014 RNA-Seq 91 14/K15 Kenya 2014 RNA-Seq 92 14/K16 Kenya 2014 RNA-Seq *also known as Sutherland (Visser et al., 2016); 14/ET2-5 (Bueno-Sancho et al., 2017) obtained from D Hodson; Kenyan field samples provided by DGO Saunders (14/K2-16) obtained from R Wanyera; South African field samples collected by D Lesch (SADL), T Terefe (SATT), and ZA Pretorius (SAZP).(15/SAZP2 was not used in the analyses due to poor read frequency graph.) CHAPTER 7: CURRENT PST THREAT IN SOUTH AFRICA 151 6E22A- 7E22A- 6E22A+ 2014 2014 2014 Free State 2013 6E16A- 2014 Western Cape Figure 7.2: Locations of Pst collections between 2013 and 2015 for RNA sequencing and historical isolate collection sites. 7.2.2 Transcriptome sequencing of stripe rust infected wheat leaves Total RNA was extracted using the Qiagen RNeasy Mini kit (Qiagen, Germany). RNA integrity and quantity were assessed using the Agilent 2100 Bioanalyzer (Agilent Technologies, USA) as explained in Chapter 3. RNA was reverse tran- scribed to cDNA using the Illumina TruSeq RNA sample preparation kit (Illumina, UK). Transcriptome sequencing was perfomed on the Illumina HiSeq instrument at the Earlham Institute, UK. Bowtie software (version 0.12.7; Langmead et al., 2009) from the TopHat package (version 1.3.2; Trapnell et al., 2012) was used to align the pair-end reads of each transcriptome independently to the PST130 reference genome (Cantu et al., 2011). Purity of isolates was confirmed using the method described in Chapter 3. Phylogenetic and population structure analyses, followed by FST calculations and the Watterson estimator of population diversity (θ̂W), were used to describe genetic variation in population clusters in a similar manner to the methodology followed in Chapter 4 and described in Chapter 3. These analyses were performed on the field isolates listed in Table 7.2 and the CHAPTER 7: CURRENT PST THREAT IN SOUTH AFRICA 152 48 isolates in Chapter 4 in Table 4.1, resulting in the assessment of 92 isolates in total. 7.2.3 Pst pathotype determination Roelfs et al. (1992) explained that the infection types given in Table 7.3 “are often refined by modifying characters as follows: = means uredinia at lower size limit for the infection type; " means uredinia somewhat smaller than normal for the infection type; + means uredinia somewhat larger than normal for the infection type; ++ means uredinia at the upper size limit for the infection type; C means more chlorosis than normal for the infection type; and N means more necrosis than normal for the infection type. Discrete infection types on a single leaf when infected with a single biotype are separated by a comma (e.g., 4, ; or 2=, 2+ or 1,3C). A range of variation between infection types is recorded by indicating the range, with the most prevalent infection type listed first (e.g., 23 or ;1C or 31N) (Roelfs and Hettel, 1992).” Fresh inoculum was prepared by inoculating seedlings of the susceptible wheat variety Morocco. Four cultures were prepared: two cultures of the histori- cal South African isolates, SA1 and SA4 and two more recently collected isolates, 13/SAZP1 and 15/SAZP4. The isolate 13/SAZP1 was previously tested and identified to be pathotype 6A16A- (Pretorius et al., 2015), while 15/SAZP4 was identified as 6E22A+ on the standard differential sets, using the scoring system in Table 7.3 (ZA Pretorius, unpublished data). An extended set of wheat differential lines were inoculated with each Pst isolate after growing seedlings for 7 to 8 days as explained in Chapter 3. Infection types were evaluated 21 days after inoculation and reported in Appendix D, Tables D.1 and D.2 (UK differential lines were obtained from S Holdgate, National Institute of Agricultural Botany (NIAB), UK and DGO Saunders, John Innes Centre (JIC), UK). CHAPTER 7: CURRENT PST THREAT IN SOUTH AFRICA 153 Table 7.3: Infection type scores used to assess Pst infection on wheat seedlings (adapted from Roelfs et al., 1992 and McIntosh et al., 1995) Host response (class) Infection typea Disease symptoms Immune 0 No visible uredia Very resistant ; Necrotic flecks Resistant ;N Necrotic areas without sporulation Resistant 1 Necrotic and chlorotic areas with re- stricted sporulation Moderately resistant 2 Moderate sporulation with necrosis and chlorosis Moderately susceptible 3 Sporulation with chlorosis Susceptible 4 Abundant sporulation without chloro- sis 7.3 Results 7.3.1 Clustering analysis using RNA-Seq and whole genome sequencing data To investigate the pathotype and genetic profile of the current Pst population in South Africa, stripe rust infected wheat samples were collected from wheat fields between 2013 and 2015 (Figure 7.2). The interaction transcriptomes of these Pst infected wheat samples were sequenced along with similar field isolates from Kenya and Ethiopia. Cluster analysis was carried out using SNP datasets to assess the existence of population structure in the Pst population. Phylogeny A phylogenetic tree (Figure 7.3) was constructed using the randomized axelerated maximum likelihood (RAxML) method as described in Section 3.3.5, to deter- mine the genetic relationship among samples (Table 7.2). Isolates examined in Chapter 4 (Table 4.1) were included in the analysis of the field samples. The tree CHAPTER 7: CURRENT PST THREAT IN SOUTH AFRICA 154 illustrates a well-defined shift in the genetic structure of the South African Pst population, with the recent samples collected between 2013 and 2015 clustering distantly from earlier collected isolates. Field isolates from Ethiopia and Kenya were more closely related to the historical East African and South African popu- lations, while the South African field isolates clustered together with a group of isolates found in the UK in 2013 on triticale, called UK Group II (Hubbard et al., 2015). The relative distances tree was also constructed, (Figure 7.4) excluding isolates from the East Africa (B) group in the interest of legibility of the figure. The UK 2013 Group II isolates clusters distantly from the other 2013 UK isolates. The 2013 - 2015 South African isolates cluster with these UK isolates, away from the historical South African isolates. Population structure analysis To assess population structure, STRUCTURE software (version 2.3.4; Pritchard et al., 2000) was applied to analyse a dataset of 112 180 synonymous biallelic SNPs. Both the log probability plot (Figure 7.5(i)) from Pritchard et al. (2000) and the plot of ∆ K (Figure 7.5(ii)), based on the method described by Evanno et al. (2005), suggested that the population could be grouped into five subclusters. The histogram plots of the data, with K estimated between 2 and 15 (Fig- ure 7.6), describe each isolate’s cluster allocation given a certain number of clusters (K). No additional information regarding population differentiation was gained when K was increased above five. STRUCTURE assumes that the population is under Hardy-Weinberg equilib- rium: Equilibrium of allelic and genotypic frequency with infinite size population, diploid and sexual reproducing species, no migration and panmixia (random crossing among isolates). As some of these citeria are violated by our data (asex- ual reproduction, small populations and no panmixia) the STRUCTURE result CHAPTER 7: CURRENT PST THREAT IN SOUTH AFRICA 155 UK & France (Pre-2011) UK (2 14 /18 2 /K 1 12 1 3 4/K 3/3 3 11 114/K 3/2 1 5 1 14 2/ -K ld7 Q 14 d-1/K9 Ql 14 TR- 1 /K12 A 14 -3/K1 T R 3 A 14/K4 14 /ET2 0 1 14/ ET08 / K14 R181a/1114/K8 E ER179b/11KE74217 KE89069 13/38 ET87094 13/40 13 ET03b/10 /25 1 SA1 3/29 3 13A /7S 1 SA2 11/13 A4 13/ S 27 ZP2 13A /123 14/S TT3 13/ /SA 19 N 14 T2 11 ) - K Z AT /08 14 14 /S 5 TT 13/ (20 SA 1 1A / 5 S 14 T 1AT 1/0 4/S 1 8 1 4 * /SATT4 SA ( SA C) 201 ( 42 )0 e(W -1 p E5) F - E C a S as r n te ern e st Fre 4) - W e State 2 01 ( SA (EFS) UK (2013) Key SA - Eastern Free State (2014) Kenya (Pre-1978) UK (Pre-2011) UK (2013) - Cluster I SA - Eastern Free State (2015) Kenya (2014) France (Pre-2011) UK (2013) - Cluster II SA - Western Cape (2013) Ethiopia (Pre-2011) UK (2011) UK (2013) - Cluster III SA - Western Cape (2014) Ethiopia (2014) Pakistan (2014) UK (2013) - Cluster IV SA - KwaZulu-Natal (2014) Eritrea (2011) Pathotypes Bootstrap value > 80 SA - KwaZulu-Natal (2015) Ethiopia (Pre-2011) Pathotyped in the 6E16A- 6E22A- 6E22A+ present study SA (Pre-2012) Ethiopia (2014) Figure 7.3: Phylogenetic tree displaying the relationship between Pst isolates. Samples representative of older Pst populations and more recent populations were compared. The maximum likelihood phylogenetic tree was obtained using the RAxML method. The relationship between samples was determined using those Pst genes that had 80 % breadth of coverage in 80 % of the samples. This included 2597 genes and a total of 792 535 third codon sites. Only the topology is indicated here, while Figure 7.4 displays relative distances. Both dendrograms were visualised using MEGA software (version 6.06). Asterisk (*) indicates genomic data of isolate 11/08, while 11/08 without an asterisk indicates RNA-Seq data. RAxML, Randomized Axelerated Maximum Likelihood; EFS, Eastern Free State; KZN, KwaZulu-Natal; WC, Western Cape East Africa (A) 15/S 15 A/ ZS PA 7 1 Z5 P/S 1A 2 1 Z5 P/S 9AZ 15 P/ 5SAZP 15 8/SAZ 1 P5 6/SAZP10 15/ S SA A (2 Z01 P5 1) - 1 EFS 15/SAZP4 SA (201 15) 5- K /Z SN AZP3 15/SAZP1 T13/3 78.6SS1 T13/2 88.45SS T13/1 88.5SS1 CL1 08/21 11/140 88.44SS3 03/7 J0085F J0205 J 50 C2-0 J 20 211 1 41 4/1 B2 m8 1 UK (2011 & 2013) t Africa (B) Ea s (201 4) an Pak ist ) 01 3 2 1 & 01 L5D 4/S A 1 L6 /SA D 14 L1D 14/ SA DL4 14/ SA DL2 14/S A C DL3 13) - W 14/S A SA (2 0 N ZP1 (2014 ) - KZ 13/SA SA SAZP314/ 14/K 6 4/K21 K1014/ T5 14/ E 4 Typical relative rainfallT 14/ E 3 4/E T 1 /K1 5 14 S ) - EF 4 (20 1 S A re-20 12) SA ( P UK (pre-2011—WGS) Pakistan (2010—WGS) UK (2013—RNA-Seq) UK (2013) (Group I) France (pre-2011—WGS) Ethiopia (Old—WGS) South Africa (WGS) UK (2011—WGS) Ethiopia (2014—RNA-Seq) South Africa (RNA-Seq) UK (2013) UK (2013—RNA-Seq) Kenya (Old—WGS) Kenya (2014—RNA-Seq) Pre-2011 UK & (Group IV) French UK (2013) Pakistan (Group III) UK (2013) South Africa 2013-2015 Etiopia (2014) UK (2013) UK 2013 (Group II) Kenya (2014) Ethiopia (2014) Kenya (2014) Pre-2011 South Africa Pre-2010 East Africa (A) 0.0001 Figure 7.4: Relative distance maximum likelihood phylogenetic tree describes the relative relationship between isolates described in Figure 7.3 where branch lengths were ignored and only topology was considered. In this dendrogram, East Africa (B) was not shown. Compare Appendix D, Figure D.3, that includes the East Africa (B) group. CHAPTER 7: CURRENT PST THREAT IN SOUTH AFRICA 157 was compared with the non-parametric method DAPC (Jombart et al., 2010). The biallelic synonymous SNP dataset used in the STRUCTURE analysis was used to summarise genetic variance within and between populations by PCA. The BIC graph (Figure 7.7(i)) illustrates an elbow at K = 7 to K = 8, while an absolute minimum was observed at K = 11. This indicated that the optimum number of population clusters falls between 7 and 11. Individual isolates were assigned to population clusters by DA of eigenvalues (Figure 7.7(ii)). According to the DA, the first two PCAs explained most of the genetic variability seen in the data. The histogram plots (Figure 7.8) at different values of K showed an increase in differentiation from K = 7 to K = 11. The gain of differentiation from 10 to 11, shown in the South African isolates form 2014, is lost at K = 12. Taking this and the BIC graph into account, K = 10 was concluded to be the optimal estimate of population clusters. The first two principal components of the DAPC analysis of the synonymous SNP sites are shown in the scatter plot (Figure 7.7(iii)). The distances between groups are representative of the relative differentiation between population groups, taking the first two principle components into account. CHAPTER 7: CURRENT PST THREAT IN SOUTH AFRICA 158 −2400000 ● ● ● ● ● ● ● ● ● ● ● ● ● −2800000 ● −3200000 −3600000 ● −4000000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 K (i) Log probability of the data L(K) as a function of K to estimate the optimal number of population clusters as identified by STRUCTURE. The optimum number of clusters (K) inferred by the model-based Bayesian cluster analysis of genome-wide SNP data is 5. ● 10000 7500 5000 2500 ● 0 ● ● ● ● ● ● ● ● ● ● ● 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 K (ii) The Evanno method of inferring the number of STRUCTURE populations (K) from the modal value of ∆K. A strong signal was detected for K = 5 where ∆K was at a maximum. ∆, Delta Figure 7.5: Evaluation of number of population clusters following STRUCTURE analy- ses. Delta K LnP(D) 159 K2! K3! K4! K5! K6! K7! K8! K9! K10! K11! K12! K13! K14! K15! !"#$%&'(")*++,& -./01&23(45%&'(")6,*6& 7!&6,**86,*9&)&:;/<0"(&=& 7!&6,**& !"#$%&6,*>& -./01&23(45%&6,*9& 7!&6,*9&)&:;/<0"(&==& 7!&'(")6,**& ?014.@4%&'(")6,**& -./01&23(45%&6,*>& 7!&6,*9&)&:;/<0"(&===& A(%#5"&'(")6,**& ?014.@4%&6,*>& -./01&23(45%&6,*B& 7!&6,*9&)&:;/<0"(&=C& '%D4<0%#&6,*>& ?(40("%&6,**& && Figure 7.6: Histogram plots of population clustering with K between 2 and 15 as obtained from STRUCTURE analyses. Each bar represents estimated membership fractions for each Pst isolate. No further differentiation was observed after K = 5. Asterisk (*) indicates genomic data of isolate 11/08, while no asterisk indicates RNA-Seq data for 11/08. ATR-2 and 11/75 (Table 4.2) were not used in this analysis. 1 4 / K 4 1 4 / K 5 1 4 / K 6 1 4 / K 7 1 4 / K 8 1 4 / K 9 1 4 / K 1 0 1 4 / K 1 1 1 4 / K 1 2 1 4 / K 1 3 1 4 / K 1 4 1 4 / K 1 5 1 4 / K 1 6 1 3 / S A Z P 1 1 4 / S A T T 2 1 4 / S A T T 3 1 4 / S A T T 5 1 4 / S A Z P 2 1 4 / S A D L 3 1 4 / S A D L 1 1 4 / S A D L 2 1 4 / S A T T 4 1 4 / S A D L 4 1 5 / S A Z P 4 1 5 / S A Z P 1 1 4 / S A D L 5 1 5 / S A Z P 3 1 5 / S A Z P 8 1 4 / S A D L 6 1 4 / S A T T 1 1 4 / K 2 1 4 / E T 4 1 4 / E T 5 1 5 / S A Z P 6 1 5 / S A Z P 7 1 5 / S A Z P 1 1 1 5 / S A Z P 5 1 5 / S A Z P 1 2 1 5 / S A Z P 1 0 1 5 / S A Z P 9 T 1 3 / 1 T 1 3 / 2 T 1 3 / 3 C L 1 1 1 / 0 8 * 1 3 / 1 2 3 1 3 / 1 9 1 3 / 1 5 1 1 / 0 8 1 3 / 2 7 1 3 / 7 1 1 3 / 4 0 1 3 / 2 9 1 3 / 2 5 1 3 / 3 8 1 3 / 2 1 1 3 / 3 3 1 3 / 1 8 2 A T R - 1 Q l d - 2 Q l d - 1 A T R - 3 1 1 / 1 3 8 8 . 5 S S 1 1 1 / 1 2 8 J 0 2 0 5 5 C 1 4 / S A Z P 3 1 4 / E T 3 1 4 / E T 2 S A 1 S A 2 S A 3 S A 4 E T 8 7 0 9 4 K E 7 4 2 1 7 K E 8 9 0 6 9 E T 0 3 b / 1 0 J 0 0 8 5 F J 0 1 1 4 4 B m 1 j 0 2 - 0 2 2 0 3 / 7 8 8 . 4 5 S S 7 8 . 6 S S 1 1 1 / 1 4 0 8 8 . 4 4 S S 3 0 8 / 2 1 E R 1 7 9 b / 1 1 E R 1 8 1 a / 1 1 E T 0 8 / 1 0 CHAPTER 7: CURRENT PST THREAT IN SOUTH AFRICA 160 Value of BIC versus number of clusters Discriminant analysis eigenvalues 0 10 20 30 40 50 Number of clusters Linear Discriminants (i) Bayesian information criterion (ii) Discriminant analysis (DA) of (BIC) curve. eigenvalues. Cluster 1 Cluster 2 Cluster 3 Cluster 4 Eritrea & Cluster 5 Ethiopia (PstS2) Cluster 6 Pre-2010 Cluster 7 Cluster 8 Cluster 9 Cluster 10 Kenya 2014 South Africa & Ethiopia 2014 Pakistan 2014 SA 2013/2014 South Africa, Ethiopia & Kenya Pre-2012 UK Pre-2011 UK 2013 Group II & 2011 SA 2014/2015 UK Pre-2011 Ethiopia 2014 Kenya 2014 UK 2013 Cluster I UK 2013 Group III & Group IV (iii) Relative proximity of Pst population clusters. Figure 7.7: Discriminant analysis of principal components (DAPC) analysis of Pst isolates. Panel (i) shows the Bayesian information criterion (BIC) curve suggesting the minimum number of clusters (K) required to explain the variation be- tween pathotype clusters. An elbow is observed at K = 7 and a minimum at K = 11. From this result it can be derived that the optimal predicted number of population clusters (K) for the dataset falls between 7 and 11. Panel (ii) shows a bar plot representing discriminant analysis (DA) of eigen- values for main principal component functions. This indicates that most of the variation in the dataset can be explained by the first two principle components. Panel (iii) shows a scatter plot indicating the relative proximity of Pst population clusters following DAPC analysis. BIC 760 780 800 820 F-statistic 0 10000 20000 30000 40000 50000 60000 161 K2! K3! K4! K5! K6! K7! K8! K9! K10! K11! K12! K13! K14! K15! !"#$%&'(")*++,& -./01&23(45%&'(")6,*6& 7!&6,**86,*9&)&:;/<0"(&=& 7!&6,**& !"#$%&6,*>& -./01&23(45%&6,*9& 7!&6,*9&)&:;/<0"(&==& 7!&'(")6,**& ?014.@4%&'(")6,**& -./01&23(45%&6,*>& 7!&6,*9&)&:;/<0"(&===& A(%#5"&'(")6,**& ?014.@4%&6,*>& -./01&23(45%&6,*B& 7!&6,*9&)&:;/<0"(&=C& '%D4<0%#&6,*>& ?(40("%&6,**& && Figure 7.8: Histogram plots indicating population structure as inferred by DAPC analysis. Each bar indicates the group an isolate is assigned to. Field samples from Africa, collected between 2013 and 2015 were assigned to three groups, coloured orange, light red and red. The light red group contains South African isolates from 2014 and 2015, two Ethiopian isolates and one Kenyan isolate from 2014 and groups with the UK 2013 Cluster II, containing triticale field isolates. The red cluster contains field samples from 2014 collected in Kenya and South Africa, and one sample from 2013 that was collected from wild rye in South Africa. The orange group differentiated earlier (K6) than the red (K8) and light red groups (K9). This small group contains field samples collected in 2014 from Ethiopia and South Africa. From these three groups it is evident that the recent Pst population in South Africa is fairly diverse and that South African isolates share similarities with the Kenyan and Ethiopian populations. The Ethiopian population shows higher diversity compared to the Kenyan population. Asterisk (*) indicates genomic data of isolate 11/08, while no asterisk indicates RNA-Seq data for 11/08. ATR-2 and 11/75 (Table 4.2) were not used in this analysis 1 4 / K 4 1 4 / K 5 1 4 / K 6 1 4 / K 7 1 4 / K 8 1 4 / K 9 1 4 / K 1 0 1 4 / K 1 1 1 4 / K 1 2 1 4 / K 1 3 1 4 / K 1 4 1 4 / K 1 5 1 4 / K 1 6 1 3 / S A Z P 1 1 4 / S A T T 2 1 4 / S A T T 3 1 4 / S A T T 5 1 4 / S A Z P 2 1 4 / S A D L 3 1 4 / S A D L 1 1 4 / S A D L 2 1 4 / S A T T 4 1 4 / S A D L 4 1 5 / S A Z P 4 1 5 / S A Z P 1 1 4 / S A D L 5 1 5 / S A Z P 3 1 5 / S A Z P 8 1 4 / S A D L 6 1 4 / S A T T 1 1 4 / K 2 1 4 / E T 4 1 4 / E T 5 1 5 / S A Z P 6 1 5 / S A Z P 7 1 5 / S A Z P 1 1 1 5 / S A Z P 5 1 5 / S A Z P 1 2 1 5 / S A Z P 1 0 1 5 / S A Z P 9 T 1 3 / 1 T 1 3 / 2 T 1 3 / 3 C L 1 1 1 / 0 8 * 1 3 / 1 2 3 1 3 / 1 9 1 3 / 1 5 1 1 / 0 8 1 3 / 2 7 1 3 / 7 1 1 3 / 4 0 1 3 / 2 9 1 3 / 2 5 1 3 / 3 8 1 3 / 2 1 1 3 / 3 3 1 3 / 1 8 2 A T R - 1 Q l d - 2 Q l d - 1 A T R - 3 1 1 / 1 3 8 8 . 5 S S 1 1 1 / 1 2 8 J 0 2 0 5 5 C 1 4 / S A Z P 3 1 4 / E T 3 1 4 / E T 2 S A 1 S A 2 S A 3 S A 4 E T 8 7 0 9 4 K E 7 4 2 1 7 K E 8 9 0 6 9 E T 0 3 b / 1 0 J 0 0 8 5 F J 0 1 1 4 4 B m 1 j 0 2 - 0 2 2 0 3 / 7 8 8 . 4 5 S S 7 8 . 6 S S 1 1 1 / 1 4 0 8 8 . 4 4 S S 3 0 8 / 2 1 E R 1 7 9 b / 1 1 E R 1 8 1 a / 1 1 E T 0 8 / 1 0 CHAPTER 7: CURRENT PST THREAT IN SOUTH AFRICA 162 Differentiation within and between population clusters Differentiation between groups was calculated through pairwise comparisons of the 10 population clusters identified by the DAPC analysis. FST statistics for all pairwise comparisons are indicated in Figure 7.9 in the lower diagonal matrix. Highest FST values were observed in comparisons of Group 1 (≥ 0.37) and Group 4 (≥ 0.58). These groups were also positioned most distantly from the other eight groups in the DAPC scatter plot (Figure 7.7(iii)). Comparing the diversity between the two groups resulted in a high FST of 0.8, further indicating that the two groups differentiated distinctly from all other groups and from each other. These high values of FST also confirmed the importance of asexual reproduction known to increase the differentiation among populations by the absence of genetic mixing. Group 1 contained an Ethiopian isolate, ET08/10, previously identified as PstS2, as stipulated in Chapter 4 (Hovmøller et al., 2008; Walter et al., 2016; Ali et al., 2017; M Hovmøller, personal communication). This isolate grouped with two isolates from Eritrea collected in 2011. Group 4 included two isolates from Ethiopia and one isolate from South Africa, all collected in 2014. Group 4 was distinctly different to Groups 9 and 10, containing the remaining South African and East African isolates collected from 2013 to 2015. Groups 9 and 10 had a low FST of 0.12, indicating that these two groups are closely related. Besides the recent African field samples, Group 9 also contained samples collected on triticale in the UK in 2013. Low variability within the three groups (Groups 4, 9 and 10) that contained the post-2012 African samples was observed as indicated on the matrix diagonal (Figure 7.9). 7.3.2 Seedling Pst pathotype testing To compare the virulence profiles of the historical Pst isolates to isolates collected from the field between 2013 and 2015, seedling inoculation tests were performed 163 Group 1 2 3 4 5 6 7 8 9 10 Group Isolate ID Group Isolate ID ET08/10 CL1 1 ER179b/11 T13/2 ER181a/11 T13/3 1 0.0031 03/7 T13/1±0.0055 08/21 14/SADL4 88.45SS 14/SADL5 2 88.44SS3 14/SADL6 2 0.0005 J0085F 14/SATT10.39 ±0.0008 J01144Bm1 14/SATT4 j02-022 15/SAZP1 11/140 9 15\SAZP3 0.0020 SA1 15/SAZP53 0.37 0.21 ±0.0030 SA2 15/SAZP6SA3 15/SAZP7 3 SA4 15/SAZP8 KE74217 15/SAZP9 4 0.80 0.58 0.62 0.0001 KE89069 15/SAZP10±0.0004 ET87094 15/SAZP11 ET03b/10 15/SAZP12 14/SAZP3 14/ET4 5 0.53 0.14 0.23 0.76 0.0003 4 14/ET2 14/ET5±0.0009 14/ET3 14/K2 J02055C 14/SADL1 5 11/13 14/SADL2 6 0.59 0.32 0.26 0.79 0.20 0.0012 11/128 14/SADL3±0.0021 Qld-1 14/SATT2 6 Qld-2 14/SATT3 ATR-1 14/SATT5 7 0.78 0.39 0.39 0.87 0.27 0.31 0.0005 ATR-3 13/SAZP1±0.0008 13/27 14/SAZP2 13/38 15/SAZP4 13/21 14/K4 8 13/33 14/K50.82 0.45 0.42 0.91 0.48 0.47 0.41 0.0004 10±0.0013 7 13/182 14/K6 13/25 14/K7 13/29 14/K8 9 0.84 0.52 0.40 0.90 0.48 0.41 0.25 0.49 0.0001 13/71 14/K9 ±0.0004 13/40 14/K1011/08 14/K11 13/19 14/K12 0 ± 8 13/15 14/K1310 0.85 0.52 0.41 0.92 0.50 0.45 0.32 0.51 0.12 0.0001 13/123 14/K1411/08* 14/K15 14/K16 Figure 7.9: Measurements of genetic diversity by FST calculation of pairs of population groups indicated by the lower triangular matrix. The Watterson estimator of population diversity is given on the diagonal of the matrix. Colours of subpopulations is as shown in the DAPC population structure analysis bar plots (Figure 7.8). Comparisons with Group 4 (orange), and often Group 1, showed high FST values indicating that these groups were genetically very different from the other samples. Asterisk (*) indicates genomic data of isolate 11/08, while no asterisk indicates RNA-Seq data for 11/08. CHAPTER 7: CURRENT PST THREAT IN SOUTH AFRICA 164 on an extended set of wheat differential lines. The wheat differential set contained varieties with known Yr resistance genes, as well as unidentified sources of stripe rust resistance. Seedlings of 56 wheat varieties were tested under controlled environmental conditions with the historical South African isolates SA1 and SA4 and two field isolates, 13/SAZP1 and 15/SAZP4, collected in 2013 and 2015, respectively. To determine whether the genetic variance displayed in the phylogenetic analyses was linked to changes in the virulence profiles of these isolates the differential set was expanded by including additional wheat lines from the UK and Australia. In the comparison between the isolates SA1 and 13/SAZP1 significant vari- ability was not observed. The SA4 and the 2015 isolate, 15/SAZP4, displayed slight differences in infection types, with most prominent differences after infec- tion of the wheat varieties Monterey (;cn versus 2cn) and Heines VII (;1+cn versus 3c), and a smaller, but observable difference on Kranich (;cn versus 1cn), Solstice (; versus ;c) and Selkirk (2cn versus 3=cn). These differences are visually displayed in Figure 7.10. Detailed results of all infection assays are listed in Appendix D, Tables D.1 and D.2. 7.4 Discussion The field Pst population in South Africa was assessed at transcriptome level using 25 samples collected between 2013 and 2015. Along with these, four Pst isolates collected in 2014 in Ethiopia and 14 isolates collected in Kenya in 2014 were also evaluated. Phylogenetic analysis placed the 2014 East African isolates in close proximity to one another. Additionally, it revealed patterns of high similarity between the field Pst population in South Africa, collected between 2013 and 2015, and the UK Cluster II triticale field isolates (T13/1, T13/2, T13/3 and CL1) described 165 Figure 7.10: Infection type comparisons between one historical and one recent Pst isolate. Infection types of SA4, from the historical population, and isolate 15/SAZP4 collected in 2015 are shown. Highly similar phenotypes were observed on wheat Warrior, Vilmorin 23, Heines Peko, Reichersberg 42. Differences in UK testers, including Kranich, Monterey and Solstice, were observed. The outcome of the remaining differential tests are summarised in Appendix D, Tables D.1 and D.2. CHAPTER 7: CURRENT PST THREAT IN SOUTH AFRICA 166 by Hubbard et al. (2015). Isolates collected in the same region of South Africa commonly clustered together. South African isolates from corresponding ge- ographical and, by implication, climatic regions were often grouped together in the phylogenetic tree. Although this result has to be further investigated to draw further conclusions, pathotyped isolates with different virulence profiles were positioned on different branches. Genotyping of the Pst isolates indicated that a shift occurred in the South African Pst population, with the current Pst population being clearly differentiated from the earlier isolates sampled before 2012, and assessed in Chapter 4. The unexpected genetic relationship with the 2013 UK isolates, found on triticale, suggests a potential recent incursion of Pst into South Africa. Results from the DAPC analysis mostly correlated with the phylogenetic findings and revealed signs of population structure, with three distinct groups containing field samples from South Africa. The historical South African isolates were placed in a separate, fourth group. All three South African field sample groups also included 2014 isolates from Kenya and/or Ethiopia. This supports the hypothesis raised in Chapter 4 stating the potential exchange of inoculum between South Africa and East Africa, although the East African isolates did not show as close resemblance to the UK Cluster II isolates as the co-clustering South African field isolates did (Figure 7.3). This indicates that inoculum was somehow spread between these locations or could be derived from the same progenitor. The South African and UK populations remained more similar, but share similarity with the East African population. Regarding the DAPC analysis outcome of the South African field population, isolate 15/SAZP4, exhibiting a partially successful infection on Monterey that was not seen in the compared earlier isolate (SA4), was placed in Group 10, that was in addition to its high similarity to Group 9 (FST = 0.12), also very homogeneous according to the differentiation calculation within groups (0.0000± 0.0001). Both CHAPTER 7: CURRENT PST THREAT IN SOUTH AFRICA 167 Group 4 and Group 9 also containing South African field isolates, indicated low diversity amongst isolates (0.0001± 0.0004). The DAPC clusters only differed from the clades in the phylogenetic analysis by one isolate (14/SAZP3) when K = 10 is considered. The placement of 14/SAZP3 together with two isolates from Ethiopia, namely 14/ET2 and 14/ET3, forming Group 4 (orange) in the DAPC analysis is noteworthy. In the phylogenetic tree, sample 14/ET2 is the only isolate from the Ethiopian field isolates that show similarity to the East Africa (B) group, which contains isolate ET08/10, that was identified to be of the aggressive PstS2 type (Hovmøller et al., 2008; Walter et al., 2016; Ali et al., 2017; M Hovmøller, personal communication). Grouping of these three isolates was however not displayed in the inferred phylogeny where 14/SAZP3 grouped with the other South African isolates. However, in the DAPC analysis, Group 4 differentiated early (K = 6) as displayed by the orange bars, compared to the red group (K = 9), containing the rest of the East African and South African field samples. The high diversity that is shown when Group 4 (orange) is compared to Group 9 (light red, FST value 0.90) and Group 10 (red, FST value 0.92) indicate that Group 4, containing 14/SAZP3, differentiates considerably from Groups 9 and 10. This isolate, carrying the 6E16A- pathotype, was previously evaluated using microsatellite markers, and differentiated from other South African 6E16A- isolates (B Visser, personal communication). In previous infection assays, this isolate had a typical 6E16A- pathotype. This isolate was not evaluated on the extended differential set used in this study. This could be similar to the case discussed by Hubbard et al. (2015), where phenotypically similar isolates were genetically distinct and belonged to different populations. This further highlights the importance of genotyping along with differential testing in seasonal surveys. The genetic diversity between Groups 9 and 10 (FST value 0.12) was low in comparison to their diversity with Group 4. Further investigation is needed to CHAPTER 7: CURRENT PST THREAT IN SOUTH AFRICA 168 conclude that isolates related to the aggressive pathotype PstS2 are present in South Africa. Genetic change revealed by the phylogenetic and DAPC analyses in the South African Pst post-2012 population does not support stepwise mutational adapta- tion, but lead to speculations that an introduction of new Pst isolates occurred after 2012. This introduction could have occurred either through natural means where urediniospores could have been transferred by wind, or by human move- ment. According to the virulence profiles of the South African field isolates, pathological support for a new incursion is limited since new virulence has not been described in routine surveys. Ali et al. (2014) describe how such surveys can be biased as sampling is often done from wheat varieties that carry resis- tance genes that have been overcome, and usually not from field isolates. After evaluating some historical isolates with more recent phenotypic counterparts, SA1 and 13/SAZP1 had near to identical seedling phenotypes on all the wheat lines. Genotypic differences were however observed between SA1 and 13/SAZP1 using molecular marker analysis (Visser et al., 2016), as well as in this study. Newly introduced Pst populations may also carry avirulences not inspected in local differential sets, as seen in the differentiation in infection types between SA4 and 15/SAZP3. The most notable difference between infection by these two isolates were on Monterey. It is a winter wheat cultivar bred in the UK by the company Senova. It is not known what stripe rust resistance genes are present in this variety, but it shows moderate levels of resistance in the UK. For instance, it was listed by the UK Cereal Pathogen Virulence Survey (UKCPVS) as being “susceptible as an adult plant to one or more of the current stripe rust pathotypes” and scored 7.3 on the stripe rust resistance rating in 2014, where possible scores ranged from 1 to 9; 1 = highly susceptible, 9 = resistant (Hubbard et al., 2014). Monterey has the pedigree Istabraq x Robigus. Robigus is fully susceptible to all UK Pst pathotypes, whereas Istabraq has the pedigree Consort CHAPTER 7: CURRENT PST THREAT IN SOUTH AFRICA 169 x Claire which both provide some resistance to UK Pst pathotypes, although resistance in Claire has been eroded over the past few years. The aggressive Pst Kranich pathotype was first detected in the UK on Monterey (S Holdgate, personal communication). The wheat variety Kranich has the pedigree Heines- 2167-50/Heines-VII//Merlin/Deu. Small pustules were observed on Kranich inoculated with 15/SAZP4, while the SA4 inoculation resulted in flecks only, with signs of chlorotic and necrotic tissue. Taken together, this may indicate that a source of stripe rust resistance from Heines VII, present in Kranich and Monterey has become less effective towards Pst isolate 15/SAZP4. The role that the host plays in shaping the characteristics of the pathogen has not been addressed in this study. Wheat breeding in South Africa has generally relied on selection for resistance in the field and information about stripe rust resistance genes deployed in commercial wheat over the past 20 years is not obtainable, as reviewed by Pretorius et al. (2007). Only as recent as 2012, marker assisted selection (MAS) has been incorporated in breeding programmes with the establishment of the Molecular marker Service Laboratory (MSL) for wheat breeding in South Africa (Prins and Agenbag, 2013). In the past, germplasm from the International Maize and Wheat Improvement Center (CIMMYT) has been the origin of valuable resistance complexes. The presence of the slow rusting complex Lr34/Yr18/Sr57, present in South African spring wheat cultivar, Kariega (Ramburan et al., 2004; Prins et al., 2011), was likely introduced by a CIMMYT source. The lack of structured molecular breeding efforts incorporating rust resistance in the past makes it difficult to track specific selection pressures on Pst imposed by host resistance. No connection could be made between the Monterey and the South African germplasm. The widely homogenous nature of the Pst population could be due to the introduction of a relatively small amount of inoculum displaying the founder effect, where genetic variation is lost when a small number of individuals es- CHAPTER 7: CURRENT PST THREAT IN SOUTH AFRICA 170 tablish the population. Additionally or alternatively, a population bottleneck could have occurred where a limited amount of genotypes, able to sustain for example environmental conditions, survived from one wheat-growing season to the next. Environmental factors, either directly, or indirectly through effects on the host, can increase stress on the pathogen, acting as a force for adaptation. Severe droughts have been experienced in both main wheat growing regions in South Africa. This could have contributed to the low occurrence of stripe rust in recent years. It is possible that these non-optimal conditions encountered by the Pst population, during and between wheat seasons, may have also contributed to a population bottleneck. The majority of the 2014 South African field samples differentiated from the 2015 field samples, indicating that the population evolved from one growing season to the next, or it could again indicate the influx of new alleles into the population. The low occurrence of stripe rust in South Africa could have led to a change in allele frequencies in the population, similar to the “chance events” described by Wellings (2007). Relatively low numbers of spores may survive during the non-crop seasons, possibly on alternative grass species (Boshoff et al., 2002; Pretorius et al., 2015), resulting in such allele frequency shifts. Anthropogenic movement in and out of South Africa has drastically increased since the change in the country’s political system in 1994. Tourism and trade act as passages for pathogens to travel long distances much more quickly and frequently than through migration via animal vectors or storms (Anderson et al., 2004). The increase in number of international arrivals indicated by the World Tourism Organisation (Figure 7.111) demonstrates the increase in the potential for exotic incursions by pathogens via human movement. Anderson et al. (2004) considered this as the major driver of emerging infectious diseases. 1https://data.worldbank.org/indicator/ST.INT.ARVL?contextual=default&end=2014& locations=ZA&start=1995&view=chart CHAPTER 7: CURRENT PST THREAT IN SOUTH AFRICA 171 11 ● ● ● ● 9 ● ● ● ● ● 7 ● ● ● ● ● ● ● ● 5 ● ● ● 1995 2000 2005 2010 2014 Year Figure 7.11: Number of international tourist arrivals in South Africa between 1995 and 2014. Millions of arrivals CHAPTER 7: CURRENT PST THREAT IN SOUTH AFRICA 172 7.5 Conclusion No new virulence profiles for stripe rust have been reported in routine surveys in South Africa since 2005. In addition, no exclusive correlation could be seen between the genotypic change observed in South African field isolates and their virulence profiles, as shown in the SA1 vs 13/SAZP1 comparison in this study. The 2015 field isolate 15/SAZP4 was found to be partially virulent on the UK winter wheat cultivar, Monterey, but no change in virulence was seen with isolate 13/SAZP1 on the extended differential wheat set. The differentiation from SA1 that was observed in this study for 14/SAZP3, also carrying the 6E16A- pathotype, was also characterised using microsatellite markers (Visser et al., 2016). The microsatellite marker research supports the fact that a genetically diverse population carries the pathotype 6E16A-. No evidence could be found of common parentage between Monterey and South African wheat varieties that could account for selection for virulence in South Africa to the stripe rust resistance currently present in Monterey. Further investigation would be needed to identify which source of resistance in Monterey was challenged by 15/SAZP4. The data discussed in this chapter shows evidence of a definite change in the South African Pst population between 2013 and 2015. It is likely that this is due to an exotic incursion of Pst from outside South Africa. The Pst population also showed an allele frequency change between 2014 and 2015. It is possible that a population bottleneck, due to unfavourable environmental conditions, was responsible for this shift. Further research is required to determine which scenario has contributed to the changes in the Pst population in South Africa, including a systematic collection of stripe rust infected wheat leaf samples throughout the growing season, and wild grass between seasons. Chapter 8 General Discussion THIS STUDY SET OUT to examine the genetic structure of the Pst population in South Africa, with specific focus on the genetic variation related to pathotype vari- ation. Previous descriptions made use of traditional pathotyping and molecular marker technologies (Pretorius et al., 1997; Boshoff et al., 2002; Pretorius et al., 2007; Ali et al., 2014; Hovmøller et al., 2008; Visser et al., 2016). In this study, characterisation was undertaken using next-generation Illumina sequencing of Pst genomes and transcriptomes, and bioinformatics analyses to extend our knowledge of the South African Pst population and its evolutionary dynamics. Specific interests included the origin of Pst introduced into South Africa, the relationship between the four pathotypes identified so far, identification of effec- tor coding genes possibly responsible for distinct virulences, and genomic and pathological investigation of recent field Pst populations. 8.1 The historical South African Pst population Phylogenetic and clustering analyses, supported by evaluation of the genetic diversity, reinforced previous findings which stipulated that the historical Pst population in South Africa, represented by the Pst isolates collected between 2001 173 CHAPTER 8: GENERAL DISCUSSION 174 and 2011, had a close relationship to each other despite their distinct differences in virulence. Data from this study supports previous reports that the four patho- types were derived from one another through stepwise evolution (Visser et al., 2016). Analysis of the relationship of the historical South African isolates with available foreign isolates indicated a possible origin from Kenya and Ethiopia, or a common progenitor from elsewhere. Significant diversity was observed in the East African isolates, which formed two distinct groups, one closely related to the South African isolates and one distant from all other isolates assessed in this study. The East African isolates (Group A) that clustered with the historical South African isolates contained three isolates collected in the 1970s and 1980s and one isolate collected in 2010 (Figure 4.6 and 4.10). We therefore confirm associations based on pathotype analysis that the South African Pst incursion of 1996 had a high probability of originating from East Africa (Pretorius et al., 1997; Boshoff et al., 2002; Pretorius et al., 2007). These conclusions were supported by previous pathotype analyses that showed the presence of 6E16A- in East Africa (Badebo et al., 1990). However, similar pathotype designations may be shared between distinct isolates, for ex- ample the Ethiopian wheat variety Et-13 A2 was resistant to 6E16A and 6E22A isolates from South Africa, but susceptible to 6E22 isolates from Germany (Hus- sein and Pretorius, 2005; Badebo et al., 2008; Denbel, 2014). Genetic evidence from microsatellite marker analysis indicated 48 % similarity between South African isolates and the Kenyan isolates KE 10/09 and KE 12/09 (Visser et al., 2016). Differences could be due to virulence for Yr9 and Yr27 that is frequently observed in East Africa but absent in South Africa1. It was unfortunate that the present study did not include isolate data from additional locations south of Kenya which would have enabled the tracking of the putative southward spread of Pst into 1http://rusttracker.cimmyt.org CHAPTER 8: GENERAL DISCUSSION 175 South Africa. Earlier reports state that stripe rust is not a major problem in these regions (Stubbs, 1985), however, analysis of samples from Rwanda and Tanzania suggests that collections from more Southern African countries could be included in on-going work to monitor gene flow (Ali et al., 2017). Previous studies that included Pst isolates from Eritrea, indicated Central and Western Asia, and the Mediterranean as possible origins of South African isolates (Enjalbert et al., 2005; Hovmøller et al., 2008). Ali et al. (2014) further reported that the South African isolates (collected between 1996 and 2004) grouped with the older, aggressive group known as PstS3 often seen in Southern Europe. There is agreement between these studies and the present study with regards to the South African isolates not showing close relationships with isolates from Eritrea, however, isolates from Ethiopia and Kenya were not included in these studies. It would be interesting to assess more South African isolates collected between 1996 and 2011, and also to compare the South African isolates to Pst isolates from other Eastern and Southern African countries, as well as Asian and Mediterranean isolates, using the field pathogenomics approach as method of investigation. Such analyses would be subject to the availability of historical samples, but would enable inspection of the different hypotheses regarding the origin of South African Pst. 8.2 Candidate effector identification and evaluation Nonsynonymous polymorphism analysis aided in identifying candidate genes possibly involved in virulence. The analysis relied on available effector gene annotations and made use of the initial gene models developed for the PST130 reference genome. It is widely argued that high throughput effector gene an- notation protocols are difficult to develop for the rusts as they do not exhibit many of the common features that are known to be characteristic of other, more CHAPTER 8: GENERAL DISCUSSION 176 thoroughly described pathogens (Dodds et al., 2009; Saunders et al., 2012). It is therefore accepted that any computational protocol, despite its best efforts, would likely misidentify some effector genes. New research findings and tools allow constant refinement of gene predictions, as was the case for the PST130 reference, where gene annotations have been improved since the start of this study. To evaluate candidate effector gene expression during Pst infection, RT-qPCR was used. This methodology has been used in a number of published studies, but many of these lack detail on experimental procedures. It is often seen that best practices, as advised by developers and supporters of the technology, are not followed or not reported, misleading newcomers to the field. Greater efforts are needed to ensure that published work using RT-qPCR follow The Minimum Information for Publication of Quantitative Real-Time PCR Experiments (MIQE) guidelines (Huggett et al., 2013). In this study, the consistent expression patterns shown by the two South African isolates across all genes indicated a low level of technical variation seen between individual assays within a PCR plate. However, variation between plates hindered the formulation of confident conclusions from these experiments. In addition, evaluations of early time points were not informative using this method due to low concentrations of fungal transcripts. Continued efforts are needed to enable evaluation of gene expression from the moment of inoculation up to around two dpi to capture expression profiles of genes involved in the early processes of infection. Four candidate effector genes overlapped between this study and time course evaluations of two UK Pst isolates (Cantu et al., 2013). Future research should prioritise investigation of these four candidate genes. As a start, heterologous expression screens in Nicotiana benthamiana could be performed to add to the available information gained from this system about one of the four candidates, PST130_05023 (Petre et al., 2016b). CHAPTER 8: GENERAL DISCUSSION 177 8.3 The recent South African Pst population Surprisingly, analysis of RNA-Seq data of recent field isolates indicated an allele frequency shift in the South African Pst population. Previously this population was thought of as fairly stable because of the lack of detection of additional virulences between 2005 and 2015, when the last field isolates were sampled. These field isolates showed a close relationship to UK Pst isolates collected on triticale (UK Group II; Hubbard et al., 2015 and Bueno-Sancho et al., 2017). Whether or not these UK isolates were able to infect wheat is not known as they were not successfully cultured and have been lost (S Holdgate, personal communication). Compared to the 2013–2015 South African isolates, field isolates collected in Kenya and Ethiopia in 2014 were more similar to the pre-2011 East African and South African isolates, as indicated by the phylogenetic analysis. This analysis used the third codon position of genes with 80 % breadth of coverage in 80 % of isolates. DAPC clustering analysis used sites where a polymorphism resulting in a synonymous substitution in at least one isolate was recorded. In this analysis, the 2014 East African isolates did not group with the pre-2011 East African and South African isolates, but with the recent South African and UK Group II isolates. Two groups, namely Group 1—also described as East Africa (B)—indicated in blue in Figure 7.7(iii), and Group 4, indicated in orange and containing three 2014 isolates, two from Ethiopia and one from South Africa, included in the dataset in Chapter 7, showed high diversity, clustering away from the rest of the isolates considered in the DAPC analyses. This diversity could result in the software having difficulty to separate more similar isolates into population clusters. The two results differ primarily in their indication of the closest relatives of the 2014 East African isolates. There is however consensus between the two analyses with regards to the recent South African isolates, showing closer similarity to CHAPTER 8: GENERAL DISCUSSION 178 the UK Group II isolates than the historical South African isolates. Comparative re-evaluation of selected recent South African isolates to pre-2011 isolates on an extended wheat differential set confirmed previous findings in two isolates. The 6E16A- pathotype was confirmed in isolates SA1 and 13/SAZP1 with nearly identical infection types. However, evaluation of SA4 and 15/SAZP4 revealed diverging infection types. Disagreement exists between studies regarding similarities in the European and Ethiopian populations. Using virulence phenotyping together with AFLP, microsatellite and SCAR marker information, Ali et al. (2017) described a diverse Pst population with more than four pathotype groups in East Africa, collected between 2009 and 2015, that were distinct from the assessed European isolates. Among these East African isolates were samples from Ethiopia. In contrast, support for a close relationship between the UK Group II isolates and Ethiopian isolates from 2014 was reported where a number of Ethiopian isolates were assigned to this group, along with isolates collected in Europe in 2014 (Bueno- Sancho et al., 2017). The authors further revealed the assignment of historical Pst isolates from New Zealand that were collected between 2006 and 2012, to this group. Taken together this data provides evidence that a new incursion may have occurred in South Africa, possibly between 2011 and 2013, and the commonalities with UK Group II Pst indicate the possible spread of this Pst group over vast distances. These findings should alert the research and agricultural community that the Pst population in South Africa could be more dynamic than is currently thought to be the case. However, similar infection types in historical and recent isolates tested on existing differentials gave rise to scepticism. Further investiga- tion of East African and UK Group II Pst isolates is needed to support the current findings and track the global movement of this group. Sequencing of field isolates to monitor new incursions complementary to virulence profiling of Pst across CHAPTER 8: GENERAL DISCUSSION 179 cropping seasons would be beneficial to facilitate comprehensive surveys. The cost of implementing the field pathogenomics approach (Hubbard et al., 2015) is unfortunately a major limiting factor to deployment of this technology in routine pathotype surveys in South Africa. 8.4 Future work Effective, long term rust resistance in wheat can be implemented by pyramiding resistance genes. Ideally, breeders should combine major, R gene type and APR genes. This relaxes selection pressure on the pathogen population that can normally rapidly overcome singly deployed R genes. Understanding the mechanisms of R genes and their corresponding Avr genes, as in the case of the recently published AvrSr35 (Salcedo et al., 2017) and AvrSr50 (Chen et al., 2017) studies, can help breeders to track high-risk pathotypes to help tailor the deployment of resistance genes. Another approach would be to identify the target “susceptibility” genes of Pst effectors, such as the barley powdery mildew susceptibility gene Mlo. Mutations in Mlo created the recessive mlo allele that has provided broad-spectrum resistance against the fungus Blumeria graminis f. sp. hordei for many years (Büschges et al., 1997). Targeted mutation breeding of Pst effector target genes in wheat, using DNA-editing technologies such as CRISPR/Cas9 (Kim et al., 2018), could generate suites of mutant genes conferring resistance to Pst. Identifying the mechanisms, both in the host and the pathogen, that provide durable resistance is the aim of many future studies (Harris et al., 2015). Advances in research that enable understanding of how effectors function include protein interaction assays such as yeast-two-hybrid screens, gene expression knock-downs, for example using virus-mediated host- induced gene silencing and heterologous expression of effector genes in easily transformed host plants such as N. benthamiana (Liu et al., 2016; Petre et al., CHAPTER 8: GENERAL DISCUSSION 180 2016b). Other delivery systems such as the type III secretion system in bacteria have also been proposed to deliver specific proteins into host cells (Ma et al., 2009; Upadhyaya et al., 2013). Using these technologies, refinement of Pst gene annotations and the first available Pst haplotype-phased genome (Schwessinger et al., 2018) all provide promising potential resources to further assess wheat-Pst interactions in the search for long lasting resistance to improve wheat yields and reduce the evolutionary potential of rust pathogens by reducing inoculum. 8.5 Conclusion In conclusion, although there remains a significant gap in our understanding of genes that are responsible for the virulence gain in the historical South African population, this study showed that, contrary to conclusions from previous stud- ies, novel genetic variation that has not been described previously, is indeed present in the recent South African population. For the first time, according to our knowledge, the Pst populations of Ethiopia, Kenya and South Africa were linked using high-resolution genomic and transcriptomic data. This confirms earlier associations between pathotypes from eastern Africa and South Africa and verifies the risk for the introduction of more aggressive pathotypes into South Africa. Further characterisation of isolates that are associated with the UK Group II isolates, with specific focus on their pathogenicity, will aid in understanding the risks involved in long distance movement of Pst and ultimately help producers to decrease the incidence of disease and increase crop yields, which will in turn relieve the pressure on global food production to meet rising demands. Appendix A The Origin of the South African Pst Pathotypes 181 CHAPTER A: THE ORIGIN OF SOUTH AFRICAN PST 182 ER179b/11 ER181a/11 ET03b/10 ET08/10 15000 15000 50000 40000 20000 10000 10000 30000 15000 20000 10000 5000 5000 10000 5000 0 0 0 0 ET87094 KE74217 KE89069 50000 50000 40000 40000 40000 30000 30000 30000 20000 20000 20000 10000 10000 10000 0 0 0 frequency Figure A.1: Read frequency graphs for East African isolates analysed in Chapter 4, that have not been similarly assessed in published studies (Cantu et al., 2013; Hubbard et al., 2015; Bueno-Sancho et al., 2017). See Table 4.1 for further identification purposes. count 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Appendix B Analyses of Polymorphisms in Historical South African Pst Isolates in Search of Candidate Effector Genes B.1 Genes present in the PST130 reference genome but ab- sent in the four historical South African Pst isolates 183 CHAPTER B: EVOLUTION OF SOUTH AFRICAN PST 184 Table B.1: PST130 genes (211) that were absent in all four historical South African isolates PST130_00014 PST130_03142 PST130_08220 PST130_14020 PST130_00053 PST130_03351 PST130_08341 PST130_14034 PST130_00147 PST130_03414 PST130_08456 PST130_14069 PST130_00148 PST130_03415 PST130_08466 PST130_14429 PST130_00159 PST130_03429 PST130_08469 PST130_14430 PST130_00173 PST130_03543 PST130_08470 PST130_14605 PST130_00227 PST130_03607 PST130_08628 PST130_14606 PST130_00246 PST130_03762 PST130_08645 PST130_14653 PST130_00348 PST130_03775 PST130_08669 PST130_14781 PST130_00404 PST130_03798 PST130_08880 PST130_14925 PST130_00445 PST130_03847 PST130_08891 PST130_14963 PST130_00483 PST130_04103 PST130_09448 PST130_14964 PST130_00611 PST130_04396 PST130_10110 PST130_15027 PST130_00612 PST130_04591 PST130_10111 PST130_15648 PST130_00656 PST130_04612 PST130_10209 PST130_15841 PST130_00812 PST130_04613 PST130_10271 PST130_16094 PST130_00848 PST130_05005 PST130_11019 PST130_16216 PST130_00945 PST130_05050 PST130_11064 PST130_16356 PST130_00950 PST130_05150 PST130_11200 PST130_16357 PST130_00989 PST130_05183 PST130_11219 PST130_16435 PST130_01030 PST130_05199 PST130_11289 PST130_16508 PST130_01031 PST130_05303 PST130_11403 PST130_16509 PST130_01079 PST130_05357 PST130_11404 PST130_16568 PST130_01080 PST130_05569 PST130_11537 PST130_16737 PST130_01081 PST130_05640 PST130_11550 PST130_16763 PST130_01082 PST130_05683 PST130_11607 PST130_16764 PST130_01107 PST130_05804 PST130_11862 PST130_16830 PST130_01143 PST130_06069 PST130_11902 PST130_16914 PST130_01368 PST130_06079 PST130_11946 PST130_16963 PST130_01388 PST130_06120 PST130_11947 PST130_17078 PST130_01690 PST130_06121 PST130_11948 PST130_17111 PST130_01696 PST130_06122 PST130_12027 PST130_17182 PST130_01697 PST130_06123 PST130_12084 PST130_17218 PST130_01825 PST130_06147 PST130_12310 PST130_17238 PST130_01826 PST130_06262 PST130_12311 PST130_17253 PST130_01847 PST130_06356 PST130_12346 PST130_17316 PST130_01859 PST130_06479 PST130_12435 PST130_17354 PST130_01946 PST130_06533 PST130_12436 PST130_17435 PST130_02005 PST130_06608 PST130_12446 PST130_17515 PST130_02139 PST130_06609 PST130_12481 PST130_17560 PST130_02140 PST130_06687 PST130_12509 PST130_17599 PST130_02142 PST130_06741 PST130_12825 PST130_17620 PST130_02153 PST130_06775 PST130_12971 PST130_17812 PST130_02289 PST130_07080 PST130_12992 PST130_17815 PST130_02406 PST130_07081 PST130_13083 PST130_17898 PST130_02413 PST130_07180 PST130_13431 PST130_17956 PST130_02482 PST130_07220 PST130_13432 PST130_17990 PST130_02770 PST130_07285 PST130_13436 PST130_17991 PST130_02826 PST130_07330 PST130_13455 PST130_17992 PST130_03059 PST130_07486 PST130_13530 PST130_18018 PST130_03060 PST130_07943 PST130_13926 PST130_18083 PST130_03094 PST130_07959 PST130_13932 PST130_18108 PST130_03099 PST130_08034 PST130_13936 CHAPTER B: EVOLUTION OF SOUTH AFRICAN PST 185 B.2 Annotations of genes homologous to identified PST130 genes PST130_00159 Accession gi|403167846| ref|XM_003327549.2| Homolog Pgt isoleucyl-tRNA synthetase (PGTG_09131) UniProtKB/ TrEMBL ID E3KG82 Protein name Isoleucyl-tRNA synthetase Associated Function or Cellular location Enzyme involved in protein biosynthesis during translation. Present in cyto- plasm. GO terms GO:0002161 (aminoacyl-tRNA editing activity), GO:0005524 (ATP binding), GO:0004822 (isoleucine-tRNA ligase activity), GO:0000049 (tRNA binding). GO:0006428 (isoleucyl-tRNA aminoacylation) Conserved domains PLN02882: aminoacyl-tRNA ligase; cd07961: Anticodon-binding domain of archaeal, bacterial, and eukaryotic cytoplasmic isoleucyl tRNA synthetases; cd00818: catalytic core domain of isoleucyl-tRNA synthetases PST130_07080 Accession gi|403159121| ref|XM_003319730.2| Homolog Pgt hypothetical protein (PGTG_01952) UniProtKB/ TrEMBL ID E3JT80 Protein name Uncharacterised protein Associated Function or Cellular location Helicases are ATPase enzymes that catalyse the unwinding of double-stranded nucleic acids. Involved in processes such as DNA replication, recombination, and nucleotide excision repair, as well as RNA transcription and splicing. CHAPTER B: EVOLUTION OF SOUTH AFRICAN PST 186 GO terms GO:0009055 (electron transfer activity), GO:0016491 (oxidoreductase activity), GO:0035091 (phosphatidylinositol binding), GO:0009061 (anaerobic respiration), GO:0022900 (electron transport chain) Conserved domains cd06869: The PX domain is a phosphoinositide (PI) bind- ing module involved in targeting proteins to PI-enriched membranes. Diverse functions such as cell signalling, vesicular trafficking, protein sorting, lipid mod- ification, cell polarity and division, activation of T and B cells, and cell sur- vival.; pfam12825: Domain of unknown function in PX-proteins.; pfam12828: PX-associated PST130_16763 Accession gi|403160602| ref|XM_003321038.2| Homolog Pgt hypothetical protein (PGTG_02128) UniProtKB/ TrEMBL ID E3JX92 Protein name Uncharacterised protein Associated Function or Cellular location Location associated with P-body and nucleolus. Cytoplasmic stress granule. GO terms GO:0005524 (ATP binding), GO:0004004 (ATP-dependent RNA helicase activity), GO:0003676 (nucleic acid binding), GO:0033962 (cytoplasmic mRNA processing body assembly), GO:0006417 (regulation of translation), GO:0010501 (RNA sec- ondary structure unwinding) Conserved domains COG0513: Superfamily II DNA and RNA helicase [Replication, recombination and repair], cd00079. Helicase superfamily c-terminal domain; cl21455. P-loop containing Nucleoside Triphosphate Hydrolases. Involved in diverse cellular functions CHAPTER B: EVOLUTION OF SOUTH AFRICAN PST 187 PST130_17182 Accession gi|403160450| ref|XM_003320901.2| Homolog Pgt hypothetical protein (PGTG_02971) UniProtKB/ TrEMBL ID E3JWV5 Protein name Uncharacterised protein Associated Function or Cellular location Location associated with cytoplasm, endoplasmic reticulum, membrane compo- nent. GO terms GO:0016491 (oxidoreductase activity); GO:0016627 (oxidoreductase activity, act- ing on the CH-CH group of donors); GO:0042761 (very long-chain fatty acid biosynthetic process) Conserved domains PLN02560: enoyl-CoA reductase; cl00155: Ubiquitin homologs. Ubiquitin- mediated proteolysis is part of the regulated turnover of proteins required for controlling cell cycle progression. cl21511: The Saccharomyces cerevisiae Meyen ex EC Hansen phospholipid methyltransferase (EC:2.1.1.16) has a broad substrate specificity of unsaturated phospholipids. PST130_17354 — A Accession gi|403161086| ref|XM_003890392.1| Homolog Pgt hypothetical protein (PGTG_20899) UniProtKB/ TrEMBL ID H6QPU7 Protein name Glycogen [starch] synthase Associated Function or Cellular location Enzyme that catalyse the transfer of glycosyl (sugar) residues to an acceptor, both during degradation (cosubstrates= water or inorganic phosphate) and during CHAPTER B: EVOLUTION OF SOUTH AFRICAN PST 188 biosynthesis of polysaccharides, glycoproteins and glycolipids. GO terms GO:0004373 (glycogen (starch) synthase activity); GO:0005978 (glycogen biosyn- thetic process) Conserved domains cl10013: Glycosyltransferases catalyse the transfer of sugar moieties from acti- vated donor molecules to specific acceptor molecules, forming glycosidic bonds. PST130_17354 — B Accession gi|403166809| ref|XM_003326625.2| Homolog Pgt glycogen [starch] synthase (PGTG_07651) UniProtKB/ TrEMBL ID E3KCW8 Protein name Glycogen [starch] synthase Associated Function or Cellular location — GO terms GO:0004373 (glycogen (starch) synthase activity); GO:0005978 (glycogen biosyn- thetic process) Conserved domains cd03793: Glycogen synthase, catalyses the transfer of a glucose molecule from UDP-glucose to a terminal branch of a glycogen molecule, a rate-limit step of glycogen biosynthesis.; pfam05693: Glycogen synthase. It is the rate limiting enzyme in the synthesis of the polysaccharide, and its activity is highly regulated through phosphorylation at multiple sites and also by allosteric effectors, mainly glucose 6-phosphate (G6P). PST130_17620 Accession gi|403174779| ref|XM_003333656.2| Homolog Pgt hypothetical protein (PGTG_15464) CHAPTER B: EVOLUTION OF SOUTH AFRICAN PST 189 UniProtKB/ TrEMBL ID E3KYK8 Protein name Uncharacterised protein Associated Function or Cellular location — GO terms GO:0003824(Catalysis of a biochemical reaction at physiological temperatures.); GO:0009058 (The chemical reactions and pathways resulting in the formation of substances; typically the energy-requiring part of metabolism in which simpler substances are transformed into more complex ones.) Conserved domains cd00609: Aspartate aminotransferase family. This family belongs to pyridoxal phosphate (PLP)-dependent aspartate aminotransferase superfamily (fold I). Pyri- doxal phosphate combines with an alpha-amino acid to form a compound called a Schiff base or aldimine intermediate, which depending on the reaction, is the sub- strate in four kinds of reactions (1) transamination (movement of amino groups), (2) racemisation (redistribution of enantiomers), (3) decarboxylation (removing COOH groups), and (4) various side-chain reactions depending on the enzyme in- volved.; COG0436: Amino acid transport and metabolism. linked to 3D-structure. PST130_17815 Accession gi|403157775| ref|XM_003307127.2| Homolog Pgt 1,3-beta-glucan synthase component FKS1 (PGTG_00125) UniProtKB/ TrEMBL ID E3JR07 Protein name 1,3-beta-glucan synthase component FKS1 Associated Function or Cellular location Component of the plasma membrane. GO terms GO:0003843 (1,3-beta-D-glucan synthase activity); GO:0006075 ((1->3)-beta-D- glucan biosynthetic process). CHAPTER B: EVOLUTION OF SOUTH AFRICAN PST 190 Conserved domains pfam02364: 1,3-beta-glucan synthase component. 1,3-beta-glucan synthase EC:2.4.1.34 also known as callose synthase catalyses the formation of a beta-1,3-glucan poly- mer that is a major component of the fungal cell wall/ PST130_00758 Accession gi|403160953| ref|XM_003321311.2| Homolog Pgt hypothetical protein (PGTG_02401) UniProtKB/ TrEMBL ID E3JY15 Protein name Uncharacterised protein Associated Function or Cellular location P-body: A focus in the cytoplasm where mRNAs may become inactivated by decapping or some other mechanism. Protein and RNA localized to these foci are involved in mRNA degradation, nonsense-mediated mRNA decay (NMD), translational repression, and RNA-mediated gene silencing. GO terms GO:0003729 (mRNA binding) GO:0030371 (translation repressor activity - Antag- onises ribosome-mediated translation of mRNA into a polypeptide) GO:0017148 (negative regulation of translation) GO:0000289 (nuclear-transcribed mRNA poly(A) tail shortening). Conserved domains smart00454. Sterile alpha motif. Widespread domain in signalling and nuclear proteins.; cl15755. SAM (Sterile alpha motif) is a module consisting of approx- imately 70 amino acids. This domain is found in the Fungi/Metazoa group and in a restricted number of bacteriaSAM domains have diverse functions and locations. They can interact with proteins, RNAs and membrane lipids, contain site of phosphorylation and/or kinase docking site, and play a role in protein homo and hetero dimerisation/oligomerisation in processes ranging from signal CHAPTER B: EVOLUTION OF SOUTH AFRICAN PST 191 transduction to regulation of transcription. Mutations in SAM domains have been linked to several diseases. PST130_08345 Accession gi|403162070| ref|XM_003322301.2| Homolog Pgt hypothetical protein (PGTG_03886) UniProtKB/ TrEMBL ID E3K0V5 Protein name Aconitate hydratase, mitochondrial Associated Function or Cellular location Associated with the mitochondrion. Protein which binds at least one iron atom, or protein whose function is iron-dependent. Involved in metabolic processes that result in cell growth. GO terms GO:0051539 (4 iron, 4 sulfur cluster binding); GO:0003994 (aconitate hydratase activity); GO:0046872 (metal ion binding); GO:0032543 (mitochondrial transla- tion); GO:0006099 (tricarboxylic acid cycle). Conserved domains TIGR01340: aconitate hydratase, mitochondrial. [Energy metabolism, TCA cycle]; cl00215. Aconitase swivel domain. Aconitase (aconitate hydratase) catalyses the reversible isomerisation of citrate and isocitrate as part of the TCA cycle. cl00285. Aconitase catalytic domain. Both cl00215 and cl00285 are present in enzymes involved in biosynthesis of leucine. PST130_12299 Accession gi|403173188| ref|XM_003332239.2| Homolog Pgt hypothetical protein (PGTG_14583) UniProtKB/ TrEMBL ID E3KU93 Protein name Uncharacterised protein Associated Function or Cellular location CHAPTER B: EVOLUTION OF SOUTH AFRICAN PST 192 Associated with the cytosol, nucleus and membrane. GO terms GO:0003723 (RNA binding) GO:0043130 (binding ubiquitin, involved in pro- teolytic degradation) GO:0031081 (nuclear pore distribution) GO:0016973 & GO:0006606 (poly(A)+ mRNA export / protein import from nucleus into the cytoplasm / vice versa) GO:0000972 & GO:0000973 (transcriptional & posttran- scriptional tethering of RNA polymerase II gene DNA at nuclear periphery) GO:2000728 (regulates mRNA export from nucleus in response to heat stress) GO:0006405 (RNA export from nucleus to the cytoplasm). Conserved domains COG2319: WD40 repeat [General function prediction only] sd00039: WD40 re- peats in seven bladed beta propellers. The WD40 repeat is found in a number of eukaryotic proteins that cover a wide variety of functions including adap- tor/regulatory modules in signal transduction, pre-mRNA processing, and cy- toskeleton assembly; cl02567: WD40 Superfamily. CHAPTER B: EVOLUTION OF SOUTH AFRICAN PST 193 B.3 Nonsynonymous polymorphisms in candidate genes SA1 M S FL S N T I L K F A L L L S V A L V Y Q L S G I N A N S I V S P K P N Q T L N P G E K L SA2 M S L S N T I L K F A L L L S V A L V Y Q L S G I N A N S I V S P K P N Q T L N P G E K L 45 SA3 M S L S N T I L K F A L L L S V A L V Y Q L S G I N A N S I V S P K P N Q T L N P G E K L SA4 M S L S N T I L K F A L L L S V A L V Y Q L S G I N A N S I V S P K P N Q T L N P G E K L A V V V K K N S T D S T D Q T L A F A V G L S V Y K D S L G R P F L R T V D V G K G E A T A V V V K K N S T D S T D Q T L A F A V G L S V Y K D S L G R P F L R T V D V G K G E A T 46 90 A V V V K K N S T D S T D Q T L A F A V G L S V Y K D S L G R P F L R T V D V G K G E A T A V V V K K N S T D S T D Q T L A F A V G L S V Y K D S L G R P F L R T V D V G K G E A T W N S H E S T Y T F E V T V P P T S D F I D Q F S K P Y N F A V S E Y Y L K G P S N V P T W N S H E S T Y T F E V T V P P T S D F I D Q F S K P Y N F A V S E Y Y L K G P S N V P T 91 135 W N S H E S T Y T F E V T V P P T S D F I D Q F S K P Y N F A V S E Y Y L K G P S N V P T W N S H E S T Y T F E V T V P P T S D F I D Q F S K P Y N F A V S E Y Y L K G P S N V P T L G L S E T P V T I K Q D * L G L S E T P V T I K Q D * 136 149 L G L S E T P V T I K Q D * L G L S E T P V T I K Q DN * Figure B.1: Translated sequence alignment of gene PST130_02001. This gene has been identified to encode a putative effector protein (Cantu et al., 2013). The signal peptide, predicted using SignalP (version 2; Emanuelsson et al., 2007) is indicated by the black box. Alternative amino acids resulting from nonsyn- onymous SNPs at biallelic sites are indicated in the below diagonal triangles. Colours were assigned according to the “Clustal X Colour Scheme” used in Jalview (Waterhouse et al., 2009), categorising amino acid profiles. CHAPTER B: EVOLUTION OF SOUTH AFRICAN PST 194 SA1 M L F S V L A V FL M M V Q G R S V I G A G F Q C L P D P A R A Q A L C S R P P T A P Q D H T SA2 M L F S V L A V L M M V Q G R S V I G A G F Q C LP D P A R A Q A L C S R P P T A P Q D H T 45 SA3 M L F S V L A V FL M M V Q G R S V I G A G F Q C L P D P A R A Q A L C S R P P T A P Q D H T SA4 M L F S V L A V FL M M V Q G R S V I G A G F Q C L P D P A R A Q A L C S R P P T A P Q D H T V T I V K P Y R I G D D Y F C P P R L D A E I P V C C K T D M Y M R Y M A S G W K T I L P V T I V K P Y R I G D D Y F C P P R L D A E IT P V C C K T D M Y M R Y M A S G W K T I L P46 90 V T I V K P Y R I G D D Y F C P P R L D A E IT P V C C K T D M Y M R Y M A S G W K T I L P V T I V K P Y R I G D D Y F C P P R L D A E IT P V C C K T D M Y M R Y M A S G W K T I L P N D T Y S A A C F P P V H L P D P P K V D L T D A L R Y Y P A G D G I N L H V D T K T G G N D T Y S A A C F P P V H L P D P P K V D L T D A L R Y Y P A G D G I N L H V D T K T G G 91 135 N D T Y S A A C F P P V H L P D P P K V D L T D A L R Y Y P A G D G I N L H V D T K T G G N D T Y S A A C F P P V H L P D P P K V D L T D A L R Y Y P A G D G I N L H V D T K T G G S F N C P V K T C K S S Y G G I G C T H D D I P G L G K A N Q T C S H L F G A K G A T Q I S F N C P V K T C K S S Y G G I G C T H D D I P G L G K A N Q T C S H L F G A K G A T Q I 136 180 S F N C P V K T C K S S Y G G I G C T H D D I P G L G K A N Q T C S H L F G A K G A T Q I S F N C P V K T C K S S Y G G I G C T H D D I P G L G K A N Q T C S H L F G A K G A T Q I C C T F T D A * C C T F T D A * 181 188 C C T F T D A * C C T F T D A * Figure B.2: Translated sequence alignment of gene PST130_02118. This gene has been identified to encode a putative effector protein (Cantu et al., 2013). The signal peptide, predicted using SignalP (version 2; Emanuelsson et al., 2007) is indicated by the black box. Alternative amino acids resulting from nonsyn- onymous SNPs at biallelic sites are indicated in the below diagonal triangles. Colours were assigned according to the “Clustal X Colour Scheme” used in Jalview (Waterhouse et al., 2009), categorising amino acid profiles. CHAPTER B: EVOLUTION OF SOUTH AFRICAN PST 195 SA1 M L K L T H V I L A C V L V L E A Y A L H I DG S G H S K R D I Y S E P K D H Y G G S H D Y T SA2 M L K L T H V I L A C V L V L E A Y A L H I G S G H S K R D I Y S E P K D H Y G S H D Y T 45 SA3 M L K L T H V I L A C V L V L E A Y A L H I G S G H S K R D I Y S E P K D H Y G S H D Y T SA4 M L K L T H V I L A C V L V L E A Y A L H I DG S G H S K R D I Y S E P K D H Y G G S H D Y T P S Y K P E P Q K K P E P S K Y Y P E P P K K P E P F K Y Y P V P P K E P E P F K H Y P E P S Y K P E P Q K K P E P S K Y Y P E P P K K P E P F K Y Y P V P P K E P E P F K H Y P E P 46 90 S Y K P E P Q K K P E P S K Y Y P E P P K K P E P F K Y Y P V P P K E P E P F K H Y P E P P S Y K P E P Q K K P E P S K Y Y P E P P K K P E P F K Y Y P V P P K E P E P F K H Y P E P P K K P E P F K Y Y P EV P P K K P E P F K H Y Y P E P P K K P E P F K Y Y P T P P K K P D P P K K P E P F K Y Y P V P P K K P E P F K H Y P E P P K K P E P F K Y Y P T P P K K P D P 91 135 P K K P E P F K Y Y P EV P P K K P E P F K H Y Y P E P P K K P E P F S K Y Y P T P P K K P D P P K K P E P F K Y Y P EV P P K K P E P F K H Y P E P P K K P E P F K Y Y P T P P K K P D P S K Y Y P E P P P K P D P S K Y F P T P P Q E K P E T P K Y Y P E P P K Y K P E E P K Y A S K Y Y P E P P P K P D P S K Y F P T P P Q E K P E T P K Y Y P E P P K Y K P E E P K Y A 136 180 S K Y Y P E P P P K P D P S K Y F P T P P Q E K P E T P K Y Y P E P P K Y K P E E P K Y A S K Y Y P E P P P K P D P S K Y F P T P P Q E K P E T P K Y Y P E P P K Y K P E E P K Y A S P K Y D AP P Y E K T P D E E P K Y S A P S Y D Y N P P K K D G Y R H * S P K Y D AP P Y E K T P D E E P K Y S A P S Y D Y N P P K K D G Y R H *181 216 S P K Y D AP P Y E K T P D E E P K Y S A P S Y D Y N P P K K D G Y R H * S P K Y D AP P Y E K T P D E E P K Y S A P S Y D Y N P P K K D G Y R H * Figure B.3: Translated sequence alignment of gene PST130_02403. This gene has been identified to encode a putative effector protein (Cantu et al., 2013). The signal peptide, predicted using SignalP (version 2; Emanuelsson et al., 2007) is indicated by the black box. Alternative amino acids resulting from nonsyn- onymous SNPs at biallelic sites are indicated in the below diagonal triangles. Colours were assigned according to the “Clustal X Colour Scheme” used in Jalview (Waterhouse et al., 2009), categorising amino acid profiles. CHAPTER B: EVOLUTION OF SOUTH AFRICAN PST 196 SA1 M N I Q L F P I M I F L L G H P S L I F G R P T E G K A V T Q E F G K L H V D C P G T E H SA2 M N V Q L F P I M I V L L G H P S L I F G R P T E G K A V T Q E F G K L H V D C P G T E H 45 SA3 M N I Q L F P I M I F L L G H P S L I F G R P T E G K A V T Q E F G K L H V D C P G T E H SA4 M N I Q L F P I M I F L L G H P S L I F G R P T E G K A V T Q E F G K L H V D C P G T E H V E H V K N P F A E E D K H A S V I S D N S K N I S G S R H S S S P E S I P E E E K P F L V E H V K N P F A E E D K H A S V I S D N S K N I S G S R H S S S P E S I P E E E K P F L 46 90 V E H V K N P F A E E D K H A S V I S D N S K N I S G S R H S S S P E S I P E E E K P L L V E H V K N P F A E E D K H A S V I S D N S K N I S G S R H S S S P E S I P E E E K P L L D R S Q S D R G S S K P S G P A P D Q P K Q G E D G K G R K M A E L Y A R F K K S L S T W D R S Q S D R G S S K P S G P A P D Q P K Q G E D G K G R K M A E L Y A R F K K S L S T W 91 135 D R S Q S D R G S S K P S G P A P D Q P K Q G E D G K G R K M A E L Y A R F K K S L S T W D R S Q S D R G S S K P S G P A P D Q P K Q G E D G K G R K M A E L Y A R F K K S L S T W Y G G H S A V A R F L R R M V N Y F H P R K M S K S K E A K E A K E A E D A K K V E D A K Y G G H S A V A R F L R R L V N Y F H P R K M S K S K E A K E A K E A E D A K K V EK D A K136 180 Y G G H S A V A R F L R R L V N Y F H P R K M S K S K E A K E A K E A E D A K K V E D A K Y G G H S A V A R F L R R L V N Y F H P R K M S K S K E A K E A K E A E DK E A K E A E A K V K D V K K V K D V K K V G D V K K A E E A T K A E D A E K A Q E A K K A Q E T T G A V R V E A S M K V K D V K K V G D V K K A E E A T K A E D A E K A Q E A K K A Q E T T G A V R V E A S M 181 225 K V K D V K K V G D V K K A E E A T K A E D A E K A Q E A K K A Q E T T G A V R V E A S M K A EV K D V K K V E D V K K A E E A T K A E D A E K A Q E A K K A Q E T T G A V R V E A S M P E L S V T E E K A A T A V K P E S P S A T S P S T G T V P A S S N F V K P G L F A T D E P E L S V T E E K A A T A V K P E S P S A T S P S T G T V P A S S N F V K P G L F A T D E 226 270 P E L S V T E E K A A T A A K P E S P S A T S P S T G T V P A S S N F V K P G L F A T D E P E L S V T E E K A A T A A K P E S P S A T S P S A G T V P A S S N F V K P G L F A T D E S Q P R P Q T I W I A * S Q P R P Q T I W I A * 271 282 S Q P R P Q T I W I A * S Q P R P Q T I W I A * Figure B.4: Translated sequence alignment of gene PST130_05023. This gene has been identified to encode a putative effector protein (Cantu et al., 2013). The signal peptide, predicted using SignalP (version 2; Emanuelsson et al., 2007) is indicated by the black box. Alternative amino acids resulting from nonsyn- onymous SNPs at biallelic sites are indicated in the below diagonal triangles. Colours were assigned according to the “Clustal X Colour Scheme” used in Jalview (Waterhouse et al., 2009), categorising amino acid profiles. CHAPTER B: EVOLUTION OF SOUTH AFRICAN PST 197 SA1 M R G L Q I C K I V F G I L V S F H H S I A A D A P P S V G I P S S V S P C G A V P L E I SA2 M R G L Q I C K I V F G I L V S F H H S I A A D A P P S V G I P S S V S P C G A V P L E I 45 SA3 M R G L Q I C K I V F G I L V S F H H S I A A D A P P S V G I P S S V S P C G A V P L E I SA4 M R G L Q I C K I V F G I L V S F H H S I A A D A P P S V G I P S S V S P C G A V P L E I T G G T P P Y S I A I N AT A D N P S G P P L H T F A D V K Q P S S L A W P S G M S T G M V T G G T P P Y S I A I N A A D N P S G P P L H T F A D V K Q P S S L A W P S G M S T G M V 46 90 T G G T P P Y S I A I N AT A D N P S G P P L H T F A D V K Q P S S L A W P S G M S T G M V T G G T P P Y S I A I N AT A D N P S G P P L H T F A D V K Q P S S L A W P S G M S T G M V L T M E V K D S K G L T T T S G Q S T V I P S A D C P Q S P G A G A T K N T T D I A T T G L T M E V K D S K G L T T T S G Q S T V I P S A D C P Q S P G A G A T K N T T D I A T T G 91 135 L T M E V K D S K G L T T T S G Q S T V I P S A D C P Q S P G A G A T K N T T D I A T T G L T M E V K D S K G L T T T S G Q S T V I P S A D C P Q S P G A G A T K N T T D I A T T G P PS G G D G A S A K N W T Q G M P A L S S D N K T A G G P T P P A S A N S T D P A H P A N A F V P PS G G D G S A K N W T Q G M P A L S S N K T A G G P T P P A S A N S T D P A H P A N A V136 180 P P G G D G AS A K N W T Q G M P A L S S D N K T A G G P T P P A S A N S T D P A H P A N A F V P P G G D G AS A K N W T Q G M P A L S S D N K T A G G P T P P A S A N S T D P A H P A N A F V S T T A N A T G A V R L D S A D S N N A S M P D S A N AV T A T A D Q H G V M N M T D S T P S T T A N A T G A V R L D S A D S NS N A S M P D S A N A T A T A D Q H G V M N M T D S T P181 225 S T T A N A T G A V R L D S A D S NS N A S M P D S A N A V T A T A D Q H G V M N M T D S T P S T T A N A T G A V R L D S A D S NS N A S M P D S A N A V T A T A D Q H G V M N M T D S T P M S P S T A R AT T N M P P S N K T V N H S N N D N S K S G N N T S S S E K P G K I G G V * M S P S T A R A T N M P P S N K T V N H N D N S K S G N N T S S S EK P G K I G G V *226 267 M S P S T A R AT T N M P P S N K T V N H E S N N D N S K S G N N T S S S K P G K I G G V * M S P S T A R AT T N M P P S N K T V N H E S N N D N S K S G N N T S S S K P G K I G G V * Figure B.5: Translated sequence alignment of gene PST130_05454. This gene has been identified to encode a putative effector protein (Cantu et al., 2013). The signal peptide, predicted using SignalP (version 2; Emanuelsson et al., 2007) is indicated by the black box. Alternative amino acids resulting from nonsyn- onymous SNPs at biallelic sites are indicated in the below diagonal triangles. Colours were assigned according to the “Clustal X Colour Scheme” used in Jalview (Waterhouse et al., 2009), categorising amino acid profiles. CHAPTER B: EVOLUTION OF SOUTH AFRICAN PST 198 SA1 M T R L I I I L G L V A R L L A P K V F G A G L P D E N L A K L P A D L H I I K A D E S G SA2 M T R L I I I L G L V A R L L A P K V F G A G L P D E N L A K L P A D L H I I K A D E S G 45 SA3 M T R L I I I L G L V A R L L A P K V F G A G L P D E N L A K L P A D L H I I K A D E S G SA4 M T R L I I I L G L V A R L L A P K V F G A G L P D E N L A K L P A D FL H I I K A D E S G S P Y V D P V T N V K F R D I P N K L D K E I T I H D G K E P W I I E P R Q N V R L D Y D S P Y V D P V T N V K F R D I P N K L D K E I T I H D G K E P W I I E P R Q N V R L D Y D 46 90 S P Y V D P V T N V K F R D I P N K L D K E I T I H D G K E P W I I E P R Q N V R L D Y D S P Y V D P V T N V K F R D I P N K L D K E I T I H DQ N G K Q E P W I I E P R Q N V R L D Y D P N Y P Y L L I T D N E R V L L N K D F Y N R H V T T T A I E R L K E E A A E R P P A S D P N Y P Y L L I T D N E R V L L N K D F Y N R H V T T T A I E R L K E E A A E R P P A S D 91 H F N F D 135P N Y P Y L L I T D N E R V L L T K D S Y N R H V T T T A I E R L K E E A A E R P P A S D P N Y P Y L L I T D N E R V L L N K D F Y N R H V T T T A I E R L K E E A A E R P P A S D P E G P T G T S N S Q H E E W Y E N L A P N P V L G T G R T A D K Q L P T D K G E S Q K E P E G P T G T S N S Q H E E W Y E N L A P N P V L G T G R T A D K Q L P T D K G E S Q K E 136 180 P E G P T G T S N S Q H E E W Y E N L A P N P V L G T G R T A D K Q L P T D K G E S Q K E P E G P T G T S N S Q H E E W Y E N L A P N P V L G T G R T A D K Q L P T D K G E S Q K E Q F I E S S R D Q A E L P D S T T G S S G E K R P T D A P M E E I Q D G S N S R P V E P R Q F I E S S R D Q A E L P D S T T G S S G E K R P T D A P M E E I Q D G S N S R P V E P R 181 225 Q F I E S S R D Q A E L P D S T T G S S G E K R P T D A P M E E I Q D G S N S R P V E P R Q F I E S S R D Q A E L P D S T T G S S G E K R P T D A P M E E I Q D G S N S R P V E P R V P D L P I R R D F L T G R L A G Q K K P K Q K K L R I R L P T E V P L L R E P D F S Q H V P D L P I R R D F L T G R L A G Q K K P K Q K K L R I R L P T E V P L L R E P D F S Q H 226 270 V P D L P I R R D F L T G R L A G Q K K P K Q K K L R I R L P T E V P L L R E P D F S Q H V P D L P I R R D F L T G R L A G Q K K P K Q K K L R I R L P T E V P L L R E P D F S Q H F L Q L V N G Q K C T E A V K L L D P S T Q K D Y F K L V T Y I Y D A Q T G R W V H Q P N F L Q L V N G Q K C T E A V K L L D P S T Q K D Y F K L V T Y I Y D A Q T G R W V H Q P N 271 315 F L Q L V N G Q K C T E A V K L L D P S T Q K D Y F K L V T Y I Y D A Q T G R W V H Q P N F L Q L V N G Q K C T E A V K L L D P S T Q K D Y F K L V T Y I Y D A Q T G R W V H Q P N V P A * V P A * 316 319 V P A * V P A * Figure B.6: Translated sequence alignment of gene PST130_05944. This gene has been identified to encode a putative effector protein (Cantu et al., 2013). The signal peptide, predicted using SignalP (version 2; Emanuelsson et al., 2007) is indicated by the black box. Alternative amino acids resulting from nonsyn- onymous SNPs at biallelic sites are indicated in the below diagonal triangles. Colours were assigned according to the “Clustal X Colour Scheme” used in Jalview (Waterhouse et al., 2009), categorising amino acid profiles. CHAPTER B: EVOLUTION OF SOUTH AFRICAN PST 199 SA1 M Q S S L I V S I L I V C S G V I A L P T S N Q A Q I E T R A E K T R S S D K Y A S S E Y SA2 M Q S S L I V S I L I V C S G V I A L P T S N Q A Q I E T R A E K T R S S D K Y A S S E Y 45 SA3 M Q S S L I V S I L I V C S G V I A L P T S N Q A Q I E T R A E K T R S S D K Y A S S E Y SA4 M Q S S L I V S I L I V C S G V I A L P T S N Q A Q I E T R A E K T R S S D K Y A S S E Y N E S D T Y A S A P N S A P S V I P V G F P S I P L P Q V S G S S P Q S G S Y F G G K G G N E S D T Y A S A P N S A P S V I P V G F P S I P L P Q V S G S S P Q S G S Y F G G K G G 46 90 N E S D T Y A S A P N S A P S V I P V G F P S I P L P Q V S G S S P Q S G S Y F G G K G G N E S D T Y A S A P N S A P S V I P V G F P S I P L P Q V S G S S P Q S G S Y F G G K G G R I S S A F P G F V G G F G G K I S G K A G G K M D A G M G G K I A A G G S G G L N A A G R I S S A F P G F V G G F G G K I S G K A G G K M D A G M G G K I A A G G S G G L N A A G 91 135 R I S S A F P G F V G G F G G K I S G K A G G K M D A G M G G K I A A G G S G G L N A A G R I S S A F P G F V G G F G G K I S G K A G G K M D A G M G G K I A A G G S G G L N A A G S V G G Q V A G G V Q A G I G A A G S I A G Q AV A G G A Q S V G G Q V A G G V Q A G I G A A G S I A G Q A A G G A Q 136 P 164 S V G G Q V A G G A Q A G I AV G A A G S I A G Q A A G G A Q S V G G Q V A G G V Q A G I G A A G S I A G Q A A G G A Q Figure B.7: Translated sequence alignment of gene PST130_06503. This gene has been identified to encode a putative effector protein (Cantu et al., 2013). The signal peptide, predicted using SignalP (version 2; Emanuelsson et al., 2007) is indicated by the black box. Alternative amino acids resulting from nonsyn- onymous SNPs at biallelic sites are indicated in the below diagonal triangles. Colours were assigned according to the “Clustal X Colour Scheme” used in Jalview (Waterhouse et al., 2009), categorising amino acid profiles. CHAPTER B: EVOLUTION OF SOUTH AFRICAN PST 200 SA1 M T K N A I S L S V F L L S C V P K S Q Q T F G F F S T V L S S N G G D P N A S Y Y A G G SA2 M T K N A I S L S V F L L S C V P K S Q Q T F G F F S T V L S S N G G D P N A S Y Y A G G 45 SA3 M T K N A I S L S V F L L S C V P K S Q Q T F G F F S T V L S S N G G D P N A S Y Y A G G SA4 M T K N A I S L S V F L L S C V P K S Q Q T F G F F S T V L S S N G G D P N A S Y Y A G G K V R Q V L A A S Q P G A K G G G Q A D A G A V V P P V K C A C E N G G P P G P S G S S D K V R Q V L A A S Q P G A K G G G Q A D A G A V V P P V K C A C E N G G P P G P S G S S D 46 90 K V R Q V L A A S Q P G A K G G G Q A D A G A V V P P V K C A C E N G G P P G P S G S S D K V R Q V L A A S Q P G A K G G G Q A D A G A V V P P V K C A C E N G G P P G P S G S S D K G T A P P N S A G G T T P P S I S S G G P T P P V T S G G P P P N G P P P I T S G A P P K G T A P P N S A G G T T P P S I S S G G P T P P V T S G G P P P N G P P P I T S G A P P 91 135 K G T A P P N S A G G T T P P S I S S G G P T P P V T S G G P P P N G P P P I T S G A P P K G T A P P N S A G G T T P P S I S S G G P T P P V T S G G P P P N G P P P I T S G A P P P G S T P S G G P P S T P L G G T P P S G P S G D S S A K P S D S P T K G D G S G D K N S P G S T P S G G P P S T P L G G T P P S G P S G D S S A K P S D S P T K G D G S G D K N S 136 180 P G S T P S G G P P S T P L G G T P P S G P S G D S S A K P S D S P T K G D G S G D K N S P G S T P S G G P P S T P L G G T P P S G P S G D S S A K P S D S P T K G D G S G D K N S P P P V T S G G P P P V T S G G A A T P S S P G N G S S G G K Q K P K D T P S K T T D K D P P P V T S G G P P P V T S G G A A T P S S P G N G S S G G K Q K P K D T P S K T T D K D 181 225 P P P V T S G G P P P V T S G G A A T P S S P G N G S S G G K Q K P K D T P S K T T D K D P P P V T S G G P P P V T S G G A A T P S S P G N G S S G G K Q K P K D T P S K T T D K D L P P P V T S G G T S S P G S P G D G S S Q G K P K P K S G D S G D T P S V S S G G G T S L P P P V T S G G T S S P G S P G D G S S Q G K P K P K S G D S G D T P S V S S G G G T S 226 270 L P P P V T S G G T S S P G S P G D G S S Q G K P K P K S G D S G D T P S V S S G G G T S L P P P V T S G G T S S P G S P G D G S S Q G K P K P K S G D S G D T P S V S S G G G T S D K P K D T P S K P G G S A D T P S V S S G G S T S D K P K D T P S K P G G S E D T P S V D K P K D T P S K P G G S A D T P S V S S G G S T S D K P K D T P S K P G G S E D T P S V 271 315 D K P K D T P S K P G G S A D T P S V S S G G S T S D K P K D T P S K P G G S E D T P S V D K P K D T P S K P G G S A D T P S V S S G G S T S D K P K D T P S K P G G S E D T P S V S S G G S T A D G K P K P K D T T S K P G G S E D T S S G G S PT A D G K P K P K D T T S K P G G S E D T316 341 S S G G S T A D G K P K P K D T T S K P G G S E D T S S G G S P AT S D G K P S K P K D T T S K P G G S E D T Figure B.8: Translated sequence alignment of gene PST130_06558. This gene has been identified to encode a putative effector protein (Cantu et al., 2013). The signal peptide, predicted using SignalP (version 2; Emanuelsson et al., 2007) is indicated by the black box. Alternative amino acids resulting from nonsyn- onymous SNPs at biallelic sites are indicated in the below diagonal triangles. Colours were assigned according to the “Clustal X Colour Scheme” used in Jalview (Waterhouse et al., 2009), categorising amino acid profiles. CHAPTER B: EVOLUTION OF SOUTH AFRICAN PST 201 SA1 M I F H T R T F Q L F S L T A M L C S R V Q A K C E G V M I V S A D A P E I P D M S A K D SA2 M I F H T R T F Q L F S L T A M L C S R V Q A K C E G V M I V S A D A P E I P D M S A K D 45 SA3 M I F H T R T F Q L F S L T A M L C S R V Q A K C E G V M I V S A D A P E I P D M S A K D SA4 M I F H T R T F Q L F S L T A M L C S R V Q A K C E G V M I V S A D A P E I P D M S A K D Q T Y H P E V G R I S Y S L D S A G T L E L T S T T P G F N C G P I T N F V S S N A T S K Q T Y H P E V G R I S Y S L D S A G T L E L T S T T P G F N C G P I T N F V S S N A T S K 46 90 Q T Y H P E V G R I S Y S L D S A G T L E L T S T T P G F N C G P I T N F V S S N A T S K Q T Y H P E V G R I S Y S L D S A G T L E L T S T T P G F N C G P I T N F V S S N A T S K T P V K D P S A H K S S R D K K E S Q D P V Q S V G A Q L H C A R D P D T V G V D L M T P T P V K D P S A H K S S R D K K E S Q D P V Q S V G A Q L H C A R D P D T V G V D L M T P 91 135 T P V K D P S A H K S S R D K K E S Q D P V Q S V G A Q L H C A R D P D T V G V D L M T P T P V K D P S A H K S S R D K K E S Q D P V Q S V G A Q L H C A R D P D T V G V D L M T P W Q T I T F Y G S L F F Q I E M K N N T C A K P A E L V L D Y S R C S Y N A T T N T G R Q W Q T I T F Y G S L F F Q I E M K N N T C A K P A E L V L D Y S R C S Y N A T T N T G R Q 136 180 W Q T I T F Y G S L F F Q I E M K N N T C A K P A E L V L D Y S R C S Y N A T T N T G R Q W Q T I T F Y G S L F F Q I E M K N N T C A K P A E L V L D Y S R C S Y N A T T N T G R Q G S A I P C N W S T C * G S A I P C N W S T C * 181 192 G S A I P C N W S T C * G S A I P C N W S T C * Figure B.9: Translated sequence alignment of gene PST130_07448. This gene has been identified to encode a putative effector protein (Cantu et al., 2013). The signal peptide, predicted using SignalP (version 2; Emanuelsson et al., 2007) is indicated by the black box. Alternative amino acids resulting from nonsyn- onymous SNPs at biallelic sites are indicated in the below diagonal triangles. Colours were assigned according to the “Clustal X Colour Scheme” used in Jalview (Waterhouse et al., 2009), categorising amino acid profiles. CHAPTER B: EVOLUTION OF SOUTH AFRICAN PST 202 SA1 M K S F G I I A T L L A L A S S I H A D A A V R P K T A A P A S D I I E L T L E N F D T V SA2 M K S F G I I A T L L A L A S S I H A D A A V R P K T A A P A S D I I E L T L E N F D T V 45 SA3 M K S F G I I A T L L A L A S S I H A D A A V R P K T A A P A S D I I E L T L E N F D T V SA4 M K S F G I I A T L L A L A S S I H A D A A V R P K T A A P A S D I I E L T L E N F D T V V A T T P L I L V E F M V P W C H F C Q D L G P E Y K R S A K I L K E Q G I P S A K V D C V A T T P L I L V E F M V P W C H F C Q D L G P E Y K R S A K I L K E Q G I P S A K V D C 46 90 V A T T P L I L V E F M V P W C H F C Q D L G P E Y K R S A K I L K E Q G I P S A K V D C V A T T P L I L V E F M V P W C H F C Q D L G P E Y K R S A K I L K E Q G I P S A K V D C T E Q D E L C A E H L L P S Y P T L K V F S N G R M A V Y K G P EK K A D S I V S Y I E N K T E Q D E L C A E H L L P S Y P T L K V F S N G R M A V Y K G P EK K A D S I V S Y I E N K91 135 T E Q D E L C A E H L L P S Y P T L K V F S N G R M A V Y K G P EK K A D S I V S Y I E N K T E Q D E L C A E H L L P S Y P T L K V F S N G R M A V Y K G P EK K A D S I V S Y I E N K E Y L G S N K A R I S S R R D S N T V * E Y L G HS N K A V R I S S R R D S N T V *136 H 155E Y L G S N K A V R I S S R R D S N T V * E Y L G HS N K A V R I S S R R D S N T V * Figure B.10: Translated sequence alignment of gene PST130_07513. This gene has been identified to encode a putative effector protein (Cantu et al., 2013). The signal peptide, predicted using SignalP (version 2; Emanuelsson et al., 2007) is indicated by the black box. Alternative amino acids resulting from nonsynonymous SNPs at biallelic sites are indicated in the below diagonal triangles. Colours were assigned according to the “Clustal X Colour Scheme” used in Jalview (Waterhouse et al., 2009), categorising amino acid profiles. CHAPTER B: EVOLUTION OF SOUTH AFRICAN PST 203 SA1 M L P S R T I W L L F L A S S I P I L Q V L A G T D Q G L S P V R R Q T L E K R W G V C M SA2 M L P S R T I W L L F L A S S I P I L Q V L A G T D Q G L S P V R R Q T L E K R W G V C M 45 SA3 M L P S R T I W L L F L A S S I P I L Q V L A G T D Q G L S P V R R Q T L E K R W G V C M SA4 M L P S R T I W L L F L A S S I P I L Q V L A G T D Q G L S P V R R Q T L E K R W G V C M V P N R R K G C V V W G S Q S C C R D C C S E Y L Q G I R P E S W R I Q C G C P P LR H A P P V P N R R K G C V V W G S Q S C C R D C C S E Y L Q G I R P E S W R I Q C G C P P R H A P 46 90 V P N R R K G C V V W G S Q S C C R D C C S E Y L Q G I R P E S W R I Q C G C P P L H AR P P V P N R R K G C V V W G S Q S C C R D C C S E Y L Q G I R P E S W R I Q C G C P P L H AR P P H T V V V V Q Q A A P P P P P A P A P A P A P A Q G P T I V I N H P G A Q P A V A Y P Q P H T V V V V Q Q A A P P P P P A P A P A P A P A Q G P T I V I N H P G A Q PT A V A Y P Q P91 135 H T V V V V Q Q A A P P P P P A P A P A P A P A Q G P T I V I N A PV T H P G G Q T A V A Y P Q P H T V V V V Q Q A A P P P P P A P A P A P A P A Q G P T I V I N H P G A Q P A V A Y P Q P V V A Y P A Q P G V V V A Y P A Q P G V 136 145 V V A Y P A Q P G V V V A Y P A Q P G V Figure B.11: Translated sequence alignment of gene PST130_07564. This gene has been identified to encode a putative effector protein (Cantu et al., 2013). The signal peptide, predicted using SignalP (version 2; Emanuelsson et al., 2007) is indicated by the black box. Alternative amino acids resulting from nonsynonymous SNPs at biallelic sites are indicated in the below diagonal triangles. Colours were assigned according to the “Clustal X Colour Scheme” used in Jalview (Waterhouse et al., 2009), categorising amino acid profiles. CHAPTER B: EVOLUTION OF SOUTH AFRICAN PST 204 SA1 M T R I F F A L L S I L A I I N T I Y A R S S L N D F L R R A I K G G V S Y Y L S N M G A SA2 M T R I F F A L L S I L A I I N T I Y A R S S L N D F L R R A I K G G V S Y Y L S N M G A 45 SA3 M T R I F F A L L S I L A I I N T I Y A R S S L N D F L R R A I K G G V S Y Y L S N M G A SA4 M T R I F F A L L S I L A I I N T I Y A R S S L N D F L R R A I K G G V S Y Y L S N M G A I S T D L M K D E D P K E E C V F Y V N S Y Q S T R E K N A A I A F A A M R N R Q L T A S I S T D L M K D E D P K E E C V F Y V N S Y Q S T R E K N A A I A F A A M R N R Q L T A S 46 90 I S T D L M K D E D P K E E C V F Y V N S Y Q S T R E K N A A I A F A A M R N R Q L T A S I S T D L M K D E D P K E E C V F Y V N S Y Q S T R E K N A A I A F A A M R N R Q L T A S G G R P T A N T L Y D A F D L N L A F G D S G T L M R E A M A G G P A Y L R S Y F K V T S G G R P T A N T L Y D A F D L N L A F G D S G T L M R E A M A G G P A Y L R S Y F K V T S 91 135 G G R P T A N T L Y D A F D L N L A F G D S G T L M R E A M A G G P A Y L R S Y F K V T S G G R P T A N T L Y D A F D L N L A F G D S G T L M R E A M A G G P A Y L R S Y F K V T S G A Y A Q R C R G T V W L I V K K G A E I Y H D A I W L T D E Y P Q L I R P G S G V T A I G A Y A Q R C R G T V W L I V K K G A E I Y H D A I W L T D E Y P Q L I R P G S G V T A I 136 180 G A Y A Q R C R G T V W L I V K K G A E I Y H D A I W L T D E Y P Q L I R P G S G V T A I G A Y A Q R C R G T V W L I V K K G A E I Y H D A I W L T D E Y P Q L I R P G S G V T A I W E I D P A E I E A A I A L D N P N H D L H P T P Y W E I D P A E I E A A I A L D N P N H D L H P T P Y 181 206 W E I D P A E I E A A I A L D N P N H D L H P T P Y W E I D P A E I E A A I A L D N P N H D L H P T P Y Figure B.12: Translated sequence alignment of gene PST130_08031. This gene has been identified to encode a putative effector protein (Cantu et al., 2013). The signal peptide, predicted using SignalP (version 2; Emanuelsson et al., 2007) is indicated by the black box. Alternative amino acids resulting from nonsynonymous SNPs at biallelic sites are indicated in the below diagonal triangles. Colours were assigned according to the “Clustal X Colour Scheme” used in Jalview (Waterhouse et al., 2009), categorising amino acid profiles. CHAPTER B: EVOLUTION OF SOUTH AFRICAN PST 205 SA1 M S F S N T I L K F A L L F S V A L V Y Q L S G I N A N S I V S P K P N Q T L N P G T K L SA2 M S F S N T I L K F A L L F S V A L V Y Q L S G I N A N S I V S P K P N Q T L N P G T K L 45 SA3 M S F S N T I L K F A L L F S V A L V Y Q L S G I N A N S I V S P K P N Q T L N P G T K L SA4 M S F S N T I L K F A L L F S V A L V Y Q L S G I N A N S I V S P K P NT Q T L N P G E T K L V V V V K K N S T D S T D Q T L A F A V G L S V Y K D S L G R P F L R T V E V G K G E A A V V V V K K N S T D S T D Q T L A F A V G L S V Y K D S L G R P F L R T V E V G K G E A A 46 90 V V V V K K N S T D S T D Q T L A F A V G L S V Y K D S L G R P F L R T V E V G K G E A A V V V V K K N S T D S T D Q T L A F A V G L S V Y K DV R E S L G R P F L R T V E V G K G E A A W N S H E S T Y T F E V T L P P T S E F I D Q F T K W N S H E S T Y T F E V T L P P T S E F I D Q F T K 91 116 W N S H E S T Y T F E V T L P P T S E F I D Q F T K W N S H E S T Y T F E V T L P P T S E F I D Q F T K Figure B.13: Translated sequence alignment of gene PST130_08984. This gene has been identified to encode a putative effector protein (Cantu et al., 2013). The signal peptide, predicted using SignalP (version 2; Emanuelsson et al., 2007) is indicated by the black box. Alternative amino acids resulting from nonsynonymous SNPs at biallelic sites are indicated in the below diagonal triangles. Colours were assigned according to the “Clustal X Colour Scheme” used in Jalview (Waterhouse et al., 2009), categorising amino acid profiles. CHAPTER B: EVOLUTION OF SOUTH AFRICAN PST 206 SA1 M P R S I L H T S C L A L Y V I A A I H V A T R P T I C Y G A S L A K R A I E R E T D R T SA2 M P R S I L H T S C L A L Y V I A A I H V A T R P T I C Y G A S L A K R A I E R E T D R T 45 SA3 M P R S I L H T S C L A L Y V I A A I H V A T R P T I C Y G A S L A K R A I E R E T D R T SA4 M P R S I L H T S C L A L Y V I A A I H V A T R P T I C Y G A S L A K R A I E R E T D R T L L R A T P S R K R V R L F G V D L S D E H N T R L E E A R V G R E K D D P Q S I P L S L L L R A T P S R K R V R L F G V D L S D E H N T R L E E A R V G R E K D D P Q S I P L S L 46 90 L L R A T P S R K R V R L F G V D L S D E H N T R L E E A R V G R E K D D P Q S I P L S L L L R A T P S R K R V R L F G V D L S D E H N T R L E E A R V G R E K D D P Q S I P L S L K P E D T L G T I P L E A Y A A L V P E L F V C Q F G S K G T I P E L L E Y L R N P P F G K P E D T L G T I P L E A Y A A L V P E L F V C Q F G S K G T I P E L L E Y L R N P P F G 91 135 K P E D T L G T I P L E A Y A A L V P E L F V C Q F G S K G T I P E L L E Y L R N P P F G K P E D T L G T I P L E A Y A A L V P E L F V C Q F G S K G T I P E L L E Y L R N P P F G F P G N A P W I Q R I D N T A T W L Q S K D I G V S N R F K P W D L L P R T Y K Q V E S D F P G N A P W I Q R I D N T A T W L Q S K D I G V S N R F K P W D L L P R T Y K Q V E S D 136 180 F P G N A P W I Q R I D N T A T W L Q S K D I G V S N R F K P W D L L P R T Y K Q V E S D F P G N A P W I Q R I D N T A T W L Q S K D I G V S N R F K P W D L L P R T Y K Q V E S D F N M I K A R E V L K E M K N H D L E S E S Q E H L V Q N L L K D L M K V L E K K T L I LS F N M I K A R E V L K E M K N H D L E S E S Q E H L V Q N L L K D L M K V L E K K T L I S 181 225 F N M I K A R E V L K E M K N H D L E S E S Q E H L V Q N L L K D L M K V L E K K T L I S F N M I K A R E V L K E M K N H D L E S E S Q E H L V Q N L L K D L M K V L E K K T L I S K D G GR A G P S R K Q F R F S G V G E H N E H N T G L K E A Q V Q R G K G H T Q S H T F S F K D GR A G P S G R K Q F R F S G V G E H N E H N T G L K E A Q V Q R G K G H T Q S H T F S F226 270 K D G A G P S R K Q F R F S G V G E H N E H N T G L K E A Q V Q R G K G H T Q S H T F S F K D G A G P S R K Q F R F S G V G E H N E H N T G L K E A Q V Q R G K G H T Q S H T F S F K P E D T L D K T S L E A Y A A L V P D L Y R C R F G N K G T I P E L S K Y L D A R N P P K P E D T L D K T S L E A Y A A L V P D L Y R C R F G N K G T I P E L S K Y L D A R N P P 271 315 K P E D T L D K T S L E A Y A A L V P D L Y R C R F G N K G T I P E L S K Y L D A R N P P K P E D T L D K T S L E A Y A A L V P D L Y R C R F G N K G T I P E L S K Y L D A R N P P P S L P K D E A V R K R I Y D T R A W L H S K D I E I N T S Y K H W S W G P S M Y R E V E P S L P K D E A V R K R I Y D T R A W L H S K D I E I N T S Y K H W S W G P S M Y R E V E 316 360 P S L P K D E A V R K R I Y D T R A W L H S K D I E I N T S Y K H W S W G P S M Y R E V E P S L P K D E A V R K R I Y D T R A W L H S K D I E I N T S Y K H W S W G P S M Y R E V E S D F N T I S L E M Y L E L A P V V L G Y P H D W N Q D L R H F L G K K Y D L Q T K N Q G S D F N T I S L E M Y L E L A P V V L G Y P H D W N Q D L R H F L G K K Y D L Q T K N Q G 361 405 S D F N T I S L E M Y L E L A P V V L G Y P H D W N Q D L R H F L G K K Y D L Q T K N Q G S D F N T I S L E M Y L E L A P V V L G Y P H D W N Q D L R H F L G K K Y D L Q T K N Q G A M A Q F L M N D L V K A F K E K M F K P R N P L * A M A Q F L M N D L V K A F K E K M F K P R N P L * 406 431 A M A Q F L M N D L V K A F K E K M F K P R N P L * A M A Q F L M N D L V K A F K E K M F K P R N P L * Figure B.14: Translated sequence alignment of gene PST130_09018. This gene has been identified to encode a putative effector protein (Cantu et al., 2013). The signal peptide, predicted using SignalP (version 2; Emanuelsson et al., 2007) is indicated by the black box. Alternative amino acids resulting from nonsynonymous SNPs at biallelic sites are indicated in the below diagonal triangles. Colours were assigned according to the “Clustal X Colour Scheme” used in Jalview (Waterhouse et al., 2009), categorising amino acid profiles. CHAPTER B: EVOLUTION OF SOUTH AFRICAN PST 207 SA1 M I S T N F L A C L T P I F L N G L L A L K V T S P T E N S Q W D L Q A T N T I T W T S V SA2 M I S T N F L A C L T P I F L N G L L A L K V T S P T E N S Q W D L Q A T N T I T W T S V 45 SA3 M I S T N F L A C L T P I F L N G L L A L K V T S P T E N S Q W D L Q A T N T I T W T S V SA4 M I S T N F L A C L T P I F L N G L L A L K V T S P T E N S Q W D L Q A T N T I T W T S V A T D P K T F D I V L T N IN N P S C A P T G F T Q A I K Q N I A S S D G K F D I S G V S S A T D P K T F D I V L T N IN N P S C A P T G F T Q A I K Q N I A S S D G K F D I S G V S S46 90 A T D P K T F D I V L T N IN N P S C A P T G F T Q A I K Q N I A S S D G K F D I S G V S S A T D P K T F D I V L T N IN N P S C A P T G F T Q A I K Q N I A S S D G K F D I S G V S S M K A C S G Y Q I N L V A S S T P D N GS A H N A G I L A Q S A P F N V T Q T S G P S M S E M K A C S G Y Q I N L V A S S T P D N GS A H N A G I L A Q S A P F N V T Q T S G P S M S E91 135 M K A C S G Y Q I N L V A S S T P D N GS A H N A G I L A Q S A P F N V T Q T S G P S M S E M K A C S G Y Q I N L V A S S T P D N GS A H N A G I L A Q S A P F N V T Q T S G P S M S E S L P L A G A N S T A N T P A A S T P V A N T T S P T Q S T S S T G A P K Y N S G T A A P S L P L A G A N S T A N T P A A S T P V A N T T S P T Q S T S S T G A P K Y N S G T A A P 136 180 S L P L A G A N S T A N T P A A S T P V A N T T S P T Q S T S S T G A P K Y N S G T A A P S L P L A G A N S T A N T P A A S T P V A N T T S P T Q S T S S T G A P K Y N S G T A A P G A K Y S F A P R I S G S F Q K V T A C A L L L V T F M L A * G A K Y S F A P R I S G S F Q K V T A C A L L L V T F M L A * 181 L 211 G A K Y S F A P R I S G S F Q K V T A C A L L F IL V T F M L A * G A K Y S F A P R I S G S FL Q K V T A C A L L L V T F M L A * Figure B.15: Translated sequence alignment of gene PST130_09275. This gene has been identified to encode a putative effector protein (Cantu et al., 2013). The signal peptide, predicted using SignalP (version 2; Emanuelsson et al., 2007) is indicated by the black box. Alternative amino acids resulting from nonsynonymous SNPs at biallelic sites are indicated in the below diagonal triangles. Colours were assigned according to the “Clustal X Colour Scheme” used in Jalview (Waterhouse et al., 2009), categorising amino acid profiles. CHAPTER B: EVOLUTION OF SOUTH AFRICAN PST 208 SA1 M Q I Q Q L I T I L C L C F S Q A L A A S V E A FL K P K I Q S L V V D L T E R R V I P G E SA2 M Q I Q Q L I T I L C L C F S Q A L A A S V E A FL K P K I Q S L V V D L T E R R V I P G E I 45SA3 M Q I Q Q L T T I L C L C F S Q A L A A S V E A F L K P K I Q S L V V D L T E H R R V I P G E SA4 M Q I Q Q L I T I L C L C F S Q A L A A S V E A F K P K I Q S L V V D L T E HT L R R V I P G E R A S G T K Y D H A L R L D M D E P V A D P N Y T P A F Y R D Y I Q G M NY P L T Y V D K E R A S G T K Y D H A L R L D M D E P V A D P N Y T P A F Y R D Y I Q G M NY P L T Y V D K E46 90 R A S G T K Y D H A L R L D M D E P V A D P N Y T P A F Y R D Y I Q G M NY P L T Y V D K E R A S G T K Y D H A L R L D M D E P V A D P N Y T P A F Y R D Y I Q G M NY P L T Y V D K E S T N S F L D A R A A Y E E T L R D D F T G N Y R V Q R R R L R I C Q N A M Y S R L C D I S T N S F L D A R A A Y E E T L R D D F T G N Y R V Q R R R L R I C Q N A M Y S R L C D I 91 135 S T N S F L D A R A A Y E E T L R D D F T G N Y R V Q R R R L R I C Q N A M Y S R L C D I S T N S F L D A R A A Y E AE T L R D G D F T G N F Y R V Q R R R L R I C Q N A M Y S R L C D I V K K G D D D T V A H V L K T Y H E Y V K S L I N K H S N A F P Q I Q T S E R A P S K P Q V K K G D D D T V A H V L K T Y H E Y V K S L I N K H S N A F P Q I Q T S E R A P S K P Q 136 180 V K K G D D D T V A H V L K T Y H E Y V K S L I N K H S N A F P Q I Q T S E R A P S K P Q V K K G D D D T V A H V L K T Y H E Y V K S L I N K H S N A F P Q I Q T S E R A P PS K P Q S A F V Y R T K E Q I N K E L L A T N Q A E T D V P K A R L I D G T S Q K T F E D F L F N S A F V Y R T K E Q I N K E L L A T N Q A E T D V P K A R L I D G T S Q K T F E D F L F N 181 225 S A F V Y R T K E Q I N K E L L A T N Q A E T D V P K A R L I D G T S Q K T F E D F L F N L A S P F V Y R T K E L A DQ Q I N K E L L A K T N Q A E T D V P K A R L I D G T S Q K T F E D F L F N H S Q K Q W Q L V H G S P S N T R P Q I F L E T G E R Y S * H S Q K Q W Q L V H G S P S N T R P Q I F L E T G E R Y S * 226 255 H S Q K Q W Q L V H G S P S N T R P Q I F L E T G E R Y S * H S Q K Q W Q L V H G S P S N T R P Q I F L E T G E R Y S * Figure B.16: Translated sequence alignment of gene PST130_10286. This gene has been identified to encode a putative effector protein (Cantu et al., 2013). The signal peptide, predicted using SignalP (version 2; Emanuelsson et al., 2007) is indicated by the black box. Alternative amino acids resulting from nonsynonymous SNPs at biallelic sites are indicated in the below diagonal triangles. Colours were assigned according to the “Clustal X Colour Scheme” used in Jalview (Waterhouse et al., 2009), categorising amino acid profiles. CHAPTER B: EVOLUTION OF SOUTH AFRICAN PST 209 SA1 M F G S S T I L L A C S L L S Y V L A A P A R L S N L P S L D G T L S N A P S P S W Q L T SA2 M F G S S T I L L A C S L L S Y V L A A P A R L S N L P S L D G T L S N A P S P S W Q L T 45 SA3 M F G S S T I L L A C S L L S Y V L A A P A G L S N L PS R Q S L D G T L S N A P S P S W Q L T SA4 M F G S S T I L L A C S L L S Y V L A A P A R L S N L P S L D G T L S N A P S P S W Q L T I D N G Q I R N R R F M V E A S A P K V E P P M S K Q M A C F D S K V G K P S I E Q T E R I D N G Q I R N R R F M V E A S A P K V E P P M S K Q M A C F D S K V G K P S I E Q S E K 46 T R 90 I D N G Q I R N R R F M V E A S A P K V E P P M S K Q M A C F D S K V G K P S I E Q S KT E R I D N G Q I R N R R F M V E A S A P K V E P P M S K Q M A C F D S K V G K P S I E Q S E KT R I E N Y L K H C K T G K A Y K V P A N G D I Y P M P K S D S T Y G Y I F G K V Q F Y D D C I E N Y L K H C KN T G K A Y K V P A E N G D I Y P M P K S D S T Y G Y I F G K V Q F Y D D C91 135 I E N Y L K H C K AN T G K A Y K V P E N G D I Y P M P K S D S T Y G Y I F G K V Q F Y D D C I E N Y L K H C KN T G K A Y K V P A N G D I Y P M P K S D S T Y G Y I F G K V Q F Y D D C D R L I H E T G C C Y G K P S D R E G Y N A M E S C C I V A G A C Y G C I C C T A F S A I D R L I H E T G C C Y G K P S D R E G Y N A M E S C C I V A G A C Y G C I C C T A F S A I 136 180 D R L I H E T G C C Y G K P S D R E E F N A G E SG Y T T M T C C I G V A G A C C Y G C I C C T A F S A I D R L I H E T G C C Y G K P S D R E G Y N A M E S C C I V A G A C Y G C I C C T A F S A I L N F K L T V D I K L V W S S N P * L N F K L T V D I K L V W S S N P * 181 198 L N F K L T V D I K L V W S S N P * L N F K L T V D I K L V W S S N P * Figure B.17: Translated sequence alignment of gene PST130_12487. This gene has been identified to encode a putative effector protein (Cantu et al., 2013). The signal peptide, predicted using SignalP (version 2; Emanuelsson et al., 2007) is indicated by the black box. Alternative amino acids resulting from nonsynonymous SNPs at biallelic sites are indicated in the below diagonal triangles. Colours were assigned according to the “Clustal X Colour Scheme” used in Jalview (Waterhouse et al., 2009), categorising amino acid profiles. CHAPTER B: EVOLUTION OF SOUTH AFRICAN PST 210 SA1 M R S F V A V A V T L A L L Q S T S A L P I F E K R A E T E G T G K G E S S S R S L G G C SA2 M R S F V A V A V T L A L L Q S T S A L P I F E K R A E T E G T G K G E S S S R S L G G C 45 SA3 M R S F V A V A V T L A L L Q S T S A L P I F E K R A E T E G T G K G E S S S R S L G G C SA4 M R S F V A V A V T L A L L Q S T S A L P I F E K R A E T E G T G K G E S S S R S L G G C S N Q V G L L N I A L S T N T H C G Q N G P A S G S G G A G G L LV P G G G G L L P G G G I S N Q V G L L N I A L S T N T H C G Q N G P A S G S G G A G G L L P G G G G L L P G G G I 46 90 S N Q V G L L N I A L S T N T H C G Q N G P A S G S G G A G G L L P G G G G P L P G G G I S N Q V G L L N I A L S T N T H C G Q N G P A S G S G G A G G L L P G G G G L L P G G G I D G L L P A G G L L P D G G I D G L L P A G G L L P G G G V D G L L P G G G I D G L L P G D G L L P A G G L L P D G G I D G L L P A G G L L P G G G V D G L L P G G G I D G L L P G 91 135 D G L L P A G G L L P D G G I D G L L P A G G L L P G G G V D G L L P G G G I D G L L P G D G L L P A G G L L P D G G I D G L L P A G G L L P G G G V D G L L P G G G I D G L L P G G G A G G L L P A G G T G G F L P G G G G L L P G G G I D G L L P G G G I D G L L P A G G G G A G G L L P A G G T G G F L P G G G G L L P G G G I D G L L P G G G I D G L L P A G G 136 180 G G A G G L L P A G G T G G F L P G G G G L L P G G G I D G L L P G G G I D G L L P A G G G G A G G L L P A G G T G G F L P G G G G L L P G G G I D G L L P G G G I D G L L P A G G I D I D 181 182 I D I D Figure B.18: Translated sequence alignment of gene PST130_12491. This gene has been identified to encode a putative effector protein (Cantu et al., 2013). The signal peptide, predicted using SignalP (version 2; Emanuelsson et al., 2007) is indicated by the black box. Alternative amino acids resulting from nonsynonymous SNPs at biallelic sites are indicated in the below diagonal triangles. Colours were assigned according to the “Clustal X Colour Scheme” used in Jalview (Waterhouse et al., 2009), categorising amino acid profiles. CHAPTER B: EVOLUTION OF SOUTH AFRICAN PST 211 SA1 M R S F G F L A T L F A L A S S I H A D A G L D P N D A P D D V I E L T S E N F D T V V T SA2 M R S F G F L A T L F A L A S S I H A D A G L D P N D A P D D V I E L T S E N F D T V V T 45 SA3 M R S F G F L A T L F A L A S S I H A D A G L N P N D A P D D V I E L T S E N F D T V V T SA4 M R S F G F L A T L F A L A S S I H A D A G L N P N D A P D D V I E L T S E N F D T V V T P A P L I L V E F M A P W C G H C K A L M P E Y K R A A T L L K K G G I P V A K A D C T E P A P L I L V E F M A P W C G H C K A L M P E Y K R A A T L L K K G G I P V A K A D C T E 46 90 P A P L I L V E F M A P W C G H C K A L M P E Y K R A A T L L K K G G I P V A K A D C T E P A P L I L V E F M A P W C G H C K A L M P E Y K R A A T L L K K G G I P V A K A D C T E Q S E L C A K Y E I Q G Y P T L K I F T N G V S S E Y K G P R K A D G I V S Y M E K R A H Q S E L C A K Y E I Q G Y P T L K I F T N G V S S E Y K G P R K A D G I V S Y M E K R A H 91 135 Q S E L C A K Y E I Q G Y P T L K I F T N G V S S E Y K G P R K A D G I V CS Y M E K R A H Q S E L C A K Y E I Q G Y P T L K I F T N G V S S E Y K G P R K A D G I V CS Y M E K R A H P V V T I V T S D N H T D F T K S G N V V P V V T I V T S D N H T D F T K S G N V V 136 156 P V V T I V T S D N H T D F T K S G N V V P V V T I V T S D N H T D F T K S G N V V Figure B.19: Translated sequence alignment of gene PST130_12956. This gene has been identified to encode a putative effector protein (Cantu et al., 2013). The signal peptide, predicted using SignalP (version 2; Emanuelsson et al., 2007) is indicated by the black box. Alternative amino acids resulting from nonsynonymous SNPs at biallelic sites are indicated in the below diagonal triangles. Colours were assigned according to the “Clustal X Colour Scheme” used in Jalview (Waterhouse et al., 2009), categorising amino acid profiles. CHAPTER B: EVOLUTION OF SOUTH AFRICAN PST 212 SA1 M M T S S K A T L F Y V A L R T L F A S Q M V L A F P L G D V S P E M T S G I L S A G D T SA2 M M T S S K A T L F Y V A L R T L F A S Q M V L A F P L G D V S P E M T S G I L S A G D T 45 SA3 M M T S S K A T L F Y V A L R T L F A S Q M V L A F P L G D V S P E M T S G I L S A G D T SA4 M M T S S K A T L F Y V A L R T L F A S Q M V L A F P L G D V S P E M T S G I L S A G D T A M T K P P R E Y F Q R V R Y G E Y G G H T D I A S N Q L P Q Y N K G E S D F S K L Y S T A M T K P P R E Y F Q R V R Y G E Y G G H T D I A S N Q L P Q Y N K G E S D F S K L Y S T 46 90 A M T K P P R E Y F Q R V R Y G E Y G G H T D I A S N Q L P Q Y N K G E S D F S K L Y S T A M T K P P R E Y F Q R V R Y G E Y G G H T D I A S N Q L P Q Y N K G E S D F S K L Y S T I L L T L D L L G Q V A E V D S M E S A S R Q I R Q K I G K L K L I I P A A G R K G R E Y I L L T L D L L G Q V A E V D S M E S A S R Q I R Q K I G K L K L I I P A A G R K G R E Y 91 135 I L L T L D L L G Q V A E V D S M E S A S R Q I R Q K I G K L K L I I P A A G R K G R E Y I L L T L D L L G Q V A E V D S M E S A S R Q I R Q K I G K L K L I I P A A G R K G R E Y S L H L A S Q FL E F I H N Q L S T E F Q W G L S H P N V E W A E L Y H G P A L V E A P P K S L H L A S Q FL E F I H N Q L S T E F Q W G L S H P N V E W A E L Y H G P A L V E A P P K136 F 180S L H L A S Q L E F I H N Q L S T E F Q W G L S H P N V E W A E L Y H G P A L V E A P P K S L H L A S Q FL E F I H N Q L S T E F Q W G L S H P N V E W A E L Y H G P A L V E A P P K V E P I K W D D L Y H GV P A L D K A S L E V Q P V R K S G I N P E V F Q D N Y N S L I T D W V E P I K W D D L Y H G P A L D K A S L E V Q P V R K S G I N P E V F Q D N Y N S L I D W 181 V T 225 V E P I K W D D L Y H GV P A L D K A S L E V Q P V R K S G I M N P E V F Q D N W Y N S L I T D W V E P I K W D D L Y H GV P A L D K A S L E V Q P V R K S G I M N P E V F Q D N W I Y N S L T D W L T K P E V D D IN G I T R K S P E F Y A A V A E I I F L L N N Y M I K Y K H T L P D F P K P L L T K P E V D G I T R K S P E F Y A A V A E I I F L L N N Y M I K Y K H T L P D F P K P L 226 N 270 L T K P E V DN G I T R K S P E F Y A A V A D I E I I F L L N N Y M I K Y K H T L P D F P K P L L T K P E V DN G I T R K S P E F Y A A V A D E I I F L I L N N Y M I K Y K H T L P D F P K P L R R F E P E E I A Y V I E N F A R S E K R L L E D I R L P F P P V D S E G W K T S A S I N R R F E P E E I A Y V I E N F A R S E K R L L E D I R L P F P P V D S E G W K T S A S I N 271 315 R R F E P E E I A Y V I E N F A R S E K R L L E D I R L P F P P V D S E G W K T S A S I N R R F E P E E I A Y V I E N F A R S E K R L L E D I R L P F P P V D S E G W K T S A S I N F L I S S D I S K A F R G E I K A L D D E G Q E L V A K A F Q R G T A K L L E Q I R G K E F L I S S D I S K A F R G E I K A L D D E G Q E L V A K A F Q R G T A K L L E Q I R G K E 316 360 F L I S S D FE I S K A F R G E I K A L D D E G Q E L V A K A F Q R G T A K L L E Q I R G K E F L I S S D I FE S K A F R G E I K A L D D E G Q E L V AK V K A F Q R G T A K L L E Q I R G K E I R GR S E Q A Y A Y L R R S A Q P K S P S R L G S P T H L T A E A L V * I R GR S E Q A Y A Y L R R S A Q P K S P S R L G S P T H L T A E A L 361 V * G 395I R R S E Q A Y A Y L R R S A Q P K S P S R L G S P T H L T A E A L V * I R GR S E Q A Y A Y L R R S A Q P K S P S R L G S P T H L T A E A L V * Figure B.20: Translated sequence alignment of gene PST130_13969. This gene has been identified to encode a putative effector protein (Cantu et al., 2013). The signal peptide, predicted using SignalP (version 2; Emanuelsson et al., 2007) is indicated by the black box. Alternative amino acids resulting from nonsynonymous SNPs at biallelic sites are indicated in the below diagonal triangles. Colours were assigned according to the “Clustal X Colour Scheme” used in Jalview (Waterhouse et al., 2009), categorising amino acid profiles. CHAPTER B: EVOLUTION OF SOUTH AFRICAN PST 213 SA1 M N N R F N I I I L L F I T S L D S L F A S Q H P H S T I N H L K T R D Q P N G I S K P C SA2 M N N R F N I I I L L F I T S L D S L F A S Q H P H S T I N H L K T R D Q P N G I S K P C 45 SA3 M N N R F N I I I L L F I T S L D S L F A S Q H P H S T I N H L K T R D Q P N G I S K P C SA4 M N N R F N I I I L L F I T S L D S L F A S Q H P H S T I N H L K T R D Q P N G I S K P C Q T Y Y S A N T P H A V A H N C Q L D S S S Q N T T Q T C S V A F S Q T S E S A Y L C N T Q T Y Y S A N T P H A V A H N C Q L D S S S Q N T T Q T C S V A F S Q T S E S A Y L C N T 46 90 Q T Y Y S A N T P H A V A H N C Q L D S S S Q N T T Q T C S V A F S Q T S E S A Y L C N T Q T Y Y S A N T P H A V A H N C Q L D S S S Q N T T Q T C S V A F S Q T S E S A Y L C N T P E G A Y T C T G P Q S G G V V C H N C V S T P N G V L P S N T T S N A K N Q A H S G S N P E G A Y T C T G P Q S G G V V C H N C V S T P N G V L P S N T T S N A K N Q A H S G S N 91 135 P E G A Y T C T G P Q S G G V V C H N C V S T P N G V L P S N T T S N A K N Q A H S G S N P E G A Y T C T G P Q S G G V V C H N C V S T P N G V L P S N T T S N A K N Q A H S G S N S T N E H Q E H P W F EK D P I T E G C F W H F I R V I E N K L P * S T N E H Q E H P W F K D P I T E G C F W H F I R V I E N K L P * 136 168 S T N E H Q E H P W F K D P I T E G C F W H F I R V I E N K L P * S T N E H Q E H P R F EW K D P I I T E G C F W H F I R V I E N K L P * Figure B.21: Translated sequence alignment of gene PST130_14091. This gene has been identified to encode a putative effector protein (Cantu et al., 2013). The signal peptide, predicted using SignalP (version 2; Emanuelsson et al., 2007) is indicated by the black box. Alternative amino acids resulting from nonsynonymous SNPs at biallelic sites are indicated in the below diagonal triangles. Colours were assigned according to the “Clustal X Colour Scheme” used in Jalview (Waterhouse et al., 2009), categorising amino acid profiles. CHAPTER B: EVOLUTION OF SOUTH AFRICAN PST 214 SA1 M K I P A I I I L L G A V C S L T N A A P M V G D V V R A G E L D V R G T G L E G T P F A SA2 M K I P A I I I L L G A V C S L T N A A P M V G D V V R A G E L D V R G T G L E G T P F A 45 SA3 M K I P A I I I L L G A V C S L T N A A P M V G D V V R A G E L D V R G T G L E G T P F A SA4 M K I P A I I I L L G A V C S L T N A A P M V G D V V R A G E L D V R G T G L E G T P F AT L A W L A Y M V L E R P G E L K N F M E G T E E G W K F S K F L P H V L G P H A L I G D I L A W L A Y M V L E R P G E L K N F M E G T E E G W K F S K F L P H V L G P H A L I G D I 46 L 90L A W L A Y M V L E R P G E L K N F M E G T E E G W K F S K F L P H V L G P H A L I G D I L A W L A Y M V L E R P G E L K N F M E G T E E G LW K F S K F L P H V L G P H A L I G D I G L V T K A L EQ K T D P A L A E K A L A Y I K S I R S A A Y N D V L E A T R P A G G H V A G L V T K A L E K T D P A L A E K A L A Y I K S I R S A A Y N D V L E A T R P A G G H V A 91 135 G L V T K A L EQ K T D P A L A E K A L A Y I K S I R S A A Y N D V L E A T R P A G G H V A G L V T K A L EQ K T D P A L A E K A L A Y I K S I R S A A Y N D V L E A T R P A G G H V A I A A T * I A A T * 136 140 I A A T * I A A T * Figure B.22: Translated sequence alignment of gene PST130_14831. This gene has been identified to encode a putative effector protein (Cantu et al., 2013). The signal peptide, predicted using SignalP (version 2; Emanuelsson et al., 2007) is indicated by the black box. Alternative amino acids resulting from nonsynonymous SNPs at biallelic sites are indicated in the below diagonal triangles. Colours were assigned according to the “Clustal X Colour Scheme” used in Jalview (Waterhouse et al., 2009), categorising amino acid profiles. CHAPTER B: EVOLUTION OF SOUTH AFRICAN PST 215 SA1 M M I L S L N L I L V L V A F F H S I P S ST I S T P A Y Y G R S S G D F R S P L M A H L G SA2 M M I L S L N L I L V L V A F F H S I P S S I S T P A Y Y G R S S G D F R S P L M A H L G 45 SA3 M M I L S L N L I L V L V A F F H S I P S ST I S T P A Y Y G R S S G D F R S P L M A H L G SA4 M M I L S L N L I L V L V A F F H S I P S ST I S T P A Y Y G R S S G D F R S P L M A H L G D G L P L Q V S P D V I A A A L E R A Q R K A E A E A E V S A D G R M R I A T P T F R KT A D G L P L Q V S P D V I A A A L E R A Q R K A E A E A E V S A D G R M R I A T P T F R K A 46 90 D G L P L Q V S P D V I A A A L E R A Q R K A E A E A E V S A D G R M R I A T P T F R K A D G L P L Q V S P D V I A A A L E R A Q R K A E A E A E V S A D G R M R I A T P T F R KT A G S D S K A R D A E W T S A R HN Q R K A E A A A A Y H A N G R S A K A A T A E K V H P E E G S D S K A R D A E W T S A R N Q R K A E A A A A Y H A N G R S A K A A T A E K V H P E E 91 135 G S D S K A R D A E W T S A R N Q R K A E A A A A Y H A N G R S A KS A A T A E K V H P E E G S D S K A R D A E W T S A R N Q R K A E A A A A Y H A N G R S A K A A T A E K V H P E E F K V E P Y R S P SV M E L T S K L L G N T F V V L D D L S Y Q W K V E I R * F K V E P Y R S P S M E L T S K L L G N T F V V L D D L S Y Q W K V E I R * 136 F K V E P Y R S P S 173 V M E L T S K L L G N T F V V L D D L S Y Q W K V E I R * F K V E P Y R S P S M E L T S K L L G N T F V V L D D L S Y Q W K V E I R * Figure B.23: Translated sequence alignment of gene PST130_16778. This gene has been identified to encode a putative effector protein (Cantu et al., 2013). The signal peptide, predicted using SignalP (version 2; Emanuelsson et al., 2007) is indicated by the black box. Alternative amino acids resulting from nonsynonymous SNPs at biallelic sites are indicated in the below diagonal triangles. Colours were assigned according to the “Clustal X Colour Scheme” used in Jalview (Waterhouse et al., 2009), categorising amino acid profiles. CHAPTER B: EVOLUTION OF SOUTH AFRICAN PST 216 SA1 M A F K S M T V A S L L V A F S F P S G L L A K D D D V K T C F T Y T G A N T T T A S C N SA2 M A F K S M T V A S L L V A F S F P S G L L A K D D D V K T C F T Y T G A N T T T A S C N 45 SA3 M A F K S M T V A S L L V A F S F P S G L L A K D D D V K T C F T Y T G A N T T T A S C N SA4 M A F K S M T V A S L L V A F S F P S G L L A K D D D V K T C F T Y T G A N T T T A S C N D I P N M V C S G G C T G G L T A T K C T T S H E M N D Q R G P L T D E K C T I A Y G K S D I P N M V C S G G C T G G L T A T K C T T S H E M N D Q R G P L T D E K C T I A Y G K S 46 90 D I P N M V C S G G C T G G L T A T K C T T S H E M N D Q R G P L T D E K C T I A Y G K S D I P N M V C S G G C T G G L T A T K C T T S H E M N D Q R G P L T D E K C T I A Y G K S S A T M A V C I A E H Q T Y T C Y G P V S G T A Q C K G C K N T Y I P P P N D Q Q N G G G S A T M A V C I A E H Q T Y T C Y G P V S G T A Q C K G C K N T Y I P P P N D Q Q N G G G 91 135 S A T M A V C I A E H Q T Y T C Y G P V S G T A Q C K G C K N T Y I P P P N D Q Q N G G G S A T M A V C I A E H Q T Y T C Y G P V S G T A Q C K G C K N T Y I P P P N D Q Q N G G G G S G N G N G G K G S G G N G S G E S G N K P P G G S S S P T P G N S P A P G Q S P T P L G S G N G N G G K G S G G N G S G E S G N K P P G G S S S P T P G N S P A P G PQ S P T P L136 180 G S G N G N G G K G S G G N G S G E S G N K P P G G S S S P T P G N S P A P G PQ S P T P L G S G N G N G G K G S G G N G S G E S G N K P P G G S S S P T P G N S P A P G Q S P T P L I S P A P G S N G N S S T P P Q T P S G G S E A P P S S S G A T T D N S K K L N S S D S K I S P A P G S N G N S S T P P Q T P S G G S E A P P S S S G A T T D N S K K L N S S D S K 181 225 I S P A P G S N G N S S T P P Q T P S G G S E A P P S S S G A T T D N S K K L N S S D S K I S P A P G S N G N S S T P P Q T P S G G S E A P P S S S G A T T D N S K K L N S S D S K P S A Y D I F L M S C S R S * P S A Y D I F L M S C S R S * 226 240 P S A Y D I F L M S C S R S * P S A Y D I F L M S C S R S * Figure B.24: Translated sequence alignment of gene PST130_17605. This gene has been identified to encode a putative effector protein (Cantu et al., 2013). The signal peptide, predicted using SignalP (version 2; Emanuelsson et al., 2007) is indicated by the black box. Alternative amino acids resulting from nonsynonymous SNPs at biallelic sites are indicated in the below diagonal triangles. Colours were assigned according to the “Clustal X Colour Scheme” used in Jalview (Waterhouse et al., 2009), categorising amino acid profiles. CHAPTER B: EVOLUTION OF SOUTH AFRICAN PST 217 SA1 M A F K S M T V A S L L V A F S F P S G L L A K D D D V K T C F T Y T G A N T T T A S C N SA2 M A F K S M T V A S L L V A F S F P S G L L A K D D D V K T C F T Y T G A N T T T A S C N 45 SA3 M A F K S M T V A S L L V A F S F P S G L L A K D D D V K T C F T Y T G A N T T T A S C N SA4 M A F K S M T V A S L L V A F S F P S G L L A K D D D V K T C F T Y T G A N T T T A S C N D I P N M V C S G G C T G G L T A T K C T T S H E M N D Q R G P L T D E K C T I A Y G K S D I P N M V C S G G C T G G L T A T K C T T S H E M N D Q R G P L T D E K C T I A Y G K S 46 90 D I P N M V C S G G C T G G L T A T K C T T S H E M N D Q R G P L T D E K C T I A Y G K S D I P N M V C S G G C T G G L T A T K C T T S H E M N D Q R G P L T D E K C T I A Y G K S S A T M A V C I A E H Q T Y T C Y G P V S G T A Q C K G C K N T Y I P P P N D Q Q N G G G S A T M A V C I A E H Q T Y T C Y G P V S G T A Q C K G C K N T Y I P P P N D Q Q N G G G 91 135 S A T M A V C I A E H Q T Y T C Y G P V S G T A Q C K G C K N T Y I P P P N D Q Q N G G G S A T M A V C I A E H Q T Y T C Y G P V S G T A Q C K G C K N T Y I P P P N D Q Q N G G G G S G N G N G G K G S G G N G S G E S G N K P P G G S S S P T P G N S P A P G Q S P T P L G S G N G N G G K G S G G N G S G E S G N K P P G G S S S P T P G N S P A P G PQ S P T P L136 180 G S G N G N G G K G S G G N G S G E S G N K P P G G S S S P T P G N S P A P G PQ S P T P L G S G N G N G G K G S G G N G S G E S G N K P P G G S S S P T P G N S P A P G Q S P T P L I S P A P G S N G N S S T P P Q T P S G G S E A P P S S S G A T T D N S K K L N S S D S K I S P A P G S N G N S S T P P Q T P S G G S E A P P S S S G A T T D N S K K L N S S D S K 181 225 I S P A P G S N G N S S T P P Q T P S G G S E A P P S S S G A T T D N S K K L N S S D S K I S P A P G S N G N S S T P P Q T P S G G S E A P P S S S G A T T D N S K K L N S S D S K P S A Y D I F L M S C S R S * P S A Y D I F L M S C S R S * 226 240 P S A Y D I F L M S C S R S * P S A Y D I F L M S C S R S * Figure B.25: Translated sequence alignment of gene PST130_17605. This gene has been identified to encode a putative effector protein (Cantu et al., 2013). The signal peptide, predicted using SignalP (version 2; Emanuelsson et al., 2007) is indicated by the black box. Alternative amino acids resulting from nonsynonymous SNPs at biallelic sites are indicated in the below diagonal triangles. Colours were assigned according to the “Clustal X Colour Scheme” used in Jalview (Waterhouse et al., 2009), categorising amino acid profiles. CHAPTER B: EVOLUTION OF SOUTH AFRICAN PST 218 SA1 M L S I N Y L L L V L S S V V L L A H S N D S L P P S S P R K S I N Y G P E L S T H S I K SA2 M L S I N Y L L L V L S S V V L L A H S N D S L P P S S P R K S I N Y G P E L S T H S I K 45 SA3 M L S I N Y L L L V L S S V V L L A H S N D S L P P S S P R K S I N Y G P E L S T H S I K SA4 M L S I N Y L L L V L S S V V L L A H S N D S L P P S S P R K S I N Y G P E L S T H S I K T S V Y S N H N H N D Q F Q A S L T S F N A A S A S S L L P I K D T F H Q T D S S S L K Q T S V Y S N H N H N D Q F Q A S L T S F N A A S A S S L L P I K D T F H Q T D S S S L K Q 46 90 T S V Y S N H N H N D Q F Q A S L T S F N A A S A S S L L P I K D T F H Q T D S S S L K Q T S V Y S N H N H N D Q F Q A S L T S F N A A S A S S L L P I K D T F H Q T D S S S L K Q F G I K I A T E F L H H L H P S D FE S L T F Q L T S A H I S K H T K V L H A Y F V Q T I P L F G I K I A T E F L H H L H P S DE S F L T F Q L T S A H I S K H T K V L H A Y F V Q T I P L91 135 F G I K I A T E F L H H L H P S D S FE L T F Q L T S A H I S K H T K V L H A Y F V Q T I P L F G I K I A T E F L H H L H P S D FE S L T F Q L T S A H I S K H T K V L H A Y F V Q T I P L G D L DHY H V K V H N A V A N L N L N L D P R S A N F G H V L S H S D S F H P I V E H P S G D L DHY H V K V H N A V A N L N L N L D P R S A N F G H V L S H S D S F H P I V E H P S136 H 180G D L D Y H V K V H N A V A N L N L N L D P R S A N F G H V L S H S D S F H P I V E H P S G D L DHY H V K V H N A V A N L N L N L D P R S A N F G H V L S H S D S F H P I V E H P S S S E A V N F I N A F D G Q Q G D R C T H L K N K F D G V L Q S L S T N N Q L L N Q Q V M S S E A V N F I N A F D G Q Q G D R C T H L K N K F D G V L QR S L S T N N Q L L N Q Q V M181 225 S S E A V N F I N A F D G Q Q G D R C T H L K N K F D G V L QR S L S T N N Q L L N Q Q V M S S E A V N F I N A F D G Q Q G D R C T H L K N K F D G V L QR S L S T N N Q L L N Q Q V M G L F S T K S S QDHS A G D E K T L L T D F S E E E L R I I A E C E M S N P T K K A I R S G L F S T K S S QDHS A G D E K T L L T D F S E E E L R I I A E C E M S N P T K K A I R S226 270 G L F S T K S S QDHS A G D E K T L L T D F S E E E L R I I A E C E M S N P T K K A I R S G L F S T K S S QDHS A G D E K T L L T D F S E E E L R I I A E C E M S N P T K K A I R S E I V D P R I A L V S F L T L A A D P E T E N H L R S R S L E D L V E S I D I V K K T P S E I V D P R I A L V S F L T L A A D P E T E N H L R S R S L E D L V E S I D I V K K T P S 271 S 315 E I V D P R I A L V S F L T L A A D P E T E N H L R S R S L E D L V E S I D I V K K T PS S E I V D P R I A L V S F L T L A A D P E T E N H L R S R S L E D L V E S I D I V K K T PS S S S S F Y A A G D S D G S A T K E S P T F E L F N V P G A L G A D S L D G S S S T K A T S S S S F Y A A G D S D G S A T K E S P T F E L F N V P G A L G A D S L D G S S S T K A T S 316 360 S S S F Y A A G D S D G S A T K E S P T F E L F N V P G A L GAS D S L D G S S S T K A T S S S S F Y A A G D S D G S A T K E S P T F E L F N V P G A L GAS D S L D G S S S T K A T S A E L A W L S V D D G E R E L K M V W R F E Y R S N S N W Y E A Y V D A S S P G L V P M V A E L A W L S V D D G E R E L K M V W R F E Y R S N S N W Y E A Y V D A S S P G L V P M V 361 405 A E L A W L S V D D G E R E L K M V W R F E Y R S N S N W Y E A Y V D A S S P G L V P M V A E L A W L S V D D G E R E L K M V W R F E Y R S N S N W Y E A Y V D A S S P G L V P M V I D W V N D F R P T S E L A D S Y S E H V A I Q T A I V E E F K R L PS T T P S E S H R HP R N P I D W V N D F R P T S E L A D S Y S E H V A I Q T A I V E E F K R L P T T P E S H R H N P 406 S S 450 I D W V N D F R P T S E L A D S Y S E H V A I Q T A I V E E F K R L P T T PS E S H R HP R N P I D W V N D F R P T S E L A D S Y S E H V A I Q T A I V E E F K R L P P H HS T T S E S P R R N P A Q S Q S E V D L P V L P E G A T D E K R T A T Y R V F P W S V N D P T L G K R Q I V V T A Q S Q S E V D L P V L P E G A T D E K R T A T Y R V F P W S V N D P T L G K R Q I V V T 451 495 A Q S Q S E V D L P V L P E G A T D E K R T A T Y R V F P W S V N D P T L G K R Q I V V T A Q S Q S E V D L P V L P E G A T D E K R T A T Y R V F P W S V N D P T L G K R Q I V V T >>> Figure B.26: See continuation on next page. CHAPTER B: EVOLUTION OF SOUTH AFRICAN PST 219 <<< P S N P T A S P L G W H T I P A T Q R N S E Q R D I S H M S T G W S R H V P R H G L R A T P S N P T A S P L G W H T I P A T Q R N S E Q R D I S H M S T G W S R H V P R H G L R A T 496 540 P S N P T A S P L G W H T I P A T Q R N S E Q R D I S H M S T G W S R H V P R H G L R A T P S N P T A S P L G W H T I P A T Q R N S E Q R D I S H M S T G W S R H V P R H G L R A T D T R G N N V Y A Q E N W E G L D N W E A N H R P N G T D D L E F K F H L G W K H P D N P D T R G N N V Y A Q E N W E G L D N W E A N H R P N G T D D L E F K F H L G W K H P D N P 541 585 D T R G N N V Y A Q E N W E G L D N W E A N H R P N G T D D L E F K F H L G W K H P D N P D T R G N N V Y A Q E N W E G L D N W E A N H R P N G T D D L E F K F H L G W K H P D N P S E T H V N P K R Y I D A A I S E L F F T C N E F H D L T Y L Y G F D E E S G N F Q Q H N S E T H V N P K R Y I D A A I S E L F F T C N E F H D L T Y L Y G F D E E S G N F Q Q H N 586 630 S E T H V N P K R Y I D A A I S E L F F T C N E F H D L T Y L Y G F D E E S G N F Q Q H N S E T H V N P K R Y I D A A I S E L F F T C N E F H D L T Y L Y G F D E E S G N F Q Q H N F G H G G K G D D A V I A N A Q D G S G Y N N A N F A T P P D G R N G R M R M Y V W N G A F G H G G K G D D A V I A N A Q D G S G Y N N A N F A T P P D G R N G R M R M Y V W N G A 631 675 F G H G G K G D D A V I A N A Q D G S G Y N N A N F A T P P D G R N G R M R M Y V W N G A F G H G G K G D D A V I A N A Q D G S G Y N N A N F A T P P D G R N G R M R M Y V W N G A E P W R D G D L E A G I V I H E Y S H G V S I R L T G G P A N S G C L G Y G E S G G M G E E P W R D G D L E A G I V I H E Y S H G V S I R L T G G P A N S G C L G Y G E S G G M G E 676 720 E P W R D G D L E A G I V I H E Y S H G V S I R L T G G P A N S G C L G Y G E S G G M G E E P W R D G D L E A G I V I H E Y S H G V S I R L T G G P A N S G C L G Y G E S G G M G E G W G D F F A T L I R M H Q S K P V D F T M G E W A S G V K G G I R K Y K Y S L D N K V N G W G D F F A T L I R M H Q S K P V D F T M G E W A S G V K G G I R K Y K Y S L D N K V N 721 765 G W G D F F A T L I R M H Q S K P V D F T M G E W A S G V K G G I R K Y K Y S L D N K I V N G W G D F F A T L I R M H Q S K P V D F T M G E W A S G V K G G I R K Y K Y S L D N K V N P E T Y Q T L D K P G Y W G V H A I G E V W A E M L F T V A E E L I A K H G F Q P S L F P P E T Y Q T L D K P G Y W G V H A I G E V W A E M L F T V A E E L I A K H G F Q P S L F P 766 810 P E T Y Q T L D K P G Y W G V H A I G E V W A E M L F T V A E E L I A K H G F Q P S L F P P E T Y Q T L D K P G Y W G V H A I G E V W A E M L F T V A E E L I A K H G F Q P S L F P P S G E A D E E G F Y K V S K L S D K K V P K H G N T L I F Q L V L D G M K I Q R C R P G P S G E A D E E G F Y K V S K L S D K K V P K H G N T L I F Q L V L D G M K I Q R C R P G 811 855 P S G E A D E E G F Y K V S K L S D K K V P K H G N T L I F Q L V L D G M K I Q R C R P G P S G E A D E E G F Y K V S K L S D K K V P K H G N T L I F Q L V L D G M K I Q R C R P G F F D A R D A I L E A D S I L T G G E N Q C E I W K G F S K R G L G P K A A I K G N T P W F F D A R D A I L E A D S I L T G G E N Q C E I W K G F S K R G L G P K A A I K G N T P W 856 900 F F D A R D A I L E A D S I L T G G E N Q C E I W K G F S K R G L G P K A A I K G N T P W F F D A R D A I L E A D S I L T G G E N Q C E I W K G F S K R G L G P K A A I K G N T P W G G G I R T N D F S L P T G V P R V H Y Y K P R I E * G G G I R T N D F S L P T G V P R V H Y Y K P R I E * 901 927 G G G I R T N D F S L P T G V P R V H Y Y K P R I E * G G G I R T N D F S L P T G V P R V H Y Y K P R I E * Figure B.27: Translated sequence alignment of gene PST130_07579. This gene has been identified to encode a putative effector protein (Cantu et al., 2013). The signal peptide, predicted using SignalP (version 2; Emanuelsson et al., 2007) is indicated by the black box. Alternative amino acids resulting from nonsynonymous SNPs at biallelic sites are indicated in the below diagonal triangles. Colours were assigned according to the “Clustal X Colour Scheme” used in Jalview (Waterhouse et al., 2009), categorising amino acid profiles. CHAPTER B: EVOLUTION OF SOUTH AFRICAN PST 220 SA1 M Y A L G Y R Q I V R L A S C C L L A T Q V V G V A T Q V V S V E P S I S E A K A T W K S SA2 M Y A L G Y R Q I V R L A S C C L L A T Q V V G V A T Q V V S V E P S I S E A K A T W K S 45 SA3 M Y A L G Y R Q I V R L A S C C L L A T Q V V G V A T Q V V S V E P S I S E A K A T W K S SA4 M Y A L G Y R Q I V R L A S C C L L A T Q V V G V A T Q V V S V E P S I S E A K A T W K S R F N A L F S A S T N P H D V E H D M S R S DGA S I G A Q E M D Q F T Y K P W H Y E A T V S K R F N A L F S A S T N P H D V E H D M S R S G A S I G A Q E M D Q F T Y K P W H E A V S K 46 90 R F N A L F S A S T N P H D V E H D M S R S G A S I G A Q E M D Q F T Y K P W H E A V S K R F N A L F S A S T N P H D V E H D M S R S G A S I G A Q E M D Q F T Y K P W H E A V S K K M D R K A I P L F L R E P N P Y V K P G P D S I T E S D L N L I S E G F D E W V E A T V I T K M D R K A I P L F L R E P N P Y V K P G P D S I E S D L N L I S E G F D E W V E A V I T 91 135 K M D R K A I P L F L R E P N P Y V K P G P D S I T E S D L N L I S E G F D E W V E A T V I T K M D R K A I P L F L R E P N P Y V K P G P D S I T E S D L N L I S E G F D E W V E A T V I T K S L S E S P E E T E K F E E Q C K I L K P I L V F L NAGG E S D GS L K Y S E E N P E Q P S K S L S E S P E E T E K F E E Q C K I L K P I L V F L NA DGG E S GS L K Y S E E N P E Q P 136 S 180 K S L S E S P E E T E K F E E Q C K I L K P I L V F L NAGG E S D GS L K Y S E E N P E Q P S K S L S E S P E E T E K F E E Q C K I L K P I L V F L NAGG E S D GS L K Y S E E N P E Q P S K I V N S D D L S R NS L I S L W K S I G S P E I N E H E A P T L D S D L D I R A N H F L K Q K K I V N S D D L S R NS L I S L W K S I G S P E I N E H E P T L D S D L D I A N H F L K Q K181 N A I 225K I V N S D D L S R S L I S L W K S I G S P E I N E H E P T L D S D L D R A N H F L K Q K K I V N S D D L S R NS L I S L W K S I G S P E I N E H E A P T L D S D L D I R A N H F L K Q K T F R T M D Y I Y N Y N I M S H E A L KNK V L S S D D D L N I L E I T G S N L F V A Y S H ND S T F R T M D Y I Y N Y N I M S H E A L K K V L S S D D I L E I T G S N L F V A Y S HD DL 226 N S 270 T F R T M D Y I Y N Y N I M S H E A L KNK V L S S D D N I L E I T G S N L F V A Y S H N D L T F R T M D Y I Y N Y N I M S H E A L K K V L S S D D I L E I T G S N L F V A Y S H N D L D F N H Y P I E Y N F F R R N D Q H E S K S F F Q V L D A K Q R R K V M Y F Y A K S R Y T D F N H Y P I E Y N F F R R N D Q H E S K S F F Q V L D A K Q R R K V M Y F Y A K S R Y T 271 315 D F N H Y P I E Y N F F R R N DP HQ V E S K S F F Q V L D A K Q R R K V M Y F Y A K S R Y T D F N H Y P I E Y N F F R R N D Q H E S K S F F Q V L D A K Q R R K V M Y F Y A K S R Y T K Q K E D H L L R L R S K E S K D E D E I T E E R Y L KR L K A F S T D S I F K D N E F L I D S K Q K E D H L L R L R S K E S K D E D E I T E E R Y L K L K A S T D S I F K D N E L I D S 316 360 K Q K E D H L L R L R S K E S K D E D E I T E E R Y L K L K A S T D S I F K D N E L I D S K Q K E D H L L R L R S K E S K D E D E I T E E R Y L K L K A S T D S I F K D N E L I D S >>> Figure B.28: See continuation on next page. CHAPTER B: EVOLUTION OF SOUTH AFRICAN PST 221 <<< L E A Y L E H A Q S H N S Q T K N A N P Y K S K E K L K E L F V T L L A L W D D K Y S P I L E A Y L E H A Q S H N S Q T K N A N P Y K S K E K L K E L F V T L L A L W D D K Y S P I 361 405 L E A Y L E H A Q S H N S Q T K N A N P Y K S K E K L K E L F V T L L A L W D D K Y S P I L E A Y L E H A Q S H N S Q T K N A N P Y K S K E K L K E L F V T L L A L W D D K Y S P I R E D Y V D F L S S L C N F I E E S Y G I D I I I V E N Q P K G R K E F M I K Y K L V S S Y M R E D Y V D F L S S L C N F I E E S Y G I D I I IS V E N Q P K G R K E F M I K Y K I T L V S S Y M406 450 R E D Y V D F L S S L CS N F I E E S Y G I D I I I V E N Q P K G K I R K E F M I K Y T L V S S Y M R E D Y V D F L S S L C N F I E E S Y G I D I I I E N Q P K GV R K E F M I K Y K T L I V S S Y M K Y L E E L D K F R E Y L L N H P S D P N V P F S H F F K E S T Q Q K M L A L D E L T V I K Y L E E L DK F R DT I E Y L L N H P S S D P N V P F S H F F E K E S T Q Q K M L A L D E L R V I 451 T 495 K Y L E E L DK F D P PT I R E Y L L N H S S D P N I F E G M V P S S H F F K E S T Q Q K M L A L D E L R T V I K Y L E E L DK FT I R E Y L L N H P S D P N V P F S H F F K E S T Q Q K M L A L D E L T V I E N Y S D H M Q R K I S K L K G H N L Y S S D L K I T Q A E Q T R L D V Q E L I S R A L W V E N Y S D H M Q R K I S K L K G H N L Y N IS S D L K T Q A E Q T R L D V Q E L I S R A L W V496 540 E N Y S D H I MQ R K I MS K L K G H N L Y S S D L K I T Q A E Q T R L D V Q E L I S R A L W V E N Y S D H I Q R K I S KM M NL K G H N L Y N I S S D L K T Q A E Q T R L D V Q E L I S R A L W V R FY L R L L * R F 541 Y L R L L * 547 R F L R L L * R FY L R L L * Figure B.29: Translated sequence alignment of gene PST130_15131. This gene has been identified to encode a putative effector protein (Cantu et al., 2013). The signal peptide, predicted using SignalP (version 2; Emanuelsson et al., 2007) is indicated by the black box. Alternative amino acids resulting from nonsynonymous SNPs at biallelic sites are indicated in the below diagonal triangles. Colours were assigned according to the “Clustal X Colour Scheme” used in Jalview (Waterhouse et al., 2009), categorising amino acid profiles. Appendix C Gene Expression Analysis of Candidate Effectors Identified in South African Pst Isolates 222 CHAPTER C: GENE EXPRESSION ANALYSIS 223 C.1 Candidate gene inspection PST130_02001 mRNA SA1 1 A U G U C U U U C U C A A A C A C R A U C C U C A A G U U Y G C C C U A C U C U U G U C U G U G G C C C U A G U G U A C C A A U U A U C U G G C A U C A A U G C 80 SA4 A U G U C U C U C U C A A A C A C G A U C C U C A A G U U Y G C C C U A C U C U U G U C U G U G G C C C U A G U G U A C C A A U U A U C U G G C A U C A A U G C SA1 81 C A A C U C G A U C G U C U C G C C U A A G C C C A A C C A A A C U C U C A A U C C A G G A G A G A A G C U A G C C G U G G U C G U C A A G A A A A A U U C C A 160 SA4 C A A C U C G A U C G U C U C G C C U A A G C C C A A C C A A A C U C U C A A U C C A G G A G A G A A G C U A G C C G U G G U C G U C A A G A A A A A U U C C A SA1 161 C C G A U U C G A C A G A U C A A A C A C U C G C U U U C G C C G U U G G A U U G U C G G U K U A U A A A G A C A G U U U A G G A A G A C C U U U U C U U C G U 240SA4 C C G A U U C G A C A G A U C A A A C A C U C G C U U U C G C C G U U G G A U U G U C G G U G U A U A A A G A C A G U U U A G G A A G A C C U U U U C U U C G U SA1 241 A C U G U C G A C G U U G G A A A A G G G G A A G C U A C A U G G A A C U C G C A U G A G U C U A C U U A U A C C U U U G A A G U C A C U G U A C C C C C C A C 320SA4 A C U G U C G A C G U U G G A A A A G G G G A A G C U A C A U G G A A C U C G C A U G A G U C U A C U U A U A C C U U U G A A G U C A C U G U A C C C C C C A C SA1 321 C A G C G A U U U C A U U G A C C A G U U C U C G A A G C C A U A U A A C U U U G C U G U C U C U G A G U A U U A C U U A A A A G G G C C C U C C A A C G U G C 400SA4 C A G C G A U U U C A U U G A C C A G U U C U C G A A G C C A U A U A A C U U U G C U G U C U C U G A G U A U U A C U U A A A A G G G C C C U C C A A C G U G C SA1 401 C Y A C U U U A G G C U U A U C U G A R A C A C C C G U G A C G A U C A A A C A G R A C U G A 480SA4 C Y A C U U U A G G C U U A U C U G A R A C A C C C G U G A C G A U C A A A C A G R A C U G A Translated peptide SA1 1 M S F S N T I L K F A L L L S V A L V Y Q L S G I N A N S I V S P K P NQ T L N P G E K L A V V V K K N S T D S T DQ T L A F A V G L S V Y K D S L G R P F L RSA4 80M S L S N T I L K F A L L L S V A L V Y Q L S G I N A N S I V S P K P NQ T L N P G E K L A V V V K K N S T D S T DQ T L A F A V G L S V Y K D S L G R P F L R SA1 81 T V D V G K G E A T WN S H E S T Y T F E V T V P P T S D F I DQ F S K P Y N F A V S E Y Y L K G P S N V P T L G L S E T P V T I K Q X * SA4 160T V D V G K G E A T WN S H E S T Y T F E V T V P P T S D F I DQ F S K P Y N F A V S E Y Y L K G P S N V P T L G L S E T P V T I K Q X * Depth Maximum Depth Exon boundaries Forward Primer SA1 24x Nonsynonymous SNP SA4 47x Amino acid change Reverse Primer Figure C.1: Nonsynonymous polymorphisms and primer design of the candidate effector gene PST130_02001 in SA1 and SA4. CHAPTER C: GENE EXPRESSION ANALYSIS 224 PST130_02403 mRNA SA1 1 A U G U U G A A G U U G A C A C A C G U C A U C U U G G C U U G C G U G C U A G U U C UMG A G G C MU A U G C G C U C C A C A U A G R U U C A G G A C A C U C 80 SA4 A U G U U G A A G U U G A C A C A C G U C A U C U U G G C U U G C G U G C U A G U U C U A G A G G C A U A U G C G C U C C A C A U A G R U U C A G G A C A C U C SA1 81 A A A G C G C G A U A U C U A U U C C G A G C C C A A G G A U C A C U A C G G U R G C C A U G A U U A U A C G Y C C U A U A A G C C C G A G C C G C A A A A G A 160SA4 A A A G C G C G A U A U C U A U U C C G A G C C C A A G G A U C A C U A C G G U R G C C A U G A U U A U A C G Y C C U A U A A G C C C G A G C C G C A R A A G A SA1 161 A G C C C G A G C C G U C U A A G U A Y U A U C C U G A A C C G C C G A A G A A G C C C G A G C C G U U C A A G U A C U A U C C U GWG C C G C C G A A G A A G 240 SA4 A G C C C G A G C C G U C U A A G U A Y U A U C C U G A A C C G C C G A A G A A G C C C G A G C C G U U C A A G U A C U A U C C U G U G C C G C C G A A G R A G SA1 241 C C C G A G C C G U U C A A Y R A C U A U C C U G A A C C G C C G A A G A A G C C C G A G C C G U U C A A G U A C U A U C C U GWR C C G C C G A A G A A G C C 320 SA4 C C C G A G C C G U U C A A Y R A C U A U C C U G A A C C G C C G A A G A A G C C C G A G C C G U U C A A G U A C U A U C C U GWG C C G C C G A A G A A G C C SA1 321 C G A G C C G U U C A A A A A C U A U C C U G A G C C G C C G A A G A A R C C C G A G C C G U U C A A G U A C U A U C C U A C G C C G C C G A A A A A G C C A G 400SA4 C G A G C C G U U C A A A C A C U A U C C U G A G C C G C C G A A G A A A C C C G A G C C G U U C A A G U A C U A U C C U A C G C C G C C G A A A A A G C C A G SA1 401 A C C C G U C U A A A U A U U A U C C U G A G C C G C C G C C G A A G C C C G A C C C G U C C A A G U A C U U U C C U A C C C C G C C G C A A G A G A A G C C M 480 SA4 A C C C G U C WA A A U A U U A U C C U G A G C C G C C G C C G A A G C C C G A C C C G U C C A A G U A C UWU C C U A C C C C G C C G C A A G A G A A G C C M SA1 481 G A A A C G C C C A A G U A U U A U C C C G A G C C G C C C A A G U A U A A G C C C G A G G A A C C C A A A U A U G C U A G U C C A A A A U A U G A U S C G C CSA4 560G A A A C G C C C A A G U A U U A U C C C G A G C C G C C C A A G U A U A A G C C C G A G G A A C C C A A A U A U G C U A G U C C A A A A U A U G A U S C G C C SA1 561 C U A C G A G A A G A C C C C U G A U G A A G A G C C A A A A U A C U C G G C C C C A A G C U A C G A U U A C A A U C C A C C A A A G A A A G A C G G C U A C CSA4 641C U A C G A G A A G A C C C C U G A U G A A G A G C C A A A A U A C U C G G C C C C A A G C U A C G A U U A C A A U C C A C C A A A G A A A G A C G G C U A C C SA1 641 G U C A U U G A 648SA4 G U C A U U G A Translated peptide SA1 1 M L K L T H V I L A C V L V L E A Y A L H I X S G H S K R D I Y S E P K D H Y G X H D Y T X Y K P E P Q K K P E P S K Y Y P E P P K K P E P F K Y Y P X P P K KSA4 80M L K L T H V I L A C V L V L E A Y A L H I X S G H S K R D I Y S E P K D H Y G X H D Y T X Y K P E P Q K K P E P S K Y Y P E P P K K P E P F K Y Y P V P P K X SA1 81 P E P F N X Y P E P P K K P E P F K Y Y P X P P K K P E P F K N Y P E P P K K P E P F K Y Y P T P P K K P D P S K Y Y P E P P P K P D P S K Y F P T P P Q E K PSA4 160P E P F N X Y P E P P K K P E P F K Y Y P X P P K K P E P F K H Y P E P P K K P E P F K Y Y P T P P K K P D P S K Y Y P E P P P K P D P S K Y X P T P P Q E K P SA1 161 E T P K Y Y P E P P K Y K P E E P K Y A S P K Y D X P Y E K T P D E E P K Y S A P S Y D Y N P P K K D G Y R H *SA4 216E T P K Y Y P E P P K Y K P E E P K Y A S P K Y D X P Y E K T P D E E P K Y S A P S Y D Y N P P K K D G Y R H * Depth Maximum Depth Exon boundaries Forward Primer SA1 23x Nonsynonymous SNP SA4 36x Amino acid change Reverse Primer Figure C.2: Nonsynonymous polymorphisms and primer design of the candidate effector gene PST130_02403 in SA1 and SA4. CHAPTER C: GENE EXPRESSION ANALYSIS 225 PST130_05023 mRNA SA1 1 A U G A A U A U U C A A U U A U U C C C A A U C A U G A U C U U C U U G U U A G G C C A C C C A A G C C U A A U A U U C G G G A G G C C G A C G G A A G G A A A 80 SA4 A U G A A U A U U C A A U U A U U C C C A A U C A U G A U C U U C U U G U U A G G C C A C C C A A G C C U A A U A U U C G G G A G G C C G A C G G A A G G A A A SA1 81 A G C U G U U A C C C A A G A A U U C G G G A A G C U A C A C G U A G A U U G U C C U G G C A C G G A A C A U G U U G A A C A U G U U A A A A A U C C G U U C G 160 SA4 A G C U G U U A C C C A A G A A U U C G G G A A G C U A C A C G U A G A U U G U C C U G G C A C G G A A C A U G U U G A A C A U G U U A A A A A U C C G U U C G SA1 161 C C G A A G A A G A C A A A C A C G C A U C U G U G A U C U C G G A C A A C A G C A A A A A C A U U U C C G G C U C A C G U C A C U C C A G C U C A C C A G A A 240 SA4 C C G A A G A A G A C A A A C A C G C A U C U G U G A U C U C G G A C A A C A G C A A A A A C A U U U C C G G C U C A C G U C A C U C C A G C U C A C C A G A A SA1 241 U C U A U A C C A G A A G A A G A G A A A C C A C U C C U C G A U C G U U C A C A A U C C G A C C G C G G C U C U U C A A A G C C G U C A G G A C C A G C U C C 320 SA4 U C U A U A C C A G A A G A A G A G A A A C C A C U C C U C G A U C G U U C A C A A U C C G A C C G C G G C U C U U C A A A G C C G U C A G G A C C A G C U C C SA1 321 C G A C C A A C C A A A A C A A G G A G A A G A C G G A A A G G G A A G A A A A A U G G C C G A A C U U U A U G C C A G G U U C A A A A A A U C U C U G U C A A 400 SA4 C G A C C A A C C A A A A C A A G G A G A A G A C G G A A A G G G A A G A A A A A U G G C C G A A C U U U A U G C C A G G U U C A A A A A A U C U C U G U C A A SA1 401 C U U G G U A C G G U G G A C A U U C G G C U G U G G C C A G G U U U U U G C G C C G C U U G G U U A A U U A C U U U C A C C C A A G A A A G A U G A G U A A G 480 SA4 C U U G G U A C G G U G G A C A U U C G G C U G U G G C C A G G U U U U U G C G C C G C U U G G U U A A U U A C U U U C A C C C A A G A A A G A U G A G U A A G SA1 481 A G C A A G G A A G C C A A G G A A G C C A A G G A A G C C G A A G A C G C C A A G A A A G Y C R A A G A C G Y C A A G A A A G Y C R A A G A C G U C A A G A A 560 SA4 A G C A A G G A A G C C A A G G A A G C C A A G G A A G C C A A A G A A G C C A A G G A A G Y C R A A G A C G Y C A A G A A A G Y C R A A G A C G U C A A G A A SA1 561 A G C C G A A G A C G U C A A G A A A G C C G A A G A A G C C A C G A A A G C U G A A G A C G C C G A G A A A G C C C A A G A G G C C A A G A A A G C C C A A G 640 SA4 A G C C G A A G A C G U C A A G A A A G C C G A A G A A G C C A C G A A A G C U G A A G A C G C C G A G A A A G C C C A A G A G G C C A A G A A A G C C C A A G SA1 641 A G A C C A C A G G C G C A G U G A G G G U C G A A G C A U C G A U G C C C G A A U U G U C G G U G A C C G A A G A G A A G G C U G C C A C G G C G G C G A A A 720 SA4 A G A C C A C A G G C G C A G U G A G G G U C G A A G C A U C G A U G C C C G A A U U G U C G G U G A C C G A A G A G A A G G C U G C C A C G G C G G C G A A A SA1 721 C C U G A A A G C C C A U C U G C C A C A U C C C C G U C C K C U G G U A C U G U G C C G G C G U C A A G U A A C U U C G A C A A G C C U G G G C U C U U U G CSA4 800C C U G A A A G C C C A U C U G C C A C A U C C C C G U C C G C U G G U A C U G U G C C G G C G U C A A G U A A C U U C GM C A A G C C U G G G C U C U U U G C SA1 801 U A U C G A C G A C U U C C A G C C A C G U C U A C A G A C C A U C U G G A U U G C G U G A 846 SA4 U A U C G A C G A C U U C C A G C C A C G U C U A C A G A C C A U C U G G A U U G C G U G A Translated peptide SA1 1 MN I Q L F P I M I F L L G H P S L I F G R P T E G K A V T Q E F G K L H V D C P G T E H V E H V K N P F A E E D K H A S V I S D N S K N I S G S R H S S S P ESA4 80MN I Q L F P I M I F L L G H P S L I F G R P T E G K A V T Q E F G K L H V D C P G T E H V E H V K N P F A E E D K H A S V I S D N S K N I S G S R H S S S P E SA1 81 S I P E E E K P L L D R S Q S D R G S S K P S G P A P DQ P K Q G E D G K G R K M A E L Y A R F K K S L S T WY G G H S A V A R F L R R L V N Y F H P R K M S K 160 SA4 S I P E E E K P L L D R S Q S D R G S S K P S G P A P DQ P K Q G E D G K G R K M A E L Y A R F K K S L S T WY G G H S A V A R F L R R L V N Y F H P R K M S K SA1 161 S K E A K E A K E A E D A K K X X D X K K X X D V K K A E D V K K A E E A T K A E D A E K AQ E A K K AQ E T T G A V R V E A S M P E L S V T E E K A A T A A KSA4 240S K E A K E A K E A K E A K E X X D X K K X X D V K K A E D V K K A E E A T K A E D A E K AQ E A K K AQ E T T G A V R V E A S M P E L S V T E E K A A T A A K SA1 241 P E S P S A T S P S X G T V P A S S N F D K P G L F A I D D F Q P R L Q T I W I A *SA4 282P E S P S A T S P S A G T V P A S S N F X K P G L F A I D D F Q P R L Q T I W I A * Depth Maximum Depth Exon boundaries Forward Primer SA1 23x Nonsynonymous SNP SA4 24x Amino acid change Reverse Primer Figure C.3: Nonsynonymous polymorphisms and primer design of the candidate effector gene PST130_05023 in SA1 and SA4. CHAPTER C: GENE EXPRESSION ANALYSIS 226 PST130_06503 mRNA SA1 1 A U G C A A U C C A G C U U A A U U G U C A G C A U C C U C A U C G U G U G C A G C G G U G U C A U U G C U U U A C C U A C U U C C A A C C A A G C A C A A A U 80 SA4 A U G C A A U C C A G C U U A A U U G U C A G C A U C C U C A U C G U G U G C A G C G G U G U C A U U G C U U U A C C U A C U U C C A A C C A A G C A C A A A U SA1 C G A A A C U C G G G C C G A G A A G A C C C G U U C C A G C G A C A A A U A C G C C U C U U C C G A A U A C A A U G A A U C C G A C A C A U A C G C A U C G G SA4 81 160C G A A A C U C G G G C C G A G A A G A C C C G U U C C A G C G A C A A A U A C G C C U C U U C C G A A U A C A A U G A A U C C G A C A C A U A C G C A U C G G SA1 161 C U C C U A A C U C C G C U C C A U C C G U G A U U C C U G U U G G C U U C C C U U C C A U U C C U C U U C C C C A A G U C U C U G G A U C G U C U C C C C A ASA4 240C U C C U A A C U C C G C U C C A U C C G U G A U U C C U G U U G G C U U C C C U U C C A U U C C U C U U C C C C A A G U C U C U G G A U C G U C U C C C C A A SA1 241 U C U G G A U C U U A C U U C G G C G G A A A G G G A G G C C G C A U U U C U U C U G C A U U C C C C G G A U U C G U U G G A G G A U U U G G C G G A A A A A USA4 320U C U G G A U C U U A C U U C G G C G G A A A G G G A G G C C G C A U U U C U U C U G C A U U C C C C G G A U U C G U U G G A G G A U U U G G C G G A A A A A U SA1 321 C A G C G G G A A G G C C G G C G G U A A A A U G G A U G C G G G A A U G G G U G G A A A G A U C G C C G C U G G G G G U U C A G G G G G C C U C A A U G C C GSA4 400C A G C G G G A A G G C C G G C G G U A A A A U G G A U G C G G G A A U G G G U G G A A A G A U C G C C G C U G G G G G U U C A G G G G G C C U C A A U G C C G SA1 401 C A G G A Y C A G U C G G C G G U C A G G U C G C G G G U G G U G Y C C A R G Y Y G G A A U C G S Y G C C G C A G G A U C A R U U G C Y G G U C A G G Y C G C W 480 SA4 C A G G A Y C A G U C G G C G G U C A G G U C G C G G G U G G U G U C C A G G C U G G A A U C G G U G C C G C A G G A U C A A U U G C C G G U C A G G C C G C U SA1 481 G G U G G U G C Y C A R 492 SA4 G G U G G U G C U C A G Translated peptide SA1 1 MQ S S L I V S I L I V C S G V I A L P T S NQ AQ I E T R A E K T R S S D K Y A S S E Y N E S D T Y A S A P N S A P S V I P V G F P S I P L P Q V S G S S P QSA4 80MQ S S L I V S I L I V C S G V I A L P T S NQ AQ I E T R A E K T R S S D K Y A S S E Y N E S D T Y A S A P N S A P S V I P V G F P S I P L P Q V S G S S P Q SA1 81 S G S Y F G G K G G R I S S A F P G F V G G F G G K I S G K A G G K MD A GMG G K I A A G G S G G L N A A G X V G GQ V A G G X Q X G I X A A G S X A GQ V ASA4 160S G S Y F G G K G G R I S S A F P G F V G G F G G K I S G K A G G K MD A GMG G K I A A G G S G G L N A A G X V G GQ V A G G V Q A G I G A A G S I A GQ A A SA1 161 G G AQSA4 164G G AQ Depth Maximum Depth Exon boundaries Forward Primer SA1 21x Nonsynonymous SNP SA4 40x Amino acid change Reverse Primer Figure C.4: Nonsynonymous polymorphisms and primer design of the candidate effector gene PST130_06503 in SA1 and SA4. CHAPTER C: GENE EXPRESSION ANALYSIS 227 PST130_07513 mRNA SA1 1 A U G A A G U C G U U C G G G A U U A U C G C A A C U C U A C U U G C U C U A G C U U C U U C U A U C C A U G C C G A C G C G G C C G U C A G A C C C A A A A C 80 SA4 A U G A A G U C G U U C G G G A U U A U C G C A A C U C U A C U U G C U C U A G C U U C U U C U A U C C A U G C C G A C G C G G C C G U C A G A C C C A A A A C SA1 81 U G C C G C K C C U G C A A G C G A U A U C A U C G A A U U G A C A U U A G A A A A C U U U G A C A C Y G U C G U C G C C A C U A C G C C U U U G A U C U U G GSA4 160U G C C G C K C C U G C A A G C G A U A U C A U C G A A U U G A C A U U A G A A A A C U U U G A C A C Y G U C G U C G C C A C U A C G C C U U U G A U C U U G G SA1 161 U C G A A U U U A U G G U A C C A U G G U G C C A C U U U U G U C A A G A C C U G G GWC C C G A G U A C A A A C G U U C G G C G A A A A U C U U G A A A G A GSA4 240U C G A A U U U A U G G U A C C A U G G U G C C A C U U U U G U C A A G A C C U G G GWC C C G A G U A C A A A C G U U C G G C G A A A A U C U U G A A A G A G SA1 241 C A A G G C A U U C C A U C G G C C A A R G U U G A C U G U A C C G A G C A G G A C G A A U U A U G U G C C G A G C A U U U A C U U C C A A G U U A C C C A A CSA4 320C A A G G C A U U C C A U C G G C C A A R G U U G A C U G U A C C G A G C A G G A C G A A U U A U G U G C C G A G C A U U U A C U U C C A A G U U A C C C A A C SA1 321 U C U C A A G G U G U U U U C A A A U G G A A G G A U G G C C G U A U A C A A A G G U C C U R A G A A G G C C G A U A G C A U C G U U U C C U A C A U A G A G ASA4 400U C U C A A G G U G U U U U C A A A U G G A A G G A U G G C C G U A U A C A A A G G U C C U R A G A A G G C C G A U A G C A U C G U U U C C U A C A U A G A G A SA1 401 A U A A G G A A U A U C U A G G C U U C A A C A A G G Y C C G A A U U U C A U C A A G A C G A G A C A G U A A C A C C G U C U A A 465 SA4 A U A A G G A A U A U C U A G G C C MC A A C A A G G Y C C G A A U U U C A U C A A G A C G A G A C A G U A A C A C C G U C U A A Translated peptide SA1 1 M K S F G I I A T L L A L A S S I H A D A A V R P K T A A P A S D I I E L T L E N F D T V V A T T P L I L V E FM V P WC H F C Q D L G P E Y K R S A K I L K ESA4 80M K S F G I I A T L L A L A S S I H A D A A V R P K T A A P A S D I I E L T L E N F D T V V A T T P L I L V E FM V P WC H F C Q D L G P E Y K R S A K I L K E SA1 81Q G I P S A K V D C T E Q D E L C A E H L L P S Y P T L K V F S N G R M A V Y K G P X K A D S I V S Y I E N K E Y L G F N K X R I S S R R D S N T V *SA4 155Q G I P S A K V D C T E Q D E L C A E H L L P S Y P T L K V F S N G R M A V Y K G P X K A D S I V S Y I E N K E Y L G X N K X R I S S R R D S N T V * Depth Maximum Depth Exon boundaries Forward Primer SA1 22x Nonsynonymous SNP SA4 41x Amino acid change Reverse Primer Figure C.5: Nonsynonymous polymorphisms and primer design of the candidate effector gene PST130_07513 in SA1 and SA4. CHAPTER C: GENE EXPRESSION ANALYSIS 228 PST130_09275 mRNA SA1 1 A U G A U U U C A A C U A A C U U C C U C G C G U G C C U C A C U C C U A U C U U U C U C A A U G G A C U U U U G G C C U U G A A A G U C A C U A G U C C C A C 80 SA4 A U G A U U U C A A C U A A C U U C C U C G C G U G C C U C A C U C C U A U C U U U C U C A A U G G A C U U U U G G C C U U G A A A G U C A C U A G U C C C A C SA1 81 C G A G A A U U C C C A G U G G G A U U U A C A G G C U A C G A A C A C C A U A A C A U G G A C C A G U G U A G C G A C U G A C C C A A A A A C C U U C G A C ASA4 160C G A G A A U U C C C A G U G G G A U U U A C A G G C U A C G A A C A C C A U A A C A U G G A C C A G U G U A G C G A C U G A C C C A A A A A C C U U C G A C A SA1 161 U A G U C C U C A C C A A C AWC A A C C C C U C A U G C G C U C C Y A C U G G C U U C A C C C A A G C G A U U A A A C A A A A C A U U G C C U C C U C C G A USA4 240U A G U C C U C A C C A A C AWC A A C C C C U C A U G C G C U C C Y A C U G G C U U C A C C C A A G C G A U U A A A C A A A A C A U U G C C U C C U C C G A U SA1 241 G G C A A G U U U G A U A U C A G U G G U G U U U C C U C A A U G A A G G C A U G C A G U G G C U A C C A G A U C A A U C U U G U A G C C U C A A G U A C C C CSA4 320G G C A A G U U U G A U A U C A G U G G U G U U U C C U C A A U G A A G G C A U G C A G U G G C U A C C A G A U C A A U C U U G U A G C C U C A A G U A C C C C SA1 321 S G A U A A U R G U G C C C A U A A C G C A G G C A U C U U G G C A C A A U C G G C C C C A U U C A A C G U G A C C C A A A C A U C C G G U C C A U C C A U G USA4 400S G A U A A U R G U G C C C A U A A C G C A G G C A U C U U G G C A C A A U C G G C C C C A U U C A A C G U G A C C C A A A C A U C C G G U C C A U C C A U G U SA1 401 C G G A G U C G U U A C C A C U C G C U G G A G C G A A C U C A A C C G C U A A U A C C C C U G C U G C A A G U A C U C C U G U C G C U A A C A C G A C C U C C 480SA4 C G G A G U C G U U A C C A C U C G C U G G A G C G A A C U C A A C C G C U A A U A C C C C U G C U G C A A G U A C U C C U G U C G C U A A C A C G A C C U C C SA1 481 C C G A C C C A A U C C A C A U C C U C C A C U G G U G C A C C A A A A U A U A A C U C G G G U A C G G C U G C U C C U G G C G C C A A G U A C U C U U U C G CSA4 560C C G A C C C A A U C C A C A U C C U C C A C U G G U G C A C C A A A A U A U A A C U C G G G U A C G G C U G C U C C U G G C G C C A A G U A C U C U U U Y G C SA1 561 U C C C A G A A U U U C U G G C U C U U U C C A G A A G G U C A C C G C U U G U G C U C U U C U A C U U G U A A C U U U C A U G U U G G C C U A G 633SA4 U C C C A G A A U U U C U G G C U C U Y U C C A G A A G G U C A C C G C U U G U G C U C U U C U A Y U U R U A A C U U U C A U G U U G G C C U A G Translated peptide SA1 SA4 1 M I S T N F L A C L T P I F L N G L L A L K V T S P T E N S QWD L Q A T N T I T WT S V A T D P K T F D I V L T N X N P S C A P T G F T Q A I K Q N I A S S D 80M I S T N F L A C L T P I F L N G L L A L K V T S P T E N S QWD L Q A T N T I T WT S V A T D P K T F D I V L T N X N P S C A P T G F T Q A I K Q N I A S S D SA1 81 G K F D I S G V S S M K A C S G Y Q I N L V A S S T P D N X A H N A G I L A Q S A P F N V T Q T S G P S M S E S L P L A G A N S T A N T P A A S T P V A N T T SSA4 160G K F D I S G V S S M K A C S G Y Q I N L V A S S T P D N X A H N A G I L A Q S A P F N V T Q T S G P S M S E S L P L A G A N S T A N T P A A S T P V A N T T S SA1 161 P T Q S T S S T G A P K Y N S G T A A P G A K Y S F A P R I S G S F Q K V T A C A L L X X T F M L A *SA4 211P T Q S T S S T G A P K Y N S G T A A P G A K Y S F A P R I S G S L Q K V T A C A L L X X T F M L A * Depth Maximum Depth Exon boundaries Forward Primer SA1 23x Nonsynonymous SNP SA4 24x Amino acid change Reverse Primer Figure C.6: Nonsynonymous polymorphisms and primer design of the candidate effector gene PST130_09725 in SA1 and SA4. CHAPTER C: GENE EXPRESSION ANALYSIS 229 PST130_12487 mRNA SA1 1 A U G U U C G G G U C C U C A A C A A U A U U A C U A G C A U G C U C U U U A C U G A G C U A C G U U U U G G C U G C C C C C G C G A G A U U A U C A A A C C USA4 80A U G U U C G G G U C C U C A A C A A U A U U A C U A G C A U G C U C U U U A C U G A G C U A C G U U U U G G C U G C C C C C G C G A G A U U A U C A A A C C U SA1 81 A C C A U C A U U A G A C G G C A C A U U G U C G A A U G C C C C A U C A C C U U C G U G G C A A C U G A C U A U U G A C A A U G G U C A A A U C A G G A A C CSA4 160A C C A U C A U U A G A C G G C A C A U U G U C G A A U G C C C C A U C A C C U U C G U G G C A A C U G A C U A U U G A C A A U G G U C A A A U C A G G A A C C SA1 161 G U A G G U U U A U G G U G G A A G C A A G U G C A C C A A A G G U G G A A C C A C C C A U G U C C A A A C A G A U G G C C U G U U U U G A C A G U A A G G U USA4 240G U A G G U U U A U G G U G G A A G C A A G U G C A C C A A A G G U G G A A C C A C C C A U G U C C A A A C A G A U G G C C U G U U U U G A C A G U A A G G U U SA1 241 G G G A A A C C U A G C A U U G A A C A A A C C G A G C G G A U C G A G A A C U A C C U A A A G C A U U G U A A A A C U G G A A A G G C U U A U A A G G U U C CSA4 320G G G A A A C C U A G C A U U G A A C A A A S C G A GM R G A U C G A G A A C U A C C U A A A G C A U U G U A AMA C U G G A A A G G C U U A U A A G G U U C C SA1 321 U G C A A A C G G A G A C A U C U A C C C U A U G C C C A A A U C C G A U U C G A C U U A C G G G U A C A U C U U C G G A A A G G U U C A G U U C U A C G A C GSA4 400U G C A A A C G G A G A C A U C U A C C C U A U G C C C A A A U C C G A U U C G A C U U A C G G G U A C A U C U U C G G A A A G G U U C A G U U C U A C G A C G SA1 401 A C U G C G A U A G A U U G A U A C A C G A A A C C G G C U G C U G C U A U G G A A A A C C A A G U G A C A G A G A G G G U U A C A A U G C C A U G G A A U C CSA4 480A C U G C G A U A G A U U G A U A C A C G A A A C C G G C U G C U G C U A U G G A A A A C C A A G U G A C A G A G A G G G U U A C A A U G C C A U G G A A U C C SA1 481 U G U U G U A U C G U U G C A G G C G C U U G C U A U G G U U G C A U C U G U U G C A C U G C C U U U U C C G C C A U U C U C A A U U U C A A G U U A A C A G USA4 560U G U U G U A U C G U U G C A G G C G C U U G C U A U G G U U G C A U C U G U U G C A C U G C C U U U U C C G C C A U U C U C A A U U U C A A G U U A A C A G U SA1 561 U G A C A U C A A A C U U G U C U G G U C A U C A A A C C C U U G ASA4 594U G A C A U C A A A C U U G U C U G G U C A U C A A A Y C C U U G A Translated peptide SA1 1 M F G S S T I L L A C S L L S Y V L A A P A R L S N L P S L D G T L S N A P S P S WQ L T I D N GQ I R N R R F M V E A S A P K V E P P M S K QMA C F D S K VSA4 80M F G S S T I L L A C S L L S Y V L A A P A R L S N L P S L D G T L S N A P S P S WQ L T I D N GQ I R N R R F M V E A S A P K V E P P M S K QMA C F D S K V SA1 81 G K P S I E Q T E R I E N Y L K H C K T G K A Y K V P A N G D I Y P M P K S D S T Y G Y I F G K V Q F Y D D C D R L I H E T G C C Y G K P S D R E G Y N AM E SSA4 160G K P S I E Q X E X I E N Y L K H C X T G K A Y K V P A N G D I Y P M P K S D S T Y G Y I F G K V Q F Y D D C D R L I H E T G C C Y G K P S D R E G Y N AM E S SA1 SA4161 C C I V A G A C Y G C I C C T A F S A I L N F K L T V D I K L V W S S N P * 198C C I V A G A C Y G C I C C T A F S A I L N F K L T V D I K L V W S S X P * Depth Maximum Depth Exon boundaries Forward Primer SA1 22x Nonsynonymous SNP SA4 28x Amino acid change Reverse Primer Figure C.7: Nonsynonymous polymorphisms and primer design of the candidate effector gene PST130_12487 in SA1 and SA4. CHAPTER C: GENE EXPRESSION ANALYSIS 230 PST130_12491 mRNA SA1 1 A U G C G U U C C U U C G U A G C C G U C G C C G U C A C C C U U G C U C U C C U C C A G A G C A C U U C C G C C U U A C C A A U U U U C G A G A A G C G U G CSA4 80A U G C G U U C C U U C G U A G C C G U C G C C G U C A C C C U U G C U C U C C U C C A G A G C A C U U C C G C C U U A C C A A U U U U C G A G A A G C G U G C SA1 81 C G A G A C U G A A G G C A C C G G A A A A G G U G A A U C A A G C U C C C G C U C C U U A G G U G G C U G C A G C A A C C A A G U U G G C C U U C U C A A C ASA4 160C G A G A C U G A A G G C A C C G G A A A A G G U G A A U C A A G C U C C C G C U C C U U A G G U G G C U G C A G C A A C C A A G U U G G C C U U C U C A A C A SA1 161 U U G C C C U C U C G A C C A A C A C U C A C U G U G G A C A A A A U G G U C C A G C C A G U G G C A G C G G U G G U G C C G G U G G C C U C K U A C C U G G CSA4 240U U G C C C U C U C G A C C A A C A C U C A C U G U G G A C A A A A U G G U C C A G C C A G U G G C A G C G G U G G U G C C G G U G G C C U C U U A C C U G G C SA1 241 G G G G G U G G U C Y C U U A C C U G G C G G U G G U A U C G A U G G U C U S U U A C C U G C C G G U G G C C U C U U A C C U G A C G G U G G U A U C G A U G GSA4 320G G G G G U G G U C C C U U A C C U G G C G G U G G U A U C G A U G G U C U G U U A C C U G C C G G U G G C C U C U U A C C U G A C G G U G G U A U C G A U G G SA1 321 U C U C U U A C C U G C C G G U G G U C U C U U A C C U G G C G G G G G U G U G G A U G G U C U C U U A C C U G G C G G U G G U A U C G A U G G U C U C U U G CSA4 400U C U C U U A C C U G C C G G U G G U C U C U U A C C U G G C G G G G G U G U G G A U G G U C U C U U A C C U G G C G G U G G U A U C G A U G G U C U C U U G C SA1 401 C U G G C G G U G G C G C C G G C G G C C U C U U A C C U G C C G G U G G U A C C G G U G G C U U C U U A C C U G G C G G G G G U G G U C U C Y U A C C U G G CSA4 480C U G G C G G U G G C R C C G G C G G C C U C U U A C C U G C C G G U G G U A C C G G U G G C U U C U U A C C U G G C G G G G G U G G U C U C C U A C C U G G C SA1 481 G G U G G U A U C G A U G G U C U C U U G C C U G G C G G U G G U A U C G A U G G U C U C U U V C C U G S C G G U G G U A U C G A USA4 546G G U G G U A U C G A U G G U C U C U U G C C U G G C G G U G G U A U C G A U G G U C U C U U G C C U G G C G G U G G U A U C G A U Translated peptide SA1 1 M R S F V A V A V T L A L L Q S T S A L P I F E K R A E T E G T G K G E S S S R S L G G C S NQ V G L L N I A L S T N T H C GQ N G P A S G S G G A G G L V P GSA4 80M R S F V A V A V T L A L L Q S T S A L P I F E K R A E T E G T G K G E S S S R S L G G C S NQ V G L L N I A L S T N T H C GQ N G P A S G S G G A G G L L P G SA1 81 G G G P L P G G G I D G L L P A G G L L P D G G I D G L L P A G G L L P G G G V D G L L P G G G I D G L L P G G G A G G L L P A G G T G G F L P G G G G L L P G 160 SA4 G G G P L P G G G I D G L L P A G G L L P D G G I D G L L P A G G L L P G G G V D G L L P G G G I D G L L P G G G A G G L L P A G G T G G F L P G G G G L L P G SA1 161 G G I D G L L P G G G I D G L L P G G G I DSA4 182G G I D G L L P G G G I D G L L P G G G I D Depth Maximum Depth Exon boundaries Forward Primer SA1 21x Nonsynonymous SNP SA4 32x Amino acid change Reverse Primer Figure C.8: Nonsynonymous polymorphisms and primer design of the candidate effector gene PST130_12491 in SA1 and SA4. CHAPTER C: GENE EXPRESSION ANALYSIS 231 PST130_12956 mRNA SA1 1 A U G A G G U C G U U U G G U U U U U U G G C A A C G C U G U U U G C C C U A G C U U C U U C U A U C C A U G C C G A C G C A G G A C U C A A C C C C A A U G ASA4 80A U G A G G U C G U U U G G U U U U U U G G C A A C G C U G U U U G C C C U A G C U U C U U C U A U C C A U G C C G A C G C A G G A C U C A A C C C C A A U G A SA1 81 C G C U C C A G A U G A C G U C A U C G A A U U G A C A U C A G A G A A C U U C G A C A C C G U C G U C A C C C C U G C G C C U U U G A U C U U G G U C G A A USA4 160C G C U C C A G A U G A C G U C A U C G A A U U G A C A U C A G A G A A C U U C G A C A C C G U C G U C A C C C C U G C G C C U U U G A U C U U G G U C G A A U SA1 161 U C A U G G C A C C A U G G U G U G G U C A U U G U A A A G C C C U C A U G C C C G A G U A U A A A C G U G C G G C G A C A C U U U U G A A A A A G G G A G G USA4 240U C A U G G C A C C A U G G U G U G G U C A U U G U A A A G C C C U C A U G C C C G A G U A U A A A C G U G C G G C G A C A C U U U U G A A A A A G G G A G G U SA1 241 A U C C C A G U G G C C A A A G C U G A C U G U A C C G A G C A G A G U G A A U U A U G C G C U A A G U A U G A A A U Y C A A G G U U A C C C A A C U C U C A ASA4 320A U C C C A G U G G C C A A A G C U G A C U G U A C C G A G C A G A G U G A A U U A U G C G C U A A G U A U G A A A U Y C A A G G U U A C C C A A C U C U C A A SA1 321 G A U C U U C A C G A A U G G U G U G U C A U C C G A A U A C A A A G G U C C U C G A A A G G C U G A U G G C A U C G U C U C C U A C A U G G A G A A A C G G GSA4 400G A U C U U C A C G A A U G G U G U G U C A U C C G A A U A C A A A G G U C C U C G A A A G G C U G A U G G C A U C G U C U G C U A C A U G G A G A A A C G G G SA1 SA4 401 C A C A C C C U G U C G U C A C U A U C G U C A C A U C G G A C A A C C A C A C C G A C U U C A C C A A A U C U G G U A A C G U G G U G 468 C A C A C C C U G U C G U C A C U A U C G U C A C A U C G G A C A A C C A C A C C G A C U U C A C C A A A U C U G G U A A C G U G G U G Translated peptide SA1 1 M R S F G F L A T L F A L A S S I H A D A G L N P N D A P D D V I E L T S E N F D T V V T P A P L I L V E F M A P WC G H C K A L M P E Y K R A A T L L K K G GSA4 80M R S F G F L A T L F A L A S S I H A D A G L N P N D A P D D V I E L T S E N F D T V V T P A P L I L V E F M A P WC G H C K A L M P E Y K R A A T L L K K G G SA1 81 I P V A K A D C T E Q S E L C A K Y E I Q G Y P T L K I F T N G V S S E Y K G P R K A D G I V S Y M E K R A H P V V T I V T S D N H T D F T K S G N V VSA4 156I P V A K A D C T E Q S E L C A K Y E I Q G Y P T L K I F T N G V S S E Y K G P R K A D G I V C Y M E K R A H P V V T I V T S D N H T D F T K S G N V V Depth Maximum Depth Exon boundaries Forward Primer SA1 23x Nonsynonymous SNP SA4 24x Amino acid change Reverse Primer Figure C.9: Nonsynonymous polymorphisms and primer design of the candidate effector gene PST130_12956 in SA1 and SA4. CHAPTER C: GENE EXPRESSION ANALYSIS 232 C.2 Additional figures of statistical analyses 233 60 3000 40 2000 2000 20 1000 1000 0 0 -20 0 -2 0 2 -2 -1 0 1 2 0 200 400 600 Theoretical Theoretical Fitted Values (i) Normal probability plot of residuals af- (ii) Normal probability plot of the random (iii) Assessment of equal variances after ter the model was fitted to the relative intercepts after the model was fitted to the model was fitted to the relative gene expression values. the relative gene expression values. gene expression values. Figure C.10: Graphical tests for normality and equal variances of the residuals and random intercepts. The relative gene expression dataset was evaluated applying the assumptions that linear mixed models are based on. Normal probability plots of the random intercept dataset (i) and the residuals (ii) showed deviation from normality. The fan like pattern observed in the plot to assess equal variances (iii) revealed that variances were not equal, as is required for using a linear model. This indicated that the relative gene expression dataset was not a good fit for a linear mixed model, as it violated the assumptions of the model type. S a m p l e S a m p l e R e s i d u a l s 234 SA1 SA1 SA1 SA1 SA1 SA1 SA1 SA1 SA1 PST130_02001 PST130_02403 PST130_05023 PST130_06503 PST130_07513 PST130_09275 PST130_12487 PST130_12491 PST130_12956 80 0.3 300 200 4 8 80 60 500 60 0.2 150 200 40 4 2 30 100 250 0.1 100 40 50 20 0.0 0 0 0 0 0 0 0 0 -100 -0.1 -50 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 SA4 SA4 SA4 SA4 SA4 SA4 SA4 SA4 SA4 PST130_02001 PST130_02403 PST130_05023 PST130_06503 PST130_07513 PST130_09275 PST130_12487 PST130_12491 PST130_12956 15 300 3 40 300 300 100 10 2 2000 30 0.2 200 200 200 5 1 50 20 100 1000 100 100 0 0.010 0 0 0 -1 0 00 0 -5 -0.2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 Theoretical Figure C.11: Gene and isolate specific tests for equal variances after the model was fitted to the relative gene expression values. S a m p l e SA1 SA4 235 PST130_02001 PST130_02403 PST130_05023 PST130_06503 PST130_07513 PST130_09275 PST130_12487 PST130_12491 PST130_12956 3000 2000 1000 0 3000 2000 1000 0 0 200 400 600 0 200 400 600 0 200 400 600 0 200 400 600 0 200 400 600 0 200 400 600 0 200 400 600 0 200 400 600 0 200 400 600 Fitted Values Figure C.12: Gene and isolate specific tests for equal variances after the model was fitted to the relative gene expression values. R e s i d u a l s 236 3 3 1.0 2 2 1 0.5 1 0 0 0.0 -1 -1 -2 -2 -0.5 -2 0 2 -2 -1 0 1 2 -2 0 2 Theoretical Theoretical Theoretical (i) Normal probability plot of residuals af- (ii) Normal probability plot of the random (iii) Assessment of equal variances after ter the model was fitted to the log10 intercepts after the model was fitted the model was fitted to the log10 trans- transformed relative gene expression to the log10 transformed relative gene formed relative gene expression val- values. expression values. ues. Figure C.13: Graphical tests for normality and equal variances of the residuals and random intercepts following a log10 transformation. The relative gene expression dataset was log10 transformed and revaluated for the assumptions that linear mixed models are based on. (i) Normal probability plot of the residuals and (ii) random intercepts of the log10 transformed relative expression values. A much closer relation was observed between the data and the curve indicating normality (in red), compared to the untransformed data (Figure C.12), (iii) residuals randomly scattered around the horizontal axis were as expected in a normally distributed dataset. S a m p l e S a m p l e S a m p l e 237 SA1 SA1 SA1 SA1 SA1 SA1 SA1 SA1 SA1 PST130_02001 PST130_02403 PST130_05023 PST130_06503 PST130_07513 PST130_09275 PST130_12487 PST130_12491 PST130_12956 1 1.5 1.0 1 1.0 12 2 1.0 1 0.5 0.5 0 0 0.5 1 1 0.0 0 0.0 0 0.0 0 -0.50 -1 -1 -1 -0.5-0.5 -1 -1.0 -1.0 -1.0 -1 -1 -2 -2 -1.5 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 SA4 SA4 SA4 SA4 SA4 SA4 SA4 SA4 SA4 PST130_02001 PST130_02403 PST130_05023 PST130_06503 PST130_07513 PST130_09275 PST130_12487 PST130_12491 PST130_12956 3 1 1.0 2 1 1.0 1 1 2 0.5 0.5 1 0 0 1 0 0.0 0 0.00 0 -0.5 0 -0.5 -1 -1 -1 -1 -1.0 -1 -1 -1 -1.0 -1.5 -2 -2 -1.5 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 Theoretical Figure C.14: Gene and isolate specific normal probability plots of the residuals after the model was fitted to the log10 transformed relative gene expression values. S a m p l e SA1 SA4 238 PST130_02001 PST130_02403 PST130_05023 PST130_06503 PST130_07513 PST130_09275 PST130_12487 PST130_12491 PST130_12956 2 1 0 -1 -2 2 1 0 -1 -2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 Fitted Values Figure C.15: Gene and isolate specific tests for equal variances after the model was fitted to the log10 transformed relative gene expression values. R e s i d u a l s CHAPTER C: GENE EXPRESSION ANALYSIS 239 C.3 Variability in RT-qPCR RT-qPCR is a sensitive, multi-step method where technical variation can easily be introduced, yielding variable data. Ling et al. (2012) recommended the use of three biological replicates with two to three RT-PCR repeats. It is a lengthy process, where every step needs to be skillfully performed with high precision, using calibrated instruments and keeping consumables constant. High quality template DNA is needed for biologically useful results, as limited template material can reduce the sensitivity of qPCR (Derveaux et al., 2010). Relative quantification of target gene expression complicates the process because only one reference gene can be used in the Pfaffl gene expression quan- tification model (Pfaffl, 2001). The use of a single reference gene is not advised, as more accurate results, with higher statistical significance are obtained when multiple reference genes are used. This requires the use of more complicated quantification models with built-in calibration schemes, efficiency calculators and methods to determine confidence measures, all adding to the total cost of qPCR experiments (Derveaux et al., 2010). The technical sensitivity of the RT-qPCR assay would ideally require every step in this five-step process to be replicated, for example, replicated RNA ex- tractions of the same tissue sample and replicate cDNA synthesis. This is often impractical, both in terms of time and cost. In this study, the amount of sample tissue that could be harvested per time point was an additional limitation. To include two more reference genes in the study would have increased the num- ber of PCR reactions by three times, significantly increasing required time and resources. Even if this was not a limiting factor, raw sample material would not have yielded enough RNA to be used in all reactions, and a whole different exper- imental approach would have been needed. The experimental setup performed was the most ambitious approach that could be accommodated. In the rest of this section best practices to keep technical noise to a minimum CHAPTER C: GENE EXPRESSION ANALYSIS 240 will be discussed and suggestions of how to further improve this aspect in future studies will be made. C.3.1 Variation in the application of treatments to biological replicates To keep technical variation to a minimum, special care must be taken to apply inoculum evenly to all plants. In this study, variation could have been intro- duced by the specific location of trays in the glasshouse, as some plants could have received more sunlight than others. In future, a mock inoculation can be considered as an alternative negative control. Run-to-run variation is not considered as true biological variation, but should not be regarded as negligible. The sample “maximisation assay setup”, described in Section 6.2.6, was applied with the aim to avoid this variation. Due to all the possible introduced variation, solid conclusions rely on the inclusion of multiple, independent biological replicates (Derveaux et al., 2010). However, due to physical capacity limitations, different biological replicates were assessed in different PCR runs. Assays would be spread out over even more runs if mock inoculation samples were added. Derveaux et al. (2010), offers a word of caution about inter-run variation, proposing inclusion of an identical sample across all plates as an inter-run calibrator. The limitation in the present study was that the positive control sample was not identical between all plates due to quantity constraints. C.3.2 Variation introduced by the RNA extraction process One must take the utmost care during the RNA extraction process as this is the most vulnerable part of the RT-qPCR experiment, due to the unstable nature of RNA. Fleige and Pfaffl (2006) also emphasise that it is important to use intact RNA for RT-qPCR and states that the Bioanalyzer 2100 measurement is a stable and reliable method for the quantification and quality assessment of RNA (Fleige and Pfaffl, 2006). Bustin and Nolan (2004) argue that RNA purification is the CHAPTER C: GENE EXPRESSION ANALYSIS 241 critical determinant of reproducibility and biological relevance of the subsequent result. Approximately half of the inoculated leaf sample for each time point was used in the first attempt of RNA extraction. The quantity of sampling material is therefore a limiting factor for RNA extraction replicates. With the current method, a maximum of two RNA extraction replicates would be possible, but that would leave no material to repeat an extraction that was unsuccessful. Sample processing, storing and transportation were controlled and kept consistent to minimise variability. C.3.3 Variation introduced by the reverse transcription process In this study, random hexamers were used as primers for the RT process. Using this nonspecific random oligonucleotides allowed the assessment of multiple targets in each sample and is known to yield the highest quantity and least bias cDNA. The alternative to prime RT reactions with, are thymine oligonucleotides (oligo-dTs). The advantages of this type of primers are that it is more specific to mRNA. RNA needs to be intact and of very high quality for this method as it will not prime RNAs without an adenine tail consisting of multiple adenine nucleotides (polyA tail). Judging the RNA integrity number (RIN) range, the methodology of random hexamers rendered it more suitable for the samples used in this study. An alternative priming method could be tested in future experiments, where a mixture of random hexamers and oligo-dTs is used, as suggested by Taylor and Mrkusich (2014). Some may argue that the one-step method of RT-qPCR reduces technical noise to some extent. The two-step method of RT-qPCR was chosen in this work as expression of multiple genes were assessed. Interest in more than one part of the transcriptome also meant that qPCR experiments were done over a considerable time frame of about three months and therefore storing a more stable form of nucleic acid was beneficial. CHAPTER C: GENE EXPRESSION ANALYSIS 242 C.3.4 Variation introduced by RT-qPCR Two objectives drive experimental layout (Derveaux et al., 2010). Firstly, in the “gene maximisation method”, the expression profiles of different genes in a sam- ple are compared. Multiple genes are assessed together on the same sample DNA per plate and samples are spread across different plates. In the present study, the alternative layout was used. Nine biological replicates were assessed for each gene and time point, allowing one biological replicate per 96-well PCR plate. This type of layout is also known as the “sample maximisation method” where samples that will be compared are preferably run together. The standardised values of the two treatments were compared with one another in each plate at each time point. In this way, gene expression changes can be investigated over the course of the infection process, to assess differences in the expression pattern between the two treatments. Because the number of wells per run is a limitation, the nine genes were assessed in different PCR runs. To reduce technical variabil- ity, three repeats of each sample were evaluated (Ling et al., 2012). Although isolates could be compared across time points for each gene assay within a run, inter-run variability could not be accounted for, as the nine biological replicates were assessed on nine different plates and the internal control had to be taken from different samples due to too little quantities. Standardisation as explained by Willems et al. (2008) could have been considered if a third treatment or mock treatment was present. C.3.5 Variation introduced by primers Primer design Primers that were specific to target sequences were designed with the help of online tools. Primers were designed with suitable characteristics regarding GC content, with low probability to form secondary structures and with the desired CHAPTER C: GENE EXPRESSION ANALYSIS 243 annealing temperature, using gene sequences. Empirical proof of primer dimer absence can be found by assessment of the melt curves. Primers-dimers are usually shorter than target amplicons and would therefore form a peak at low melting temperatures which are clearly visible on the melt curve (Kubista et al., 2006). In such a case the melt curve will have two peaks, one for the secondary structures and one for the amplicons. Designing primers for long gene sequences that include introns can be more complex. The possibility of alternative splicing in short genes of fungal pathogens has been indicated before (Grützmann et al., 2014), which sets the stage for future investigations on alternative splicing in Pst effector coding genes. Effectors are often short peptides (Saunders et al., 2012), which are consequently not rich in intronic regions. However, when genes have alternatively spliced isoforms, target identification can be tricky, complicating the primer design in turn (Derveaux et al., 2010). Some of the candidates assessed in this work have been shown to be alternatively spliced from RNA-Seq datasets. By annealing primers at various points along the mRNA, information about the expression of specific exons or the full transcript length can be gathered. To avoid amplification of gDNA, primers can further be designed to span exon-exon boundaries (Thellin et al., 1999). In future, more attention could be given to incorporate this step in the primer design protocol to improve primer specificity. This would not be possible in all cases however, as for example where short exons exist with high sequence variability between compared entities, as other criteria such as the absence of SNPs in primer sequences and identical amplicons need to be met. This will furthermore highly increase the cost and time needed per gene assay. End point analysis by gel electrophoresis remains a good indication that a PCR product of the intended size is obtained (Wittwer et al., 1997). Gel purification and sequencing of the product can be performed for a more specific confirmation. In this study, after optimisation, one primer pair was used for each gene assayed. CHAPTER C: GENE EXPRESSION ANALYSIS 244 Efficiency Primer efficiencies were determined by implementing dilution series assays. Care should be taken that the starting concentration of the series is concentrated enough to allow six or seven serial dilutions that still contain sufficient template to yield an accurate result. Quantifications at low template concentrations were either not successful in the programmed 40 cycle PCRs, or less reproducible. C.3.6 Choice of reference genes The Pst β-tubulin was used as reference gene. It has been used widely across many species, but there are controversial reports in the literature about the stability of many of the genes traditionally considered to have stable expression profiles and specifically using them as reference genes in qPCR (Murphy and Polak, 2002; Jain et al., 2006; Schmidt and Delaney, 2010). If the reference gene has not been tested before in the same experimental conditions, it is recommended that more than one reference gene should be included when the relative quantification method is used (Thellin et al., 1999). Thellin et al. (1999) suggest using rRNA 18S and 28S as internal standards. The use of three to five reference genes is proposed for accurate normalisation (Derveaux et al., 2010). It is advised that more reference genes should be used to evaluate relative gene expression in future studies. RT-qPCR remains to be a process full of grey areas, but it has become a prime method in various biological research fields (Schmittgen and Livak, 2008). Al- though the variability in data quality and reporting has been addressed by setting up the MIQE guidelines (Bustin et al., 2009), vague reporting of methodology and statistical analysis continues to mislead newcomers to the field. The sensitivity of RT-qPCR makes it a powerful tool, but necessitates that every step is performed with great accuracy to keep variability, which is inevitably introduced into every part of the multi-step process, to a minimum. Technical CHAPTER C: GENE EXPRESSION ANALYSIS 245 repeats aim to identify outliers that are not caused by true biological variation, but rather due to accumulation of technical inconsistency. Even when no obvious outliers can be identified, the measured CT values are always a combination of true biological variance and technical noise introduced during the process (Ling et al., 2012). C.3.7 Results of efficiency corrected relative gene expression Sufficient amounts of biological replicates are needed to make sound conclusions. The trouble with RT-qPCR is that inter-plate variability can jeopardies conclu- sions. Appendix C, Figure C.16, displays the relative expression data that was standardised to the reference gene by the efficiency corrected method (Schmittgen and Livak, 2008) and log10 transformed. From this data, it is difficult to conclude a true biological result. Patterns seen across plates can indicate an experimental error. For example, high relative expressions were seen in all genes in SA1, plate four at time point 9. Similar behaviour was observed at time points 3 and 5, although to a lesser extent. Appendix C, Figure C.17, indicates relative gene expression profiles as de- termined by the efficiency corrected relative quantification method. It is clear from Figures C.16 and C.17 that the Pfaffl method is not suitable for data with so much variability. To achieve success in RT-qPCR assays in future, it is advised to use multiple reference genes and deploy the developed software available, as reviewed (Ruijter et al., 2013). Improvement of both throughput and accuracy can be achieved using a 384-well PCR platform. CHAPTER C: GENE EXPRESSION ANALYSIS 246 PST130_02001 PST130_02403 PST130_05023 3 2 1 0 -1 -2 PST130_06503 PST130_07513 PST130_09275 plate 3 1 2 2 3 1 4 5 0 6 -1 7 8 -2 9 PST130_12487 PST130_12491 PST130_12956 3 2 1 0 -1 -2 0 1 2 3 5 9 12 0 1 2 3 5 9 12 0 1 2 3 5 9 12 Days Post Inoculation Figure C.16: High inter-run variability in relative expression patterns is due to the sum of the effects of inter-assay variability and the variability between different biological replicates. It is difficult to distinguish between the two sources of variability. This highlights the need for a calibration method when experiments include more than one qPCR that need to be compared. Log(10) of the Relative Expression of SA1 to SA4 CHAPTER C: GENE EXPRESSION ANALYSIS 247 PST130_02001 PST130_02403 PST130_05023 0.75 0.50 0.25 0.00 -0.25 -0.50 PST130_06503 PST130_07513 PST130_09275 0.75 0.50 0.25 0.00 -0.25 -0.50 PST130_12487 PST130_12491 PST130_12956 0.75 0.50 0.25 0.00 -0.25 -0.50 0 1 2 3 5 9 12 0 1 2 3 5 9 12 0 1 2 3 5 9 12 Days Post Inoculation Figure C.17: The Pfaffl method of relative gene expression shows the relative gene ex- pression of SA1 to SA4. A positive value indicates a higher expression in SA1, while a negative value indicates a higher expression in SA4. This method does not correct for inter-run variability and risks making false conclusions. Log(10) of the Relative Expression of SA1 to SA4 Appendix D Analysis of the Current Stripe Rust Threat in South Africa 248 CHAPTER D: CURRENT PST THREAT IN SOUTH AFRICA 249 13/SAZP1 14/SADL1 14/SADL2 14/SADL3 14/SADL4 250 1200 600 750 200 200 900 150 400 500 600 100 100 250 20050 300 0 0 0 0 0 14/SADL5 14/SADL6 14/SATT1 14/SATT2 14/SATT3 1500 800 1000 600 200 600 750 400 1000 400 500 100 200 500 200 250 0 0 0 0 0 14/SATT4 14/SATT5 14/SAZP2 14/SAZP3 15/SAZP1 600 750 600 300 400 400500 400 200 250 200 200 200 100 0 0 0 0 0 15/SAZP10 15/SAZP11 15/SAZP12 15/SAZP2 15/SAZP3 1200 1250 1000 300 900 1000 750 300 750 200 600 200 500 500 300 100250 250 100 0 0 0 0 0 15/SAZP4 15/SAZP5 15/SAZP6 15/SAZP7 15/SAZP8 800 800 90 750 400 600 600 300 60 500 400 400 200 30 250 200 200 100 0 0 0 0 0 15/SAZP9 1000 500 0 frequency Figure D.1: Read frequency graphs from heterokaryotic SNP sites for the South African field isolates (analysed in Chapter 7) that were collected between 2013 and 2015. See Table 7.2 for further identification purposes. count 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 CHAPTER D: CURRENT PST THREAT IN SOUTH AFRICA 250 14/ET2 14/ET3 14/ET4 14/ET5 14/K10 250 600 1000 500 90 200 400 150 400 750 60 300500 100 200 20030 50 250 100 0 0 0 0 0 14/K11 14/K12 14/K13 14/K14 14/K15 1000 1000 900 750 750 600750 500 500 500 400 600 250 250 250 200 300 0 0 0 0 0 14/K16 14/K2 14/K4 14/K5 14/K6 800 600 1500 750 750 600 500 400 400 1000 500 200 500 250 250 200 0 0 0 0 0 14/K7 14/K8 14/K9 600 800 600 400 600 400 400 200 200 200 0 0 0 frequency Figure D.2: Read frequency graphs from heterokaryotic SNP sites for the East African field isolates (analysed in Chapter 7) that were collected in 2014. See Table 7.2 for further identification purposes. count 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 T4T 5 4/S A DL 1 /SA 6 DL 14 4/S A L1 1 /SA D 14 AD L4 14/ S DL2 14/S A ADL 3 14/S ZP1 13/SA 14/SAZ P3 6 14/K 4/K21 0 14/K 1 5 4/E T 1 T4E 14/ 3 4/E T 1 15 14 /K 251 14 1 /K4 1 / 1 K 2 4/ 1K 114 5/ 0K ET08/1 14 7/K 2 2 1 94/ 3/ 18 ld- K1 1 3/3 3 Q 1 1 ld- 14/E T2 3 114 1 3/2 Q /1 R-1 E R181a /K16 1 AT 1 R-34/K4 AT 14/K1 ER179b/114 14/K8 KE74217 KE89069 13/38 0.0002 14/4 ET87094 0 ET03b/10 13/25 SA1 13/29 A3 13/ S 711 SA2 1/13 1 SA4 3/P2 27AZ 14/S TT3 13/ /SA 123 14 T 2 13 AT 5 /1 14/S TT1 1 1 9 4/S A TT 1 /0 1 A 3 8 /S 1 /1 1514 /08* Figure D.3: Circular relative distance maximum likelihood phylogenetic tree. The relative distance maximum likelihood phylogenetic tree describes the relative relationship between isolates described in Figure 7.3 where branch lengths are ignored and only topology was considered. The group East Africa (B) isolates absent from Figure 7.4 is displayed in this dendrogram. The key in Figure 7.3 also applies here. 1 1 55 // SS A 15 AZ Z P P/S 1 715 A/ 2S ZP 15 AZ 9/SA P15/ Z 5 S PA 815/S ZA PZ 6 15 P/S 1A 0ZP 15 1/S 1AZP4 15/SAZP3 15/SAZP1 T13/3 7 8 . 6 S S 1 T13/2 88.45SS T13/1 88.5 0 S8 S/ 121 CL1 118 / 0 8 14 .4 04S J 3 S J 0 /7 3 J 0 085 0 205 FJ 2 5 §11 0/ 1 -0 C 1 12 4 2 8 4 2 Bm1 CHAPTER D: CURRENT PST THREAT IN SOUTH AFRICA 252 Table D.1: Differential testing of South African Pst isolates previously defined as patho- type 6E16A- on an extended set of wheat seedling testers !"#$%&'(!) 674(89:497;< 4=>67?@4 *+'%&(,%-.'&/ 0'"."&%12' 0'34 0'35 0'34 0'35 !"#$%&'()&*+,- !"#$%#&' ./ ./ ./ ./ !0-"1123&45/6$78 !"#&(%#)* ./ ./ ./ ./ 9$-$ !"#+,%#-." ./ ./ ./% ./% :5;+$-5%&(< !"#/0%#$0 ./% ./% ./% ./% =25%23&>$;12% !"#)%#1 ,;P,%3$%, !"#) A A A A GH$/20&! !"3/&/$%0-$; A A !")2 C/% C/% .C/% (/ !2;65-6 !")2 .CI/% .CI/% .C/% .C/% G+1505$% ?@>@.A@ ./% ./% ./% ./% G7,/82 !"2%#!"+2%#BCDE" ./% ./% ./% ./% T-5O,K52- !"+2%#BCDE" . . . . B,K2%L, !"1%#!"2%#BCDE" ./% ./% ./% ./% B,-302%3&: !"/)%#BCDE" . . . . B;,5-2 !"+1 ./ ./ ./ ./ B-"3$2 ?@>@.A@ . . . . 42;785 ?@>@.A@ ./% ./% ./% ./% >-,%5/8 ?@>@.A@ ./% ./% . . >U!&!02-;5%O ?@>@.A@ . . . . 9$%02-2P ?@>@.A@ ./ ./ ./ ./ 9$3,5/ ?@>@.A@ . . . . N2%K2LH$"3 !"+2%#BCDE" . . . . N2H2;,05$% ?@>@.A@ . . . . N$15O"3 ?@>@.A@ ./ ./ ./ ./ !$;305/2 ?@>@.A@ . . . . Q-5K2%0 !"+2 << <X ./% ./% ./% ./% U,--5$- ?@>@.A@ . . . . N27Y&-27;5/,02&Z&3"3/Y&3"3/270,1;2&Z&.&&[;2/6&Z&I&&H2-P&3+,;;&7"30";2&Z&CEA&35L2&$[&7"30";2&Z&/&&/8;$-$353&Z&%&%2/-$353 CHAPTER D: CURRENT PST THREAT IN SOUTH AFRICA 253 Table D.2: Differential testing of South African Pst isolates previously defined as patho- type 6E22A+ on an extended set of wheat seedling testers !"#$%&'(!) 678 49:67;<8 *+'%&(,%-.'&/ 0'"."&%12' 0'34 0'35 0'34 0'35 !"#$%&'()&*+,- !"#$%#&' ./ ./ ./ ./ !0-"1123&45/6$78 !"#&(%#)* 9:/ 9;; +5<2=>&./&?&( @ A$-$ !"#+,%#-." ./% ./ ./ ./ B5C+$-5%&(9 !"#/0%#$0 .D:/ .D:/ .D:/ .D:/ E25%23&F$C12% !"#)%#1 G G G G H22 !"#2 G G G G I85%2323&DJJ !"#+ . @ . @ E25%23&BKK )%#)*%#3455 .D;/% .D/% 9/ 9/ LM!)!"&6 !"#&6 @ @ @ @ LM!)!"/) !"#/)#7!"849 @ @ .D/&N35%OC2P @ I$+7,5- !"#: @ @ @ 9 Q$-=&4237-2R !"#/0%#$0 .D:/ .D:/ ./ ./D:/ E25%23&S26$ !"#)%#1%#)* 9@/ 9@/ 9@/ 9;/ T25/82-312-O&G( !"#2%#)* 9;;/ 9;;/ 9;;/ 9;;/ EU1-5=&GJ !"#$; . . . . IC2+2%0 !"#)%#<%#)*!"#$% . . . . VW&!72C0, !"#* . @ . @ F,CU,%3$%, !"#) G G G G LM$/20&! !"3/&/$%0-$C G G G G !"+)JX&LM! !"#+ . . . . !"*)JX&LM! !"#* . @ @ @ !"1)JX&LM! !"#1 G G G G !"2)JX&LM! !"#2 G G G G !":)JX&LM! !"#: 9 @ @ @ !"<)JX&LM! !"#< . @ @ ./ !"+,)JX&LM! !"#+, @ @ @ @ !"+*)JX&LM! !"#+* . @ . @ !"+2)JX&LM! !"#+2 9:/ 9:/ 9;; 9;; !")$)JX&LM! !"#)$ ./ ./ ./ ./ !")1)9X&LM! !"#)1 @ @ ./&N35%OC2P @ !")2)JX&LM! !"#)2 .D:/% .D:/% .D:/% @ LM$/20&T& !"#= G G +5<2=>&./&?&4 +5<2=>&./&?&4 !"#&> !")2 .D(/% .D(/% (;/ (;/ !2C65-6 !")2 (/% (/% 9:/% 9:/% L+1505$% ?@>@.A@ ./% ./% ..D:D:/% ..D:/% L7,/82 !"2%#!"+2%#BCDE" ./% ./% ./% ./% Y-5O,=52- !"+2%#BCDE" . . . . I,=2%R, !"1%#!"2%#BCDE" 9/ 9/ 9;/ 9;/ I,-302%3&B !"/)%#BCDE" ./% ./% ./% ./% IC,5-2 !"+1 .D:/% ./% ./% ./% I-"3$2 ?@>@.A@ . . . . 42C785 ?@>@.A@ ./% ./% ./% ./% F-,%5/8 ?@>@.A@ ./% ./% ./% .D:/% FZ!&!02-C5%O ?@>@.A@ ./% ./% ./% ./% A$%02-2U ?@>@.A@ ./% ./% .D:/ .D:/ A$3,5/ ?@>@.A@ ./% ./% ./% ./% T2%=2RM$"3 !"+2%#BCDE" ./% ./% ./% ./% T2M2C,05$% ?@>@.A@ ./% ./% ./% ./% T$15O"3 ?@>@.A@ ./% ./% ./% ./% !$C305/2 ?@>@.A@ . . ./ ./ V-5=2%0 !"+2 9;; 9/ G 9;;/ V,C$% !"/) ./% ./% ./% ./% B"6, !"3/&/$%0-$C&N[FP 9:/ 9:/ 9;; 9;; Z,--5$- ?@>@.A@ . . . . T27\&-27C5/,02&?&3"3/\&3"3/270,1C2&?&.&&]C2/6&?&:&&M2-U&3+,CC&7"30"C2&?&D@G&35R2&$]&7"30"C2&?&/&&/8C$-$353&?&%&%2/-$353 +5<2=\&+$-2&08,%&$%2&5%]2/05$%&0U72&,+$%O&5%=5M5=",C&7C,%03&5%&5%$/"C,05$%&-27C5/,02&?&35%OC2\&$%CU&,&35%OC2&7C,%0&3/$-2= Bibliography AgriOrbit, 2017. Uncertainty over Western Cape wheat cultivation conditions. URL https://agriorbit.com/ uncertainty-western-cape-wheat-cultivation-conditions/. [Online; accessed 20/01/2018]. Ali, S., Gladieux, P., Leconte, M., Gautier, A., Justesen, A. F., Hovmøller, M. S., Enjalbert, J., and de Vallavieille-Pope, C. 2014. Origin, migration routes and worldwide population genetic structure of the wheat yellow rust pathogen Puccinia striiformis f.sp. tritici. PLoS Pathogens, 10:e1003903. Ali, S., Rodriguez-Algaba, J., Thach, T., Sørensen, C. K., Hansen, J. G., Lassen, P., Nazari, K., Hodson, D. P., Justesen, A. F., and Hovmøller, M. S. 2017. Yellow rust epidemics worldwide were caused by pathogen races from divergent genetic lineages. Frontiers in Plant Science, 8:1057. Allison, O. C. and Isenbeck, K. 1930. Biological specialization of Puccinia glum- narum tritici Eriksson and Henning. Phytopathologische Zeitschrift, 2. Altschul, S. F., Madden, T. L., Schäffer, A. A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. J. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research, 25:3389–3402. Ames, B. N. 1979. Identifying environmental chemicals causing mutations and cancer. Science, 204:587–593. Anderson, P. K., Cunningham, A. A., Patel, N. G., Morales, F. J., Epstein, P. R., and Daszak, P. 2004. Emerging infectious diseases of plants: pathogen pollution, climate change and agrotechnology drivers. Trends in Ecology & Evolution, 19: 535–544. Andrews, S., 2010. FastQC A Quality Control tool for High Throughput Se- quence Data. URL http://www.bioinformatics.babraham.ac.uk/projects/ fastqc/. [Online; accessed 20/01/2018]. Anikster, Y. 1984. The formae speciales. In Bushnell, W. R. and Roelfs, A. P., editors, The Cereal Rusts. Orlando. 254 BIBLIOGRAPHY 255 Badebo, A., Stubbs, R. W., van Ginkel, M., and Gebeyehu, G. 1990. Identification of resistance genes to puccinia striiformis in seedlings of Ethiopian and CIM- MYT bread wheat varieties and lines. Netherlands Journal of Plant Pathology, 96: 199–210. Badebo, A., Assefa, S., and Fehrmann, H. 2008. Yellow rust resistance in advanced lines and commercial cultivars of bread wheat from Ethiopia. East African Journal of Sciences, 2:29–34. Bates, D., Mächler, M., Bolker, B., and Walker, S. 2014. Fitting linear mixed-effects models using lme4. arXiv preprint arXiv:1406.5823. Beddow, J. M., Pardey, P. G., Chai, Y., Hurley, T. M., Kriticos, D. J., Braun, H.-J., Park, R. F., Cuddy, W. S., and Yonow, T. 2015. Research investment implications of shifts in the global geography of wheat stripe rust. Nature Plants, 1:15132. Bienko, M., Green, C. M., Crosetto, N., Rudolf, F., Zapart, G., Coull, B., Kan- nouche, P., Wider, G., Peter, M., Lehmann, A. R., Hofmann, K., and Dikic, I. 2005. Ubiquitin-binding domains in Y-family polymerases regulate translesion synthesis. Science, 310:1821–1824. Bockus, W. W. and Wiese, M. V., editors. 2010. Compendium of wheat diseases and pests. St. Paul, Minn, 3rd ed edition. Bofkin, L. and Goldman, N. 2006. Variation in evolutionary processes at different codon positions. Molecular Biology and Evolution, 24:513–521. Bolton, M. D., Kolmer, J. A., and Garvin, D. F. 2008. Wheat leaf rust caused by Puccinia triticina. Molecular Plant Pathology, 9:563–575. Boshoff, W. H. P. and Pretorius, Z. A. 1999. A new pathotype of Puccinia striiformis f. sp. tritici on wheat in South Africa. Plant Disease, 83:591–591. Boshoff, W. H. P., Pretorius, Z. A., and Van Niekerk, B. D. 2002. Establishment, distribution, and pathogenicity of Puccinia striiformis f. sp. tritici in South Africa. Plant Disease, 86:485–492. Boshoff, W. H. P., Pretorius, Z. A., and Van Niekerk, B. D. 2003. Fungicide efficacy and the impact of stripe rust on spring and winter wheat in South Africa. South African Journal of Plant and Soil, 20:11–17. Bozkurt, T. O., Schornack, S., Banfield, M. J., and Kamoun, S. 2012. Oomycetes, effectors, and all that jazz. Current opinion in plant biology, 15:483–492. Brown, J. K. M. 2003. Little else but parasites. Science, 299:1680–1681. Brown, J. K. and Hovmøller, M. S. 2002. Aerial dispersal of pathogens on the global and continental scales and its impact on plant disease. Science, 297: 537–541. BIBLIOGRAPHY 256 Bubić, I., Wagner, M., Krmpotić, A., Saulig, T., Kim, S., Yokoyama, W. M., Jonjić, S., and Koszinowski, U. H. 2004. Gain of virulence caused by loss of a gene in murine cytomegalovirus. Journal of Virology, 78:7536–7544. Bueno-Sancho, V., Persoons, A., Hubbard, A., Cabrera-Quio, L. E., Lewis, C. M., Corredor-Moreno, P., Bunting, D. C. E., Ali, S., Chng, S., Hodson, D. P., Madariaga Burrows, R., Bryson, R., Thomas, J., Holdgate, S., and Saunders, D. G. O. 2017. Pathogenomic analysis of wheat yellow rust lineages detects sea- sonal variation and host specificity. Genome Biology and Evolution, 9:3282–3296. Burns, M. J., Nixon, G. J., Foy, C. A., and Harris, N. 2005. Standardisation of data from real-time quantitative PCR methods–evaluation of outliers and comparison of calibration curves. BMC Biotechnology, 5:31. Bustin, S. A., Benes, V., Garson, J. A., Hellemans, J., Huggett, J., Kubista, M., Mueller, R., Nolan, T., Pfaffl, M. W., Shipley, G. L., Vandesompele, J., and Wit- twer, C. T. 2009. The MIQE Guidelines: Minimum information for publication of quantitative real-time PCR experiments. Clinical Chemistry, 55:611–622. Bustin, S. A. and Nolan, T. 2004. Pitfalls of quantitative real-time reverse- transcription polymerase chain reaction. Journal of Biomolecular Techniques: JBT, 15:155. Büschges, R., Hollricher, K., Panstruga, R., Simons, G., Wolter, M., Frijters, A., Daelen, R. v., Lee, T. v. d., Diergaarde, P., Groenendijk, J., Töpsch, S., Vos, P., Salamini, F., and Schulze-Lefert, P. 1997. The barley Mlo gene: A novel control element of plant pathogen resistance. Cell, 88:695–705. Cantu, D., Govindarajulu, M., Kozik, A., Wang, M., Chen, X., Kojima, K. K., Jurka, J., Michelmore, R. W., and Dubcovsky, J. 2011. Next generation sequencing provides rapid access to the genome of Puccinia striiformis f. sp. tritici, the causal agent of wheat stripe rust. PLoS ONE, 6:e24230. Cantu, D., Segovia, V., MacLean, D., Bayles, R., Chen, X., Kamoun, S., Dubcovsky, J., Saunders, D. G., and Uauy, C. 2013. Genome analyses of the wheat yellow (stripe) rust pathogen Puccinia striiformis f. sp. tritici reveal polymorphic and haustorial expressed secreted proteins as candidate effectors. BMC Genomics, 14:270. Castanera, R., López-Varas, L., Borgognone, A., LaButti, K., Lapidus, A., Schmutz, J., Grimwood, J., Pérez, G., Pisabarro, A. G., Grigoriev, I. V., Stajich, J. E., and Ramírez, L. 2016. Transposable elements versus the fungal genome: Impact on whole-genome architecture and transcriptional profiles. PLoS Genetics, 12: e1006108. Chen, J., Upadhyaya, N. M., Ortiz, D., Sperschneider, J., Li, F., Bouton, C., Breen, S., Dong, C., Xu, B., Zhang, X., Mago, R., Newell, K., Xia, X., Bernoux, M., Taylor, J. M., Steffenson, B., Jin, Y., Zhang, P., Kanyuka, K., Figueroa, M., Ellis, BIBLIOGRAPHY 257 J. G., Park, R. F., and Dodds, P. N. 2017. Loss of AvrSr50 by somatic exchange in stem rust leads to virulence for Sr50 resistance in wheat. Science, 358:1607–1610. Chen, W., Wellings, C., Chen, X., Kang, Z., and Liu, T. 2014. Wheat stripe (yellow) rust caused by Puccinia striiformis f. sp. tritici: Puccinia striiformis , yellow rust. Molecular Plant Pathology, 15:433–446. Chen, X. M., Line, R. F., and Leung, H. 1993. Relationship between virulence variation and DNA polymorphism in Puccinia striiformis. Phytopathology, 83: 1489–1497. Chen, X., Penman, L., Wan, A., and Cheng, P. 2010. Virulence races of Puccinia striiformis f. sp. tritici in 2006 and 2007 and development of wheat stripe rust and distributions, dynamics, and evolutionary relationships of races from 2000 to 2007 in the United States. Canadian Journal of Plant Pathology, 32:315–333. Chen, X. 2005. Epidemiology and control of stripe rust [Puccinia striiformis f. sp. tritici] on wheat. Canadian Journal of Plant Pathology, 27:314–337. Chen, Y.-E., Cui, J.-M., Su, Y.-Q., Yuan, S., Yuan, M., and Zhang, H.-Y. 2015. Influence of stripe rust infection on the photosynthetic characteristics and antioxidant system of susceptible and resistant wheat cultivars at the adult plant stage. Frontiers in Plant Science, 6:779. Cheng, P., Ma, Z., Wang, X., Wang, C., Li, Y., Wang, S., and Wang, H. 2014. Impact of UV-B radiation on aspects of germination and epidemiological components of three major physiological races of Puccinia striiformis f. sp. tritici. Crop Protection, 65:6–14. Cingolani, P., Platts, A., Wang, L. L., Coon, M., Nguyen, T., Wang, L., Land, S. J., Lu, X., and Ruden, D. M. 2012. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff. Fly, 6:80–92. Coram, T. E., Wang, M., and Chen, X. 2008. Transcriptome analysis of the wheat—Puccinia striiformis f. sp. tritici interaction. Molecular Plant Pathology, 9: 157–169. Cuomo, C. A., Bakkeren, G., Khalil, H. B., Panwar, V., Joly, D., Linning, R., Sakthikumar, S., Song, X., Adiconis, X., and Fan, L. 2017. Comparative analysis highlights variable genome content of wheat rusts and divergence of the mating loci. G3: Genes, Genomes, Genetics, 7:361–376. DAFF, 2015. A profile of the South African wheat market value chain. URL http: //www.nda.agric.za/doaDev/sideMenu/Marketing/AnnualPublications/ CommodityProfiles/fieldcrops/WheatMarketValueChainProfile2015.pdf. [Online; accessed 20/01/2018]. DAFF, 2016. A profile of the South African wheat market value chain. URL http: //www.nda.agric.za/doaDev/sideMenu/Marketing/AnnualPublications/ BIBLIOGRAPHY 258 CommodityProfiles/fieldcrops/WheatMarketValueChainProfile2016.pdf. [Online; accessed 20/01/2018]. Dangl, J. L. and Jones, J. D. 2001. Plant pathogens and integrated defence responses to infection. Nature, 411:826. Davey, J. W., Hohenlohe, P. A., Etter, P. D., Boone, J. Q., Catchen, J. M., and Blaxter, M. L. 2011. Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nature Reviews Genetics, 12:499–510. de Vallavieille-Pope, C., Huber, L., Leconte, M., and Bethenod, O. 2002. Preinocu- lation effects of light quantity on infection efficiency of Puccinia striiformis and P. triticina on wheat seedlings. Phytopathology, 92:1308–1314. de Vallavieille-Pope, C., Ali, S., Leconte, M., Enjalbert, J., Delos, M., and Rouzet, J. 2012. Virulence dynamics and regional structuring of Puccinia striiformis f. sp. tritici in France between 1984 and 2009. Plant Disease, 96:131–140. Dean, R., Van Kan, J. a. L., Pretorius, Z. A., Hammond-Kosack, K. E., Di Pietro, A., Spanu, P. D., Rudd, J. J., Dickman, M., Kahmann, R., Ellis, J., and Foster, G. D. 2012. The top 10 fungal pathogens in molecular plant pathology. Molecular Plant Pathology, 13:414–430. Denbel, W. 2014. Epidemics of Puccinia striiformis f. sp. tritici in Arsi and West Arsi zones of Ethiopia in 2010 and identification of effective resistance genes. Journal of Natural Sciences Research, 4:33–39. Derveaux, S., Vandesompele, J., and Hellemans, J. 2010. How to do successful gene expression analysis using real-time PCR. Methods, 50:227–230. Dobon, A., Bunting, D. C. E., Cabrera-Quio, L. E., Uauy, C., and Saunders, D. G. O. 2016. The host-pathogen interaction between wheat and yellow rust induces temporally coordinated waves of gene expression. BMC Genomics, 17: 380. Dodds, P. N. and Rathjen, J. P. 2010. Plant immunity: towards an integrated view of plant–pathogen interactions. Nature Reviews Genetics, 11:539. Dodds, P. N., Lawrence, G. J., Catanzariti, A.-M., Teh, T., Wang, C.-I. A., Ayliffe, M. A., Kobe, B., and Ellis, J. G. 2006. Direct protein interaction underlies gene-for-gene specificity and coevolution of the flax resistance genes and flax rust avirulence genes. Proceedings of the National Academy of Sciences of the United States of America, 103:8888–8893. Dodds, P. N., Rafiqi, M., Gan, P. H. P., Hardham, A. R., Jones, D. A., and Ellis, J. G. 2009. Effectors of biotrophic fungi and oomycetes: pathogenicity factors and triggers of host resistance. New Phytologist, 183:993–1000. BIBLIOGRAPHY 259 Dong, S., Raffaele, S., and Kamoun, S. 2015. The two-speed genomes of filamen- tous pathogens: waltz with plants. Current Opinion in Genetics & Development, 35:57–65. Dou, D. and Zhou, J.-M. 2012. Phytopathogen effectors subverting host immunity: different foes, similar battleground. Cell Host & Microbe, 12:484–495. Drake, J. W., Charlesworth, B., Charlesworth, D., and Crow, J. F. 1998. Rates of spontaneous mutation. Genetics, 148:1667–1686. Du Plessis, A. 1933. The history of small-grains culture in South Africa. Annals of the University of Stellenbosch, 8:1652–1752. Duan, X., Tellier, A., Wan, A., Leconte, M., Vallavieille-Pope, C. d., and Enjalbert, J. 2010. Puccinia striiformis f.sp. tritici presents high diversity and recombination in the over-summering zone of Gansu, China. Mycologia, 102:44–53. Duplessis, S., Cuomo, C. A., Lin, Y.-C., Aerts, A., Tisserant, E., Veneault-Fourrey, C., Joly, D. L., Hacquard, S., Amselem, J., Cantarel, B. L., Chiu, R., Coutinho, P. M., Feau, N., Field, M., Frey, P., Gelhaye, E., Goldberg, J., Grabherr, M. G., Kodira, C. D., Kohler, A., Kües, U., Lindquist, E. A., Lucas, S. M., Mago, R., Mauceli, E., Morin, E., Murat, C., Pangilinan, J. L., Park, R., Pearson, M., Quesneville, H., Rouhier, N., Sakthikumar, S., Salamov, A. A., Schmutz, J., Selles, B., Shapiro, H., Tanguay, P., Tuskan, G. A., Henrissat, B., Peer, Y. V. d., Rouzé, P., Ellis, J. G., Dodds, P. N., Schein, J. E., Zhong, S., Hamelin, R. C., Grigoriev, I. V., Szabo, L. J., and Martin, F. 2011. Obligate biotrophy features unraveled by the genomic analysis of rust fungi. Proceedings of the National Academy of Sciences, 108:9166–9171. Edgerton, M. D. 2009. Increasing crop productivity to meet global needs for feed, food, and fuel. Plant Physiology, 149:7–13. Egorov, T. A., Odintsova, T. I., Pukhalsky, V. A., and Grishin, E. V. 2005. Diversity of wheat anti-microbial peptides. Peptides, 26:2064–2073. El Gueddari, N. E., Rauchhaus, U., Moerschbacher, B. M., and Deising, H. B. 2002. Developmentally regulated conversion of surface-exposed chitin to chitosan in cell walls of plant pathogenic fungi. New Phytologist, 156:103–112. Elyasi-Gomari, S. and Petrenkova, V. P. 2011. Virulence of Puccinia striiformis f. sp. tritici in Khuzestan province of Iran. American Journal of Experimental Agriculture, 1:281. Emanuelsson, O., Brunak, S., Heijne, G. v., and Nielsen, H. 2007. Locating proteins in the cell using TargetP, SignalP and related tools. Nature Protocols, 2: 953. Enjalbert, J., Duan, X., Leconte, M., Hovmøller, M. S., and De Vallavieille-Pope, C. 2005. Genetic evidence of local adaptation of wheat yellow rust (Puccinia BIBLIOGRAPHY 260 striiformis f. sp. tritici) within France: Geographic structure of yellow rust in France. Molecular Ecology, 14:2065–2073. Evanno, G., Regnaut, S., and Goudet, J. 2005. Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Molecular Ecology, 14:2611–2620. FAS USDA, 2016. Grain and feed annual report—Republic of South Africa. URL https://gain.fas.usda.gov/Recent%20GAIN%20Publications/Grain% 20and%20Feed%20Annual_Pretoria_South%20Africa%20-%20Republic% 20of_3-24-2016.pdf. [Online; accessed 20/01/2018]. FAS USDA, 2017. United states department of agriculture—foreign agricultureal service: Production, supply and distribution report. URL https://apps.fas. usda.gov/psdonline/app/index.html#/app/home/statsByCountry. [Online; accessed 20/01/2018]. Fernández-Ortuño, D., Torés, J. A., Vicente, A. d., and Pérez-García, A. 2007. Multiple displacement amplification, a powerful tool for molecular genetic analysis of powdery mildew fungi. Current Genetics, 51:209–219. Fitzmaurice, G., Davidian, M., Verbeke, G., and Molenberghs, G. 2008. Longitudi- nal data analysis. Fleige, S. and Pfaffl, M. W. 2006. RNA integrity and the effect on the real-time qRT-PCR performance. Molecular Aspects of Medicine, 27:126–139. Flood, J. 2010. The importance of plant health to food security. Food Security, 2: 215–231. Flor, H. 1956. The complementary genic systems in flax and flax rust. Advances in Genetics, 8:29–54. Franceschetti, M., Maqbool, A., Jiménez-Dalmaroni, M. J., Pennington, H. G., Kamoun, S., and Banfield, M. J. 2017. Effectors of filamentous plant pathogens: Commonalities amid diversity. Microbiology and Molecular Biology Reviews, 81: e00066–16. Garnica, D. P., Nemri, A., Upadhyaya, N. M., Rathjen, J. P., and Dodds, P. N. 2014. The ins and outs of rust haustoria. PLoS Pathogens, 10:e1004329. Gilroy, E. M., Breen, S., Whisson, S. C., Squires, J., Hein, I., Kaczmarek, M., Turnbull, D., Boevink, P. C., Lokossou, A., Cano, L. M., Morales, J., Avrova, A. O., Pritchard, L., Randall, E., Lees, A., Govers, F., van West, P., Kamoun, S., Vleeshouwers, V. G. A. A., Cooke, D. E. L., and Birch, P. R. J. 2011. Pres- ence/absence, differential expression and sequence polymorphisms between PiAVR2 and PiAVR2-like in phytophthora infestans determine virulence on R2 plants. New Phytologist, 191:763–776. BIBLIOGRAPHY 261 Glen, H. F. 2002. Cultivated plants of Southern Africa: Botanical names, common names, origins, literature. Godfrey, D., Böhlenius, H., Pedersen, C., Zhang, Z., Emmersen, J., and Thordal- Christensen, H. 2010. Powdery mildew fungal effector candidates share N-terminal Y/F/WxC-motif. BMC Genomics, 11:317. GRAIN SA, 2017. CEC Wheat per province: Production Info—Area Grown, Yields and Estimates. URL http://www.grainsa.co.za/report-documents?cat=14. [Online; accessed 20/01/2018]. Griffiths, A. J. F., Wessler, S. R., Carroll, S. B., and Doebley, J. F. 2015. Introduction to Genetic Analysis. New York, NY, eleventh edition edition. Grubbs, F. E. 1969. Procedures for detecting outlying observations in samples. Technometrics, 11:1–21. Grützmann, K., Szafranski, K., Pohl, M., Voigt, K., Petzold, A., and Schuster, S. 2014. Fungal alternative splicing is associated with multicellular complexity and virulence: a genome-wide multi-species study. DNA Research, 21:27–39. Hacquard, S., Petre, B., Frey, P., Hecker, A., Rouhier, N., and Duplessis, S. 2011. The Poplar-Poplar rust interaction: Insights from genomics and transcriptomics. Journal of Pathogens, pages 1–11. Hane, J. K. and Oliver, R. P. 2010. In silico reversal of repeat-induced point mutation (RIP) identifies the origins of repeat families and uncovers obscured duplicated genes. BMC Genomics, 11:655. Harris, M. O., Friesen, T. L., Xu, S. S., Chen, M. S., Giron, D., and Stuart, J. J. 2015. Pivoting from arabidopsis to wheat to understand how agricultural plants integrate responses to biotic stress. Journal of Experimental Botany, 66:513–531. Hartl, D. L. and Clark, A. G. 1998. Principles of population genetics. Hawksworth, D., Kirk, P., Sutton, B., and Pegler, D., editors. 1995. Ainsworth & Bisby’s Dictionary of the Fungi. 8th ed edition. Henikoff, S., Till, B. J., and Comai, L. 2004. TILLING. Traditional mutagenesis meets functional genomics. Plant Physiology, 135:630–636. Higuchi, R., Fockler, C., Dollinger, G., and Watson, R. 1993. Kinetic PCR analysis: real-time monitoring of DNA amplification reactions. Biotechnology, 11:1026– 1030. Hogenhout, S. A., Van der Hoorn, R. A., Terauchi, R., and Kamoun, S. 2009. Emerging concepts in effector biology of plant-associated organisms. Molecular Plant-Microbe Interactions, 22:115–122. BIBLIOGRAPHY 262 Holland, N. T., Smith, M. T., Eskenazi, B., and Bastaki, M. 2003. Biological sample collection and processing for molecular epidemiological studies. Mutation Research/Reviews in Mutation Research, 543:217–234. Hovmøller, M. S. and Justesen, A. F. 2007a. Rates of evolution of avirulence phenotypes and DNA markers in a northwest European population of Puccinia striiformis f. sp. tritici: Clonal evolution of virulence. Molecular Ecology, 16: 4637–4647. Hovmøller, M. S., Justesen, A. F., and Brown, J. K. M. 2002. Clonality and long- distance migration of Puccinia striiformis f.sp. tritici in north-west Europe. Plant Pathology, 51:24–32. Hovmøller, M. S., Yahyaoui, A. H., Milus, E. A., and Justesen, A. F. 2008. Rapid global spread of two aggressive strains of a wheat rust fungus. Molecular Ecology, 17:3818–3826. Hovmøller, M. S., Walter, S., and Justesen, A. F. 2010. Escalating threat of wheat rusts. Science, 329:369–369. Hovmøller, M. S., Walter, S., Bayles, R. A., Hubbard, A., Flath, K., Sommerfeldt, N., Leconte, M., Czembor, P., Rodriguez-Algaba, J., Thach, T., Hansen, J. G., Lassen, P., Justesen, A. F., Ali, S., and de Vallavieille-Pope, C. 2016. Replacement of the European wheat yellow rust population by new races from the centre of diversity in the near-Himalayan region. Plant Pathology, 65:402–411. Hovmøller, M. S. and Justesen, A. F. 2007b. Appearance of atypical Puccinia striiformis f. sp. tritici phenotypes in north-western Europe. Australian Journal of Agricultural Research, 58:518–524. Huang, X., Feng, H., and Kang, Z. 2012. Selection of reference genes for quantita- tive real-time PCR normalization in Puccinia striiformis f. sp. tritici. Journal of Agricultural Biotechnology, 20:181–187. Hubbard, A., Pritchard, L., E, C., and S, H., 2014. United Kingdom Cereal Pathogen Virulence Survey. Annual Report. URL https://cereals.ahdb. org.uk/media/1131354/Annual-Report-UKCPVS-2014.pdf. [Online; accessed 20/01/2018]. Hubbard, A., Lewis, C. M., Yoshida, K., Ramirez-Gonzalez, R. H., de Vallavieille- Pope, C., Thomas, J., Kamoun, S., Bayles, R., Uauy, C., and Saunders, D. 2015. Field pathogenomics reveals the emergence of a diverse wheat yellow rust population. Genome Biology, 16:23. Huerta-Espino, J., Singh, R. P., Germán, S., McCallum, B. D., Park, R. F., Chen, W. Q., Bhardwaj, S. C., and Goyeau, H. 2011. Global status of wheat leaf rust caused by Puccinia triticina. Euphytica, 179:143–160. BIBLIOGRAPHY 263 Huggett, J. F., Foy, C. A., Benes, V., Emslie, K., Garson, J. A., Haynes, R., Helle- mans, J., Kubista, M., Mueller, R. D., and Nolan, T. 2013. The digital MIQE guidelines: minimum information for publication of quantitative digital PCR experiments. Clinical Chemistry, 59:892–902. Hussein, S. and Pretorius, Z. A. 2005. Leaf and stripe rust resistance among Ethiopian grown wheat varieties and lines. SINET: Ethiopian Journal of Science, 28:23–32. IndexMundi, 2017. South Africa wheat imports by year. URL https://www.indexmundi.com/agriculture/?country=za&commodity= wheat&graph=imports. [Online; accessed 20/01/2018]. ITA USDC, 2017. South Africa - agricultural sector. URL https://www.export. gov/article?id=South-Africa-agricultural-equipment. [Online; accessed 20/01/2018]. Jain, M., Nijhawan, A., Tyagi, A. K., and Khurana, J. P. 2006. Validation of housekeeping genes as internal control for studying gene expression in rice by quantitative real-time PCR. Biochemical and Biophysical Research Communications, 345:646–651. Jia, F., Lo, N., and Ho, S. Y. W. 2014. The impact of modelling rate heterogeneity among sites on phylogenetic estimates of intraspecific evolutionary rates and timescales. PLoS ONE, 9:e95722. Jiao, M., Tan, C., Wang, L., Guo, J., Zhang, H., Kang, Z., and Guo, J. 2017. Basidiospores of Puccinia striiformis f. sp. tritici succeed to infect barberry, while urediniospores are blocked by non-host resistance. Protoplasma, 254:2237–2246. Jin, Y. 2011. Role of Berberis spp. as alternate hosts in generating new races of Puccinia graminis and P. striiformis. Euphytica, 179:105–108. Jin, Y., Szabo, L. J., and Carson, M. 2010. Century-old mystery of Puccinia striiformis life history solved with the identification of Berberis as an alternate host. Phytopathology, 100:432–435. Johnson, R. 1978. Induced resistance to fungal diseases with special reference to yellow rust of wheat. Annals of Applied Biology, 89:107–110. Johnson, R., Stubbs, R., Fuchs, E., and Chamberlain, N. 1972. Nomenclature for physiologic races of Puccinia striiformis infecting wheat. Transactions of the British Mycological Society, 58:475–480. Joly, D. L., Feau, N., Tanguay, P., and Hamelin, R. C. 2010. Comparative analysis of secreted protein evolution using expressed sequence tags from four poplar leaf rusts (Melampsora spp.). BMC Genomics, 11:422. BIBLIOGRAPHY 264 Jombart, T., Devillard, S., and Balloux, F. 2010. Discriminant analysis of prin- cipal components: a new method for the analysis of genetically structured populations. BMC Genetics, 11:94. Jones, J. D. and Dangl, J. L. 2006. The plant immune system. Nature, 444:323–329. Justesen, A. F., Ridout, C. J., and Hovmøller, M. S. 2002. The recent history of Puccinia striiformis f.sp. tritici in Denmark as revealed by disease incidence and AFLP markers. Plant Pathology, 51:13–23. Kamoun, S. 2007. Groovy times: filamentous pathogen effectors revealed. Current Opinion in Plant Biology, 10:358–365. Kang, Z. 2017. Stripe rust. New York, NY. Karlen, Y., McNair, A., Perseguers, S., Mazza, C., and Mermod, N. 2007. Statistical significance of quantitative PCR. BMC Bioinformatics, 8:131. Keet, J.-H., 2015. The invasion potential of selected Berberis species in South Africa. PhD thesis, University of the Free State. Kim, D., Alptekin, B., and Budak, H. 2018. CRISPR/Cas9 genome editing in wheat. Functional & Integrative Genomics, 18:31–41. Kimura, M. and Ohta, T. 1969. The average number of generations until fixation of a mutant gene in a finite population. Genetics, 61:763. Kiran, K., Rawal, H. C., Dubey, H., Jaswal, R., Devanna, B., Gupta, D. K., Bhard- waj, S. C., Prasad, P., Pal, D., Chhuneja, P., Balasubramanian, P., Kumar, J., Swami, M., Solanke, A. U., Gaikwad, K., Singh, N. K., and Sharma, T. R. 2016. Draft genome of the wheat rust pathogen (Puccinia triticina) unravels genome- wide structural variations during evolution. Genome Biology and Evolution, 8: 2702–2721. Kiran, K., Rawal, H. C., Dubey, H., Jaswal, R., Bhardwaj, S. C., Prasad, P., Pal, D., Devanna, B. N., and Sharma, T. R. 2017. Dissection of genomic features and variations of three pathotypes of Puccinia striiformis through whole genome sequencing. Scientific Reports, 7:42419. Kirk, P., Cannon, P., Minter, D., and Stalpers, J., editors. 2008. Dictionary of the Fungi. 10th ed edition. Klug, W. S., editor. 2012. Concepts of Genetics. San Francisco, 10th ed edition. Knott, D. 1989. Introduction. The Wheat Rusts-Breeding for Resistance (Monograph on Theoretical and Applied Genetics, volume 12. Kolmer, J. A. 2005. Tracking wheat rust on a continental scale. Current Opinion in Plant Biology, 8:441–449. BIBLIOGRAPHY 265 Kubista, M., Andrade, J. M., Bengtsson, M., Forootan, A., Jonák, J., Lind, K., Sindelka, R., Sjöback, R., Sjögreen, B., Strömbom, L., Ståhlberg, A., and Zoric, N. 2006. The real-time polymerase chain reaction. Molecular Aspects of Medicine, 27:95–125. Langmead, B., Trapnell, C., Pop, M., and Salzberg, S. L. 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology, 10:R25. Lee, W.-S., Hammond-Kosack, K. E., and Kanyuka, K. 2012. Barley stripe mosaic virus-mediated tools for investigating gene function in cereal plants and their pathogens: virus-induced gene silencing, host-mediated gene silencing, and virus-mediated overexpression of heterologous protein. Plant Physiology, 160: 582–590. Lei, Y., Wang, M., Wan, A., Xia, C., See, D. R., Zhang, M., and Chen, X. 2017. Viru- lence and molecular characterization of experimental isolates of the stripe rust pathogen (Puccinia striiformis) indicate somatic recombination. Phytopathology, 107:329–344. Leonard, K. J. and Szabo, L. J. 2005. Stem rust of small grains and grasses caused by Puccinia graminis. Molecular Plant Pathology, 6:99–111. Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R., and 1000 Genome Project Data Processing Subgroup. 2009. The Sequence Alignment/Map format and SAMtools. Bioinformatics, 25: 2078–2079. Li, H. and Durbin, R. 2009. Fast and accurate short read alignment with Burrows- Wheeler transform. Bioinformatics, 25. Li, W. H., Wu, C. I., and Luo, C. C. 1985. A new method for estimating syn- onymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes. Molecular Biology and Evolution, 2:150–174. Librado, P. and Rozas, J. 2009. DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics, 25:1451–1452. Ling, D., Pike, C. J., and Salvaterra, P. M. 2012. Deconvolution of the confounding variations for reverse transcription quantitative real-time polymerase chain reaction by separate analysis of biological replicate data. Analytical Biochemistry, 427:21–25. Ling, P., Wang, M., Chen, X., and Campbell, K. 2007. Construction and char- acterization of a full-length cDNA library for the wheat stripe rust pathogen (Puccinia striiformis f. sp. tritici). BMC Genomics, 8:145. BIBLIOGRAPHY 266 Little, R. and Manners, J. G. 1969. Somatic recombination in yellow rust of wheat (Puccinia striiformis): II. Germ tube fusions, nuclear number and nuclear size. Transactions of the British Mycological Society, 53:251–258. Liu, C., Pedersen, C., Schultz-Larsen, T., Aguilar, G. B., Madriz-Ordeñana, K., Hovmøller, M. S., and Thordal-Christensen, H. 2016. The stripe rust fun- gal effector PEC6 suppresses pattern-triggered immunity in a host species- independent manner and interacts with adenosine kinases. New Phytologist, pages 1–13. Livak, K. J. and Schmittgen, T. D. 2001. Analysis of relative gene expression data using real-time quantitative PCR and the 2−∆∆CT method. Methods, 25:402–408. Lorrain, C., Hecker, A., and Duplessis, S. 2015. Effector-mining in the poplar rust fungus Melampsora larici-populina secretome. Frontiers in Plant Science, 6:1051. Lowe, I., Cantu, D., and Dubcovsky, J. 2011. Durable resistance to the wheat rusts: Integrating systems biology and traditional phenotype-based research methods to guide the deployment of resistance genes. Euphytica, 179:69–79. Ma, J., Huang, X., Wang, X., Chen, X., Qu, Z., Huang, L., and Kang, Z. 2009. Identification of expressed genes during compatible interaction between stripe rust (Puccinia striiformis) and wheat using a cDNA library. BMC Genomics, 10: 586. Maddison, A. C. and Manners, J. G. 1972. Sunlight and viability of cereal rust uredospores. Transactions of the British Mycological Society, 59:429–443. Malinovsky, F. G., Fangel, J. U., and Willats, W. G. 2014. The role of the cell wall in plant immunity. Frontiers in Plant Science, 5:178. Mallard, S., Gaudet, D., Aldeia, A., Abelard, C., Besnard, A. L., Sourdille, P., and Dedryver, F. 2005. Genetic analysis of durable resistance to yellow rust in bread wheat. Theoretical and Applied Genetics, 110:1401–1409. Mandiyan, V., Andreev, J., Schlessinger, J., and Hubbard, S. R. 1999. Crystal structure of the ARF-GAP domain and ankyrin repeats of PYK2-associated protein β. The EMBO Journal, 18:6890–6898. Mao, F., Leung, W.-Y., and Xin, X. 2007. Characterization of EvaGreen and the implication of its physicochemical properties for qPCR applications. BMC Biotechnology, 7:76. Markell, S. and Milus, E. 2008. Emergence of a novel population of Puccinia striiformis f. sp. tritici in eastern United States. Phytopathology, 98:632–639. Mboup, M., Leconte, M., Gautier, A., Wan, A., Chen, W., de Vallavieille-Pope, C., and Enjalbert, J. 2009. Evidence of genetic recombination in wheat yellow rust populations of a Chinese oversummering area. Fungal Genetics and Biology, 46: 299–307. BIBLIOGRAPHY 267 McDonald, B. A. 2004. Population genetics of plant pathogens. The Plant Health Instructor. URL http://www.apsnet.org/edcenter/advanced/topics/ PopGenetics/Pages/default.aspx. [Online; accessed 20/01/2018]. McDonald, B. A. and Linde, C. 2002. Pathogen population genetics, evolutionary potential, and durable resistance. Annual Review of Phytopathology, 40:349–379. McDonald, J. H. and Kreitman, M. 1991. Adaptive protein evolution at the Adh locus in Drosophila. Nature, 351:652. McIntosh, R. A. A catalogue of gene symbols for wheat. In Proceedings of the 6th International Wheat Genetics Symposium, Kyoto, Japan, 1983. McIntosh, R. A., Wellings, C. R., and Park, R. F. 1995. Wheat rusts: an atlas of resistance genes. Melbourne. Mehta, D., Menke, A., and Binder, E. B. 2010. Gene expression studies in major depression. Current Psychiatry Reports, 12:135–144. Mendgen, K., Struck, C., Voegele, R. T., and Hahn, M. 2000. Biotrophy and rust haustoria. Physiological and Molecular Plant Pathology, 56:141–145. Milus, E., Seyran, E., and McNew, R. 2006. Aggressiveness of Puccinia striiformis f. sp. tritici isolates in the south-central United States. Plant Disease, 90:847–852. Milus, E. A., Kristensen, K., and Hovmøller, M. S. 2009. Evidence for increased aggressiveness in a recent widespread strain of Puccinia striiformis f. sp. tritici causing stripe rust of wheat. Phytopathology, 99:89–94. Miyata, T. and Yasunaga, T. 1980. Molecular evolution of mRNA: a method for estimating evolutionary rates of synonymous and amino acid substitutions from homologous nucleotide sequences and its application. Journal of Molecular Evolution, 16:23–36. Moldenhauer, J., Moerschbacher, B. M., and van der Westhuizen, A. J. 2006. Histo- logical investigation of stripe rust (Puccinia striiformis f.sp. tritici) development in resistant and susceptible wheat cultivars. Plant Pathology, 55:469–474. Murphy, C. L. and Polak, J. M. 2002. Differentiating embryonic stem cells: GAPDH, but neither HPRT nor β-tubulin is suitable as an internal standard for measuring RNA levels. Tissue Engineering, 8:551–559. Naccache, S. N., Federman, S., Veeraraghavan, N., Zaharia, M., Lee, D., Samayoa, E., Bouquet, J., Greninger, A. L., Luk, K.-C., Enge, B., Wadford, D. A., Messenger, S. L., Genrich, G. L., Pellegrino, K., Grard, G., Leroy, E., Schneider, B. S., Fair, J. N., Martinez, M. A., Isa, P., Crump, J. A., DeRisi, J. L., Sittler, T., Hackett, J., Miller, S., and Chiu, C. Y. 2014. A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples. Genome Research, 24:1180–1192. BIBLIOGRAPHY 268 Nei, M. and Gojobori, T. 1986. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Molecular Biology and Evolution, 3:418–426. Niks, R. E. 1989. Morphology of infection structures of Puccinia striiformis var. dactylidis. European Journal of Plant Pathology, 95:171–175. Oerke, E.-C. and Dehne, H.-W. 2004. Safeguarding production—losses in major crops and the role of crop protection. Crop Protection, 23:275–285. Olsen, O., Wang, X., and von Wettstein, D. 1993. Sodium azide mutagenesis: pref- erential generation of AT–> GC transitions in the barley Ant18 gene. Proceedings of the National Academy of Sciences, 90:8043–8047. Panstruga, R. and Dodds, P. N. 2009. Terrific protein traffic: The mystery of effector protein delivery by filamentous plant pathogens. Science, 324:748–750. Panwar, V. and Bakkeren, G. 2017. Investigating gene function in cereal rust fungi by plant-mediated virus-induced gene silencing. In Wheat Rust Diseases, volume 1659, pages 115–124. New York, NY. Parker, I. M. and Gilbert, G. S. 2004. The evolutionary ecology of novel plant- pathogen interactions. Annual Review of Ecology, Evolution, and Systematics, 35: 675–700. Parlevliet, J. E. 2002. Durability of resistance against fungal, bacterial and viral pathogens; present situation. Euphytica, 124:147–156. Persoons, A., Morin, E., Delaruelle, C., Payen, T., Halkett, F., Frey, P., De Mita, S., and Duplessis, S. 2014. Patterns of genomic variation in the poplar rust fungus Melampsora larici-populina identify pathogenesis-related factors. Frontiers in Plant Science, 5:450. Petre, B., Saunders, D. G. O., Sklenar, J., Lorrain, C., Win, J., Duplessis, S., and Kamoun, S. 2015. Candidate effector proteins of the rust pathogen Melampsora larici-populina target diverse plant cell compartments. Molecular Plant-Microbe Interactions, 28:689–700. Petre, B., Lorrain, C., Saunders, D. G., Win, J., Sklenar, J., Duplessis, S., and Kamoun, S. 2016a. Rust fungal effectors mimic host transit peptides to translo- cate into chloroplasts: Effectors use molecular mimicry to target chloroplasts. Cellular Microbiology, 18:453–465. Petre, B., Saunders, D. G. O., Sklenar, J., Lorrain, C., Krasileva, K. V., Win, J., Duplessis, S., and Kamoun, S. 2016b. Heterologous expression screens in Nicotiana benthamiana identify a candidate effector of the wheat yellow rust pathogen that associates with processing bodies. PLoS ONE, 11:e0149035. Pfaffl, M. W. 2001. A new mathematical model for relative quantification in real-time RT–PCR. Nucleic Acids Research, 29:e45–e45. BIBLIOGRAPHY 269 Pfeifer, G., You, Y., and Besaratinia, A. 2005. Mutations induced by ultraviolet light. Mutation Research/Fundamental and Molecular Mechanisms of Mutagenesis, 571:19–31. Pretorius, Z. A., Boshoff, W. H. P., and Kema, G. H. J. 1997. First report of Puccinia striiformis f. sp. tritici on wheat in South Africa. Plant Disease, 81:424–424. Pretorius, Z. A., Pakendorf, K. W., Marais, G. F., Prins, R., and Komen, J. S. 2007. Challenges for sustainable cereal rust control in South Africa. Australian Journal of Agricultural Research, 58:593. Pretorius, Z., Bender, C., and Visser, B. 2015. The rusts of wild rye in South Africa. South African Journal of Botany, 96:94–98. Prins, R. and Agenbag, G., 2013. The establishment of a molecular service labora- tory for wheat breeding in south africa. Poster presentation: 12th International Wheat Genetics Symposium, Yokohama, Japan. Prins, R., Pretorius, Z. A., Bender, C. M., and Lehmensiek, A. 2011. QTL mapping of stripe, leaf and stem rust resistance genes in a Kariega × Avocet S doubled haploid wheat population. Molecular Breeding, 27:259–270. Pritchard, J. K., Stephens, M., and Donnelly, P. 2000. Inference of population structure using multilocus genotype data. Genetics, 155:945–959. Pryce-Jones, E., Carver, T. I. M., and Gurr, S. J. 1999. The roles of cellulase enzymes and mechanical force in host penetration by Erysiphe graminis f. sp. hordei. Physiological and Molecular Plant Pathology, 55:175–182. Quinlan, A. R. and Hall, I. M. 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics, 26:841–842. Rambaut, A. and Grass, N. C. 1997. Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Bioinformatics, 13:235–238. Ramburan, V. P., Pretorius, Z. A., Louw, J. H., Boyd, L. A., Smith, P. H., Boshoff, W. H. P., and Prins, R. 2004. A genetic analysis of adult plant resistance to stripe rust in the wheat cultivar Kariega. Theoretical and Applied Genetics, 108: 1426–1433. Rao, H. S. and Sears, E. 1964. Chemical mutagenesis in Triticum aestivum. Mutation Research/Fundamental and Molecular Mechanisms of Mutagenesis, 1:387–399. Rapilly, F. 1979. Yellow rust epidemiology. Annual Review of Phytopathology, 17: 59–73. Ray, D. K., Mueller, N. D., West, P. C., and Foley, J. A. 2013. Yield trends are insufficient to double global crop production by 2050. PLoS ONE, 8:e66428. BIBLIOGRAPHY 270 Rodriguez-Algaba, J., Walter, S., Sørensen, C. K., Hovmøller, M. S., and Justesen, A. F. 2014. Sexual structures and recombination of the wheat rust fungus Puccinia striiformis on Berberis vulgaris. Fungal Genetics and Biology, 70:77–85. Roelfs, A. P., Singh, R. P., and Saari, E. E. 1992. Rust Diseases of Wheat: Concepts and Methods of Disease Management. Roelfs, A. P. and Hettel, G. 1992. Rust diseases of wheat. Rousset, F. 2008. GENEPOP’007: a complete re-implementation of the GENEPOP software for Windows and Linux. Molecular Ecology Resources, 8:103–106. Rovenich, H., Boshoven, J. C., and Thomma, B. P. 2014. Filamentous pathogen effector functions: of pathogens, hosts and microbiomes. Current Opinion in Plant Biology, 20:96–103. Ruijter, J. M., Pfaffl, M. W., Zhao, S., Spiess, A. N., Boggy, G., Blom, J., Rutledge, R. G., Sisti, D., Lievens, A., and De Preter, K. 2013. Evaluation of qPCR curve analysis methods for reliable biomarker discovery: Bias, resolution, precision, and implications. Methods, 59:32–46. Rutledge, R. G. and Cote, C. 2003. Mathematics of quantitative kinetic PCR and the application of standard curves. Nucleic Acids Research, 31:e93–e93. SAGL, 2012. The Southern African Grain Laboratory NPC: South African Winter Cereal Production. URL http://www.sagl.co.za/Portals/0/Wheat%20crop% 202011%202012/Average%20yield%20per%20province.pdf. [Online; accessed 20/01/2018]. Salcedo, A., Rutter, W., Wang, S., Akhunova, A., Bolus, S., Chao, S., Anderson, N., Soto, M. F. D., Rouse, M., Szabo, L., Bowden, R. L., Dubcovsky, J., and Akhunov, E. 2017. Variation in the AvrSr35 gene determines Sr35 resistance against wheat stem rust race Ug99. Science, 358:1604–1606. Salemi, M., Vandamme, A.-M., and Lemey, P. 2009. The phylogenetic handbook: a practical approach to phylogenetic analysis and hypothesis testing. Saunders, D. G. O., Win, J., Cano, L. M., Szabo, L. J., Kamoun, S., and Raffaele, S. 2012. Using hierarchical clustering of secreted protein families to classify and rank candidate effectors of rust fungi. PLoS ONE, 7:e29847. Scally, A. 2016. The mutation rate in human evolution and demographic inference. Current Opinion in Genetics & Development, 41:36–43. Schlötterer, C. 2004. The evolution of molecular markers—just a matter of fashion? Nature Reviews Genetics, 5:63–69. Schmidt, G. W. and Delaney, S. K. 2010. Stable internal reference genes for normal- ization of real-time RT-PCR in tobacco (Nicotiana tabacum) during development and abiotic stress. Molecular Genetics and Genomics, 283:233–241. BIBLIOGRAPHY 271 Schmittgen, T. D. and Livak, K. J. 2008. Analyzing real-time PCR data by the comparative CT method. Nature Protocols, 3:1101–1108. Schumann, G. L. and Leonard, K. 2000. Stem rust of wheat (black rust). The Plant Health Instructor. URL https://www.apsnet.org/edcenter/intropp/ lessons/fungi/Basidiomycetes/Pages/StemRust.aspx. Schwessinger, B., Sperschneider, J., Cuddy, W. S., Garnica, D. P., Miller, M. E., Taylor, J. M., Dodds, P. N., Figueroa, M., Park, R. F., and Rathjen, J. P. 2018. A near-complete haplotype-phased genome of the dikaryotic wheat stripe rust fungus Puccinia striiformis f. sp. tritici reveals high interhaplotype diversity. mBio, 9:e02275–17. Selitrennikoff, C. P. 2001. Antifungal proteins. Applied and Environmental Microbi- ology, 67:2883–2894. Sharma, I. 2012. Disease resistance in wheat, volume 1. Sharma-Poudyal, D., Chen, X. M., Wan, A. M., Zhan, G. M., Kang, Z. S., Cao, S. Q., Jin, S. L., Morgounov, A., Akin, B., and Mert, Z. 2013. Virulence characterization of international collections of the wheat stripe rust pathogen, Puccinia striiformis f. sp. tritici. Plant Disease, 97:379–386. Sharp, E. L. 1967. Atmospheric ions and germination of uredospores of Puccinia striiformis. Science, 156:1359–1360. Shaw, M. W. and Osborne, T. M. 2011. Geographic distribution of plant pathogens in response to climate change. Plant Pathology, 60:31–43. Shiferaw, B., Kassie, M., Jaleta, M., and Yirga, C. 2014. Adoption of improved wheat varieties and impacts on household food security in Ethiopia. Food Policy, 44:272–284. Simbolo, M., Gottardi, M., Corbo, V., Fassan, M., Mafficini, A., Malpeli, G., Lawlor, R. T., and Scarpa, A. 2013. DNA qualification workflow for next generation sequencing of histopathological samples. PLoS ONE, 8:1–8. Simmonds, N. W. 1991. Genetics of horizontal resistance to diseases of crops. Biological Reviews, 66:189–241. Smit, H., Tolmay, V., Barnard, A., Jordaan, J., Koekemoer, F., Otto, W., Pretorius, Z., Purchase, J., and Tolmay, J. 2010. An overview of the context and scope of wheat ( Triticum aestivum ) research in South Africa from 1983 to 2008. South African Journal of Plant and Soil, 27:81–96. Speed, T. 2004. Statistics and gene expression analysis. Biostatistical Genetics and Genetic Epidemiology, pages 1–13. Stamatakis, A. 2014. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics, 30:1312–1313. BIBLIOGRAPHY 272 Steele, K. A., Humphreys, E., Wellings, C. R., and Dickinson, M. J. 2001. Support for a stepwise mutation model for pathogen evolution in Australasian Puccinia striiformis f.sp. tritici by use of molecular markers. Plant Pathology, 50:174–180. Stergiopoulos, I. and de Wit, P. J. 2009. Fungal effector proteins. Annual Review of Phytopathology, 47:233–263. Stotz, H. U., Mitrousia, G. K., de Wit, P. J., and Fitt, B. D. 2014. Effector-triggered defence against apoplastic fungal pathogens. Trends in Plant Science, 19:491–500. Stubbs, R. W. 1988. Pathogenicity analysis of yellow (stripe) rust of wheat and its significance in a global context. Stubbs, R. 1985. Stripe rust. In Diseases, Distribution, Epidemiology, and Control, pages 61–101. Szabo, L. J. and Bushnell, W. R. 2001. Hidden robbers: the role of fungal haustoria in parasitism of plants. Proceedings of the National Academy of Sciences of the United States of America, 98:7654–7655. Sørensen, C. K., Justesen, A. F., and Hovmøller, M. S. 2012. 3-D imaging of temporal and spatial development of Puccinia striiformis haustoria in wheat. Mycologia, 104:1381–1389. Tamura, K., Stecher, G., Peterson, D., Filipski, A., and Kumar, S. 2013. MEGA6: Molecular evolutionary genetics analysis version 6.0. Molecular Biology and Evolution, 30:2725–2729. Taylor, S., Wakem, M., Dijkman, G., Alsarraj, M., and Nguyen, M. 2010. A practical approach to RT-qPCR—publishing data that conform to the MIQE guidelines. Methods, 50:S1–S5. Taylor, S. C. and Mrkusich, E. M. 2014. The state of RT-quantitative PCR: firsthand observations of implementation of minimum information for the publication of quantitative real-time PCR experiments (MIQE). Journal of Molecular Microbiology and Biotechnology, 24:46–52. Thach, T., Ali, S., Justesen, A., Rodriguez-Algaba, J., and Hovmøller, M. 2015. Recovery and virulence phenotyping of the historic ‘Stubbs collection’ of the yellow rust fungus Puccinia striiformis from wheat: Long-term storage of rust fungi. Annals of Applied Biology, 167:314–326. Thach, T., Ali, S., de Vallavieille-Pope, C., Justesen, A., and Hovmøller, M. 2016. Worldwide population structure of the wheat rust fungus Puccinia striiformis in the past. Fungal Genetics and Biology, 87:1–8. Thellin, O., Zorzi, W., Lakaye, B., De Borman, B., Coumans, B., Hennen, G., Grisar, T., Igout, A., and Heinen, E. 1999. Housekeeping genes as internal standards: use and limits. Journal of Biotechnology, 75:291–295. BIBLIOGRAPHY 273 Thorvaldsdóttir, H., Robinson, J. T., and Mesirov, J. P. 2013. Integrative genomics viewer (IGV): high-performance genomics data visualization and exploration. Briefings in Bioinformatics, 14:178–192. Tomancak, P., Berman, B. P., Beaton, A., Weiszmann, R., Kwan, E., Hartenstein, V., Celniker, S. E., and Rubin, G. M. 2007. Global analysis of patterns of gene expression during Drosophila embryogenesis. Genome Biology, 8:R145. Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D., Kelley, D. R., Pimentel, H., Salzberg, S. L., Rinn, J. L., and Pachter, L. 2012. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nature Protocols, 7:562–578. United Nations. World population prospects: The 2017 revision, key findings and advance tables. Technical report, United Nations, Department of Economic and Social Affairs, Population Division, 2017. Upadhyaya, N. M., Mago, R., Staskawicz, B. J., Ayliffe, M. A., Ellis, J. G., and Dodds, P. N. 2013. A bacterial type III secretion assay for delivery of fungal effector proteins into wheat. Molecular Plant-Microbe Interactions, 27:255–264. van der Hoorn, R. A. and Kamoun, S. 2008. From guard to decoy: A new model for perception of plant pathogen effectors. The Plant Cell Online, 20:2009–2017. Van der Plank, J. 1968. Disease resistance in plants. Van Niekerk, H. 2001. Southern Africa wheat pool. In The World Wheat Book: The History of Wheat Breeding. VanGuilder, H. D., Vrana, K. E., and Freeman, W. M. 2008. Twenty-five years of quantitative PCR for gene expression analysis. Biotechniques, 44:619. Vieira, M. L. C., Santini, L., Diniz, A. L., and Munhoz, C. d. F. 2016. Microsatellite markers: what they mean and why they are so useful. Genetics and Molecular Biology, 39:312–328. Visser, B., Herselman, L., and Pretorius, Z. A. 2016. Microsatellite characterisation of South African Puccinia striiformis races. South African Journal of Plant and Soil, 33:161–166. Vos, P., Hogers, R., Bleeker, M., Reijans, M., Van de Lee, T., Hornes, M., Friters, A., Pot, J., Paleman, J., and Kuiper, M. 1995. AFLP: a new technique for DNA fingerprinting. Nucleic Acids Research, 23:4407–4414. Wahl, I., Anikster, Y., Manisterski, J., and Segal, A. 1984. Evolution at the center of origin, volume 1. Walter, S., Ali, S., Kemen, E., Nazari, K., Bahri, B. A., Enjalbert, J., Hansen, J. G., Brown, J. K., Sicheritz-Pontén, T., Jones, J., de Vallavieille-Pope, C., Hovmøller, M. S., and Justesen, A. F. 2016. Molecular markers for tracking the origin and BIBLIOGRAPHY 274 worldwide distribution of invasive strains of Puccinia striiformis. Ecology and Evolution, 6:2790–2804. Wang, B., Sun, Y., Song, N., Zhao, M., Liu, R., Feng, H., Wang, X., and Kang, Z. 2017. Puccinia striiformis f. sp. tritici microRNA-like RNA 1 ( Pst -milR1), an important pathogenicity factor of Pst , impairs wheat resistance to Pst by suppressing the wheat pathogenesis-related 2 gene. New Phytologist, 215: 338–350. Wang, C.-F., Huang, L.-L., Buchenauer, H., Han, Q.-M., Zhang, H.-C., and Kang, Z.-S. 2007. Histochemical studies on the accumulation of reactive oxygen species (O−2 and H2O2) in the incompatible and compatible interaction of wheat—Puccinia striiformis f. sp. tritici. Physiological and Molecular Plant Pathol- ogy, 71:230–239. Wang, M. and Chen, X. 2013. First report of oregon grape (Mahonia aquifolium) as an alternate host for the wheat stripe rust pathogen (Puccinia striiformis f. sp. tritici) under artificial inoculation. Plant Disease, 97:839–839. Wang, X., Tang, C., Zhang, G., Li, Y., Wang, C., Liu, B., Qu, Z., Zhao, J., Han, Q., Huang, L., Chen, X., and Kang, Z. 2009. cDNA-AFLP analysis reveals differential gene expression in compatible interaction of wheat challenged with Puccinia striiformis f. sp. tritici. BMC Genomics, 10:289. Waterhouse, A. M., Procter, J. B., Martin, D. M., Clamp, M., and Barton, G. J. 2009. Jalview version 2—a multiple sequence alignment editor and analysis workbench. Bioinformatics, 25:1189–1191. Wellings, C. R. 2007. Puccinia striiformis in Australia: a review of the incursion, evolution, and adaptation of stripe rust in the period 1979–2006. Australian Journal of Agricultural Research, 58:567. Wellings, C. R., McIntosh, R. A., and Walker, J. 1987. Puccinia striiformis f.sp. tritici in Eastern Australia possible means of entry and implications for plant quarantine. Plant Pathology, 36:239–241. Wellings, C. R., McIntosh, R. A., and Hussain, M. 1988. A new source of resistance to Puccinia striiformis f. sp. tritici in spring wheats (Triticum aestivum). Plant Breeding, 100:88–96. Wellings, C. R. 2011. Global status of stripe rust: a review of historical and current threats. Euphytica, 179:129–141. Willems, E., Leyns, L., and Vandesompele, J. 2008. Standardization of real-time PCR gene expression data from independent biological replicates. Analytical Biochemistry, 379:127–129. Winter, B. 2013. Linear models and linear mixed effects models in R with linguistic applications. arXiv preprint arXiv:1308.5499. BIBLIOGRAPHY 275 Wittwer, C. T., Herrmann, M. G., Moss, A. A., and Rasmussen, R. P. 1997. Contin- uous fluorescence monitoring of rapid cycle DNA amplification. Biotechniques, 22:130–139. Yang, Z. 2007. PAML 4: Phylogenetic analysis by maximum likelihood. Molecular Biology and Evolution, 24:1586–1591. Yang, Z. and Nielsen, R. 2000. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Molecular Biology and Evolution, 17:32–43. Yin, C. and Hulbert, S. 2015. Host induced gene silencing (HIGS), a promising strategy for developing disease resistant crops. Gene Technology, 04:130. Yoshida, K., Saitoh, H., Fujisawa, S., Kanzaki, H., Matsumura, H., Yoshida, K., Tosa, Y., Chuma, I., Takano, Y., Win, J., Kamoun, S., and Terauchi, R. 2009. Association genetics reveals three novel avirulence genes from the rice blast fungal pathogen magnaporthe oryzae. The Plant Cell, 21:1573–1591. Yoshida, K., Schuenemann, V. J., Cano, L. M., Pais, M., Mishra, B., Sharma, R., Lanz, C., Martin, F. N., Kamoun, S., Krause, J., Thines, M., Weigel, D., and Burbano, H. A. 2013. The rise and fall of the Phytophthora infestans lineage that triggered the Irish potato famine. eLife, 2. Yuan, J. S., Reed, A., Chen, F., and Stewart, C. N. 2006. Statistical analysis of real-time PCR data. BMC Bioinformatics, 7:85. Zadoks, J. C. 1961. Yellow rust on wheat studies in epidemiology and physiologic specialization. European Journal of Plant Pathology, 67:69–256. Zadoks, J., Chang, T., and Konzak, C. 1974. A decimal code for the growth stages of cereals. Weed research, 14:415–421. Zhang, Y., Qu, Z., Zheng, W., Liu, B., Wang, X., Xue, X., Xu, L., Huang, L., Han, Q., Zhao, J., and Kang, Z. 2008. Stage-specific gene expression during urediniospore germination in Puccinia striiformis f. sp tritici. BMC Genomics, 9: 203. Zhao, J., Zhang, H., Yao, J., Huang, L., and Kang, Z. 2011. Confirmation of Berberis spp. as alternate hosts of Puccinia striiformis f. sp. tritici on wheat in China. Mycosystema, 30:895–900. Zhao, J., Wang, L., Wang, Z., Chen, X., Zhang, H., Yao, J., Zhan, G., Chen, W., Huang, L., and Kang, Z. 2013. Identification of eighteen Berberis species as alternate hosts of Puccinia striiformis f. sp. tritici and virulence variation in the pathogen isolates from natural infection of barberry plants in China. Phytopathology, 103:927–934. BIBLIOGRAPHY 276 Zheng, W., Huang, L., Huang, J., Wang, X., Chen, X., Zhao, J., Guo, J., Zhuang, H., Qiu, C., Liu, J., Liu, H., Huang, X., Pei, G., Zhan, G., Tang, C., Cheng, Y., Liu, M., Zhang, J., Zhao, Z., Zhang, S., Han, Q., Han, D., Zhang, H., Zhao, J., Gao, X., Wang, J., Ni, P., Dong, W., Yang, L., Yang, H., Xu, J.-R., Zhang, G., and Kang, Z. 2013. High genome heterozygosity and endemic genetic recombination in the wheat stripe rust fungus. Nature Communications, 4:2673. Zillinsky, F. J. 1983. Common Diseases of Small Grain Cereals. A Guide to Identification.