Patterton, Hugh-G.Mayne, Shannon L. N.2015-11-112015-11-11201320132013http://hdl.handle.net/11660/1590English: Multi-subunit protein complexes are involved in many essential biochemical processes including signal transduction, protein synthesis, RNA synthesis, DNA replication and protein degradation. An accurate description of the relative structural arrangement of the constituent sub-units in such complexes is crucial for an understanding of the molecular mechanism of the complex as a whole. Many complexes, however, lie in the mega-Dalton range, and are not amenable to X-ray crystallographic or Nuclear Magnetic Resonance analysis. Techniques that are suited to structural studies of such large complexes, such as cryo-electron microscopy, do not provide the resolution required for a mechanistic insight. Mass spectrometry (MS) has increasingly been applied to identify the residues that are involved in chemical cross-links in compound protein assemblies, and have provided valuable insight into the molecular arrangement, orientation and contact surfaces of sub-units within such large complexes. This approach is known as MS3D, and involves the MS analysis of cross-linked di-peptides following the enzymatic cleavage of a chemically cross-linked complex. A major challenge of this approach is the identification of the cross-linked di-peptides in a composite mixture of peptides, as well as the identification of the residues involved in the cross-link. These analyses require bioinformatics tools with capabilities beyond that of general, MS-based proteomic analysis software. Many MS3D software tools have appeared, often designed for very specific experimental methods. We review all major MS3D bioinformatics programs currently available, considering their applicability to different workflows, specific experimental requirements, and the computational approach taken by each. We also developed AnchorMS, a new bioinformatics tool for the identification of both the sequences and cross-linked residues of di-peptides within a post-digest peptide mixture based on MS1 and MS2 data. AnchorMS is intended as a component in the workflow of an MS3D experiment where the protein sequences, cross-linking reagent and protease are known. AnchorMS is freely available as a public web service at cbio.ufs.ac.za/AnchorMS via a simple, user-friendly web interface coded in PHP/XHTML. Experimental sample preparation information and MS data may be uploaded through the web form and analysed by AnchorMS. After analysis, the web interface displays the di-peptides detected, as well as the calculated maximum inter-residue distance between crosslinked residues. This distance information can be used in the optimization of sub-unit positioning within structural models using third party software. The computational core of AnchorMS was developed as an open-source Python project. We describe in detail the overall structure and workflow of the code as well as the functionality implemented in each section of the code. AnchorMS creates a digital library of possible di-peptides and generates expected precursor and fragment mass spectra for each. In order to identify di-peptides, the observed mass spectra are matched against the library of expected mass spectra. Features that are unique to AnchorMS are highlighted, including those for the analysis of di-peptides where the sequences are identical, but the cross-linked residues differ. AnchorMS considers their possible co-fragmentation and employs a specialised second score for distinguishing between such precursors. A unique mathematical model for estimating the level of false positive matching was derived based on an in silico simulation of false positive spectrum matching using randomly generated di-peptide sequences. Subsets of the simulation data were modelled using disparate functions, which were subsequently combined to yield a composite model that described expected false matching under various conditions. The refined calibration of this model against simulation data was performed using the R programming language. AnchorMS also implemented this model as a dynamic false positive threshold, where score values greater than the threshold were considered likely to be true spectrum matches.Afrikaans: Proteïen komplekse wat uit verskeie subeenhede bestaan, is by baie essensiële biochemiese prosesse betrokke, insluitend by seintransduksie, proteïensintese, RNAsintese, DNA-replikasie en by die afbreking van proteïene. 'n Akkurate beskrywing van die relatiewe strukturele rangskikking van die subeenhede in sulke komplekse is van kardinale belang vir 'n begrip van die molekulêre meganisme van die kompleks as 'n geheel. Komplekse is egter dikwels groter as een megaDalton, en dus nie geskik vir Xstraal kristallografiese of kern magnetiese resonansie analises nie. Tegnieke wat vir strukturele studies van sulke groot komplekse gebruik word, soos krioelektronmikroskopie, gee nie die nodige resolusie vir 'n meganistiese insig nie. Massaspektrometrie (MS) word toenemend toegepas om die aminosure te identifiseer wat in chemiese kruisbindings in die saamgestelde proteïen kompleks betrokke is, en verskaf waardevolle inligting oor die molekulêre rangskikking, oriëntasie en kontak oppervlaktes van subeenhede binne sulke groot komplekse. Hierdie benadering staan bekend as MS3D, en behels die MS analise van kruisgekoppelde peptied dimere na die ensiematiese klowing van 'n chemies kruisgekoppelde kompleks. 'n Groot uitdaging van hierdie benadering is die identifisering van die kruisgekoppelde peptied dimere in 'n saamgestelde peptied mengsel, sowel as die identifisering van die aminosuur residue wat in die kruisbinding betrokke is. Hierdie ontleding vereis bioinformatika sagteware met vermoëns buite dié van algemene, MS-gebaseerde proteoom analise sagteware. Baie MS3D sagteware het al verskyn, en is dikwels ontwerp vir baie spesifieke eksperimentele metodes. Ons gee 'n oorsig van die belangrikste MS3D bioinformatika programme tans beskikbaar, met inagneming van hul toepaslikheid in verskillende werk metodes, spesifieke eksperimentele vereistes, en die rekenaarmatige benadering wat elkeen neem. Ons het AnchorMS ontwikkel, 'n nuwe bioinformatika program vir die identifisering van beide die volgordes en die spesifieke kruisgekoppelde aminosure van die peptied dimere binne 'n verteerde peptied mengsel, gebaseer op MS1 en MS2 data. AnchorMS is bedoel as 'n komponent in die werksvloei van 'n MS3D eksperiment, waar die proteïen volgordes, kruisbindende reagens en protease bekend is. AnchorMS is vrylik toeganklik as 'n openbare webdiens by cbio.ufs.ac.za/AnchorMS via 'n eenvoudige, gebruikervriendelike webkoppelvlak, gekodeer in PHP/XHTML. Eksperimentele monster voorbereiding inligting en MS data word deur die webvorm opgelaai, en deur AnchorMS geanaliseer. Na die analise, vertoon die webkoppelvlak waargenome peptied dimere asook die berekende maksimum afstand tussen kruisgekoppelde aminosure. Hierdie inligting kan in die optimisering van subeenheid posisies in strukturele modelle gebruik word, met behulp van derdeparty sagteware. Die rekenkern van AnchorMS is as 'n oopbron Python projek ontwikkel. Ons beskryf in detail die algemene struktuur en werksvloei van die program, sowel as die funksionaliteit wat in elke afdeling van die kode geïmplementeer is. Soos met ander MS3D programme, stel AnchorMS 'n digitale biblioteek saam uit moontlike analiet peptied dimere en genereer vir elkeen verwagte MS spektra. Om peptied dimere te identifiseer, word die waargenome massa spektra vergelyk teen die biblioteek van verwagte massa spektra. Dié funksionaliteit, wat uniek tot AnchorMS is, word uitgelig, insluitend dié vir die analise van peptied dimere waar die volgordes identies, maar die kruisgekoppelde aminosure verskil. AnchorMS oorweeg ook hulle moontlike medefragmentasie, en pas 'n gespesialiseerde, tweede telling toe om tussen sulke gevalle te onderskei. ‘n Unieke wiskundige model is afgelei vir die beraming van die vlak van valse, positiewe identifikasies, gebaseer op 'n rekenaar simulasie van spektra voorspel vir peptiede met lukraak gegenereerde volgordes. Gedeeltes van die simulasie data is gemodelleer deur die gebruik van funksies wat gekombineer is om 'n saamgestelde model te lewer, wat die verwagte vlak van valse, positiewe identifikasies onder verskillende omstandighede kan beskryf. Verfyning van hierdie model is met behulp van die R programmeringstaal uitgevoer. AnchorMS implementeer ook hierdie model as 'n dinamiese, valse positiewe drempel, waar telling waardes groter as hierdie drempel beskou word as korrekte identifikasies.enDissertation (M.Sc. (Biochemistry))--University of the Free State, 2013Mass spectrometryBioinformaticsStructureSoftwareProteinPythonModellingMS3DMass spectrometryInteractionsCross-linkingA bioinformatic tool for analysing the structures of protein complexes by means of mass spectrometry of cross-linked proteinsDissertationUniversity of the Free State