Doctoral Degrees (Mathematical Statistics and Actuarial Science)
Permanent URI for this collection
Browse
Recent Submissions
Item Open Access Continuous-time Markov modelling of the effects of treatment regimens on HIV/AIDS immunology and virology(University of the Free State, 2019) Shoko, Claris; Chikobvu, DelsonAs the Human immunodeficiency virus (HIV) enters the human body, its main target is the CD4+ cell, which it turns into a factory that produces millions of other HIV particles, thus compromising the immune system and resulting in opportunistic infections, for example tuberculosis (TB). Combination Anti-retroviral therapy (cART) has become the standard of care for patients with HIV infection and has led to the reduction in acquired immunodeficiency syndrome (AIDS) related morbidity and mortality, an increase in CD4+ cell counts and a decrease in viral load count to undetectable levels. In modelling HIV/AIDS progression in patients, researchers mostly deal with either viral load only or CD4+cellcountsonly, as they expect these two variables to be collinear. The purpose of this study is to fit a continuous-time Markov model that best describes mortality of HIV infected patients on cART by eventually including both CD4+ cell counts monitoring and viral load monitoring in a single model after treating for collinearity of these variables using the Principal Component approach. Acohortof320HIVinfectedpatientsoncARTfollowedupat a Wellness Clinic in Bela Bela, South Africa, is used in this thesis. These patients are administered with a triple therapy of two nucleoside reverse transcriptase inhibitor (NRTIs) and one non-nucleoside reverse transcriptase inhibitor (NNRTI). The thesis is divided into five sections. In the first section, a continuous-time homogeneous Markov model based on CD4+ cell count states is fitted. The model is used to analyse the effects of tuberculosis (TB) co-infection on the immunologic progression of HIV/AIDS patients on cART. TB co-infection was of interest because it is an opportunistic infection that takes advantage of the compromised immune system. Results from this section showed that once TB is diagnosed prior to treatment initiation and managed, mortality rates are reduced. However, if TB is diagnosed during the course of treatment, it increases the rates of immune deterioration in patients, leading to high rates of mortality. Therefore, this section proposes the need for routine TB screening before treatment initiation and a tevery stage of the follow up period, to avoid loss of lives. The goal of cART is not only to boost the immune system but also to suppress the viral load to undetectable levels. Thus, in the second section, a non-homogeneous continuous-time Markov model based on viral load states is fitted. This model helped in revealing possibilities of viral rebound among patient son cART. Although there were no significant gender differences on HIV/AIDS virology, the model explained the progression of patients better than the model based on CD4+ cell count fitted in the first section. In the third section, determinants of viral rebound are analysed. Viral rebound was notable mainly after patients had attained a viral load suppressed to the levels between 50 copies/mL and 10 000 copies/mL. The major attributes of viral rebound were non-adherence, lactic acid, resistance to treatment, and different combination therapy such as AZT-3TC-LPV/r and FTC-TDF-EFV. This section suggests the need to closely monitor HIV patients to ensure attainment of undetectable viral load (below 50 copies/mL) during the first six months of treatment uptake, as this reduces chances of viral rebound, leading to life gain by HIV/AIDS patients. The fourth section compares the use of viral load count and CD4+cell count in monitoring HIV/AIDS disease progression on patients receiving cART in order to establish the superiority of viral load over CD4+ cell count. This was done by fitting two separate models, one for CD4+ cell count states and the other one for viral load states. Comparison of the fitted models were based on percentage prevalence plots for the fitted model and for the observed data and likelihood ratio tests. The test confirmed that viral load monitoring is superior compared to CD4+cell count monitoring. Viral load monitoring is very good at detecting virologic failure, thereby avoiding unnecessary switches of treatment lines. However, this section suggests the use of both CD4+cellcount monitoring and viral load monitoring because CD4+ cell count monitoring helps in managing possibilities of the development of opportunistic infections. In the fifth section, continuous-time homogeneous Markov models are fitted, including both CD4+ cell count monitoring and viral load monitoring in one model. Since these variables are assumed to be collinear, principal component analysis was used to treat for the collinearity among these two variables. The models are fitted in such a way that when Markov states are based on CD4+ cell count, the principal component of viral load is included as a covariate, and when the Markov states are based on viral load, the principal component of CD4+cell count is included as a covariate. Results from the models show an improvement in the power of the continuous-time Markov model to explain and predict mortality when both CD4+cellcount and viral load routine monitoring are included in one model.Item Open Access Actuarial risk management of investment guarantees in life insurance(University of the Free State, 2010-11) Bekker, Kobus Nel; Dhaene, Jan; Finkelstein, MaximInvestment guarantees in life insurance business have generated a lot of research in recent years due to the earlier mispricing of such products. These guarantees generally take the form of exotic options and are therefore difficult to price analytically, even in a simplified setting. A possible solution to the risk management problem of investment guarantees contingent on death and survival is proposed through the use of a conditional lower bound approximation of the corresponding embedded option value. The derivation of the conditional lower bound approximation is outlined in the case of regular premiums with asset-based charges and the implementation is illustrated in a Black-Scheles-Merton setting. The derived conditional lower bound approximation also facilitates verifying economic scenario generator based pricing and valuation, as well as sensitivity measures for hedging solutions.Item Open Access Parametric and nonparametric Bayesian statistical inference in animal science(University of the Free State, 2000-11) Pretorius, Albertus Lodewikus; Van der Merwe, Abrie J.Chapter 1 illustrated an extension of the Gibbs sampler to solve problems arising in animal breeding theory. Formulae were derived and presented to implement the Gibbs sampler where-after marginal densities, posterior means, modes and credibility intervals were obtained from the Gibbs sampler. In the Bayesian Method of Moment chapter we have illustrated how this approach, based on a few relatively weak assumptions, is used to obtain maximum entropy densities, realized error terms and future values of the parameters for the mixed linear model. Given the data, it enables researchers to compute post data densities for parameters and future observations if the form of the likelihood function is unknown. On introducing and proving simple assumptions relating to the moments of the realized error terms and the future, as yet unobserved error terms, we derived post-data moments of parameters and future values of the dependent variable. Using these moments as side conditions, proper maxent densities for the model parameters were derived and could easily be computed. It was also shown that in the computed example, where use was made of the Gibbs sampler to compute finite sample post-data parameter densities, some BMOM maxent densities were very similar to the traditional Bayesian densities, whilst others were not. It should be appreciated that the BMOM approach yielded useful inverse inferences without using assumed likelihood functions, prior densities for their parameters and Bayes' theorem, also it was the case that the BMOM techniques extended in the present thesis to the mixed linear model provided valuable and significant solutions in applying traditional likelihood or Bayesian analysis in animal breeding problems. The important contribution of Charter 3 and 4 revolved around the nonparametrie modeling of the random effects. We have applied a general technique for Bayesian nonparametries to this important class of models, the mixed linear model for animal breeding experiments. Our technique involved specifying a non parametric prior for the distribution of the random effects and a Dirichlet process prior on the space of prior distributions for that nonparametric prior. The mixed linear model was then fitted with a Gibbs sampler, which turned an analytical intractable multidimensional integration problem into a feasible numerical one, overcoming most of the computational difficulties usually experience with the Dirichlet process. This proposed procedure also represented a new application of the mixture of Dirichlet process model to problems arising from animal breeding experiment. The application to and discussion of the breeding experiment from Kenya was helpful for understanding the importance and utility of the Dirichlet process, and inference for all the mixed linear model parameters. However, as mentioned before, a substantial statistical issue that still remains to be tackled is the great discrepancy between resulting posterior densities of the random effects as the value of the precision parameter, M changes. We believe that Bayesian nonparametries have much to offer, and can be applied to a wide range of statistical procedures. In addition to the Dirichlet Process Prior, we will look in the future at other non parametric priors like the Pólya tree priors and Bernoulli trips. Whilst our feeling in the final chapter was that study of performance of non-informative was certainly to be encouraged, we have found the group reference priors to generally be high satisfactory, and felt reasonably confident in using them in situations in which further study was impossible. Results from the different theorems yielded that the group orderings of the mixed model parameters are very important since different orderings will frequently result in different reference priors. This dependencél of the reference prior on the group chosen and their ordering was unavoidable. Our motivation and idea for the reference prior was basically to choose the prior, which in a certain asymptotic sense maximized the information in the posterior that was provided by the data. The thesis has surveyed a range of current research in the area of Bayesian parametric and nonparametrie inference in animal science. The work is ongoing and several problems remain unresolved. In particular, more work is required in the following areas: a full Bayesian nonparametrie analysis involving covariate information; multivariate priors based on stochastic processes; multivariate error models involving Pólya trees; developing exchangeable processes to cover a larger class of problems and nonparametric sensitivity analysis.Item Open Access Aspects of Bayesian change-point analysis(University of the Free State, 2000-11) Schoeman, Anita Carina; Groenewald, P. C. N.English: In chapter one we looked at the nature of structural change and defined structural change as a change in one or more parameters of the model in question. Bayesian procedures can be applied to solve inferential problems of structural change. Among the various methodological approaches within Bayesian inference, emphasis is put on the analysis of the posterior distribution itself, since the posterior distribution can be used for conducting hypothesis testing as well as obtaining a point estimate. The history of structural change in statistics, beginning in the early 1950's, is also discussed. Furthermore the Bayesian approach to hypothesis testing was developed by Jeffreys (1935, 1961), where the centerpiece was a number, now called the Bayes factor, which is the posterior odds of the null hypothesis when the prior probability on the null is one-half. According to Kass and Raftery (1993) this posterior odds = Bayes factor x prior odds and the Bayes factor is the ratio of the posterior odds of Hl to its prior odds, regardless of the value of the prior odds. The intrinsic and fractional Bayes factors are defined and some advantages and disadvantages of the IBF's are discussed. In chapter two changes in the multivariate normal model are considered. Assuming that a change has taken place, one will want to be able to detect the change and to estimate its position as well as the other parameters of the model. To do a Bayesian analysis, prior densities should be chosen. Firstly the hyperparameters are assumed known, but as this is not. usually true, vague improper priors are used (while the number of change-point.s is fixed). Another way of dealing with the problem of unknown hyperparameters is to use a hierarchical model where the second stage priors are vague. We also considered Gibbs sampling and gave the full conditional distributions for all the cases. The three cases that are studied is (1) a change in the mean with known or unknown variance, (2) a change in the mean and variance by firstly using independent prior densities on the different variances and secondly assuming the variances to be proportional and (3) a change in the variance. The same models above are also considered when the number of change-points are unknown. In this case vague priors are not appropriate when comparing models of different dimensions. In this case we revert to partial Bayes factors, specifically the intrinsic and fractional Bayes factors, to obtain the posterior probabilities of the number of change-points. Furthermore we look at component analysis, i.e. determining which components of a multivariate variable are mostly responsible for the changes in the parameters. The univariate case is then also considered in more detail, including multiple model comparisons and models with auto correlated errors. A summary of approaches in the literature as well as four examples are included. In chapter three changes in the linear model, with (1) a change in the regression coefficient and a constant variance, (2) a change in only the variance and (3) a change in the regression coefficient and the variance, are considered. Bayes factors for the above mentioned cases, multiple change-points, component analysis, switchpoint (continuous change-point) and auto correlation are included, together with seven examples. In chapter four changes in some other standard models are considered. Bernoulli type experiments include the Binomial model, the Negative binomial model, the Multinomial model and the Markov chain model. Exponential type models include the Poisson model, the Gamma model and the Exponential model. Special cases of the Exponential model include the left truncated exponential model and the Exponential model with epidemic change. In all cases the partial Bayes factor is used to obtain posterior probabilities when the number of change-points is unknown. Marginal posterior densities of all parameters under the change-point model are derived. Eleven examples are included. In chapter five change-points in the hazard rate are studied. This includes an abrupt change in a constant hazard rate as well as a change from a decreasing hazard rate to a constant hazard rate or a change from a constant hazard rate to an increasing hazard rate. These hazard rates are obtained from combinations of Exponential and Weibull density functions. In the same way a bathtub hazard rate can also be constructed. Two illustrations are given. Some concluding remarks are made in chapter six, with discussions of other approaches in the literature and other possible applications not dealt with in this study.Item Open Access A Bayesian analysis of multiple interval-censored failure time events with application to AIDS data(University of the Free State, 2003-05) Mokgatlhe, Lucky; Groenewald, C. N.; De Waal, Daniel J.English: The measure of time to event (failure) for units on longitudinal clinical visits cannot always be ascertained exactly. Instead only time intervals within which the event occurred may be recorded. That being the case, each unit's failure will be described by a single interval resulting in grouped interval data over the sample. Yet, due to non-compliance to visits by some units, failure will be described by endpoints within which the event has occurred. These endpoints may encompass several intervals, hence overlapping intervals across units. Furthermore, some units may not realize the event of interest within the preset duration of study, hence are censored. Finally, several events of interest can be investigated on a single unit resulting in several failure times that inevitably are dependent. All these prescribe an interval-censored survival data with multiple-failure times. Three models of analysing interval-censored survival data with two failure times were applied to four sets of data. For the distribution free methods, Cox's hazard with either a log-log transform or logit transform on the baseline conditional survival probabilities was used to derive the likelihood. The Independence assumption model (lW) work under the assumption that the lifetimes are independent and any dependence exists through the use of common covariates. The second model that do not necessarily assume independence, computes the joint failure probabilities for two lifetimes by Bayes' rule of conditioning on the interval of failure for one lifetime, hence Conditional Bivariate model (CB). The use of Clayton and Farley-Morgenstern bivariate Copulas (CC) with inbuilt dependence parameter was the other model. For parametric models the IW and CC methods were applied to the data sets on the assumption that the marginal distribution of the lifetimes is Weibull. The traditional classical estimation method of Newton-Raphson was used to find optimum parameter estimates and their variances stabilized using a sandwich estimator, where possible. Bayesian methods combine the data with prior information. Thus for either transforms, two proper priors were derived, of which their combination with the likelihood resulted in a posterior function. To estimate the entire distribution of a parameter from non-standard posterior functions, two Markov Chain Monte Carlo (MCMC) methods were used. The Gibbs Sampler method samples in turn observations from the conditional distribution of a parameter in question, while holding other parameters constant. For intractably complex posterior functions, the Metropolis-Hastings method of sampling vectors of parameter values in blocks from a Multivariate Normal proposal density was used. The analysis of ACTG175data revealed that increase in levels of HIV RNA precede decline in CD4 cell counts. There is a strong dependence between the two failure times, hence restricting the use of the independence model. The most preferred models are using copulas and the conditional bivariate model. It was shown that ARV's actually improves a patient's lifetime at varying rates, with combination treatment performing better. The worrying issue is the resistance that HIV virus develops against the drugs. This is evidenced by the adverse effect the previous use of ARV's has on patients, in that a new drug used on them has less effect. Finally it is important that patients start therapy at early stages since patients displaying signs of AIDS at entry respond negatively to drugs.Item Open Access Bayesian control charts based on predictive distributions(University of the Free State, 2016-01) Van Zyl, Ruaan; Van der Merwe, A. J.English: Control charts are statistical process control (SPC) tools that are widely used in the monitoring of processes, specifically taking into account stability and dispersion. Control charts signal when a significant change in the process being studied is observed. This signal can then be investigated to identify issues and to find solutions. It is generally accepted that SPC are implemented in two phases, Phase I and Phase II. In Phase I the primary interest is assessing process stability, often trying to bring the process in control by locating and eliminating any assignable causes, estimating any unknown parameters and setting up the control charts. After that the process move on to Phase II where the control limits obtained in Phase I are used for online process monitoring based on new samples of data. This thesis concentrate mainly on implementing a Bayesian approach to monitoring processes using SPC. This is done by providing an overview of some non-informative priors and then to specifically derive the reference and probability-matching priors for the common coefficient of variation, standardized mean and tolerance limits for a normal population. Using the Bayesian approach described in this thesis SPC is performed, including derivations of control limits in Phase I and monitoring by the use of runlengths and average run-lengths in Phase II for the common coefficient of variation, standardized mean, variance and generalized variance, tolerance limits for normal populations, two-parameter exponential distribution, piecewise exponential model and capability indices. Results obtained using the Bayesian approach are compared to frequentist results.Item Open Access On some methods of reliability improvement of engineering systems(University of the Free State, 2015-08) Mangara, Bernard Tonderayi; Finkelstein, MaximThe purpose of this thesis was to study some methods of reliability improvement of engineering systems. The reason for selecting the theme “reliability improvement of engineering systems” was first to explore traditional methods of reliability improvement (that is, based on the notion that reliability could be assured by simply introducing a sufficiently high “safety factor” into the design of a component or a system) and then propose new and original concepts of reliability improvement. The latter consists of approaches, methods and best practices that are used at the design phase of a component (system) in order to minimize the likelihood (risk) that the component (system) might not meet the reliability requirements, objectives and expectations. Therefore, chapter 1 of the thesis, “Introduction to the main methods and concepts of reliability for technical systems” encompasses the introduction section and the main traditional methods available for improvement of technical / engineering systems. In chapter 2, “Reliability Component Importance Measures” two new and original concepts on reliability improvement of engineering systems are introduced. These are: 1) the study of availability importance of components in coherent systems and 2) the optimal assignment of interchangeable components in coherent multi-state systems. In chapter 3, “Cannibalization Revisited” two new and original concepts on reliability improvement of engineering systems are introduced. These are: 1) theoretical model to show the effects of cannibalization on mission time availability of systems and 2) new model for cannibalization and the corresponding example. In chapter4, “On the Improvement of Steam Power Plant System Reliability” a new and original model is developed that helps in determining the optimal maintenance strategies which will ensure maximum reliability of the coal-fired generating station. Conclusions are given, concerning the study conducted and the results thereof, at the end of each chapter. The conclusions for this thesis are annotated in chapter 5. A set of selected references that were consulted during the study performed for this doctor of philosophy thesis is provided at the end.Item Open Access On the use of extreme value theory in energy markets(University of the Free State, 2007-05-09) Micali, V.; De Waal, D.English: The thesis intent is to provide a set of statistical methodologies in the field of Extreme Value Theory (EVT) with a particular application to energy losses, in Gigawatt-hours (GWh) experienced by electrical generating units (GU’s). Due to the complexity of the energy market, the thesis focuses on the volume loss only and does not expand into the price, cost or mixes thereof (although the strong relationship between volume and price is acknowledged by some initial work on the energy price [SMP] is provided in Appendix B) Hence, occurrences of excessive unexpected energy losses incurred by these GU’s formulate the problem. Exploratory Data Analysis (EDA) structures the data and attempts at giving an indication on the categorisation of the excessive losses. The size of the GU failure is also investigated from an aggregated perspective to relate to the Generation System. Here the effect of concomitant variables (such as the Load Factor imposed by the market) is emphasised. Cluster Analysis (2-Way Joining) provided an initial categorising technique. EDA highlights the shortfall of a scientific approach to determine the answer to the question at when is a large loss sufficiently large that it affects the System. The usage of EVT shows that the GWh Losses tend to behave as a variable in the Fréchet domain of attraction. The Block Maxima (BM) and Peak-Over-Threshold (POT), the latter as semi and full parametric, methods are investigated. The POT methodologies are both applicable. Of particular interest is the Q-Q plots results on the semiparametric POT method, which yielded results that fit the data satisfactorily (pp 55-56). The Generalised Pareto Distribution (GPD) models well the tail of the GWh Losses above a threshold under the POT full parametric method. Different methodologies were explored in determining the parameters of the GPD. The method of 3-LM (linear combinations of Probability Weighted Moments) is used to arrive at initial estimates of the GPD parameters. A GPD is finally parameterised for the GWh Losses above 766 GWh. The Bayesian philosophy is also utilised in this thesis as it provides a predictive distribution of (high quantiles) the large GWh Losses. Results are found in this part of the thesis in so far that it utilises the ratio of the Mean Excess Function (the expectation of a loss above a certain threshold) over its probability of exceeding the threshold as an indicator to establish the minimum of this ratio. The technique was developed for the GPD by using the Fisher Information Matrix (FIM) and the Delta-Method. Prediction of high quantiles were done by using Markov Chain Monte Carlo (MCMC) and eliciting the GPD Maximal Data Information (MDI) prior. The last EVT methodology investigated in the thesis is the one that uses the Dirichlet process and the method of Negative Differential Entropy (NDE). The thesis also opened new areas of pertinent research.Item Open Access Bayesian inference for linear and nonlinear functions of Poisson and binomial rates(University of the Free State, 2012) Raubenheimer, Lizanne; Van der Merwe, A. J.This thesis focuses on objective Bayesian statistics, by evaluating a number of noninformative priors. Choosing the prior distribution is the key to Bayesian inference. The probability matching prior for the product of different powers of k binomial parameters is derived in Chapter 2. In the case of two and three independently distributed binomial variables, the Jeffreys, uniform and probability matching priors for the product of the parameters are compared. This research is an extension of the work by Kim (2006), who derived the probability matching prior for the product of k independent Poisson rates. In Chapter 3 we derive the probability matching prior for a linear combination of binomial parameters. The construction of Bayesian credible intervals for the difference of two independent binomial parameters is discussed. The probability matching prior for the product of different powers of k Poisson rates is derived in Chapter 4. This is achieved by using the differential equation procedure of Datta & Ghosh (1995). The reference prior for the ratio of two Poisson rates is also obtained. Simulation studies are done to com- pare different methods for constructing Bayesian credible intervals. It seems that if one is interested in making Bayesian inference on the product of different powers of k Poisson rates, the probability matching prior is the best. On the other hand, if we want to obtain point estimates, credibility intervals or do hypothesis testing for the ratio of two Poisson rates, the uniform prior should be used. In Chapter 5 the probability matching prior for a linear contrast of Poisson parameters is derived, this prior is extended in such a way that it is also the probability matching prior for the average of Poisson parameters. This research is an extension of the work done by Stamey & Hamilton (2006). A comparison is made between the confidence intervals obtained by Stamey & Hamilton (2006) and the intervals derived by us when using the Jeffreys and probability matching priors. A weighted Monte Carlo method is used for the computation of the Bayesian credible intervals, in the case of the proba- bility matching prior. In the last section of this chapter hypothesis testing for two means is considered. The power and size of the test, using Bayesian methods, are compared to tests used by Krishnamoorthy & Thomson (2004). For the Bayesian methods the Jeffreys prior, probability matching prior and two other priors are used. Bayesian estimation for binomial rates from pooled samples are considered in Chapter 6, where the Jeffreys prior is used. Bayesian credibility intervals for a single proportion and the difference of two binomial proportions estimated from pooled samples are considered. The results are compared This thesis focuses on objective Bayesian statistics, by evaluating a number of noninformative priors. Choosing the prior distribution is the key to Bayesian inference. The probability matching prior for the product of different powers of k binomial parameters is derived in Chapter 2. In the case of two and three independently distributed binomial variables, the Jeffreys, uniform and probability matching priors for the product of the parameters are compared. This research is an extension of the work by Kim (2006), who derived the probability matching prior for the product of k independent Poisson rates. In Chapter 3 we derive the probability matching prior for a linear combination of binomial parameters. The construction of Bayesian credible intervals for the difference of two independent binomial parameters is discussed. The probability matching prior for the product of different powers of k Poisson rates is derived in Chapter 4. This is achieved by using the differential equation procedure of Datta & Ghosh (1995). The reference prior for the ratio of two Poisson rates is also obtained. Simulation studies are done to com- pare different methods for constructing Bayesian credible intervals. It seems that if one is interested in making Bayesian inference on the product of different powers of k Poisson rates, the probability matching prior is the best. On the other hand, if we want to obtain point estimates, credibility intervals or do hypothesis testing for the ratio of two Poisson rates, the uniform prior should be used. In Chapter 5 the probability matching prior for a linear contrast of Poisson parameters is derived, this prior is extended in such a way that it is also the probability matching prior for the average of Poisson parameters. This research is an extension of the work done by Stamey & Hamilton (2006). A comparison is made between the confidence intervals obtained by Stamey & Hamilton (2006) and the intervals derived by us when using the Jeffreys and probability matching priors. A weighted Monte Carlo method is used for the computation of the Bayesian credible intervals, in the case of the proba- bility matching prior. In the last section of this chapter hypothesis testing for two means is considered. The power and size of the test, using Bayesian methods, are compared to tests used by Krishnamoorthy & Thomson (2004). For the Bayesian methods the Jeffreys prior, probability matching prior and two other priors are used. Bayesian estimation for binomial rates from pooled samples are considered in Chapter 6, where the Jeffreys prior is used. Bayesian credibility intervals for a single proportion and the difference of two binomial proportions estimated from pooled samples are considered. The results are compared to those from other methods. In Chapters 7 and 8, Bayesian process control for the p - chart and the c - chart are considered. The Jeffreys prior is used for the Bayesian methods. Control chart limits, average run lengths and false alarm rates are determined. The results from the Bayesian method are compared to the results obtained from the classical (frequentist) method. Bayesian tolerance intervals for the binomial and Poisson distributions are studied in Chapter 9, where the Jeffreys prior is used.Item Open Access Modelling electricity demand in South Africa(University of the Free State, 2014-01) Sigauke, Caston; Chikobvu, DelsonEnglish: Peak electricity demand is an energy policy concern for all countries throughout the world, causing blackouts and increasing electricity tariffs for consumers. This calls for load curtailment strategies to either redistribute or reduce electricity demand during peak periods. This thesis attempts to address this problem by providing relevant information through a frequentist and Bayesian modelling framework for daily peak electricity demand using South African data. The thesis is divided into two parts. The first part deals with modelling of short term daily peak electricity demand. This is done through the investigation of important drivers of electricity demand using (i) piecewise linear regression models, (ii) a multivariate adaptive regression splines (MARS) modelling approach, (iii) a regression with seasonal autoregressive integrated moving average (Reg-SARIMA) model (iv) a Reg-SARIMA model with generalized autoregressive conditional heteroskedastic errors (Reg-SARIMA-GARCH). The second part of the thesis explores the use of extreme value theory in modelling winter peaks, extreme daily positive changes in hourly peak electricity demand and same day of the week increases in peak electricity demand. This is done through fitting the generalized Pareto, generalized single Pareto and the generalized extreme value distributions. One of the major contributions of this thesis is quantification of the amount of electricity which should be shifted to off peak hours. This is achieved through accurate assessment of the level and frequency of future extreme load forecasts. This modelling approach provides a policy framework for load curtailment and determination of the number of critical peak days for power utility companies. This has not been done for electricity demand in the context of South Africa to the best of our knowledge. The thesis further extends the autoregressive moving average-exponential generalized autoregressive conditional heteroskedasticity model to an autoregressive moving average exponential generalized autoregressive conditional heteroskedasticity-generalized single Pareto distribution. The benefit of this hybrid model is in risk modelling of under and over demand predictions of peak electricity demand. Some of the key findings of this thesis are (i) peak electricity demand is influenced by the tails of probability distributions as well as by means or averages, (ii) electricity demand in South Africa rises significantly for average temperature values below 180C and rises slightly for average temperature values above 220C and (iii) modelling under and over demand electricity forecasts provides a basis for risk assessment and quantification of such risk associated with forecasting uncertainty including demand variability.Item Open Access Hierarchical Bayesian modelling for the analysis of the lactation of dairy animals(University of the Free State, 2006-03) Lombaard (née Viljoen), Carolina Susanna; Groenewald, P. C. N.English: This thesis was written with the aim of modelling the lactation process in dairy cows and goats by applying a hierarchical Bayesian approach. Information on cofactors that could possibly affect lactation is included in the model through a novel approach using covariates. Posterior distributions of quantities of interest are obtained by means of the Markov chain Monte Carlo methods. Prediction of future lactation cycle(s) is also performed. In chapter one lactation is defined, its characteristics considered, the factors that could possibly influence lactation mentioned, and the reasons for modelling lactation explained. Chapter two provides a historical perspective to lactation models, considers typical lactation curve shapes and curves fitted to the lactation composition traits fat and protein of milk. Attention is also paid to persistency of lactation. Chapter three considers alternative methods of obtaining total yield and producing Standard Lactation Curves (SLAC’s). Attention is paid to methods used in fitting lactation curves and the assumptions about the errors. In chapter four the generalised Bayesian model approach used to simultaneous ly model more than one lactation trait, while also incorporating information on cofactors that could possibly influence lactation, is developed. Special attention is paid not only to the model for complete data, but also how modelling is adjusted to make provision for cases where not all lactation cycles have been observed for all animals, also referred to as incomplete data. The use of the Gibbs sampler and the Metropolis-Hastings algorithm in determining marginal posterior distributions of model parameters and quantities that are functions of such parameters are also discussed. Prediction of future lactation cycles using the model is also considered. In chapter five the Bayesian approach together with the Wood model, applied to 4564 lactation cycles of 1141 Jersey cows, is used to illustrate the approach to modelling and prediction of milk yield, percentage of fat and percentage of protein in milk composition in the case of complete data. The incorporation of cofactor information through the use of the covariate matrix is also considered in greater detail. The results from the Gibbs sampler are evaluated and convergence there-of investigated. Attention is also paid to the expected lactation curve characteristics as defined by Wood, as well as obtaining the expected lactation 254 curve of one of the levels of a cofactor when the influence of the other cofactors on the lactation curve has be eliminated. Chapter six considers the use of the Bayesian approach together with the general exponential and 4-parameter Morant model, as well as an adaptation of a model suggested by Wilmink, in modelling and predicting milk yield, fat content and protein content of milk for the Jersey data. In chapter seven a diagnostic comparison by means of Bayes factors of the results from the four models in the preceding two chapters, when used together with the Bayesian approach, is performed. As a result the adapted form of the Wilmink model fared best of the models considered! Chapter eight illustrates the use of the Bayesian approach, together with the four lactation models considered in this study, to predict the lactation traits for animals similar to, but not contained in the data used to develop the respective models. In chapter nine the Bayesian approach together with the Wood model, applied to 755 lactation cycles of 493 Saanen does collected during either or both of two consecutive year, is used to illustrate the approach to modelling and predicting milk yield, percentage of fat and percentage of protein in milk in the case of incomplete data. Chapter ten provides a summary of the results and a perspective of the contribution of this research to lactation modelling.Item Open Access Bayesian non-linear models for the bactericidal activity of tuberculosis drugs(University of the Free State, 2015-05) Burger, Divan Aristo; Schall, R.; Van der Merwe, A. J.Trials of the early bactericidal activity (EBA) of tuberculosis (TB) treatments assess the decline, during the first few days to weeks of treatment, in colony forming unit (CFU) count of Mycobacterium tuberculosis in the sputum of patients with smear-microscopy-positive pulmonary TB. Profiles over time of CFU data have conventionally been modeled using linear, bilinear or bi-exponential regression. This thesis proposes a new biphasic nonlinear regression model for CFU data that comprises linear and bilinear regression models as special cases, and is more exible than bi-exponential regression models. A Bayesian nonlinear mixed effects (NLME) regression model is fitted jointly to the data of all patients from clinical trials, and statistical inference about the mean EBA of TB treatments is based on the Bayesian NLME regression model. The posterior predictive distribution of relevant slope parameters of the Bayesian NLME regression model provides insight into the nature of the EBA of TB treatments; specifically, the posterior predictive distribution allows one to judge whether treatments are associated with mono-linear or bilinear decline of log(CFU) count, and whether CFU count initially decreases fast, followed by a slower rate of decrease, or vice versa. The fit of alternative specifications of residuals, random effects and prior distributions is explored. In particular, the conventional normal regression models for log(CFU) count versus time profiles are extended to provide a robust approach which accommodates outliers and potential skewness in the data. The deviance information criterion and compound Laplace-Metropolis Bayes factors are calculated to discriminate between models. The biphasic model is fitted to time to positivity data in the same way as for CFU data.Item Open Access Bayesian tolerance intervals for variance component models(University of the Free State, 2012-01) Hugo, Johan; Van der Merwe, A. J.English: The improvement of quality has become a very important part of any manufacturing process. Since variation observed in a process is a function of the quality of the manufactured items, estimating variance components and tolerance intervals present a method for evaluating process variation. As apposed to confidence intervals that provide information concerning an unknown population parameter, tolerance intervals provide information on the entire population, and, therefore address the statistical problem of inference about quantiles and other contents of a probability distribution that is assumed to adequately describe a process. According to Wolfinger (1998), the three kinds of commonly used tolerance intervals are, the ( ; ) tolerance interval (where is the content and is the confidence), the - expectation tolerance interval (where is the expected coverage of the interval) and the fixed - in - advance tolerance interval in which the interval is held fixed and the proportion of process measurements it contains, is estimated. Wolfinger (1998) presented a simulation based approach for determining Bayesian tolerance intervals in the case of the balanced one - way random effects model. In this thesis, the Bayesian simulation method for determining the three kinds of tolerance intervals as proposed by Wolfinger (1998) is applied for the estimation of tolerance intervals in a balanced univariate normal model, a balanced one - way random effects model with standard N(0; 2 " ) measurement errors, a balanced one - way random effects model with student t - distributed measurement errors and a balanced two - factor nested random effects model. The proposed models will be applied to data sets from a variety of fields including flatness measurements measured on ceramic parts, measuring the amount of active ingredient found in medicinal tablets manufactured in small batches, measurements of iron concentration in parts per million determined by emission spectroscopy and a South - African data set collected at SANS Fibres (Pty.) Ltd. concerned with measuring the percentage increase in length before breaking of continuous filament polyester. In addition, methods are proposed for comparing two or more quantiles in the case of the balanced univariate normal model. Also, the Bayesian simulation method proposed by Wolfinger (1998) for the balanced one - way random effects model will be extended to include the estimation of tolerance intervals for averages of observations from new or unknown batches. The Bayesian simulation method proposed for determining tolerance intervals for the balanced one - way random effects model with student t - distributed measurement errors will also be used for the detection of possible outlying part measurements. One of the main advantages of the proposed Bayesian approach, is that it allows explicit use of prior information. The use of prior information for a Bayesian analysis is however widely criticized, since common non - informative prior distributions such as a Jeffreys’ prior can have an unexpected dramatic effect on the posterior distribution. In recognition of this problem, it will also be shown that the proposed non - informative prior distributions for the quantiles and content of fixed - in - advance tolerance intervals in the cases of the univariate normal model, the proposed random effects model for averages of observations from new or unknown batches and the balanced two - factor nested random effects model, are reference priors (as proposed by Berger and Bernardo (1992c)) as well as probability matching priors (as proposed by Datta and Ghosh (1995)). The unique and flexible features of the Bayesian simulation method were illustrated since all mentioned models performed well for the determination of tolerance intervals.Item Open Access Extending the reach of sequential regression multiple imputation(University of the Free State, 2015-06) Von Maltitz, Michael Johan; Raghunathan, T. E.; Schall, R.; Van der Merwe, A. J.English: The purpose of this thesis is twofold. Firstly, it reviews a signi cant portion of literature concerning multiple imputation and, in particular, sequential regression multiple imputation, and summarises this information, thereby allowing a reader to gain in-depth knowledge of this research eld. Secondly, the thesis delves into one particular novel topic in sequential regression multiple imputation. The latter objective, of course, is not truly possible without the former, since the deeper the review of multiple imputation, the more likely it will be to identify and solve pressing concerns in the sequential regression multiple imputation sub eld. The literature review will show that there is room in imputation research for work on a robust model for the sequential regression multiple imputation algorithm. This thesis pays particular attention to this robust model, formulating its estimation procedure within the context of sequential regression multiple imputation of continuous data, attempting to discover a statistic that would show when to use the robust model over the regular Normal speci cation, and then implementing the robust model in another estimation algorithm that might allow for better imputation of ordinal data. This thesis contributes to `extending the reach of sequential regression multiple imputation' in two ways. Firstly, it is my wish for users of public data sets, particularly in South Africa, to become familiar with the (now internationally standard) topics presented in the rst half of this thesis. The only way to start publicising sequential regression multiple imputation in South Africa is to lay out the evidence for and against this procedure in a logical manner, so that any reader of this thesis might be able to understand the procedures for analysing multiply imputed data, or tackle one of the many research problems uncovered in this text. In this way, this thesis will extend the reach of sequential regression multiple imputation to many more South African researchers. Secondly, by working on a new robust model for use in the sequential regression multiple imputation algorithm, this thesis strengthens the sequential regression multiple imputation algorithm by extending its reach to incomplete data that is not necessarily Normally distributed, be it due to heavy tails, or inherent skewness, or both.