Mathematical Statistics and Actuarial Science
Permanent URI for this community
Browse
Browsing Mathematical Statistics and Actuarial Science by Issue Date
Now showing 1 - 20 of 26
Results Per Page
Sort Options
Item Open Access Parametric and nonparametric Bayesian statistical inference in animal science(University of the Free State, 2000-11) Pretorius, Albertus Lodewikus; Van der Merwe, Abrie J.Chapter 1 illustrated an extension of the Gibbs sampler to solve problems arising in animal breeding theory. Formulae were derived and presented to implement the Gibbs sampler where-after marginal densities, posterior means, modes and credibility intervals were obtained from the Gibbs sampler. In the Bayesian Method of Moment chapter we have illustrated how this approach, based on a few relatively weak assumptions, is used to obtain maximum entropy densities, realized error terms and future values of the parameters for the mixed linear model. Given the data, it enables researchers to compute post data densities for parameters and future observations if the form of the likelihood function is unknown. On introducing and proving simple assumptions relating to the moments of the realized error terms and the future, as yet unobserved error terms, we derived post-data moments of parameters and future values of the dependent variable. Using these moments as side conditions, proper maxent densities for the model parameters were derived and could easily be computed. It was also shown that in the computed example, where use was made of the Gibbs sampler to compute finite sample post-data parameter densities, some BMOM maxent densities were very similar to the traditional Bayesian densities, whilst others were not. It should be appreciated that the BMOM approach yielded useful inverse inferences without using assumed likelihood functions, prior densities for their parameters and Bayes' theorem, also it was the case that the BMOM techniques extended in the present thesis to the mixed linear model provided valuable and significant solutions in applying traditional likelihood or Bayesian analysis in animal breeding problems. The important contribution of Charter 3 and 4 revolved around the nonparametrie modeling of the random effects. We have applied a general technique for Bayesian nonparametries to this important class of models, the mixed linear model for animal breeding experiments. Our technique involved specifying a non parametric prior for the distribution of the random effects and a Dirichlet process prior on the space of prior distributions for that nonparametric prior. The mixed linear model was then fitted with a Gibbs sampler, which turned an analytical intractable multidimensional integration problem into a feasible numerical one, overcoming most of the computational difficulties usually experience with the Dirichlet process. This proposed procedure also represented a new application of the mixture of Dirichlet process model to problems arising from animal breeding experiment. The application to and discussion of the breeding experiment from Kenya was helpful for understanding the importance and utility of the Dirichlet process, and inference for all the mixed linear model parameters. However, as mentioned before, a substantial statistical issue that still remains to be tackled is the great discrepancy between resulting posterior densities of the random effects as the value of the precision parameter, M changes. We believe that Bayesian nonparametries have much to offer, and can be applied to a wide range of statistical procedures. In addition to the Dirichlet Process Prior, we will look in the future at other non parametric priors like the Pólya tree priors and Bernoulli trips. Whilst our feeling in the final chapter was that study of performance of non-informative was certainly to be encouraged, we have found the group reference priors to generally be high satisfactory, and felt reasonably confident in using them in situations in which further study was impossible. Results from the different theorems yielded that the group orderings of the mixed model parameters are very important since different orderings will frequently result in different reference priors. This dependencél of the reference prior on the group chosen and their ordering was unavoidable. Our motivation and idea for the reference prior was basically to choose the prior, which in a certain asymptotic sense maximized the information in the posterior that was provided by the data. The thesis has surveyed a range of current research in the area of Bayesian parametric and nonparametrie inference in animal science. The work is ongoing and several problems remain unresolved. In particular, more work is required in the following areas: a full Bayesian nonparametrie analysis involving covariate information; multivariate priors based on stochastic processes; multivariate error models involving Pólya trees; developing exchangeable processes to cover a larger class of problems and nonparametric sensitivity analysis.Item Open Access Aspects of Bayesian change-point analysis(University of the Free State, 2000-11) Schoeman, Anita Carina; Groenewald, P. C. N.English: In chapter one we looked at the nature of structural change and defined structural change as a change in one or more parameters of the model in question. Bayesian procedures can be applied to solve inferential problems of structural change. Among the various methodological approaches within Bayesian inference, emphasis is put on the analysis of the posterior distribution itself, since the posterior distribution can be used for conducting hypothesis testing as well as obtaining a point estimate. The history of structural change in statistics, beginning in the early 1950's, is also discussed. Furthermore the Bayesian approach to hypothesis testing was developed by Jeffreys (1935, 1961), where the centerpiece was a number, now called the Bayes factor, which is the posterior odds of the null hypothesis when the prior probability on the null is one-half. According to Kass and Raftery (1993) this posterior odds = Bayes factor x prior odds and the Bayes factor is the ratio of the posterior odds of Hl to its prior odds, regardless of the value of the prior odds. The intrinsic and fractional Bayes factors are defined and some advantages and disadvantages of the IBF's are discussed. In chapter two changes in the multivariate normal model are considered. Assuming that a change has taken place, one will want to be able to detect the change and to estimate its position as well as the other parameters of the model. To do a Bayesian analysis, prior densities should be chosen. Firstly the hyperparameters are assumed known, but as this is not. usually true, vague improper priors are used (while the number of change-point.s is fixed). Another way of dealing with the problem of unknown hyperparameters is to use a hierarchical model where the second stage priors are vague. We also considered Gibbs sampling and gave the full conditional distributions for all the cases. The three cases that are studied is (1) a change in the mean with known or unknown variance, (2) a change in the mean and variance by firstly using independent prior densities on the different variances and secondly assuming the variances to be proportional and (3) a change in the variance. The same models above are also considered when the number of change-points are unknown. In this case vague priors are not appropriate when comparing models of different dimensions. In this case we revert to partial Bayes factors, specifically the intrinsic and fractional Bayes factors, to obtain the posterior probabilities of the number of change-points. Furthermore we look at component analysis, i.e. determining which components of a multivariate variable are mostly responsible for the changes in the parameters. The univariate case is then also considered in more detail, including multiple model comparisons and models with auto correlated errors. A summary of approaches in the literature as well as four examples are included. In chapter three changes in the linear model, with (1) a change in the regression coefficient and a constant variance, (2) a change in only the variance and (3) a change in the regression coefficient and the variance, are considered. Bayes factors for the above mentioned cases, multiple change-points, component analysis, switchpoint (continuous change-point) and auto correlation are included, together with seven examples. In chapter four changes in some other standard models are considered. Bernoulli type experiments include the Binomial model, the Negative binomial model, the Multinomial model and the Markov chain model. Exponential type models include the Poisson model, the Gamma model and the Exponential model. Special cases of the Exponential model include the left truncated exponential model and the Exponential model with epidemic change. In all cases the partial Bayes factor is used to obtain posterior probabilities when the number of change-points is unknown. Marginal posterior densities of all parameters under the change-point model are derived. Eleven examples are included. In chapter five change-points in the hazard rate are studied. This includes an abrupt change in a constant hazard rate as well as a change from a decreasing hazard rate to a constant hazard rate or a change from a constant hazard rate to an increasing hazard rate. These hazard rates are obtained from combinations of Exponential and Weibull density functions. In the same way a bathtub hazard rate can also be constructed. Two illustrations are given. Some concluding remarks are made in chapter six, with discussions of other approaches in the literature and other possible applications not dealt with in this study.Item Open Access A Bayesian analysis of multiple interval-censored failure time events with application to AIDS data(University of the Free State, 2003-05) Mokgatlhe, Lucky; Groenewald, C. N.; De Waal, Daniel J.English: The measure of time to event (failure) for units on longitudinal clinical visits cannot always be ascertained exactly. Instead only time intervals within which the event occurred may be recorded. That being the case, each unit's failure will be described by a single interval resulting in grouped interval data over the sample. Yet, due to non-compliance to visits by some units, failure will be described by endpoints within which the event has occurred. These endpoints may encompass several intervals, hence overlapping intervals across units. Furthermore, some units may not realize the event of interest within the preset duration of study, hence are censored. Finally, several events of interest can be investigated on a single unit resulting in several failure times that inevitably are dependent. All these prescribe an interval-censored survival data with multiple-failure times. Three models of analysing interval-censored survival data with two failure times were applied to four sets of data. For the distribution free methods, Cox's hazard with either a log-log transform or logit transform on the baseline conditional survival probabilities was used to derive the likelihood. The Independence assumption model (lW) work under the assumption that the lifetimes are independent and any dependence exists through the use of common covariates. The second model that do not necessarily assume independence, computes the joint failure probabilities for two lifetimes by Bayes' rule of conditioning on the interval of failure for one lifetime, hence Conditional Bivariate model (CB). The use of Clayton and Farley-Morgenstern bivariate Copulas (CC) with inbuilt dependence parameter was the other model. For parametric models the IW and CC methods were applied to the data sets on the assumption that the marginal distribution of the lifetimes is Weibull. The traditional classical estimation method of Newton-Raphson was used to find optimum parameter estimates and their variances stabilized using a sandwich estimator, where possible. Bayesian methods combine the data with prior information. Thus for either transforms, two proper priors were derived, of which their combination with the likelihood resulted in a posterior function. To estimate the entire distribution of a parameter from non-standard posterior functions, two Markov Chain Monte Carlo (MCMC) methods were used. The Gibbs Sampler method samples in turn observations from the conditional distribution of a parameter in question, while holding other parameters constant. For intractably complex posterior functions, the Metropolis-Hastings method of sampling vectors of parameter values in blocks from a Multivariate Normal proposal density was used. The analysis of ACTG175data revealed that increase in levels of HIV RNA precede decline in CD4 cell counts. There is a strong dependence between the two failure times, hence restricting the use of the independence model. The most preferred models are using copulas and the conditional bivariate model. It was shown that ARV's actually improves a patient's lifetime at varying rates, with combination treatment performing better. The worrying issue is the resistance that HIV virus develops against the drugs. This is evidenced by the adverse effect the previous use of ARV's has on patients, in that a new drug used on them has less effect. Finally it is important that patients start therapy at early stages since patients displaying signs of AIDS at entry respond negatively to drugs.Item Open Access Hierarchical Bayesian modelling for the analysis of the lactation of dairy animals(University of the Free State, 2006-03) Lombaard (née Viljoen), Carolina Susanna; Groenewald, P. C. N.English: This thesis was written with the aim of modelling the lactation process in dairy cows and goats by applying a hierarchical Bayesian approach. Information on cofactors that could possibly affect lactation is included in the model through a novel approach using covariates. Posterior distributions of quantities of interest are obtained by means of the Markov chain Monte Carlo methods. Prediction of future lactation cycle(s) is also performed. In chapter one lactation is defined, its characteristics considered, the factors that could possibly influence lactation mentioned, and the reasons for modelling lactation explained. Chapter two provides a historical perspective to lactation models, considers typical lactation curve shapes and curves fitted to the lactation composition traits fat and protein of milk. Attention is also paid to persistency of lactation. Chapter three considers alternative methods of obtaining total yield and producing Standard Lactation Curves (SLAC’s). Attention is paid to methods used in fitting lactation curves and the assumptions about the errors. In chapter four the generalised Bayesian model approach used to simultaneous ly model more than one lactation trait, while also incorporating information on cofactors that could possibly influence lactation, is developed. Special attention is paid not only to the model for complete data, but also how modelling is adjusted to make provision for cases where not all lactation cycles have been observed for all animals, also referred to as incomplete data. The use of the Gibbs sampler and the Metropolis-Hastings algorithm in determining marginal posterior distributions of model parameters and quantities that are functions of such parameters are also discussed. Prediction of future lactation cycles using the model is also considered. In chapter five the Bayesian approach together with the Wood model, applied to 4564 lactation cycles of 1141 Jersey cows, is used to illustrate the approach to modelling and prediction of milk yield, percentage of fat and percentage of protein in milk composition in the case of complete data. The incorporation of cofactor information through the use of the covariate matrix is also considered in greater detail. The results from the Gibbs sampler are evaluated and convergence there-of investigated. Attention is also paid to the expected lactation curve characteristics as defined by Wood, as well as obtaining the expected lactation 254 curve of one of the levels of a cofactor when the influence of the other cofactors on the lactation curve has be eliminated. Chapter six considers the use of the Bayesian approach together with the general exponential and 4-parameter Morant model, as well as an adaptation of a model suggested by Wilmink, in modelling and predicting milk yield, fat content and protein content of milk for the Jersey data. In chapter seven a diagnostic comparison by means of Bayes factors of the results from the four models in the preceding two chapters, when used together with the Bayesian approach, is performed. As a result the adapted form of the Wilmink model fared best of the models considered! Chapter eight illustrates the use of the Bayesian approach, together with the four lactation models considered in this study, to predict the lactation traits for animals similar to, but not contained in the data used to develop the respective models. In chapter nine the Bayesian approach together with the Wood model, applied to 755 lactation cycles of 493 Saanen does collected during either or both of two consecutive year, is used to illustrate the approach to modelling and predicting milk yield, percentage of fat and percentage of protein in milk in the case of incomplete data. Chapter ten provides a summary of the results and a perspective of the contribution of this research to lactation modelling.Item Open Access On the use of extreme value theory in energy markets(University of the Free State, 2007-05-09) Micali, V.; De Waal, D.English: The thesis intent is to provide a set of statistical methodologies in the field of Extreme Value Theory (EVT) with a particular application to energy losses, in Gigawatt-hours (GWh) experienced by electrical generating units (GU’s). Due to the complexity of the energy market, the thesis focuses on the volume loss only and does not expand into the price, cost or mixes thereof (although the strong relationship between volume and price is acknowledged by some initial work on the energy price [SMP] is provided in Appendix B) Hence, occurrences of excessive unexpected energy losses incurred by these GU’s formulate the problem. Exploratory Data Analysis (EDA) structures the data and attempts at giving an indication on the categorisation of the excessive losses. The size of the GU failure is also investigated from an aggregated perspective to relate to the Generation System. Here the effect of concomitant variables (such as the Load Factor imposed by the market) is emphasised. Cluster Analysis (2-Way Joining) provided an initial categorising technique. EDA highlights the shortfall of a scientific approach to determine the answer to the question at when is a large loss sufficiently large that it affects the System. The usage of EVT shows that the GWh Losses tend to behave as a variable in the Fréchet domain of attraction. The Block Maxima (BM) and Peak-Over-Threshold (POT), the latter as semi and full parametric, methods are investigated. The POT methodologies are both applicable. Of particular interest is the Q-Q plots results on the semiparametric POT method, which yielded results that fit the data satisfactorily (pp 55-56). The Generalised Pareto Distribution (GPD) models well the tail of the GWh Losses above a threshold under the POT full parametric method. Different methodologies were explored in determining the parameters of the GPD. The method of 3-LM (linear combinations of Probability Weighted Moments) is used to arrive at initial estimates of the GPD parameters. A GPD is finally parameterised for the GWh Losses above 766 GWh. The Bayesian philosophy is also utilised in this thesis as it provides a predictive distribution of (high quantiles) the large GWh Losses. Results are found in this part of the thesis in so far that it utilises the ratio of the Mean Excess Function (the expectation of a loss above a certain threshold) over its probability of exceeding the threshold as an indicator to establish the minimum of this ratio. The technique was developed for the GPD by using the Fisher Information Matrix (FIM) and the Delta-Method. Prediction of high quantiles were done by using Markov Chain Monte Carlo (MCMC) and eliciting the GPD Maximal Data Information (MDI) prior. The last EVT methodology investigated in the thesis is the one that uses the Dirichlet process and the method of Negative Differential Entropy (NDE). The thesis also opened new areas of pertinent research.Item Open Access Actuarial risk management of investment guarantees in life insurance(University of the Free State, 2010-11) Bekker, Kobus Nel; Dhaene, Jan; Finkelstein, MaximInvestment guarantees in life insurance business have generated a lot of research in recent years due to the earlier mispricing of such products. These guarantees generally take the form of exotic options and are therefore difficult to price analytically, even in a simplified setting. A possible solution to the risk management problem of investment guarantees contingent on death and survival is proposed through the use of a conditional lower bound approximation of the corresponding embedded option value. The derivation of the conditional lower bound approximation is outlined in the case of regular premiums with asset-based charges and the implementation is illustrated in a Black-Scheles-Merton setting. The derived conditional lower bound approximation also facilitates verifying economic scenario generator based pricing and valuation, as well as sensitivity measures for hedging solutions.Item Open Access Bayesian inference for linear and nonlinear functions of Poisson and binomial rates(University of the Free State, 2012) Raubenheimer, Lizanne; Van der Merwe, A. J.This thesis focuses on objective Bayesian statistics, by evaluating a number of noninformative priors. Choosing the prior distribution is the key to Bayesian inference. The probability matching prior for the product of different powers of k binomial parameters is derived in Chapter 2. In the case of two and three independently distributed binomial variables, the Jeffreys, uniform and probability matching priors for the product of the parameters are compared. This research is an extension of the work by Kim (2006), who derived the probability matching prior for the product of k independent Poisson rates. In Chapter 3 we derive the probability matching prior for a linear combination of binomial parameters. The construction of Bayesian credible intervals for the difference of two independent binomial parameters is discussed. The probability matching prior for the product of different powers of k Poisson rates is derived in Chapter 4. This is achieved by using the differential equation procedure of Datta & Ghosh (1995). The reference prior for the ratio of two Poisson rates is also obtained. Simulation studies are done to com- pare different methods for constructing Bayesian credible intervals. It seems that if one is interested in making Bayesian inference on the product of different powers of k Poisson rates, the probability matching prior is the best. On the other hand, if we want to obtain point estimates, credibility intervals or do hypothesis testing for the ratio of two Poisson rates, the uniform prior should be used. In Chapter 5 the probability matching prior for a linear contrast of Poisson parameters is derived, this prior is extended in such a way that it is also the probability matching prior for the average of Poisson parameters. This research is an extension of the work done by Stamey & Hamilton (2006). A comparison is made between the confidence intervals obtained by Stamey & Hamilton (2006) and the intervals derived by us when using the Jeffreys and probability matching priors. A weighted Monte Carlo method is used for the computation of the Bayesian credible intervals, in the case of the proba- bility matching prior. In the last section of this chapter hypothesis testing for two means is considered. The power and size of the test, using Bayesian methods, are compared to tests used by Krishnamoorthy & Thomson (2004). For the Bayesian methods the Jeffreys prior, probability matching prior and two other priors are used. Bayesian estimation for binomial rates from pooled samples are considered in Chapter 6, where the Jeffreys prior is used. Bayesian credibility intervals for a single proportion and the difference of two binomial proportions estimated from pooled samples are considered. The results are compared This thesis focuses on objective Bayesian statistics, by evaluating a number of noninformative priors. Choosing the prior distribution is the key to Bayesian inference. The probability matching prior for the product of different powers of k binomial parameters is derived in Chapter 2. In the case of two and three independently distributed binomial variables, the Jeffreys, uniform and probability matching priors for the product of the parameters are compared. This research is an extension of the work by Kim (2006), who derived the probability matching prior for the product of k independent Poisson rates. In Chapter 3 we derive the probability matching prior for a linear combination of binomial parameters. The construction of Bayesian credible intervals for the difference of two independent binomial parameters is discussed. The probability matching prior for the product of different powers of k Poisson rates is derived in Chapter 4. This is achieved by using the differential equation procedure of Datta & Ghosh (1995). The reference prior for the ratio of two Poisson rates is also obtained. Simulation studies are done to com- pare different methods for constructing Bayesian credible intervals. It seems that if one is interested in making Bayesian inference on the product of different powers of k Poisson rates, the probability matching prior is the best. On the other hand, if we want to obtain point estimates, credibility intervals or do hypothesis testing for the ratio of two Poisson rates, the uniform prior should be used. In Chapter 5 the probability matching prior for a linear contrast of Poisson parameters is derived, this prior is extended in such a way that it is also the probability matching prior for the average of Poisson parameters. This research is an extension of the work done by Stamey & Hamilton (2006). A comparison is made between the confidence intervals obtained by Stamey & Hamilton (2006) and the intervals derived by us when using the Jeffreys and probability matching priors. A weighted Monte Carlo method is used for the computation of the Bayesian credible intervals, in the case of the proba- bility matching prior. In the last section of this chapter hypothesis testing for two means is considered. The power and size of the test, using Bayesian methods, are compared to tests used by Krishnamoorthy & Thomson (2004). For the Bayesian methods the Jeffreys prior, probability matching prior and two other priors are used. Bayesian estimation for binomial rates from pooled samples are considered in Chapter 6, where the Jeffreys prior is used. Bayesian credibility intervals for a single proportion and the difference of two binomial proportions estimated from pooled samples are considered. The results are compared to those from other methods. In Chapters 7 and 8, Bayesian process control for the p - chart and the c - chart are considered. The Jeffreys prior is used for the Bayesian methods. Control chart limits, average run lengths and false alarm rates are determined. The results from the Bayesian method are compared to the results obtained from the classical (frequentist) method. Bayesian tolerance intervals for the binomial and Poisson distributions are studied in Chapter 9, where the Jeffreys prior is used.Item Open Access Bayesian tolerance intervals for variance component models(University of the Free State, 2012-01) Hugo, Johan; Van der Merwe, A. J.English: The improvement of quality has become a very important part of any manufacturing process. Since variation observed in a process is a function of the quality of the manufactured items, estimating variance components and tolerance intervals present a method for evaluating process variation. As apposed to confidence intervals that provide information concerning an unknown population parameter, tolerance intervals provide information on the entire population, and, therefore address the statistical problem of inference about quantiles and other contents of a probability distribution that is assumed to adequately describe a process. According to Wolfinger (1998), the three kinds of commonly used tolerance intervals are, the ( ; ) tolerance interval (where is the content and is the confidence), the - expectation tolerance interval (where is the expected coverage of the interval) and the fixed - in - advance tolerance interval in which the interval is held fixed and the proportion of process measurements it contains, is estimated. Wolfinger (1998) presented a simulation based approach for determining Bayesian tolerance intervals in the case of the balanced one - way random effects model. In this thesis, the Bayesian simulation method for determining the three kinds of tolerance intervals as proposed by Wolfinger (1998) is applied for the estimation of tolerance intervals in a balanced univariate normal model, a balanced one - way random effects model with standard N(0; 2 " ) measurement errors, a balanced one - way random effects model with student t - distributed measurement errors and a balanced two - factor nested random effects model. The proposed models will be applied to data sets from a variety of fields including flatness measurements measured on ceramic parts, measuring the amount of active ingredient found in medicinal tablets manufactured in small batches, measurements of iron concentration in parts per million determined by emission spectroscopy and a South - African data set collected at SANS Fibres (Pty.) Ltd. concerned with measuring the percentage increase in length before breaking of continuous filament polyester. In addition, methods are proposed for comparing two or more quantiles in the case of the balanced univariate normal model. Also, the Bayesian simulation method proposed by Wolfinger (1998) for the balanced one - way random effects model will be extended to include the estimation of tolerance intervals for averages of observations from new or unknown batches. The Bayesian simulation method proposed for determining tolerance intervals for the balanced one - way random effects model with student t - distributed measurement errors will also be used for the detection of possible outlying part measurements. One of the main advantages of the proposed Bayesian approach, is that it allows explicit use of prior information. The use of prior information for a Bayesian analysis is however widely criticized, since common non - informative prior distributions such as a Jeffreys’ prior can have an unexpected dramatic effect on the posterior distribution. In recognition of this problem, it will also be shown that the proposed non - informative prior distributions for the quantiles and content of fixed - in - advance tolerance intervals in the cases of the univariate normal model, the proposed random effects model for averages of observations from new or unknown batches and the balanced two - factor nested random effects model, are reference priors (as proposed by Berger and Bernardo (1992c)) as well as probability matching priors (as proposed by Datta and Ghosh (1995)). The unique and flexible features of the Bayesian simulation method were illustrated since all mentioned models performed well for the determination of tolerance intervals.Item Open Access Performance of first-year accounting students: does time perspective matter?(University of the Free State, 2013) Joubert, Hanli; Viljoen, Marianne; Schall, RobertEnglish: Academic failure of first-year accounting students is a national and international problem. Existing research is inconclusive regarding the causes for the failure and does not make provision for the possible influence of dominant time perspectives on performance in accounting. This article investigates whether time perspective has an effect on the performance of first-year accounting students. A quantitative non-experimental predictive multivariate design is used and confounding variables are taken into consideration. The results of the study indicate significant relationships between performance in first-year accounting and gender, age and a past-negative time perspective. The most significant result of this study is that a past-negative time perspective, together with an unfavourable psychosocial background, might have led to failure in first-year accounting. It is suggested that students with a negative time perspective be identified and encouraged to participate in support programmes at the university.Item Open Access Regularised iterative multiple correspondence analysis in multiple imputation(University of the Free State, 2013-07) Nienkemper, Johané; Von Maltitz, M. J.;English: Non-responses in survey data are a prevalent problem. Various techniques for the handling of missing data have been studied and published. The application of a regularised iterative multiple correspondence analysis (RIMCA) algorithm in single imputation (SI) has been suggested for the handling of missing data in survey analysis. Multiple correspondence analysis (MCA) as an imputation procedure is appropriate for survey data, since MCA is concerned with the relationships among the variables in the data. Therefore, missing data can be imputed by exploiting the relationship between observed and missing data. The RIMCA algorithm expresses MCA as a weighted principal component analysis (PCA) of a data triplet ( ), which represents a weighted data matrix, a metric and a diagonal matrix containing row masses, respectively. Performing PCA on a triplet involves the generalised singular value decomposition of the weighted data matrix . Here, standard singular value decomposition (SVD) will not suffice, since constraints are imposed on the rows and columns because of the weighting. The success of this algorithm lies in the fact that all eigenvalues are shrunk and the last components are omitted; thus a ‘double shrinkage’ occurs, which reduces variance and stabilises predictions. RIMCA seems to overcome overfitting and underfitting problems with regard to categorical missing data in surveys. The idea of applying the RIMCA algorithm in MI was appealing, since advantages of MI occur over SI, such as an increase in the accuracy of estimations and the attainment of valid inferences when combining multiple datasets. The aim of this study was to establish the performance of RIMCA in MI. This was achieved by two objectives: to determine whether RIMCA in MI outperforms RIMCA in SI and to determine the accuracy of predictions made from RIMCA in MI as an imputation model. Real and simulated data were used. A simulation protocol was followed creating data drawn from multivariate Normal distributions with both high and low correlation structures. Varying the percentages of missing values in the data and missingness mechanisms (missing completely at random (MCAR) and missing at random (MAR)), as is done by Josse et al. (2012), were created in the data. The first objective was achieved by applying RIMCA in both SI and MI to real data and simulated data. The performance of RIMCA in SI and MI were compared with regard to the obtained mean estimates and confidence intervals. In the case of the real data, the estimates were compared to the mean estimates of the incomplete data, whereas for the simulated data the true mean values and confidence intervals could be compared to the estimates obtained from the imputation procedures. The second objective was achieved by calculating the apparent error rates of predictions made by the RIMCA algorithm in SI and MI in simulated datasets. Along with the apparent error rates, approximate overall success rates were calculated in order to establish the accuracy of imputations made by the SI and MI. The results of this study show that the confidence intervals provided by MI are wider in most of the cases, which confirmed the incorporation of additional variance. It was found that for some of the variables the SI procedures were statistically different from the true confidence intervals, which shows that SI was not suitable in these instances for imputation. Overall the mean estimates provided by MI were closer to the true values, with respect to the simulated and real data. A summary of the bias, mean square errors and coverage for the imputation techniques over a thousand simulations were provided, which also confirmed that RIMCA in MI was a better model than RIMCA in SI in the contexts provided by this research.Item Open Access Modelling electricity demand in South Africa(University of the Free State, 2014-01) Sigauke, Caston; Chikobvu, DelsonEnglish: Peak electricity demand is an energy policy concern for all countries throughout the world, causing blackouts and increasing electricity tariffs for consumers. This calls for load curtailment strategies to either redistribute or reduce electricity demand during peak periods. This thesis attempts to address this problem by providing relevant information through a frequentist and Bayesian modelling framework for daily peak electricity demand using South African data. The thesis is divided into two parts. The first part deals with modelling of short term daily peak electricity demand. This is done through the investigation of important drivers of electricity demand using (i) piecewise linear regression models, (ii) a multivariate adaptive regression splines (MARS) modelling approach, (iii) a regression with seasonal autoregressive integrated moving average (Reg-SARIMA) model (iv) a Reg-SARIMA model with generalized autoregressive conditional heteroskedastic errors (Reg-SARIMA-GARCH). The second part of the thesis explores the use of extreme value theory in modelling winter peaks, extreme daily positive changes in hourly peak electricity demand and same day of the week increases in peak electricity demand. This is done through fitting the generalized Pareto, generalized single Pareto and the generalized extreme value distributions. One of the major contributions of this thesis is quantification of the amount of electricity which should be shifted to off peak hours. This is achieved through accurate assessment of the level and frequency of future extreme load forecasts. This modelling approach provides a policy framework for load curtailment and determination of the number of critical peak days for power utility companies. This has not been done for electricity demand in the context of South Africa to the best of our knowledge. The thesis further extends the autoregressive moving average-exponential generalized autoregressive conditional heteroskedasticity model to an autoregressive moving average exponential generalized autoregressive conditional heteroskedasticity-generalized single Pareto distribution. The benefit of this hybrid model is in risk modelling of under and over demand predictions of peak electricity demand. Some of the key findings of this thesis are (i) peak electricity demand is influenced by the tails of probability distributions as well as by means or averages, (ii) electricity demand in South Africa rises significantly for average temperature values below 180C and rises slightly for average temperature values above 220C and (iii) modelling under and over demand electricity forecasts provides a basis for risk assessment and quantification of such risk associated with forecasting uncertainty including demand variability.Item Open Access The capability approach and measurement: operationalizing capability indicators in higher education(University of the Free State, 2015-01) Ruswa, Anesu; Walker, Melanie; Chikobvu, DelsonThe thesis contributes to work in the field of operational measurement of Human Capabilities. Although a number of studies have examined the challenges posed in the measurement of Human Capabilities, there has not been a focus on the empirical merits of the methods and methodologies followed in identification and measurement of valuable capabilities especially in the Higher Education context. To this end, this study provides insights into the identification of valuable student capabilities through an exposition of the methods which can be followed to create and measure robust indicators of student capabilities. A quantitative inquiry determines which Human capabilities students in Higher Education institutions have reason to value and the results of this process are compared to a theoretical student capabilities literature. The thesis advocates for a human development approach over a human capital approach in evaluating the wellbeing of students. The study is significant in that it aids policy and decision makers in Higher Education to identify what students value and thus be in a position to fashion curricula, programmes and policies in a way which best benefits the subjects. To achieve the above mentioned goal, the thesis draws substantially on the work of Paul Anand, Amartya Sen, Flavio Comim, Enrica Chiappero Martinetti, Ingrid Robeyns, Melanie Walker and Sabina Alkire, among others, who have researched and advanced in the field of operational measurement of human capabilities in the Higher Education environment.Item Open Access Stochastic ordering with applications to reliability theory(University of the Free State, 2015-01) Khalema, Tokelo; Finkelstein, MaximAbstract not availableItem Open Access Bayesian non-linear models for the bactericidal activity of tuberculosis drugs(University of the Free State, 2015-05) Burger, Divan Aristo; Schall, R.; Van der Merwe, A. J.Trials of the early bactericidal activity (EBA) of tuberculosis (TB) treatments assess the decline, during the first few days to weeks of treatment, in colony forming unit (CFU) count of Mycobacterium tuberculosis in the sputum of patients with smear-microscopy-positive pulmonary TB. Profiles over time of CFU data have conventionally been modeled using linear, bilinear or bi-exponential regression. This thesis proposes a new biphasic nonlinear regression model for CFU data that comprises linear and bilinear regression models as special cases, and is more exible than bi-exponential regression models. A Bayesian nonlinear mixed effects (NLME) regression model is fitted jointly to the data of all patients from clinical trials, and statistical inference about the mean EBA of TB treatments is based on the Bayesian NLME regression model. The posterior predictive distribution of relevant slope parameters of the Bayesian NLME regression model provides insight into the nature of the EBA of TB treatments; specifically, the posterior predictive distribution allows one to judge whether treatments are associated with mono-linear or bilinear decline of log(CFU) count, and whether CFU count initially decreases fast, followed by a slower rate of decrease, or vice versa. The fit of alternative specifications of residuals, random effects and prior distributions is explored. In particular, the conventional normal regression models for log(CFU) count versus time profiles are extended to provide a robust approach which accommodates outliers and potential skewness in the data. The deviance information criterion and compound Laplace-Metropolis Bayes factors are calculated to discriminate between models. The biphasic model is fitted to time to positivity data in the same way as for CFU data.Item Open Access Extending the reach of sequential regression multiple imputation(University of the Free State, 2015-06) Von Maltitz, Michael Johan; Raghunathan, T. E.; Schall, R.; Van der Merwe, A. J.English: The purpose of this thesis is twofold. Firstly, it reviews a signi cant portion of literature concerning multiple imputation and, in particular, sequential regression multiple imputation, and summarises this information, thereby allowing a reader to gain in-depth knowledge of this research eld. Secondly, the thesis delves into one particular novel topic in sequential regression multiple imputation. The latter objective, of course, is not truly possible without the former, since the deeper the review of multiple imputation, the more likely it will be to identify and solve pressing concerns in the sequential regression multiple imputation sub eld. The literature review will show that there is room in imputation research for work on a robust model for the sequential regression multiple imputation algorithm. This thesis pays particular attention to this robust model, formulating its estimation procedure within the context of sequential regression multiple imputation of continuous data, attempting to discover a statistic that would show when to use the robust model over the regular Normal speci cation, and then implementing the robust model in another estimation algorithm that might allow for better imputation of ordinal data. This thesis contributes to `extending the reach of sequential regression multiple imputation' in two ways. Firstly, it is my wish for users of public data sets, particularly in South Africa, to become familiar with the (now internationally standard) topics presented in the rst half of this thesis. The only way to start publicising sequential regression multiple imputation in South Africa is to lay out the evidence for and against this procedure in a logical manner, so that any reader of this thesis might be able to understand the procedures for analysing multiply imputed data, or tackle one of the many research problems uncovered in this text. In this way, this thesis will extend the reach of sequential regression multiple imputation to many more South African researchers. Secondly, by working on a new robust model for use in the sequential regression multiple imputation algorithm, this thesis strengthens the sequential regression multiple imputation algorithm by extending its reach to incomplete data that is not necessarily Normally distributed, be it due to heavy tails, or inherent skewness, or both.Item Open Access On some methods of reliability improvement of engineering systems(University of the Free State, 2015-08) Mangara, Bernard Tonderayi; Finkelstein, MaximThe purpose of this thesis was to study some methods of reliability improvement of engineering systems. The reason for selecting the theme “reliability improvement of engineering systems” was first to explore traditional methods of reliability improvement (that is, based on the notion that reliability could be assured by simply introducing a sufficiently high “safety factor” into the design of a component or a system) and then propose new and original concepts of reliability improvement. The latter consists of approaches, methods and best practices that are used at the design phase of a component (system) in order to minimize the likelihood (risk) that the component (system) might not meet the reliability requirements, objectives and expectations. Therefore, chapter 1 of the thesis, “Introduction to the main methods and concepts of reliability for technical systems” encompasses the introduction section and the main traditional methods available for improvement of technical / engineering systems. In chapter 2, “Reliability Component Importance Measures” two new and original concepts on reliability improvement of engineering systems are introduced. These are: 1) the study of availability importance of components in coherent systems and 2) the optimal assignment of interchangeable components in coherent multi-state systems. In chapter 3, “Cannibalization Revisited” two new and original concepts on reliability improvement of engineering systems are introduced. These are: 1) theoretical model to show the effects of cannibalization on mission time availability of systems and 2) new model for cannibalization and the corresponding example. In chapter4, “On the Improvement of Steam Power Plant System Reliability” a new and original model is developed that helps in determining the optimal maintenance strategies which will ensure maximum reliability of the coal-fired generating station. Conclusions are given, concerning the study conducted and the results thereof, at the end of each chapter. The conclusions for this thesis are annotated in chapter 5. A set of selected references that were consulted during the study performed for this doctor of philosophy thesis is provided at the end.Item Open Access Second-order estimation procedures for complete and incomplete heavy-tailed data(University of the Free State, 2016) Maribe, Gaonyalelwe; Verster, AndréhetteThis thesis investigates the second-order re ned peaks over threshold model called the Extended Pareto Distribution (EPD) introduced by Beirlant et al. (2009). Focus is placed on estimation of the Extreme Value Index (EVI). Firstly we investigate the e ectiveness of the EPD in modelling heavy-tailed distributions and compare it to the Generalized Pareto Distribution (GPD) in terms of the bias, mean squared error and variance of the EVI. This is done through a simulation study and the Maximum Likelihood (ML) method of estimation is used to make the comparison. In practice, data can be tampered by some arbitrary process or study design. We therefore investigate the performance of the EPD in estimating the EVI for heavy-tailed data under the assumption that the data is completely observable and uncontaminated, random right censored and contaminated respectively. We suggest an improved ML numerical procedure in the estimation of EPD parameters under the assumption that data is completely observable and uncontaminated. We further propose a Bayesian EPD estimator of the EVI and show through a simulation study that this estimator leads to much improved results as the ML EPD estimator. A small case study is conducted to assess the performance of the Bayesian EPD estimator and the ML EPD estimator using a real dataset from a Belgian reinsurance rm. We investigate the performance of some well known parametric and semi-parametric estimators of the EVI adapted for censoring by a simulation study and further illustrate their performance by applying them to a real survival dataset. A censored Bayesian EPD estimator for right censored data is then proposed through an altered expression of the posterior density. The censored Bayesian EPD estimator is compared with the censored ML EPD estimator through a simulation study. Behaviour of the minimum density power divergence estimator (MDPDE) is assessed at uncontaminated and contaminated distributions respectively through an exhaustive simulation study including other EPD estimators mentioned in this thesis. The comparison is made in terms of the bias and mean squared error. EVI estimates from the di erent estimators are then used to estimate quantiles, the results are reported concurrently with the EVI estimates. We illustrate the performance of all mentioned estimators on a real dataset from geopedology, in which a few abnormal soil measurements highly in uence the estimates of the EVI and high quantiles.Item Open Access Bayesian control charts based on predictive distributions(University of the Free State, 2016-01) Van Zyl, Ruaan; Van der Merwe, A. J.English: Control charts are statistical process control (SPC) tools that are widely used in the monitoring of processes, specifically taking into account stability and dispersion. Control charts signal when a significant change in the process being studied is observed. This signal can then be investigated to identify issues and to find solutions. It is generally accepted that SPC are implemented in two phases, Phase I and Phase II. In Phase I the primary interest is assessing process stability, often trying to bring the process in control by locating and eliminating any assignable causes, estimating any unknown parameters and setting up the control charts. After that the process move on to Phase II where the control limits obtained in Phase I are used for online process monitoring based on new samples of data. This thesis concentrate mainly on implementing a Bayesian approach to monitoring processes using SPC. This is done by providing an overview of some non-informative priors and then to specifically derive the reference and probability-matching priors for the common coefficient of variation, standardized mean and tolerance limits for a normal population. Using the Bayesian approach described in this thesis SPC is performed, including derivations of control limits in Phase I and monitoring by the use of runlengths and average run-lengths in Phase II for the common coefficient of variation, standardized mean, variance and generalized variance, tolerance limits for normal populations, two-parameter exponential distribution, piecewise exponential model and capability indices. Results obtained using the Bayesian approach are compared to frequentist results.Item Open Access A survey on participation and attitude to sports among undergraduate students in junior residences at the University of the Free State(University of the Free State, 2016-01) Mangoejane, Patricia Kekeletso; Chikobvu, DelsonThe main objective of this study is to assess and quantify participation in sporting activities by students and to determine the factors influencing students’ intentions to participate or not to participate in sports at the University of the Free State. The data are obtained from interviewing students participating or not participating in various sporting codes available at the University of the Free State (main campus in Bloemfontein, South Africa). A systematic random sampling technique was used as the interviewing team knocked on every fifth door in a given residence to ensure that all corners of each residence were reached. The students found at the residence at that particular time, were asked to fill in the questionnaire. Tables and charts are used for illustration of results. T-tests, F-tests, Principal component analysis, Cluster comparison analysis and Item analysis are also performed for further analysis. Three hundred and eight students (308) (61% females and 39% males) living in junior residences were interviewed for this research. The majority of participants (75%) were non-whites (blacks, coloured, and Asians); this was in line with the University of the Free State enrolment structure of the year 2011 (75% non-whites and 25% whites). The reasons provided by the participants for their participation in sporting activities were indicated as keeping fit (91%), releasing of stress (89.35%), gaining a feeling of wellbeing (83%), increasing in physical abilities (81%) and previous school sports involvement (67%). Students from second academic year upwards mostly raised the positive response that they relied on regular exercise to achieve academic success. The researcher concludes that certain variables, namely gender, age group, race, marital status preferred language of study, faculty of study, academic year of study, previous school sport participation, current sport participation, participated sporting codes, reasons for sport participation and reasons for non-sport participation for students, are the most important variables that the Kovsie Sport and management of sports, should focus on in order to encourage students to participate in sporting activities. Through sports, students are also able to interact with one another and participate in different sporting codes offered by the university.Item Open Access Exotic equity derivatives: a comparison of pricing models and methods with both stochastic volatility and interest rates(University of the Free State, 2017) Scheltema, Jaundre; Venter, Jan-PaulThe traditional Black Scholes methodology for exotic equity option pricing fails to capture the features of latent stochastic volatility and observed stochastic interest rate factors exhibited in financial markets today. The detailed study presented here shows how these shortcomings of the Black Scholes methodology have been addressed in literature by examining some of the developments of stochastic volatility models with constant and stochastic interest rates. A subset of these models, notably with models developed within the last two years, are then compared in a simulated study design against a complex Market Model. Each of the select models were chosen as “best” representatives of their respective model class. The Market Model, which is specified through a system of Stochastic Differential Equations, is taken as a proxy for real world market dynamics. All of the select models are calibrated against the Market Model using a technique known as Differential Evolution, which is a globally convergent stochastic optimiser, and then used to price exotic equity options. The end results show that the Heston-Hull-CIR Model (H2CIR) outperforms the alternative Double Heston and 4/2 Models respectively in producing exotic equity option prices closest to the Market Model. Various other commentaries are also given to assess each of the select models with respect to parameter stability, computational run times and robustness in implementation, with the final conclusions supporting the H2CIR Model in preference over the other models. Additionally a second research question is also investigated that relates to Monte Carlo pricing methods. Here the Monte Carlo pricing schemes used under the Black Scholes and other pricing methodologies is extended to present a semi-exact simulation scheme built on the results from literature. This new scheme is termed the Brownian Motion Reconstruction scheme and is shown to outperform the Euler scheme when pricing exotic equity derivatives with relatively few monitoring or option exercise dates. Finally, a minor result in this study involves a new alternative numerical method to recover transition density functions from their respective characteristic functions and is shown to be competitive against the popular Fast Fourier Transform method. It is hoped that the results in this thesis will assist investment and banking practitioners to obtain better clarity when assessing and vetting different models for use in the industry, and extend the current range of techniques that are used to price options.