A comparison of sentiment analysis techniques in a parallel and distributed NoSQL environment

dc.contributor.advisorKotze, J. E.
dc.contributor.advisorDollman, G. J.
dc.contributor.authorVan der Linde, Ian Daniel
dc.date.accessioned2020-12-07T08:15:40Z
dc.date.available2020-12-07T08:15:40Z
dc.date.issued2020-04
dc.description.abstractSentiment analysis has seen a revival due to the advent of social media platforms such as Facebook and Twitter. The data posted on these platforms can be mined for valuable insights into customer relations, political unrest and product supply and demand. This information is embedded in typical Big Data, with very large volumes delivered at high velocity consisting of a wide variety of content and sources, and usually unstructured in nature. The challenge of analysing such data for decision support can be addressed through the use of sentiment analysis techniques in distributed environments designed to process and store large amounts of data in a horizontally-scalable fashion. The performance characteristics of these techniques have, however, hardly been studied in distributed environments, and the impact of cluster size on such environments is largely undocumented. The aim of this research was to investigate the accuracy and performance of four sentiment analysis approaches (a lexicon-based classifier, a Naïve-Bayes classifier, a Neural Network classifier, and a Support Vector Machine classifier) in a distributed environment with a cluster size of three to eight machines, while making use of a distributed NoSQL database backend to retrieve and store the data. The key investigations were to determine the nature of performance bottlenecks for each classifier in a distributed environment, how well each classifier scaled as more machines are added, and whether a relationship could be found between classifier accuracy and performance. It was determined that all four classifiers provide statistically significantly different accuracies, when compared pairwise and collectively. It was also found that there is no clear relationship between accuracy and resource usage (i.e., a more performant technique does not necessarily have worse accuracy).en_ZA
dc.identifier.urihttp://hdl.handle.net/11660/10863
dc.language.isoenen_ZA
dc.publisherUniversity of the Free Stateen_ZA
dc.rights.holderUniversity of the Free Stateen_ZA
dc.subjectDissertation (M.Sc. (Computer Science and Informatics))--University of the Free State, 2020en_ZA
dc.subjectSentiment analysisen_ZA
dc.subjectNoSQL databaseen_ZA
dc.subjectDocument classificationen_ZA
dc.subjectParallel computingen_ZA
dc.subjectDistributed computingen_ZA
dc.subjectEmpirical analysisen_ZA
dc.titleA comparison of sentiment analysis techniques in a parallel and distributed NoSQL environmenten_ZA
dc.typeDissertationen_ZA
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
VanDerLindeID.pdf
Size:
2.84 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.76 KB
Format:
Item-specific license agreed upon to submission
Description: