Opinion Mining for Software Development: A Systematic Literature Review

Paper: https://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf
Implementation: https://radimrehurek.com/gensim/models/ldamodel.html (Note: Not provided by the original authors.)
Input: corpus (collection of documents represented by vectors), number of topics
Output: an LDA model which can predict the topic of texts
Core technique: three-level hierarchical Bayesian model
Advantage: It provides well-defined inference procedures for previously unseen documents.
Limitations: (1) The dimension of topics is assumed known and fixed. (2) It allows certain words to be allocated to several different topics.

Paper: https://link.springer.com/chapter/10.1007/978-3-642-20161-5_34
Implementation: https://github.com/minghui/Twitter-LDA (Note: Not provided by the original authors.)
Input: corpus (collection of tweets represented by vectors), number of topics
Output: an Twitter-LDA model which can predict the topic of tweet
Core technique: a variant of LDA, which assumes an entire document (tweet) has only a single topic
Advantage: (1) It is designed for short tweets while standard LDA may not work well with Twitter because of the short text. (2) It removes noisy background topics and can capture more meaningful topics from tweets than LDA.
Limitations: Not mentioned in the original paper.

Paper: https://dl.acm.org/doi/10.1145/2950290.2983938
Implementation: https://www.ifi.uzh.ch/en/seal/people/panichella/tools/ARdoc.html
Input: text composed of sentences
Output: sentences classified in five categories: feature request, problem discovery, information seeking, information giving and other
Core technique: J48 classifier based textual, structural, and sentiment features
Advantage: It can correctly classify useful feedback (from a maintenance perspective) contained in app reviews with a precision ranging between 84% and 89%, a recall ranging between 84% and 89%.
Limitations: The topic categorization is coarse-grained: the intentions of the writers concerning the mentioned topics are not identified.

Paper: https://ieeexplore.ieee.org/document/8918993
Implementation: https://github.com/rafaelkallis/ticket-tagger
Input: GitHub issue
Output: a tag for the issue: bug report, enhancement, or question
Core technique: fastText model by Facebook
Advantage: It can automatically assign labels with appreciable levels of precision and recall for all the three categories.
Limitations: (1) Relatively higher numbers of false positives for the Question category and false negatives for the Enhancement class. (2) Relatively lower recall obtained for the Enhancement class.

Paper: https://dl.acm.org/doi/10.1145/2950290.2950299
Implementation: https://www.ifi.uzh.ch/en/seal/people/panichella/tools/SURFTool.html
Input: app reviews
Output: sentences classified in five categories: feature request, problem discovery, information seeking, information giving and other for different generated topics (e.g., GUI, feature/functionality)
Core technique: a two-level classification model based on concept-related dictionaries
Advantage: The usefulness of summaries generated by SURF is verified by developers.
Limitations: A graphic summary of user feedback is missing.

Paper: https://doi.org/10.1007/s10664-019-09716-7
Implementation: https://github.com/seelprojects/MARC-3.0
Input: user reviews on mobile application stores
Output: extracted bug reports and feature requests, non-functional requirements classified as Dependability, Performance, Supportability, and Usability
Core technique: dictionary-based multi-label classification
Advantage: It achieves an average precision of 70% (range [66% - 80%]) and average recall of 86% (range [69% - 98%] in identifying non-functional requirements.
Limitations: It only relies on the textual content of the reviews and their sentiment as classification features, and other types of meta-data attributes, such as the star-rating, author, app version, or submission time of the review, are not considered.

Paper: https://doi.org/10.1007/978-3-030-15538-4_4
Implementation: https://github.com/RELabUU/RE-SWOT
Input: user reviews for the reference app and its competitors
Output: extracted features classified in Strengths, Weaknesses, Threats, and Opportunities
Core technique: collocation ﬁnding algorithm
Advantage: It provides visual and interactive interface.
Limitations: (1) Trends over time are missing. (2) During feature extraction, two word collocations are not merged automatically. (3) SWOT classification can be sometimes inaccurate.

Paper: https://dl.acm.org/doi/10.1109/MSR.2019.00058
Implementation: https://github.com/DEEPTIPExtraction/DEEPTip
Input: Stack Overflow posts
Output: extracted tips on API usage
Core technique: Convolutional Neural Network
Advantage: It can identify ~85% true tips.
Limitations: Not mentioned in the paper.

Paper: https://dl.acm.org/doi/10.1109/ICSE.2019.00066
Implementation: https://pome-repo.github.io
Input: Stack Overflow posts
Output: extracted sentences in categories including community, compatibility, documentation, functional, performance, reliability, usability with a sentiment polarity tag (positive / negative)
Core technique: Predefined patterns
Advantage: Well defined aspect categories and high precision.
Limitations: Relatively low recall.

OM4SD

Supplementary Results

Opinion mining tools for artifact content analysis