Skip to content

Documentation

tianyiwangnova edited this page Feb 3, 2021 · 29 revisions

ReviewMiner

class ReviewMiner(df: pd.DataFrame = None, id_column: str = None, review_column: str = None)


Parameters:

df: pd.DataFrame, default=None

a data frame where each row is a comment/review; The data frame should have at least an ID column that stores the unique IDs of the comments, and a review column where the actual comments/reviews are stored. You can initialize the class without df if you just want to use some of its methods to analyze external datasets. You can assign values to df later by <class>.df = <your_data_frame>.

id_column: str, default=None

the name of the column that stores the unique IDs of the comments.

review_column: str, default=None

the name of the column where the actual comments/reviews are stored.


Methods:

  • one_time_analysis(report_interval: int = None)

One time analysis to display popular aspects and opinions, distribution of sentiment scores of each comment, sentiment scores for common aspects, and aspects with the most negative comments.

  • Parameters:

report_interval: int, default=None

It might take quite a while to extract the aspects and opinions if the dataset is very large. When extracting all the aspects and opinions, the function will report progress for every report_interval comments. When there're more than 500 comments and there's no specified report interval, the function will report progress every 10% of the comments. When there's no more 500 comments and no specified report interval, the function will only report when it finishes for all the comments.

  • aspect_extractor(sentence: str)

Extract aspects (noun phrases and nouns) from a sentence

  • Parameters:

sentence: int

The sentence for analyzing

  • Returns:

candidate_aspects: list

a list of aspects in the sentence

  • aspect_opinion_for_one_comment(comment: str)

Extract aspects and opinions for one comment (which can consist of many sentences)

  • Parameters:

comment: int

The sentence for analyzing

  • Returns:

aspect_opinion_dict: dict

a dictionary with the aspects as keys and the opinions wrapped up as a single string of words separated with ' ' e.g. {'bedroom': 'sunny spacious', 'wardrobe': 'beautiful'}

  • aspect_opinon_for_all_comments(report_interval: int)

  • Parameters:

report_interval: int, default=None

It might take quite a while to extract the aspects and opinions if the dataset is very large. When extracting all the aspects and opinions, the function will report progress for every report_interval comments. When there're more than 500 comments and there's no specified report interval, the function will report progress every 10% of the comments. When there's no more 500 comments and no specified report interval, the function will only report when it finishes for all the comments.

Clone this wiki locally