Skip to content

mwkyuen/whyR_challenge2

Repository files navigation

Why R? 2020 Text Mining Hackathon Challenge 2: Segmentation

For this challenge, we used historical data of some comments from Hacker News to create meaningful segments of users.

Challenge 2 - “Segmentation”

Based on historical data of some comments create meaningful segments of users. Can you propose any statistical measures of goodness of fit to describe the quality of the segmentation solution?

Below we present some inspirations for potential characteristics that may eventually differentiate segments:

  • What is the sentiment of comments made by each group?
  • What are the common words-association within each group based on comments?
  • What are the keywords of titles of articles under which comments are made?
  • What amount of comments is done by which segment?
  • What are the characteristics of each group?
  • What articles do they comment about and what do they write about?
  • What is the high-level summary of groups?

Please keep in mind a segmentation solution should have balanced segments sizes and meaningful stories behind the groups of users.

More info: github.com/WhyR2020/hackathon

Technologies

Solution

We submitted our solution as a presentation video where we talked about each step that we took to segment the users and some insights that we discovered when exploring the data.

Video: https://youtu.be/B-cEalcxfb4

PowerPoint Presentation: Presentation.pptx

About

WhyR Hackathon 2020. Challenge 2: Segmentation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages