Why R? 2020 Text Mining Hackathon Challenge 2: Segmentation

For this challenge, we used historical data of some comments from Hacker News to create meaningful segments of users.

Challenge 2 - “Segmentation”

Based on historical data of some comments create meaningful segments of users. Can you propose any statistical measures of goodness of fit to describe the quality of the segmentation solution?

Below we present some inspirations for potential characteristics that may eventually differentiate segments:

What is the sentiment of comments made by each group?

What are the common words-association within each group based on comments?

What are the keywords of titles of articles under which comments are made?

What amount of comments is done by which segment?

What are the characteristics of each group?

What articles do they comment about and what do they write about?

What is the high-level summary of groups?

Please keep in mind a segmentation solution should have balanced segments sizes and meaningful stories behind the groups of users.

More info: github.com/WhyR2020/hackathon

Technologies

Solution

We submitted our solution as a presentation video where we talked about each step that we took to segment the users and some insights that we discovered when exploring the data.

Video: https://youtu.be/B-cEalcxfb4

PowerPoint Presentation: Presentation.pptx

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
data		data
model		model
plot		plot
.gitignore		.gitignore
00_clean.R		00_clean.R
01_eda.R		01_eda.R
02_topic_modeling.R		02_topic_modeling.R
03_sentiment_analysis.R		03_sentiment_analysis.R
04_amount_of_content.R		04_amount_of_content.R
05_articles.R		05_articles.R
05_word_frequencies.R		05_word_frequencies.R
Presentation.pptx		Presentation.pptx
README.md		README.md
whyR_challenge2.Rproj		whyR_challenge2.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Why R? 2020 Text Mining Hackathon Challenge 2: Segmentation

Challenge 2 - “Segmentation”

Technologies

Solution

About

Releases

Packages

Contributors 2

Languages

mwkyuen/whyR_challenge2

Folders and files

Latest commit

History

Repository files navigation

Why R? 2020 Text Mining Hackathon Challenge 2: Segmentation

Challenge 2 - “Segmentation”

Technologies

Solution

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages