Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

O+M 2024-09-23 #4900

Closed
14 tasks done
FuhuXia opened this issue Sep 23, 2024 · 5 comments
Closed
14 tasks done

O+M 2024-09-23 #4900

FuhuXia opened this issue Sep 23, 2024 · 5 comments
Assignees

Comments

@FuhuXia
Copy link
Member

FuhuXia commented Sep 23, 2024

As part of day-to-day operation of Data.gov, there are many Operation and Maintenance (O&M) responsibilities. Instead of having the entire team watching notifications and risking some notifications slipping through the cracks, we have created an O&M Triage role. One person on the team is assigned the Triage role which rotates each sprint. This is not meant to be a 24/7 responsibility, only East Coast business hours. If you are unavailable, please note when you will be unavailable in Slack and ask for someone to take on the role for that time.

Check the O&M Rotation Schedule for future planning.

Acceptance criteria

You are responsible for all O&M responsibilities this week. We've highlighted a few so they're not forgotten. You can copy each checklist into your daily report.

Daily Checklist

Note: Catalog Auto Tasks
You will need to update the chart values manually. Click the Action link in each issue and grab the values from monitor task output and check runtime.

Weekly Checklist

Monthly Checklist

ad-hoc checklist

  • audit/review applications on cloud foundry and determine what can be stopped and/or deleted.

Reference

@FuhuXia FuhuXia self-assigned this Sep 23, 2024
@FuhuXia
Copy link
Member Author

FuhuXia commented Sep 23, 2024

facebook bots have made 3 million request to /_tracking and populated catalog popular views since 3 weeks ago. A PR is created for exclude facebook traffic from the stats.

image

+++++++++++++++++++++++++++++++

UPDATE:

PR deployed. Facebook bots disappeared from "non-bot _tracking by user agent" dashboard on NewRelic.

Before: facebook bot counts for 70% of /_tracking traffic.

image

After: top /_tracking traffic are from normal user agents.

image

@FuhuXia
Copy link
Member Author

FuhuXia commented Sep 23, 2024

Harvester /harvest/nist/ continues to be stuck. Need to investigate.

image

Update:

GitHub ticket created.

Update 2:

Issue fixed. Now the job can complete, not getting stuck.

image

@FuhuXia
Copy link
Member Author

FuhuXia commented Sep 24, 2024

Source /harvest/education-json failed for the past 7 days. The source http://www.ed.gov/data.json is 404.

Update:

ed.gov site went thru some renovation. The working url is https://data.ed.gov/data.json.

Update 2

Now original http://www.ed.gov/data.json is correctly redirected to https://data.ed.gov/data.json. But we leave the source url as https://data.ed.gov/data.json.

@FuhuXia
Copy link
Member Author

FuhuXia commented Sep 25, 2024

Facebook bots are crawling catalog in the silly manner, blindly following all possible found links and making requests to various tags combinations.
The requests rate seems to be stabilized at 12 requests per seconds.

image

@FuhuXia
Copy link
Member Author

FuhuXia commented Sep 30, 2024

We locked down DB for the weekend for postgresql version bump works, during which there is no harvesting, no DB writing activity, therefore no solr replicating. To my surprise, SOLR followers' memory usage stayed flat for the whole time.

This shows that solr replicating activity is the one this is directly associated with solr ECS memory leak.

image

Update

based on previous clue, we removed the solr traffic 60%-30%-10% allocation on AWS balance group console and gave each solr follower equal amount the traffic.
image

@FuhuXia FuhuXia closed this as completed Sep 30, 2024
@FuhuXia FuhuXia mentioned this issue Oct 18, 2024
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🗄 Closed
Development

No branches or pull requests

1 participant