Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

O+M 2024-08-26 #4864

Closed
5 of 14 tasks
FuhuXia opened this issue Aug 26, 2024 · 8 comments
Closed
5 of 14 tasks

O+M 2024-08-26 #4864

FuhuXia opened this issue Aug 26, 2024 · 8 comments
Assignees

Comments

@FuhuXia
Copy link
Member

FuhuXia commented Aug 26, 2024

As part of day-to-day operation of Data.gov, there are many Operation and Maintenance (O&M) responsibilities. Instead of having the entire team watching notifications and risking some notifications slipping through the cracks, we have created an O&M Triage role. One person on the team is assigned the Triage role which rotates each sprint. This is not meant to be a 24/7 responsibility, only East Coast business hours. If you are unavailable, please note when you will be unavailable in Slack and ask for someone to take on the role for that time.

Check the O&M Rotation Schedule for future planning.

Acceptance criteria

You are responsible for all O&M responsibilities this week. We've highlighted a few so they're not forgotten. You can copy each checklist into your daily report.

Daily Checklist

Note: Catalog Auto Tasks
You will need to update the chart values manually. Click the Action link in each issue and grab the values from monitor task output and check runtime.

Weekly Checklist

Monthly Checklist

ad-hoc checklist

  • audit/review applications on cloud foundry and determine what can be stopped and/or deleted.

Reference

@GSA GSA deleted a comment Aug 26, 2024
@GSA GSA deleted a comment Aug 26, 2024
@FuhuXia
Copy link
Member Author

FuhuXia commented Aug 27, 2024

catalog-gather and catalog-fetch in staging are stuck, no deployment can go thru.

catalog-fetch          started           web:0/0, web:0/0   
catalog-gather         started           web:0/0, web:0/0   

@FuhuXia
Copy link
Member Author

FuhuXia commented Aug 27, 2024

Deleted staging catalog-gather and catalog-fetch. Deployment is fine now.

@FuhuXia
Copy link
Member Author

FuhuXia commented Aug 27, 2024

After deployment, catalog-gather and catalog-fetch are back to stuck state web:0/0, web:0/0.

@FuhuXia
Copy link
Member Author

FuhuXia commented Aug 28, 2024

setting instance 1 (instead of 0) for catalog-gather and catalog-fetch is the workaround.
GSA/catalog.data.gov#1439

@FuhuXia
Copy link
Member Author

FuhuXia commented Aug 28, 2024

set IMLS to manual schedule and marked it as a broken source, since there is no response to our request to unblock harvesting traffic.
#4799 (comment)

@FuhuXia
Copy link
Member Author

FuhuXia commented Aug 28, 2024

monthly harvest ioos with 30k+ records just refreshed all its timestamps to 2024-08-25 in source https://data.noaa.gov/waf/NOAA/ioos/iso/xml/ .

Sent 33790 objects to the fetch queue

UPDATE:
It finished after 3+ days. The scary part is that it appears the timestamp changes are mpstly legit due to file content change, not simple refreshing.

image

Some (1127 to be exact, 3.5% ) are caught as simple file refreshing, for exchange this one
https://data.noaa.gov/waf/NOAA/ioos/iso/xml/world_equator.xml, timestamped 2024-08-25 but <gmd:dateStamp> 2021-03-30

image

@rshewitt
Copy link
Contributor

apparently i don't have edit access to the audit log. ooooooooooof. can someone grant me that?

@FuhuXia
Copy link
Member Author

FuhuXia commented Sep 3, 2024

apparently i don't have edit access to the audit log. ooooooooooof. can someone grant me that?

Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

6 participants
@FuhuXia @jbrown-xentity @rshewitt and others