Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adoption SIG: Opinionated Catalog Ingestors Working Group #70

Open
fcorti opened this issue Feb 21, 2023 · 10 comments
Open

Adoption SIG: Opinionated Catalog Ingestors Working Group #70

fcorti opened this issue Feb 21, 2023 · 10 comments
Assignees

Comments

@fcorti
Copy link
Contributor

fcorti commented Feb 21, 2023

This is a working group to follow up on the idea of defining Opinionated Catalog Ingestors to address the pain point related to the Software Catalog Adoption.

@fcorti
Copy link
Contributor Author

fcorti commented Mar 1, 2023

We met Feb 28, 2023

Attendees: Francesco Corti, Mihai Tabara, minkimcello, Ravikumar Tadikonda, Taras Mankovski, Waldir Montoya, Phil K.

We discussed the idea and proposal described by the picture below.

Screenshot 2023-03-01 at 10 37 33

The general feedback is positive on the proposal.
Something to discuss further is the way to "extend" the metadata collected through the various "bundled ingestors".
This is something that we would like to discuss into a future session.

Action: @fcorti shares a doodle to setup another working session.

Please raise comments here if you have any thoughts, suggestions, concern.

@Sarabadu
Copy link

Sarabadu commented Mar 1, 2023

Related to the yml editor, I was dreaming about a preview entity page while writing new catalog files.

Is a bit edgy and maybe only useful while adopting/learning but having feedback while learning to add new kinds would be nice.

@MihaiTabara MihaiTabara self-assigned this Mar 16, 2023
@adamdmharvey
Copy link
Member

adamdmharvey commented Mar 16, 2023

Wanted to potentially throw out the idea of a Cloud Ingestor. For example, we could find it very useful to ingest Kuberentes clusters as resources in the catalog, which we can then tie elements to. (e.g., AWS EKS clusters) And that's not intended to mean duplicating the Kubernetes plugin, necessarily. (though there's good overlap)

Eliminating the manual stitching for this would be powerful, and being able to throw away spreadsheets (which we don't want to maintain), wikis, or scripts to query against Terraform files or even just AWS CLI, since that will all lack the Backstage ownership level data.

  • AWS
    • AWS Accounts Ingestor (e.g., if we have an AWS org, being able to ingest all of the accounts attached to it)
    • AWS EKS Ingestor
  • GCP Projects Ingestor

etc...

@taras
Copy link
Member

taras commented Mar 17, 2023

During March 16th Adotion SIG, I showed my proposal for a North Star of a future ingestion system. I am describing the user experience of using the ingestion system to help align the community on where we could be going. It may take a long time to realize this experience, but I'm hoping it'll help us talk about how we want things to be, especially as we move to event-driven ingestion.

Backstage Ingestion North Star Proposal
Backstage Ingestion North Star Proposal-2
Backstage Ingestion North Star Proposal-3
Backstage Ingestion North Star Proposal-4
Backstage Ingestion North Star Proposal-5
Backstage Ingestion North Star Proposal-6
Backstage Ingestion North Star Proposal-7

@taras
Copy link
Member

taras commented Mar 17, 2023

Cloud Ingestor

@adamdmharvey, the idea of a cloud ingestor is an interesting one. Historically, the catalog wasn't designed to ingest lots of frequently changing data, but it does unlock some interesting possibilities. For example, we could create a shared plugin to compute DORA metrics. The move to event-driven ingestion might nudge the community in this direction.

@awanlin
Copy link
Contributor

awanlin commented Mar 17, 2023

Just popping in a quick comment that we should have a .Net ingestor - this could pulling in data based on .csproj or .vbproj (sorry but there is still a lot of legacy out there) files and creating entities from them

@fcorti this could to beside the What Else on your diagram 😉

@fcorti
Copy link
Contributor Author

fcorti commented Mar 20, 2023

@fcorti this could to beside the What Else on your diagram 😉

Yep, I expect this list of ingestors to be prioritised (once agreed) so that we can start from the "most requested" and then leave the community the space to implement one, few or many of them.

I personally like the idea of providing the "base" for the implementation of all the ingestors that the adopters will want and need. Ideally in some time only the "legacy" ones should remain as the ones to develop.

@cal5barton
Copy link

Wanted to potentially throw out the idea of a Cloud Ingestor. For example, we could find it very useful to ingest Kuberentes clusters as resources in the catalog, which we can then tie elements to. (e.g., AWS EKS clusters) And that's not intended to mean duplicating the Kubernetes plugin, necessarily. (though there's good overlap)

Eliminating the manual stitching for this would be powerful, and being able to throw away spreadsheets (which we don't want to maintain), wikis, or scripts to query against Terraform files or even just AWS CLI, since that will all lack the Backstage ownership level data.

  • AWS

    • AWS Accounts Ingestor (e.g., if we have an AWS org, being able to ingest all of the accounts attached to it)
    • AWS EKS Ingestor
  • GCP Projects Ingestor

etc...

We'd use this type of ingestor.

@niallthomson
Copy link

I'm at AWS and we've been seeing customers ingesting infrastructure in to the catalog as resources but each implementation is pretty ad-hoc. Has there been any developments on how we think this can be approached in a healthy way?

@webark
Copy link

webark commented Dec 16, 2023

People use infrastructure in a wide range of ways, so it might be harder to have a single representation of that data.

For us at least, we don't want a single resource for each record, but for buckets of record. Right now we bucket based on type and an entityRef of the component that generated the infrastructure. This allows for a kind of "environment agnostic" view of the resources. We're looking to find ways to associate what components use the resource buckets in the case of shared infrastructure (like anything from a database or an s3 bucket or a ECS cluster) utilizing tracing data, but we haven't started that work yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants