Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exposing OSCAL data with OpenTelemetry #2039

Open
3 tasks
gyliu513 opened this issue Sep 3, 2024 · 19 comments
Open
3 tasks

Exposing OSCAL data with OpenTelemetry #2039

gyliu513 opened this issue Sep 3, 2024 · 19 comments

Comments

@gyliu513
Copy link

gyliu513 commented Sep 3, 2024

User Story

As an OSCAL user, I want to expose all of the OSCAL data in OTEL format and see all of the data via some otel backends, like grafana etc.

Goals

Enable OSCAL can embrace the OTLP protocol and expose its data to different platforms.

Dependencies

No response

Acceptance Criteria

  • All OSCAL website and readme documentation affected by the changes in this issue have been updated. Changes to the OSCAL website can be made in the docs/content directory of your branch.
  • A Pull Request (PR) is submitted that fully addresses the goals of this User Story. This issue is referenced in the PR.
  • The CI-CD build process runs without any reported errors on the PR. This can be confirmed by reviewing that all checks have passed in the PR.

(For reviewers: The wiki has guidance on code review and overall issue review for completeness.)

Revisions

No response

@gyliu513
Copy link
Author

gyliu513 commented Sep 3, 2024

Does anyone know if OSCAL has any plan to integrate with OpenTelemetry? Thanks!

@iMichaela
Copy link
Contributor

Does anyone know if OSCAL has any plan to integrate with OpenTelemetry? Thanks!

@gyliu513 - At this time, NIST does not have a plan to support OTel, but if the community is interested in researching this topic, we can support it.

OpenTelemetry provides a common framework for collecting telemetry data and exporting it to an Observability back end of your choice. It uses a set of standardized, vendor-agnostic APIs, SDKs, and tools for ingesting, transforming, and transporting data. Since the telemetry data consists of the logs, metrics and traces collected from a distributed system, I am assuming your are proposing OTel for the assessment results collection and insertion into the OSCAL Assessment Plans and/or POA&Ms?

@gyliu513
Copy link
Author

gyliu513 commented Sep 4, 2024

I am assuming your are proposing OTel for the assessment results collection and insertion into the OSCAL Assessment Plans and/or POA&Ms?

@iMichaela Yes, this is what I was hoping we can integrate, any suggestion for this? Thanks

@gyliu513
Copy link
Author

gyliu513 commented Sep 5, 2024

@iMichaela do you know if there are any tools which can be used to collect OSCAL data automatically? Thanks

@aj-stein-gsa
Copy link

User Story

As an OSCAL user, I want to expose all of the OSCAL data in OTEL format and see all of the data via some otel backends, like grafana etc.

This came up in a FedRAMP implementers meeting and sounds interesting (at least personally to me, I am one of those OTEL people in personal lab environments from time to time with Prometheus and Grafana). Do you have an idea of what kind of security information you would want to see and how it relates to security controls for a notional system?

@iMichaela do you know if there are any tools which can be used to collect OSCAL data automatically? Thanks

As someone who reviews a lot of tools and integrations, I have not seen any yet, but that is why I asked the previous question.

@iMichaela
Copy link
Contributor

This came up in a FedRAMP implementers meeting and sounds interesting (at least personally to me, I am one of those OTEL people in personal lab environments from time to time with Prometheus and Grafana). Do you have an idea of what kind of security information you would want to see and how it relates to security controls for a notional system?

My assumption - per communication above - was that the intention is to collect logs, metrics, and traces/evidence required. It should match the information planned to be collected for control assessments, to satisfy the regulatory framework requirements.

@aj-stein-gsa - if you recall the ATARC pilot, I envision the need for providing inputs to guide the outputs. Personally I need to do more reading, but I am also very interested in researching it . It would be a great OSCAL research topic. I am going to raise it with CNCF OSCAL WGs as well.

@gyliu513
Copy link
Author

gyliu513 commented Sep 5, 2024

Thanks @aj-stein-gsa and @iMichaela for the discussion here, really helpful. Let me share a use case here:

Suppose I have a VM, and I was using otel collector to collect some metrics for this VM, like VM name, cpu, memory etc. I also want to get some OSCAL assessment results for this VM as well, and then do correlation for those data to show the customer an overview for this VM entity.

But if we can provide a solution of using otel to collect data for OSCAL as well, then we can probably define a unified data collector layer and data correlation layer to handle this request.

An example for a VM otel metrics data and OSCAL data as below, hope this helps.

An example of OSCAL Security Plan

{
  "system-security-plan": {
    "metadata": {
      "title": "Virtual Machine System Security Plan",
      "last-modified": "2024-09-04T00:00:00Z",
      "version": "1.0",
      "oscal-version": "1.0.0"
    },
    "system-characteristics": {
      "system-name": "Example Virtual Machine",  // VM name here
      "system-description": "This is a virtual machine running critical applications.",
      "system-information": {
        "system-type": "Virtual Machine",
        "system-host": "VMware ESXi",
        "operating-system": "Ubuntu 22.04 LTS"
      }
    },
    "control-implementation": {
      "implemented-controls": [
        {
          "control-id": "AC-2",
          "description": "Implement access control for the VM.",
          "responsible-roles": ["VM Administrator"]
        },
        {
          "control-id": "SI-7",
          "description": "Ensure the integrity of VM's software and updates.",
          "responsible-roles": ["Security Officer"]
        }
      ]
    }
  }
}

And then I got assessment result for my VM as below with OSCAL

{
  "assessment-results": {
    "metadata": {
      "title": "Virtual Machine Assessment Results",
      "last-modified": "2024-09-04T00:00:00Z",
      "version": "1.0",
      "oscal-version": "1.0.0"
    },
    "results": [
      {
        "control-id": "AC-2",
        "status": "satisfied",
        "findings": "User access control measures are in place and effective."
      },
      {
        "control-id": "SI-7",
        "status": "partially satisfied",
        "findings": "Software integrity checks are in place, but one outdated package was found."
      }
    ]
  }
}

And get OSCAL AD as following:

{
  "authorization-decision": {
    "metadata": {
      "title": "Virtual Machine Authorization Decision",
      "last-modified": "2024-09-04T00:00:00Z",
      "version": "1.0",
      "oscal-version": "1.0.0"
    },
    "authorization-result": {
      "decision": "authorized with conditions",
      "description": "The VM is authorized for use, but the outdated package must be updated within 30 days.",
      "justification": "No critical vulnerabilities were identified, but some remediation is required."
    }
  }
}

Here is the data of the VM that I get from otel

{
  "resourceMetrics": [
    {
      "resource": {
        "attributes": [
          {"key": "vm.name", "value": "Example Virtual Machine"},  // VM Name here
          {"key": "host.name", "value": "vm-host-01"},
          {"key": "os.type", "value": "linux"}
        ]
      },
      "scopeMetrics": [
        {
          "metrics": [
            {
              "name": "vm.cpu.usage",
              "description": "CPU usage of the VM",
              "unit": "percentage",
              "dataPoints": [
                {"timestamp": 1693804800, "value": 55.3}
              ]
            }
          ]
        }
      ]
    }
  ]
}

After correlation, the VM data will be as following:

{
  "vm.name": "Example Virtual Machine",
  "oscal-controls": {
    "AC-2": {
      "status": "satisfied",
      "description": "User access control measures are correctly implemented."
    },
    "SI-7": {
      "status": "partially satisfied",
      "description": "Software integrity checks are in place, but one outdated package was found."
    }
  },
  "otel-metrics": {
    "cpu.usage": "55.3%",
    "memory.usage": "2GB",
    "network.throughput": "150Mbps"
  },
  "otel-traces": [
    {
      "trace-id": "1234567890abcdef",
      "span-id": "abcdef1234567890",
      "operation": "vm-login",
      "status": "ok",
      "start-time": "2024-09-04T10:00:00Z",
      "end-time": "2024-09-04T10:00:05Z"
    }
  ],
  "otel-logs": [
    {
      "timestamp": "2024-09-04T10:00:00Z",
      "log-level": "info",
      "message": "User admin logged into VM."
    }
  ]
}

@gyliu513
Copy link
Author

gyliu513 commented Sep 5, 2024

@iMichaela do you have some meeting notes or github links for CNCF OSCAL WGs? Thanks

@aj-stein-gsa
Copy link

Thanks @aj-stein-gsa and @iMichaela for the discussion here, really helpful. Let me share a use case here:

Suppose I have a VM, and I was using otel collector to collect some metrics for this VM, like VM name, cpu, memory etc. I also want to get some OSCAL assessment results for this VM as well, and then do correlation for those data to show the customer an overview for this VM entity.

But if we can provide a solution of using otel to collect data for OSCAL as well, then we can probably define a unified data collector layer and data correlation layer to handle this request.

Neat, so you essentially want custom metrics to consume with an OTEL collector with perhaps a custom receiver?

@gyliu513
Copy link
Author

gyliu513 commented Sep 5, 2024

Neat, so you essentially want custom metrics to consume with an OTEL collector with perhaps a custom receiver?

@aj-stein-gsa Yes, but maybe not only receiver, but also processor, as there maybe some semantic convention required in the processor. We probably need a oscalreceiver?

@iMichaela
Copy link
Contributor

@iMichaela do you have some meeting notes or github links for CNCF OSCAL WGs? Thanks

@jflowers leads the cncf/tag-security OSCAL Norms project.
I raised the issue today and I believe there is a lot of interest.

@iMichaela
Copy link
Contributor

Thanks @aj-stein-gsa and @iMichaela for the discussion here, really helpful. Let me share a use case here:
Suppose I have a VM, and I was using otel collector to collect some metrics for this VM, like VM name, cpu, memory etc. I also want to get some OSCAL assessment results for this VM as well, and then do correlation for those data to show the customer an overview for this VM entity.
But if we can provide a solution of using otel to collect data for OSCAL as well, then we can probably define a unified data collector layer and data correlation layer to handle this request.

Neat, so you essentially want custom metrics to consume with an OTEL collector with perhaps a custom receiver?

There are few comments I have to the data sample you provided, which might be fundamental to the problem. I'll put aside for the moment the incorrect OSCAL structure, I am only looking at the data you are trying to convey

The system-security-plan/control-implementation/implemented-requirements/by-component/description here reads "Implement access control for the VM." This description MUST document how such control/requirement is implemented. The HOW is AC-2 implemented is what needs to be assessed in ways that adhere to the assessor's plan of assessing it and what evidence is required by this regulatory framework. So this information is needed and either the otel oscalreceiver and/or the processor will need to use it to know what to check, collect as evidence and as input for the adjudication.

The assessment-results/results/assessment-log will collect logs, the assessment-results/results/observation will gather the relevant-evidence, date, collected, method, etc., the assessment-results/results/findings will link those to the relevant-observations and implementation-statement-uuid, etc..
Finally, the assessment-results/results/attestation would need to capture the outcome of the assessment (AD data).

In the example, you are providing otel metrics, but similar metrics might be already defined by different authorities (FedRAMP, CSA STAR, etc) and might need to be mapped or used as inputs...

I hope this is all doable .. The reason for calling on CNCF experts.
I personally like the idea very much, but it is not straight forward..

@gyliu513
Copy link
Author

gyliu513 commented Sep 5, 2024

The assessment-results/results/assessment-log will collect logs, the assessment-results/results/observation will gather the relevant-evidence, date, collected, method, etc., the assessment-results/results/findings will link those to the relevant-observations and implementation-statement-uuid, etc..
Finally, the assessment-results/results/attestation would need to capture the outcome of the assessment (AD data).

Thanks @iMichaela , what I provided is just an example to clarify my use case, there maybe some errors, but please ignore that. :)

In the example, you are providing otel metrics, but similar metrics might be already defined by different authorities (FedRAMP, CSA STAR, etc) and might need to be mapped or used as inputs...

This is a good point. Yes, we can get same data from different sources, I think that is why we need semantic convention and data correlation to mitigate those issues.

@iMichaela
Copy link
Contributor

iMichaela commented Sep 5, 2024

This is a good point. Yes, we can get same data from different sources, I think that is why we need semantic convention and data correlation to mitigate those issues.

And most likely, enforced control metrics will need native support in OSCAL or use of a registry of extensions, otherwise tools might not know how to extract the information and use it (pass it as input , use it for the final AD, etc) automatically. Just for keeping records together, here is the CSA/cloud-audit-metrics project (their own JSON schema to align with the

@ogijaoh
Copy link

ogijaoh commented Sep 19, 2024

What is the proposed change to OSCAL? I am not clearly seeing the need to change OSCAL. This seems like an effort to develop a tool that will make use of assessment result data in OSCAL formats. Do these efforts belong in this repository, or should a separate repository be started to accomplish this?

OSCAL is a set of data structures. OpenTelemetry is a set of tools for measuring the performance and behavior of software. The way I am understanding the discussion, it seems the way forward is to work telemetry outputs into the software applications that consume/create OSCAL-structured data, not modify the OSCAL structures themselves.

edited for grammar

@gyliu513
Copy link
Author

@ogijaoh Thanks for the comments, totally agree with you.

I can see you are working for https://github.com/defenseunicorns/lula, and it can Generate machine-readible OSCAL artifacts, seems this can be used as a source for generating OSCAL data, and we need to build a oscalreceiver to get those data. Comments? Thanks

@iMichaela
Copy link
Contributor

@ogijaoh - You are absolutely right, and to my understanding, the proposal, as clarified after some discussion (see above) is focusing on the ability to use OSCAL assessment plan information into open telemetry and the outputs of otel tools into OSCAL assessment results allowing the software applications that consume/create OSCAL-structured data to consume the information.
The OSCAL repo has been used by many as a place to dump ideas, complaints of all kinds. We are not pleased with it, but when an interesting idea comes, we do not want it to be lost.
I know OSCAL schemas do not need to be modified for this purpose, but allowing the community to take this idea to OSCA-DEFINE and further explore it, is something that I am trying to get out of this discussion, because I am trying to figure out if this will then be an EPIC story with a pointer to where the research take place, or I close it if the comment thinks it has no value.
So, is there an extension necessary to be defined (by the community) under an otel ns, and register it on the OSCAL registry the community is planing to stand up? Is there a need to map the OSCAL data to otel collector's data, so all tools that the community will build will do the same implementation of this community-defined specification so tools are interoperable?

@ogijaoh
Copy link

ogijaoh commented Sep 23, 2024

@gyliu513, I don't work for Lula, but I have worked with it. Using Lula as a use case, what content would you want created/accessed for OTEL? I have some familiarity with the processing of logs with Promtail, Loki, and Grafana (PLG). I'm not quite sure I understand your question, but I'll take a stab at describing the challenge as I see it to develop something like this. Please let me know if this is off the mark, or if OTEL provides other capabilities than the ones I am referencing in PLG.

If Lula were going to give us logs, thought would need to be given to how Lula would generate these logs for the different capabilities Lula has.

For instance, Lula's current output when validating system configuration/status against an expected configuration/status has information such as:

  • number of testable requirements referenced in the component
  • number of validations for the component
  • whether a validation was evaluated or not
  • the resulting status of a validation
  • overall status of each testable requirement (which can include several validations)

To use Loki / Grafana in the way I'm familiar with their use, Lula would have to output quite a few logs for each validation run in order to provide the full breadth of information it provides with its standard results output.

Going away from logs for a minute...if the goal is to see a more visual display of what happened with a validation, or to see the latest validation results for a system, would it be possible to just write something that reads the assessment results file produced by Lula and displays the results? I don't know if this would be something done with a custom receiver or just by adding an assessment results file (or a repository of related OSCAL-structured files) as a data source, and then parsing that data source.

@ogijaoh
Copy link

ogijaoh commented Sep 23, 2024

@iMichaela, copy that on allowing these conversations to start in this space.

To your first statement, this is different than how I was thinking about the problem. I was considering the use of OpenTelemetry capabilities as a means for understanding what has happened with assessments (see my response to @gyliu513 in this comment: #2039 (comment)).

If I understand you correctly, your interpretation of the above conversation is taking information in OTEL and creating OSCAL-structured content for consumption/use elsewhere. So not an attempt to understand the telemetry of tools in the OSCAL ecosystem, but rather translating content from OTEL formats into OSCAL-structured formats. Is that right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Needs Triage
Development

No branches or pull requests

4 participants