Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: modelmesh container have error logs when kserve runtime is running. #125

Open
Jooho opened this issue Nov 29, 2023 · 5 comments
Open
Assignees

Comments

@Jooho
Copy link
Contributor

Jooho commented Nov 29, 2023

When kserve and modelmeh are running in the same namespace, modelmesh container show these errors:

{"instant":{"epochSecond":1701291690,"nanoOfSecond":966215444},"thread":"ll-elg-thread-2","level":"INFO","loggerName":"com.ibm.watson.modelmesh.ModelMesh","message":"Returning READY to readiness probe (did not find any other pods in terminating state)","endOfBatch":false,"loggerFqcn":"org.apache.logging.log4j.spi.AbstractLogger","contextMap":{},"threadId":35,"threadPriority":5}
Nov 29, 2023 9:01:37 PM io.grpc.netty.NettyServerTransport notifyTerminated
INFO: Transport failed
io.netty.handler.codec.http2.Http2Exception: Unexpected HTTP/1.x request: GET /stats/prometheus
at io.netty.handler.codec.http2.Http2Exception.connectionError(Http2Exception.java:109)
at io.netty.handler.codec.http2.Http2ConnectionHandler$PrefaceDecoder.readClientPrefaceString(Http2ConnectionHandler.java:317)
at io.netty.handler.codec.http2.Http2ConnectionHandler$PrefaceDecoder.decode(Http2ConnectionHandler.java:247)
at io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:453)
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:529)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:468)

There are 3 networkpolicy in the namespace:

  • allow-from-openshift-monitoring-ns
  • istio-expose-route-minimal
  • istio-mesh-minimal

If allow-from-openshift-monitoring-ns network policy is deleted, the error message is not showing up anymore. So I think this networkpolicy is the culprit of this issue. However, it is not 100% so it needs more debugging.

Reference:
https://github.com/orgs/opendatahub-io/projects/42?pane=issue&itemId=40292089

@skonto
Copy link

skonto commented Dec 7, 2023

By setting istio-prometheus-ignore="true" you can avoid scraping on port 15020 happening on the modelmesh pod.
See:

Name:         istio-proxies-monitor
Namespace:    kserve-demo
... 
Spec:
  Namespace Selector:
  Pod Metrics Endpoints:
    Bearer Token Secret:
      Key:     
    Interval:  30s
    Path:      /stats/prometheus
  Selector:
    Match Expressions:
      Key:       istio-prometheus-ignore
      Operator:  DoesNotExist

@vaibhavjainwiz
Copy link
Member

vaibhavjainwiz commented Dec 11, 2023

Analysis
As a part to setup OMW(Openshift Monitoring Workflow), a Pod monitor(istio-proxies-monitor) has been created which allow to directy scrap metrics from all Pod in KServe runtime namespace on /stats/prometheus endpoint at HTTP port.

ModelMesh already have a ServiceMonitor resource on its pod which allows the metric scraping through secure port. istio-proxies-monitor should not monitor ModelMesh pod.

Solution
istio-proxies-monitor(PodMonitor) and istiod-monitor(ServiceMonitor) are suppose to monitor Istio component not the Kserve. As I verifies equivalent Istio PodMonitor and ServiceMonitor are already created in Istio-system namespace. So I think we could safely remove the istio-proxies-monitor(PodMonitor) and istiod-monitor(ServiceMonitor) from Kserve namespace.

model-serving-Page-16 drawio

@Jooho
Copy link
Contributor Author

Jooho commented Dec 11, 2023

So I think we could safely remove the istio-proxies-monitor(PodMonitor) and istiod-monitor(ServiceMonitor) from Kserve namespace.

Do you know who created these two objects?

@vaibhavjainwiz
Copy link
Member

So I think we could safely remove the istio-proxies-monitor(PodMonitor) and istiod-monitor(ServiceMonitor) from Kserve namespace.

Do you know who created these two objects?

Today I do some more research around it and found below article. According to point 7.1 its intentionally added in there. We should not remove the istio-proxies-monitor(PodMonitor) and istiod-monitor(ServiceMonitor) from Kserve namespace.
https://docs.openshift.com/container-platform/4.14/service_mesh/v2x/ossm-observability.html#ossm-integrating-with-user-workload-monitoring_observability

@vaibhavjainwiz
Copy link
Member

vaibhavjainwiz commented Dec 11, 2023

After discussing with @skonto @bartoszmajsak , we come to the point that we need to add extra label in istio-proxies-monitor PodMonitor to skip ModelMesh pod monitoring.

Discussion thread : https://redhat-internal.slack.com/archives/C065ARTVA80/p1702293019814919?thread_ts=1701693652.733169&cid=C065ARTVA80

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment