Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

efficiency: investigate bottleneck #1614

Open
oliver-sanders opened this issue Jan 4, 2024 · 3 comments
Open

efficiency: investigate bottleneck #1614

oliver-sanders opened this issue Jan 4, 2024 · 3 comments
Milestone

Comments

@oliver-sanders
Copy link
Member

oliver-sanders commented Jan 4, 2024

See also: cylc/cylc-uiserver#547

This workflow has proven to be remarkably difficult for the UIS & UI to handle:

#!Jinja2

{% set members = 10 %}
{% set hours = 100 %}

[scheduler]
    allow implicit tasks = True

[task parameters]
    member = 0..{{members}}
    fcsthr = 0..{{hours}}
  [[templates]]
    member = member%(member)03d
    fcsthr = _fcsthr%(fcsthr)03d

[scheduling]
  initial cycle point = 2000
  runahead limit = P3
  [[xtriggers]]
    start = wall_clock(offset=PT7H15M)
  [[graph]]
    T00,T06,T12,T18 = """
        @start & prune[-PT6H]:finish => prune & purge
        @start => sniffer:ready_<member,fcsthr> => <member,fcsthr>_process? => finish
        <member,fcsthr>_process:fail? => fault
      """

[runtime]
    [[sniffer]]
        [[[outputs]]]
{% for member in range(0, members + 1) %}
    {% for hour in range(0, hours + 1) %}
            ready_member{{ member | pad(3, 0) }}_fcsthr{{ hour | pad(3, 0) }} = {{ member }}{{ hour }}
    {% endfor %}
{% endfor %}

For more information see: https://cylc.discourse.group/t/slow-load-of-cylc-workflows-disconnects/823/19

Investigation so far has confirmed:

  • The scheduler is not a source of delay.
  • The UIS chokes on the update for several seconds.
    • During this time, updates to other workflows are suspended
  • The UI chokes on the deltas of several seconds.
  • The browser takes a couple of seconds to update.

This issue focuses on the UI side of things.

Suggested remediation (UI only, please update with new suggestions):

@oliver-sanders oliver-sanders added this to the Pending milestone Jan 4, 2024
@oliver-sanders
Copy link
Member Author

oliver-sanders commented Jan 4, 2024

IMO, the UI side of this issue is more concerning than the UIS side because UIS delay loads the server, whereas UI delay hits the user's browser.

The bulk of the time is being taken in the data store processing the deltas, this should be the first target for improvement. Profiling required to highlight problem areas, given that the table view is only slightly faster to load than the tree view, family tree computation is unlikely to be the cause.

@oliver-sanders
Copy link
Member Author

oliver-sanders commented Jan 8, 2024

Profiling Experiments

1 - JS Profiling

Profile the time it takes to load the tree view for the workflow in the OP with hours turned down to 20. Workflow is started in paused mode.

Results:

  • 5.68 seconds of scripting time.
    • 1.5s seconds of which is accounted for by the UPDATE_DELTAS call chain (i.e. workflow data store stuff)
      • 0.61 seconds of which is applyInheritance.
      • 0.90 seconds of which is shared between createTreeNode and addChild.

The remainder appears to be vuejs.

2 - View Load Time

Open a view, then measure the time it takes to open the same view in a new workspace tab.

  • Because you've already opened the view, there's nothing for the data store to do.
  • This gives you an idea of mount/render time.

Manual timings to the nearest second:

  • Tree - 13s
  • Table - 2s
  • Simple Tree - 0s

3 - Component Loading

Start with the "simple tree" view and add in the components used by the regular "tree" view one by one, measuring the impact on load time for each.

  • Simple Tree ~0s
    • With <Task /> icons ~3s
      • With <Job /> icons ~4.5s
        • With <v-btn /> buttons (used for expand/collapse) ~8.5s
          • With <v-icon /> icon (used for expand/collapse icons) ~11.5s

Using these timings to extract the cost per component:

  • 4s <v-btn />
  • 3s <Task />
  • 3s <v-icon />
  • 1.5s <Job />

Note: These costs are for 1'000 tasks, e.g 0.004s per <Task /> icon.

Conclusions

  1. The store is a little sluggish, we should look into possible optimisations
  2. The real killer is the component count in the Tree view.
  3. Potential for easy gains simplifying the expand/collapse system.

Remediation:

oliver-sanders added a commit to oliver-sanders/cylc-ui that referenced this issue Jan 9, 2024
* Reduce the overheads of `applyInheritance` for families with lots of
  children.
* Use a map for O(1) lookup times when checking whether child nodes
  are present in the store.
* This should change scaling from `O(mn)` to `O(n)` where `n` is the
  number of tasks in a family when created and `m` is the number of
  children after an update (n~=m for most cases).
* Partially addresses cylc#1614
* For the reduced example (`hours = 20`) this reduces the time taken
  by ~85% from ~0.6s to <0.1s (though these times will be subject to
  much jitter).
@oliver-sanders
Copy link
Member Author

The three optimisations up so far make a reasonable dent in the CPU time.

The time is going into two places:

  • Unpacking deltas into the data store.
  • Iterating the store, mounting and rendering components in the views.

The data store time is more concerning than the view time as views can be optimised (e.g. table view reduces the number of nodes on screen by pagination, tree view can use a virtual scroller in the future to similar effect) but the data store time will always remain so the store should be the main target of optimisation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant