core: cache some embedded DT info + generic bisect helper function #7067

etienne-lms · 2024-10-02T17:06:22Z

Parse the embedded DTB prior using it and cache information that can take time for libdft functions to find. Cached info is stored in sorted arrays so that we can bisect to find the target info. Cached information are:

parent node of a node (see fdt_parent_offset()).
possibly heavily used #address-cells and #size-cells properties of a parent node.
phandle of a node (see fdt_node_offset_by_phandle()).

The feature saves from few dozen to few hundreds of milliseconds of boot time depending on the DTB size (more specifically depending on the number of nodes in the DTB), for example, about 400ms boot time saved on STM32MP15/pager, and about 600ms saved on our STM32MP25 downstream config.

The patch series adds a generic bisect helper function and a test case in PTA test selftests. The test is proceed by CI on qemuv7 and qemuv8 paltform. These both platforms also embed a very small DTB in OP-TEE core so the DT cached info are also slightly tested (their embedded DTB only contains 7 nodes and 1 phandle).

Implement a generic array bisecting helper function, bisect_equal(). Change-Id: Ic0915b994fbed45fc3f3d7efd2b50991676166a1 Signed-off-by: Etienne Carriere <etienne.carriere@foss.st.com>

Add a bisect test sequence to OP-TEE selftest support in the test PTA. The test checks bisect operation for present and absent cells, on arrays from 0 to 65 cells. Change-Id: I281da151eb9f6d5191e9f325e26a84f7e0f9593f Signed-off-by: Etienne Carriere <etienne.carriere@foss.st.com>

Parse embedded DTB before drivers get information from and save information on parent nodes (parent node offset, address and size cells value) in order to speed up fdt_parent_offset() libfdt function that parses the DTB from it's root node to fetch information mainly. Also change fdt_reg_base_address(), fdt_reg_size() and fdt_fill_device_info() to leverage this support. The cached information is sorted by node offset value to allow an efficient bisect sequence to retrieved desired information. The cached resources are released once OP-TEE core initialization is completed. This feature is enabled upon configuration switch CFG_DT_CACHED_NODE_INFO. Change-Id: I32c084afb384f9aafad16a6f642d39d07ed55069 Signed-off-by: Etienne Carriere <etienne.carriere@foss.st.com>

Parse embedded DTB before drivers get information from and save information on nodes phandles if any in order to speed up fdt_node_offset_by_phandle() libfdt function that parses the DTB from its root node to find a node offset based on the node phandle. The cached information is sorted by phandle value to allow an efficient bisect sequence to retrieved desired information. The cached resources are released once OP-TEE core initialization is completed. This feature is added to CFG_DT_CACHED_NODE_INFO support Change-Id: Ibf9f463c3c3e01e0225555ca5386f6c3104e646e Signed-off-by: Etienne Carriere <etienne.carriere@foss.st.com>

Explicitly default enable CFG_DT_CACHED_NODE_INFO on plat-stm32mp1. Change-Id: I9b63c3dc5073b5d89d7509a86670a6bb130aed76 Signed-off-by: Etienne Carriere <etienne.carriere@foss.st.com>

Explicitly default enable CFG_DT_CACHED_NODE_INFO on plat-stm32mp2. Change-Id: I4cd1bf1da1a243f9e2d67bba635f9b52d6826b1a Signed-off-by: Etienne Carriere <etienne.carriere@foss.st.com>

etienne-lms · 2024-10-03T07:25:08Z

CI / Code style tet failure is due to false positive warnings on trace messages exceeding 80 char/line:

8deb8ab9d core: pta: test bisect implementation
WARNING: line length of 105 exceeds 80 columns
#98: FILE: core/pta/tests/misc.c:620:
+						LOG("-> Found unexpected target %d (size %zu, hide %zu)",

WARNING: line length of 102 exceeds 80 columns
#105: FILE: core/pta/tests/misc.c:627:
+						LOG("- Failed to find target %d (size %zu, hide %zu)",

WARNING: line length of 108 exceeds 80 columns
#110: FILE: core/pta/tests/misc.c:632:
+						LOG("- Wrongly found %d for target %d (size %zu, hide %zu)",

total: 0 errors, 3 warnings, 0 checks, 111 lines checked

jforissier

Parse the embedded DTB prior using it and cache information that can take time for libdft functions to find. Cached info is stored in sorted arrays so that we can bisect to find the target info.

Have you considered using a hash table? I have a feeling it would be even faster.

More comments below.

jforissier · 2024-10-03T10:05:18Z

core/include/kernel/dt.h

+ * @parent_offset: Output parent node offset upon success
+ * @return 0 on success and -1 on failure
+ *
+ * This function is supported when CFG_DT_CACHED_NODE_INFO is enabled.


I believe the cache feature should be more transparent. Ideally hook the cache into the FDT lib code. Or if we don't want to touch libfdt, make a wrapper. But the caller should not need to know if the cache is enabled or not.

I'm fine patch libfdt but I'm pretty confident such change would never hit libfdt mainstream.
The way fdt_reg_base_address() is implemented in our kernel/dt.h make that fully leverage adresse cell (to prevent a node property parsing) makes that this very function needs to be patched.
Same for fdt_reg_size() (to prevent 2 parsing of node properties) and fdt_fill_device_info() (2 parent node lookup + 2 node properties parsing).

That said, I can change this inline description to:

- * Find the offset of a parent node in the parent node cache + * Find the offset of a parent node in the parent node cache if any * (...) - * - * This function is supported when CFG_DT_CACHED_NODE_INFO is enabled.

I've removed the fixup commit that held your comment. The commit was noisy and buggy so I it replaced with another fixup commit that should work better.
Please feel free to use this comment thread if needed. I won't tag it 'Resolved' :)

jforissier · 2024-10-03T10:06:16Z

lib/libutils/isoc/include/bisect.h

+ * @cmp: Trilean comparision helper function applicable to @array
+ * @return: Pointer to the cell in @array matching @target, NULL if none found
+ */
+void *bisect_equal(const void *array, size_t n, size_t cell_size,


Strange name, how about bisect_find()?

I agree the name may sound strange.
I choose bisect_equal() since the function find a cell that matches the target in an equal case regarding cmp() callback.
I think there could be bisect_roundup(), bisect_rounddow() and bisect_nearest() for returning the closest matching cell.
That said, maybe bisect() straight could be applicable for the exact matching case.

etienne-lms

Parse the embedded DTB prior using it and cache information that can take time for libdft functions to find. Cached info is stored in sorted arrays so that we can bisect to find the target info.

Have you considered using a hash table? I have a feeling it would be even faster.

Why do you think a hash table would speed up the processed? We would still need to parse the hash table to find our target.

etienne-lms · 2024-10-03T11:47:03Z

core/include/kernel/dt.h

+ * @parent_offset: Output parent node offset upon success
+ * @return 0 on success and -1 on failure
+ *
+ * This function is supported when CFG_DT_CACHED_NODE_INFO is enabled.


I'm fine patch libfdt but I'm pretty confident such change would never hit libfdt mainstream.
The way fdt_reg_base_address() is implemented in our kernel/dt.h make that fully leverage adresse cell (to prevent a node property parsing) makes that this very function needs to be patched.
Same for fdt_reg_size() (to prevent 2 parsing of node properties) and fdt_fill_device_info() (2 parent node lookup + 2 node properties parsing).

That said, I can change this inline description to:

- * Find the offset of a parent node in the parent node cache + * Find the offset of a parent node in the parent node cache if any * (...) - * - * This function is supported when CFG_DT_CACHED_NODE_INFO is enabled.

etienne-lms · 2024-10-03T11:51:21Z

lib/libutils/isoc/include/bisect.h

+ * @cmp: Trilean comparision helper function applicable to @array
+ * @return: Pointer to the cell in @array matching @target, NULL if none found
+ */
+void *bisect_equal(const void *array, size_t n, size_t cell_size,


I agree the name may sound strange.
I choose bisect_equal() since the function find a cell that matches the target in an equal case regarding cmp() callback.
I think there could be bisect_roundup(), bisect_rounddow() and bisect_nearest() for returning the closest matching cell.
That said, maybe bisect() straight could be applicable for the exact matching case.

Fix build error when CFG_EMBED_DTB is disabled. Signed-off-by: Etienne Carriere <etienne.carriere@foss.st.com>

jforissier · 2024-10-03T12:30:02Z

Parse the embedded DTB prior using it and cache information that can take time for libdft functions to find. Cached info is stored in sorted arrays so that we can bisect to find the target info.

Have you considered using a hash table? I have a feeling it would be even faster.

Why do you think a hash table would speed up the processed? We would still need to parse the hash table to find our target.

Ah yes, this is about speeding up the first lookup... So I'm not so sure. We are comparing:

[Bisecting] Parse the DT, for each value, copy the value into an array; then sort the array and bisect to find a value.
[Hashing] Parse the DT, for each value, apply a hash function, copy the hash and the value into an array of linked lists (buckets), then to find a value apply the hash function and read from the appropriate linked list.

With many lookups I believe the hash wins (especially with many values and a large enough hash table to have few collisions), but with only one lookup it's not obvious.

etienne-lms · 2024-10-04T07:17:58Z

[Hashing] Parse the DT, for each value, apply a hash function, copy the hash and the value into an array of linked lists (buckets), then to find a value apply the hash function and read from the appropriate linked list.

Parsing a list to find a hash value seems less efficient (to me) than an array bisect operation. I must have missed something.

jforissier · 2024-10-04T10:07:34Z

[Hashing] Parse the DT, for each value, apply a hash function, copy the hash and the value into an array of linked lists (buckets), then to find a value apply the hash function and read from the appropriate linked list.

Parsing a list to find a hash value seems less efficient (to me) than an array bisect operation. I must have missed something.

Hopefully there is no collision and in this case you reach the value you are looking for immediately (that is, the linked lists have a single element). A linked list is one possible implementation to address collisions: https://en.wikipedia.org/wiki/Hash_table#Separate_chaining

jenswi-linaro

Comment for "libutils: add bisect helper functions", this looks very much like the standard function bsearch(). Can we import bsearch() from newlib instead?

etienne-lms · 2024-10-07T07:57:21Z

Thanks for the pointer. I was looking for a bisect open source implementation but failed to find a compliant one. I missed to look into newlib source tree. I'll remove my implementation in favor to this far more mature one.

jenswi-linaro

Comments for "core: dt: cache embedded DTB node parent information"

jenswi-linaro · 2024-10-07T07:50:53Z

core/kernel/dt.c

+
+	cuint = fdt_getprop(fdt, node_offset, "#address-cells", NULL);
+	if (cuint)
+		addr_cells = (int)fdt32_to_cpu(*cuint);


Why is the (int) cast needed?
If #address-cells isn't found, shouldn't the value used in the parent node be used instead?
Same below for #size-cells.

I see that libfdt does not go 1 parent-level up to get these information. It would make sense but I don't think this is how DT support is implemented.

jenswi-linaro · 2024-10-07T07:52:52Z

core/kernel/dt.c

+	parent_node_cache.alloced_count = 0;
+}
+
+static TEE_Result enlarge_parent_node_cache(void)


grow_parent_node_cache()?

jenswi-linaro · 2024-10-07T07:55:36Z

core/kernel/dt.c

+static struct {
+	struct parent_node_cache *array;
+	size_t count;
+	size_t alloced_count;


Is this field needed? It seems that count is always increased right after alloced_count has been increased.

I allocated struct by chunk of 64 instances, for some efficiency. Therefore I count the allocated and filled cells separately.

jenswi-linaro · 2024-10-07T08:02:26Z

core/kernel/dt.c

+ * @address_cells: Parent node #address-cells property value or 0
+ * @size_cells: Parent node #size-cells property value or 0
+ */
+struct parent_node_cache {


I find the parent part in the name confusing since the struct describes a node. How about struct cached_node?

The struct intended to relate to parent node information of a node, hence this name.
I'll fix that to use a common struct for both parent node info and node phandle info. I'll use cached_node.

jenswi-linaro · 2024-10-07T08:14:48Z

Please drop the Change-Id: tags.

etienne-lms · 2024-10-07T13:46:07Z

I've tested use of a hash table, using a simplistic hash algo (some XOR on offset/phandle value bytes). It gives the same boot perfs on the platforms i've tested. Only adds some extra heap consumption.

By the way I've also tested few other schemes and I found that on my platforms with at most several hundreds of nodes in the DT (tested worse case with 1 thousand), scanning a list or an array of the preloaded structures, without any bisect or hash indices optimization, give the overall same results. All in one, I wonder if bisect or hash indices are really useful here, seen the relatively small size of the database (hundreds of nodes in the DTB).

I think I should simplify this to the minimum even is less elegant than using hashed or sorted arrays.

jforissier · 2024-10-07T13:51:51Z

@etienne-lms that's interesting. If there is no clear benefit in performance I would also think the simplest algorithm is the best.

jenswi-linaro · 2024-10-07T16:40:51Z

So you mean that not caching is as fast as caching?

etienne-lms · 2024-10-07T19:16:42Z

No, I meant parsing an array or list of cached info is as fast as finding the cached info using bisect or or hash table.

jenswi-linaro · 2024-10-08T05:41:03Z

OK, because I suspect there's some overhead in collecting the list.

etienne-lms · 2024-10-09T10:07:03Z

Actually the overhead for bisect or hashing is very small. I'v removed these optims because they don't add much value (until a platform comes with a DTB with several thousands of node) and cost some more memory. We'll be able to integrate such optims if needed in the futur.

Since the changes needed in updating this series are quite big and revert commits, I preferred to create a new P-R instead of appending fixup commits. Please have a look at #7073. Consequently, I close this P-R.

etienne-lms added 6 commits October 2, 2024 15:23

libutils: add bisect helper functions

0e2d87a

Implement a generic array bisecting helper function, bisect_equal(). Change-Id: Ic0915b994fbed45fc3f3d7efd2b50991676166a1 Signed-off-by: Etienne Carriere <etienne.carriere@foss.st.com>

plat-stm32mp1: conf: default enable CFG_DT_CACHED_NODE_INFO

536b589

Explicitly default enable CFG_DT_CACHED_NODE_INFO on plat-stm32mp1. Change-Id: I9b63c3dc5073b5d89d7509a86670a6bb130aed76 Signed-off-by: Etienne Carriere <etienne.carriere@foss.st.com>

plat-stm32mp2: conf: default enable CFG_DT_CACHED_NODE_INFO

e524167

Explicitly default enable CFG_DT_CACHED_NODE_INFO on plat-stm32mp2. Change-Id: I4cd1bf1da1a243f9e2d67bba635f9b52d6826b1a Signed-off-by: Etienne Carriere <etienne.carriere@foss.st.com>

jforissier reviewed Oct 3, 2024

View reviewed changes

etienne-lms commented Oct 3, 2024

View reviewed changes

[review] core: dt: cache embedded DTB node ...

3aac7fb

Fix build error when CFG_EMBED_DTB is disabled. Signed-off-by: Etienne Carriere <etienne.carriere@foss.st.com>

etienne-lms force-pushed the dt-cached-info branch from 6467b58 to 3aac7fb Compare October 3, 2024 12:26

jenswi-linaro reviewed Oct 7, 2024

View reviewed changes

etienne-lms mentioned this pull request Oct 9, 2024

core: cache some embedded DT info (take 2) #7073

Open

etienne-lms closed this Oct 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core: cache some embedded DT info + generic bisect helper function #7067

core: cache some embedded DT info + generic bisect helper function #7067

etienne-lms commented Oct 2, 2024

etienne-lms commented Oct 3, 2024

jforissier left a comment

jforissier Oct 3, 2024

etienne-lms Oct 3, 2024

etienne-lms Oct 3, 2024

jforissier Oct 3, 2024

etienne-lms Oct 3, 2024

etienne-lms left a comment

etienne-lms Oct 3, 2024

etienne-lms Oct 3, 2024

jforissier commented Oct 3, 2024

etienne-lms commented Oct 4, 2024

jforissier commented Oct 4, 2024

jenswi-linaro left a comment

etienne-lms commented Oct 7, 2024

jenswi-linaro left a comment

jenswi-linaro Oct 7, 2024

etienne-lms Oct 7, 2024

jenswi-linaro Oct 7, 2024

etienne-lms Oct 7, 2024

jenswi-linaro Oct 7, 2024

etienne-lms Oct 7, 2024

jenswi-linaro Oct 7, 2024

etienne-lms Oct 7, 2024

jenswi-linaro commented Oct 7, 2024

etienne-lms commented Oct 7, 2024

jforissier commented Oct 7, 2024

jenswi-linaro commented Oct 7, 2024

etienne-lms commented Oct 7, 2024

jenswi-linaro commented Oct 8, 2024

etienne-lms commented Oct 9, 2024

core: cache some embedded DT info + generic bisect helper function #7067

core: cache some embedded DT info + generic bisect helper function #7067

Conversation

etienne-lms commented Oct 2, 2024

etienne-lms commented Oct 3, 2024

jforissier left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

etienne-lms left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jforissier commented Oct 3, 2024

etienne-lms commented Oct 4, 2024

jforissier commented Oct 4, 2024

jenswi-linaro left a comment

Choose a reason for hiding this comment

etienne-lms commented Oct 7, 2024

jenswi-linaro left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jenswi-linaro commented Oct 7, 2024

etienne-lms commented Oct 7, 2024

jforissier commented Oct 7, 2024

jenswi-linaro commented Oct 7, 2024

etienne-lms commented Oct 7, 2024

jenswi-linaro commented Oct 8, 2024

etienne-lms commented Oct 9, 2024