-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core: cache some embedded DT info + generic bisect helper function #7067
Conversation
Implement a generic array bisecting helper function, bisect_equal(). Change-Id: Ic0915b994fbed45fc3f3d7efd2b50991676166a1 Signed-off-by: Etienne Carriere <etienne.carriere@foss.st.com>
Add a bisect test sequence to OP-TEE selftest support in the test PTA. The test checks bisect operation for present and absent cells, on arrays from 0 to 65 cells. Change-Id: I281da151eb9f6d5191e9f325e26a84f7e0f9593f Signed-off-by: Etienne Carriere <etienne.carriere@foss.st.com>
Parse embedded DTB before drivers get information from and save information on parent nodes (parent node offset, address and size cells value) in order to speed up fdt_parent_offset() libfdt function that parses the DTB from it's root node to fetch information mainly. Also change fdt_reg_base_address(), fdt_reg_size() and fdt_fill_device_info() to leverage this support. The cached information is sorted by node offset value to allow an efficient bisect sequence to retrieved desired information. The cached resources are released once OP-TEE core initialization is completed. This feature is enabled upon configuration switch CFG_DT_CACHED_NODE_INFO. Change-Id: I32c084afb384f9aafad16a6f642d39d07ed55069 Signed-off-by: Etienne Carriere <etienne.carriere@foss.st.com>
Parse embedded DTB before drivers get information from and save information on nodes phandles if any in order to speed up fdt_node_offset_by_phandle() libfdt function that parses the DTB from its root node to find a node offset based on the node phandle. The cached information is sorted by phandle value to allow an efficient bisect sequence to retrieved desired information. The cached resources are released once OP-TEE core initialization is completed. This feature is added to CFG_DT_CACHED_NODE_INFO support Change-Id: Ibf9f463c3c3e01e0225555ca5386f6c3104e646e Signed-off-by: Etienne Carriere <etienne.carriere@foss.st.com>
Explicitly default enable CFG_DT_CACHED_NODE_INFO on plat-stm32mp1. Change-Id: I9b63c3dc5073b5d89d7509a86670a6bb130aed76 Signed-off-by: Etienne Carriere <etienne.carriere@foss.st.com>
Explicitly default enable CFG_DT_CACHED_NODE_INFO on plat-stm32mp2. Change-Id: I4cd1bf1da1a243f9e2d67bba635f9b52d6826b1a Signed-off-by: Etienne Carriere <etienne.carriere@foss.st.com>
CI / Code style tet failure is due to false positive warnings on trace messages exceeding 80 char/line:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parse the embedded DTB prior using it and cache information that can take time for libdft functions to find. Cached info is stored in sorted arrays so that we can bisect to find the target info.
Have you considered using a hash table? I have a feeling it would be even faster.
More comments below.
core/include/kernel/dt.h
Outdated
* @parent_offset: Output parent node offset upon success | ||
* @return 0 on success and -1 on failure | ||
* | ||
* This function is supported when CFG_DT_CACHED_NODE_INFO is enabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the cache feature should be more transparent. Ideally hook the cache into the FDT lib code. Or if we don't want to touch libfdt, make a wrapper. But the caller should not need to know if the cache is enabled or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine patch libfdt but I'm pretty confident such change would never hit libfdt mainstream.
The way fdt_reg_base_address()
is implemented in our kernel/dt.h make that fully leverage adresse cell (to prevent a node property parsing) makes that this very function needs to be patched.
Same for fdt_reg_size()
(to prevent 2 parsing of node properties) and fdt_fill_device_info()
(2 parent node lookup + 2 node properties parsing).
That said, I can change this inline description to:
- * Find the offset of a parent node in the parent node cache
+ * Find the offset of a parent node in the parent node cache if any
* (...)
- *
- * This function is supported when CFG_DT_CACHED_NODE_INFO is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the fixup commit that held your comment. The commit was noisy and buggy so I it replaced with another fixup commit that should work better.
Please feel free to use this comment thread if needed. I won't tag it 'Resolved' :)
* @cmp: Trilean comparision helper function applicable to @array | ||
* @return: Pointer to the cell in @array matching @target, NULL if none found | ||
*/ | ||
void *bisect_equal(const void *array, size_t n, size_t cell_size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strange name, how about bisect_find()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the name may sound strange.
I choose bisect_equal()
since the function find a cell that matches the target in an equal case regarding cmp()
callback.
I think there could be bisect_roundup()
, bisect_rounddow()
and bisect_nearest()
for returning the closest matching cell.
That said, maybe bisect()
straight could be applicable for the exact matching case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parse the embedded DTB prior using it and cache information that can take time for libdft functions to find. Cached info is stored in sorted arrays so that we can bisect to find the target info.
Have you considered using a hash table? I have a feeling it would be even faster.
Why do you think a hash table would speed up the processed? We would still need to parse the hash table to find our target.
core/include/kernel/dt.h
Outdated
* @parent_offset: Output parent node offset upon success | ||
* @return 0 on success and -1 on failure | ||
* | ||
* This function is supported when CFG_DT_CACHED_NODE_INFO is enabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine patch libfdt but I'm pretty confident such change would never hit libfdt mainstream.
The way fdt_reg_base_address()
is implemented in our kernel/dt.h make that fully leverage adresse cell (to prevent a node property parsing) makes that this very function needs to be patched.
Same for fdt_reg_size()
(to prevent 2 parsing of node properties) and fdt_fill_device_info()
(2 parent node lookup + 2 node properties parsing).
That said, I can change this inline description to:
- * Find the offset of a parent node in the parent node cache
+ * Find the offset of a parent node in the parent node cache if any
* (...)
- *
- * This function is supported when CFG_DT_CACHED_NODE_INFO is enabled.
* @cmp: Trilean comparision helper function applicable to @array | ||
* @return: Pointer to the cell in @array matching @target, NULL if none found | ||
*/ | ||
void *bisect_equal(const void *array, size_t n, size_t cell_size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the name may sound strange.
I choose bisect_equal()
since the function find a cell that matches the target in an equal case regarding cmp()
callback.
I think there could be bisect_roundup()
, bisect_rounddow()
and bisect_nearest()
for returning the closest matching cell.
That said, maybe bisect()
straight could be applicable for the exact matching case.
Fix build error when CFG_EMBED_DTB is disabled. Signed-off-by: Etienne Carriere <etienne.carriere@foss.st.com>
6467b58
to
3aac7fb
Compare
Ah yes, this is about speeding up the first lookup... So I'm not so sure. We are comparing:
With many lookups I believe the hash wins (especially with many values and a large enough hash table to have few collisions), but with only one lookup it's not obvious. |
Parsing a list to find a hash value seems less efficient (to me) than an array bisect operation. I must have missed something. |
Hopefully there is no collision and in this case you reach the value you are looking for immediately (that is, the linked lists have a single element). A linked list is one possible implementation to address collisions: https://en.wikipedia.org/wiki/Hash_table#Separate_chaining |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment for "libutils: add bisect helper functions", this looks very much like the standard function bsearch(). Can we import bsearch()
from newlib instead?
Thanks for the pointer. I was looking for a bisect open source implementation but failed to find a compliant one. I missed to look into newlib source tree. I'll remove my implementation in favor to this far more mature one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments for "core: dt: cache embedded DTB node parent information"
|
||
cuint = fdt_getprop(fdt, node_offset, "#address-cells", NULL); | ||
if (cuint) | ||
addr_cells = (int)fdt32_to_cpu(*cuint); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the (int)
cast needed?
If #address-cells
isn't found, shouldn't the value used in the parent node be used instead?
Same below for #size-cells
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that libfdt does not go 1 parent-level up to get these information. It would make sense but I don't think this is how DT support is implemented.
parent_node_cache.alloced_count = 0; | ||
} | ||
|
||
static TEE_Result enlarge_parent_node_cache(void) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grow_parent_node_cache()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
static struct { | ||
struct parent_node_cache *array; | ||
size_t count; | ||
size_t alloced_count; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this field needed? It seems that count
is always increased right after alloced_count
has been increased.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I allocated struct by chunk of 64 instances, for some efficiency. Therefore I count the allocated and filled cells separately.
* @address_cells: Parent node #address-cells property value or 0 | ||
* @size_cells: Parent node #size-cells property value or 0 | ||
*/ | ||
struct parent_node_cache { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the parent
part in the name confusing since the struct describes a node. How about struct cached_node
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The struct intended to relate to parent node information of a node, hence this name.
I'll fix that to use a common struct for both parent node info and node phandle info. I'll use cached_node
.
Please drop the |
I've tested use of a hash table, using a simplistic hash algo (some XOR on offset/phandle value bytes). It gives the same boot perfs on the platforms i've tested. Only adds some extra heap consumption. By the way I've also tested few other schemes and I found that on my platforms with at most several hundreds of nodes in the DT (tested worse case with 1 thousand), scanning a list or an array of the preloaded structures, without any bisect or hash indices optimization, give the overall same results. All in one, I wonder if bisect or hash indices are really useful here, seen the relatively small size of the database (hundreds of nodes in the DTB). I think I should simplify this to the minimum even is less elegant than using hashed or sorted arrays. |
@etienne-lms that's interesting. If there is no clear benefit in performance I would also think the simplest algorithm is the best. |
So you mean that not caching is as fast as caching? |
No, I meant parsing an array or list of cached info is as fast as finding the cached info using bisect or or hash table. |
OK, because I suspect there's some overhead in collecting the list. |
Actually the overhead for bisect or hashing is very small. I'v removed these optims because they don't add much value (until a platform comes with a DTB with several thousands of node) and cost some more memory. We'll be able to integrate such optims if needed in the futur. Since the changes needed in updating this series are quite big and revert commits, I preferred to create a new P-R instead of appending fixup commits. Please have a look at #7073. Consequently, I close this P-R. |
Parse the embedded DTB prior using it and cache information that can take time for libdft functions to find. Cached info is stored in sorted arrays so that we can bisect to find the target info. Cached information are:
fdt_parent_offset()
).#address-cells
and#size-cells
properties of a parent node.fdt_node_offset_by_phandle()
).The feature saves from few dozen to few hundreds of milliseconds of boot time depending on the DTB size (more specifically depending on the number of nodes in the DTB), for example, about 400ms boot time saved on STM32MP15/pager, and about 600ms saved on our STM32MP25 downstream config.
The patch series adds a generic bisect helper function and a test case in PTA test selftests. The test is proceed by CI on qemuv7 and qemuv8 paltform. These both platforms also embed a very small DTB in OP-TEE core so the DT cached info are also slightly tested (their embedded DTB only contains 7 nodes and 1 phandle).