Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnecessary re-compile in riak_kv_wm_object #30

Open
martinsumner opened this issue May 14, 2024 · 2 comments
Open

Unnecessary re-compile in riak_kv_wm_object #30

martinsumner opened this issue May 14, 2024 · 2 comments

Comments

@martinsumner
Copy link

To process a PUT via riak_kv_wm_object, there are three regular expressions compiled:

Links are. a deprecated feature, and the expressions are still compiled even if the links are empty.

Index field splitting is required, but would be more efficiently done with string:split (although there are some subtle differences with output in this case if the input is not a binary).

@martinsumner
Copy link
Author

The majority of the time is in compiling the regex not applying it.

string:split(Terms, ", ", all) is marginally quicker than re:split(Terms, ",\s", [{return, binary}]) if Terms is already a binary (otherwise an iolist_to_binary/1 call is required first (And this will be slower).

There may be subtle functional differences between string:split and re:split

@martinsumner
Copy link
Author

Overall re split may be better than string:split ...

lists:sum(lists:map(fun(_I) -> element(1, timer:tc(fun() -> re:split(Term, RE0, [{return, binary}]) end)) end, lists:seq(1, 10000))).
13094
61> lists:sum(lists:map(fun(_I) -> element(1, timer:tc(fun() -> re:split(Term, RE0, [{return, binary}]) end)) end, lists:seq(1, 10000))).
13073
62> lists:sum(lists:map(fun(_I) -> element(1, timer:tc(fun() -> re:split(Term, RE0, [{return, binary}]) end)) end, lists:seq(1, 10000))).
13160
63> lists:sum(lists:map(fun(_I) -> element(1, timer:tc(fun() -> re:split(TermList, RE0, [{return, binary}]) end)) end, lists:seq(1, 10000))).
70358
64> lists:sum(lists:map(fun(_I) -> element(1, timer:tc(fun() -> re:split(TermList, RE0, [{return, binary}]) end)) end, lists:seq(1, 10000))).
72131
65> lists:sum(lists:map(fun(_I) -> element(1, timer:tc(fun() -> re:split(TermList, RE0, [{return, binary}]) end)) end, lists:seq(1, 10000))).
70717
66> lists:sum(lists:map(fun(_I) -> element(1, timer:tc(fun() -> re:split(TermListB, RE0, [{return, binary}]) end)) end, lists:seq(1, 10000))).
45961
67> lists:sum(lists:map(fun(_I) -> element(1, timer:tc(fun() -> re:split(TermListB, RE0, [{return, binary}]) end)) end, lists:seq(1, 10000))).
46929
68> lists:sum(lists:map(fun(_I) -> element(1, timer:tc(fun() -> re:split(TermListB, RE0, [{return, binary}]) end)) end, lists:seq(1, 10000))).
47076
69> lists:sum(lists:map(fun(_I) -> element(1, timer:tc(fun() -> string:split(Term, ", ", all) end)) end, lists:seq(1, 10000))).
8858
70> lists:sum(lists:map(fun(_I) -> element(1, timer:tc(fun() -> string:split(Term, ", ", all) end)) end, lists:seq(1, 10000))).
8020
71> lists:sum(lists:map(fun(_I) -> element(1, timer:tc(fun() -> string:split(Term, ", ", all) end)) end, lists:seq(1, 10000))).
8301
72> lists:sum(lists:map(fun(_I) -> element(1, timer:tc(fun() -> string:split(iolist_to_binary(TermList), ", ", all) end)) end, lists:seq(1, 10000))).
92594
73> lists:sum(lists:map(fun(_I) -> element(1, timer:tc(fun() -> string:split(iolist_to_binary(TermList), ", ", all) end)) end, lists:seq(1, 10000))).
82369
74> lists:sum(lists:map(fun(_I) -> element(1, timer:tc(fun() -> string:split(iolist_to_binary(TermList), ", ", all) end)) end, lists:seq(1, 10000))).
95715
75> lists:sum(lists:map(fun(_I) -> element(1, timer:tc(fun() -> string:split(TermListB, ", ", all) end)) end, lists:seq(1, 10000))).
71398
76> lists:sum(lists:map(fun(_I) -> element(1, timer:tc(fun() -> string:split(TermListB, ", ", all) end)) end, lists:seq(1, 10000))).
70223
77> lists:sum(lists:map(fun(_I) -> element(1, timer:tc(fun() -> string:split(TermListB, ", ", all) end)) end, lists:seq(1, 10000))).
70935

Term is a single IndexTerm. TermList is 10 comma/space separated terms. TermListB is TermList but as a binary (i.e. iolist_to_binary(TermList))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant