-
Notifications
You must be signed in to change notification settings - Fork 479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster pattern matching for built-in types #5711
Comments
One thing I would like about the second one is if data_to_constr converted the constr tag to a sum of product as well. Basically this |
Oh yeah, that's on the menu, I just didn't implement it, because it's a bunch of boilerplate and I'm not sure if we're going to pick this option rather than the other one, hence might potentially result in time waste. I'm going to speed up pattern matching for tuples too. Also we do remember about the request for the scrutinee to come last, I didn't reflect it here, because I didn't want to change anything mid-way. Once we've settled on an approach, we'll discuss the idea with the team and likely change the implementation accordingly. |
I had to think about this a bit. If we abstract over the matcher with a function we lose this property (since function arguments get forced). But maybe we recover it with inlining, plus we could push more applications into case branches. (Although this is tricky in UPLC since we need to know how many lambdas to go under. Boo. An argument I didn't think of for making the bindings for case variables explicit in the term...)
Is this visible in other benchmarks?
Right, this is just "pattern matching on Data" and not "pattern-matching on datatypes encoded using Data", which would also be convenient but we still don't know how to type it.
I don't know either. It's funny there is no performance difference, that was a potential decision point. Did we get any input from @mjaskelioff about whether the change to allow builtins to return an application spine would affect the formalism at all? |
Returning an application spine would affect the formalism, but I don't see it as being too problematic. The main complexity in the builtin formalisation is in consuming the arguments, rather than in the return type. From the point of view of specification (formal or not) it seems like a win for (2) as (2) is simpler to explain than (1), were we would need to introduce spines. |
But we're not inlining
Yup.
Yeah, that sucks.
None of us did.
Yes, but the benchmarking machine is unreliable, so no.
Well I'm sure there's some, it's just that it's not big enough for us to be able to detect it using broken tools. |
I don't have much to say, I find this (about option1) a bit disturbing:
And also this:
That sounds a bit like premature future-proofing that we may never need. I'm sure in the past I've been worried about the possibility of builtins returning more general terms for some reason, but I don't recall the details at the moment. |
About the possibility of buitlins returning applications: could we return a saturated application wrapped in a |
Outcome of our discussion:
|
Hang on, doesn't the SOP approach also violate our old assumption that the only types which appear in the signatures of builtins are either a) builtin types or b) type variables? Now we would be able to have an SOP type there. |
Yes. We would have to go for extending the notion of a) and saying that a SOP of builtin types is a builtin type. |
In any case, what this discussion shows is that these are not just any new builtins: both ideas require a change to the language. So the discussion needs to go deeper and cannot rely solely on implementations: we need to understand the consequences on the language design. What would be the specification of the new features? How would they affect the type system of TPLC? |
Yes, I had missed that returning SOPs was an extension! |
As per our Slack discussion that wouldn't work, because then types would suggest that putting SOPs into lists or pairs was possible, but that is not supported and we have no plans on supporting it, since it's a pain and we don't have a use case anyway.
Yes, I should've stated explicitly that I'm not simply proposing a few new builtins, but I thought it was obvious from
and
.
Good thing I sent an email about this issue to the Plutus mailing list! I think the type system consequences for (1) are negligible, we're just allowing builtins to apply functions, seems very straightforward and mirrors the functionality of regular functions. (2) is somewhat weird and it'd fantastic if type theory experts had a deep dive.
Yes, we'll need to add SOP to builtin signatures to the specification and the metatheory. In the implementation it's trivial and I did it in the PR. I think we should allow for SOP values to be arguments, because even though it's probably useless, it's not wrong and forbidding it would be more work than not doing that, at least in the implementation. I'll let the specification and the metatheory folks decide how they want it to be there. |
Okay, still it's good to be clear that there's an extension either way. If anything the SOP extension seems more dubious to me so I've tipped back in favour of the pattern-matching builtins 🙃 |
Note that as far as I remember from the specification/formalization point of view we also need to allow for |
This is the one place where I disagree. If we add SOP to improve (time and space) efficiency, and we make a change to builtins to improve efficiency, it is not so odd if these interact. |
I'm also still in favor of supporting SOPs especially to support the quoted text above in the future. An alternative I can think of is integer -> constr i []. But all these cases require SOP typing with builtins. So I'm leaning toward returning SOP with builtins. |
That's fair, I didn't consider this perspective, thank you.
This is also a good point. With the reservation that while we fully recognize how helpful such functionality would be, we still have no idea to how implement it in an even remotely sensible way. I'll create a separate issue about that and we'll keep thinking about it, but don't hold your breath.
This reasoning surprises me, because the type system part kinda directly follows from allowing SOPs to appear from within builtins. It's the operational semantics of this interaction that I'm being paranoid over, but if we accept those, I feel like the type system extension is negligible in comparison? It's literally just allowing more Plutus types to appear in signatures of builtins, to my taste we could allow all of them there (that's what the implementation amounts to, pretty much) and it'd be fine (I do agree with the view that having a specific type of builtin signatures that aren't hardcoded to be Plutus types is good design and that the implementation should introduce such machinery). |
This is a bit vague, but I feel like historically I have thought of our builtin types and functions as being imported wholesale from another universe of types and functions. i.e. there's another universe with just integers, etc., and that's where integer builtin functions come from, and we just embed them. For polymorphic builtins, they're just so polymorphic that they work with everything (hand waving!). But if we allow actual PLC types from the PLC type universe to appear in builtin signatures, that story is completely dead. Maybe that's not a problem at all, I'm unsure if this metaphor I want to use has any actual consequences! |
What is currently done in the metatheory for builtins is to define a
universe of *argument types*. Currently this universe consists of either
type variables of kind *, or builtin-compatible types (atomic constants,
type variables of kind #, and (pair or list) applied to builtin-compatible
types. We could extend this universe with SOPs, but we would need to define
precisely which SOP types we would allow to appear in an argument: are SOPs
only top-level (like *-kinded variables) or can they appear anywhere? (say
in a list of SOPs).
Note that the universe is called "argument types" because the return type
can be in a different universe. (Currently the result type is an argument
type).
Also, the universe is sort of independent of PLC types (the same argument
type is interpreted in different type systems)
So, speaking from the POV of the metatheory, I don't think the typing is
going to be a big problem. It's a matter of defining appropriate universes.
…On Fri, Jan 26, 2024 at 6:44 AM Michael Peyton Jones < ***@***.***> wrote:
It's the operational semantics of this interaction that I'm being paranoid
over, but if we accept those, I feel like the type system extension is
negligible in comparison?
This is a bit vague, but I feel like historically I have thought of our
builtin types and functions as being imported wholesale from another
universe of types and functions. i.e. there's another universe with just
integers, etc., and that's where integer builtin functions come from, and
we just embed them. For polymorphic builtins, they're just so polymorphic
that they work with everything (hand waving!). But if we allow actual PLC
types from the PLC type universe to appear in builtin signatures, that
story is completely dead. Maybe that's not a problem at all, I'm unsure if
this metaphor I want to use has any actual consequences!
—
Reply to this email directly, view it on GitHub
<#5711 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHDDMXGGKXO3T2A5OKLOOLYQN3IHAVCNFSM6AAAAABBVSBKNCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJRG42TSMZUG4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
It's much larger compared to SOPs and in terms of the number of files affected. If you look into the actual diff and type
I agree. But I've now created an issue arguing that we need to be able to return applications from builtins to get faster processing of
It's not easily a Overall, I just don't think that the builtins machinery should do evaluator's job when we can avoid that. I think that the
Fair point. I can't quite determine what is it that I find unsettling about making builtins return SOPs. Maybe I'm just being silly.
I've now given the idea of using SOPs for that purpose a serious consideration, please checkout out this issue for a discussion on the builtin that you propose and this issue for a more general topic of getting faster processing of I may be very wrong, so if you think there's a flaw in my reasoning or I simply missed something, please do let me know in one of those issues. But so far I prefer the approach of returning iterated applications from builtins and that's what I'll end up implementing unless somebody finds a reason to prefer SOPs or the team votes in favor of doing SOPs even if we don't have an exact reason to.
The implementation supports all of PLC types already (or can be trivially made to), but to the best of my knowledge that is not used in the existing builtins indeed. Note however that allowing for a builtin to return iterated application is only useful if we pass functions to builtins and functions have
The implementation doesn't make any difference between |
Done by #6530. There's still work that needs to be done w.r.t. costing and the spec, but it's out of the scope of this issue. |
This issue is for discussing approaches to improving performance of pattern matching for built-in types.
The current approach is inefficient. Details can be found in this Note, the gist is that currently pattern matching over lists is implemented as (pseudocode)
where
chooseList
,head
andtail
are all built-in functions (chooseList
returns either its second or its third argument depending on whether its first argument is null or not, respectively). Therefore in case ofnil
we end up performing one builtin call and in case ofcons
we end up performing three builtin calls. Note how we also have to pass a unit argument around in order not to always evaluate all branches: Plutus is strict, hencechooseList xs z (f (head xs) (tail xs))
would have the wrong semantics.There have been two proposals on how to implement faster pattern matching for built-in types:
Both PRs are documented, see the initial comment on each PR if you want to understand the details of design and implementation.
Both approaches work and give us comparable speedup on built-in-list-specific benchmarks.
This is what we get for (1):
This is what we get for (2):
It may appear that (2) is faster, however the benchmarking machine apparently is capable (see this slack discussion) of being wrong by 6% and probably even more, hence we can't really rely on these results, but it's still clear that both the approaches give us meaningful improvement of about the same scale.
If we analyze performance analytically, this is what we'll find:
unit
arguments around in order for pattern matching not to force all the branches at once and only force the picked one, while (2), being backed by SOPs, has no such restriction sincecase
is specifically designed to be a proper pattern matching construct (unlike function application which is strict in Plutus). E.g. for (1) we have the following diff:while for (2) it's
I.e. (2) wins for this one.
matchList
is literally a single built-in function call, while in (2) it's a built-in function call + acase
:which affects not only performance, but also size, which is an even scarcer resource.
I.e. (1) wins for this one.
f x y
we simply return all of those terms separately and let the evaluator handle reducing the application -- we have to do that, because each evaluator deals with reducing applications differently), because one doesn't need to case on the result to figure out if it contains a single output or multiple of them.I.e. (2) wins for this one.
Maybe it's fine, but I believe (1) wins for this one.
I.e. (1) wins for this one.
Overall, there's no clear winner. Performance comparison is unclear with both approaches delivering meaningful improvement, although my very subjective feeling is that (2) slightly wins when it comes to performance. But also (1) makes builtins more expressive, maybe for some another reason we'll end up needing to return function applications from builtins eventually anyway?
I'm personally torn between the two options. Which one should we choose?
The text was updated successfully, but these errors were encountered: