Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue: Additional checks, both semver and non-semver #5

Open
obi1kenobi opened this issue Jul 18, 2022 · 30 comments
Open

Tracking issue: Additional checks, both semver and non-semver #5

obi1kenobi opened this issue Jul 18, 2022 · 30 comments
Labels
A-lint Area: new or existing lint C-enhancement Category: raise the bar on expectations E-help-wanted Call for participation: Help is requested to fix this issue. E-mentor Call for participation: Mentorship is available for this issue.

Comments

@obi1kenobi
Copy link
Owner

obi1kenobi commented Jul 18, 2022

This is a list of all not-yet-implemented checks that would be useful to have. Some of these require new schema and adapter implementations as well, tracked in #241.

In addition to checking for semver violations, there are certain changes that are not breaking and don't even require a minor version, but can still be frustrating in downstream crates without a minor or major version bump. Crates should be able to opt into such warnings on an individual basis.

For example, based on this poll (with small sample size: ~40 respondents), ~40% of users expect that upgrading to a new patch version of a crate should not generate new lints or compiler warnings. The split between expecting a new minor version and a new major version was approximately 3-to-1.

Major version required

Minor version recommended

Project-defined whether major / minor / patch version required

For example, because they are technically breaking but projects may often treat them as non-major.

  • Raising the Minimum Supported Rust Version (MSRV) for the crate
  • Changing the size of a type

General opt-in warnings

Opt-in warnings for difficult-to-reverse changes

  • Removing #[non_exhaustive] from an item
    • Per semver, removing #[non_exhaustive] can be done in a patch release, but adding it back would then require a new major version.
  • Adding an enum variant in a #[non_exhaustive] enum
    • Per semver, adding variants to a non-exhaustive enum can be done in a patch release, but removing them again afterward would require a new major version.
  • Removing the last non-pub field in an exhaustive public struct
    • Structs that are not #[non_exhaustive] and have only public fields can be constructed with a struct literal. Removing the ability to construct a struct with a struct literal is a breaking change and requires a new major version.
  • Making an item importable in more than one way
  • Making a trait object-safe if it previously was not
    • Object safety then becomes part of the API contract, and breaking object safety is semver-major.
  • A 1-ZST (1-byte-aligned zero-sized-type) type no longer being a 1-ZST
  • Leaking or re-exporting another crate's type in one's own API
    • for example, having a function that returns a value of another crate's API
    • this can cause coupling to the other crate's version, and can be a pain
    • there are legitimate reasons to do this sometimes, but it should be an intentional decision and probably worth flagging in review
  • Making a type Send/Sync/Sized/Unpin or other auto traits, when it previously wasn't.
    • this is possible to do indirectly, e.g. by removing the last field that prevented the type from (auto-)implementing those traits
    • reverting this is a breaking change

More checks to triage here

@obi1kenobi obi1kenobi pinned this issue Jul 21, 2022
@obi1kenobi obi1kenobi changed the title Tracking issue: Useful non-semver warnings Tracking issue: Additional checks, both semver and non-semver Jul 22, 2022
@epage epage added A-lint Area: new or existing lint E-help-wanted Call for participation: Help is requested to fix this issue. labels Aug 9, 2022
@obi1kenobi obi1kenobi added the E-mentor Call for participation: Mentorship is available for this issue. label Aug 10, 2022
@CAD97
Copy link

CAD97 commented Aug 23, 2022

Own public types should be Debug:

That's already available in the compiler as #[warn(missing_debug_implementations)], isn't it?

@obi1kenobi
Copy link
Owner Author

That's already available in the compiler as #[warn(missing_debug_implementations)], isn't it?

Oh, neat, TIL. It appears to be allowed by default and has to be enforced by manually enabling the check. In that case, perhaps the wish-listed query should be checking that #![deny(missing_debug_implementations)] is set instead.

@epage
Copy link
Collaborator

epage commented Aug 23, 2022

This also gets into a conversation that I think we only had over zulip so good to summarize here.

Especially if we want this in cargo some day, I think we should clearly define the scope.

cargo clippy is meant for linting an API as it exists

cargo semver-checks would be meant for linting changes in an API

  • missing_debug_implementations is an example of something that imo doesn't belong in cargo semver-checks
  • Linting that a lint is enabled is both getting a bit meta and again something that should be out of scope

Misc notes

  • Making it easier to add lints to clippy is a conversation with the clippy folks and they are interested in solving it
  • User-generated lints in either type of tool shipped with rustup would likely be marked as unstable initially. A path to being stable is dependent on how comfortable people are on stabilizing the query language and the data model which is a large surface area
  • In the mean time, there could be room for a linter that handles user defined lints.

@obi1kenobi
Copy link
Owner Author

One possible way forward would be something like:

  • Extract the data model components (the Trustfall schema and adapter) into a library crate (essentially Allow usage as a library, not just as a binary. #67).
  • Make cargo-semver-checks be just a set of semver queries + a binary that wraps that library crate to execute those queries.
  • Make one or more other tools for the other use cases: any queries that don't fit within the current cargo-semver-checks / clippy domains, custom user-specified queries, etc.

That way, we could easily experiment with querying for more things without bloating the scope of cargo-semver-checks and without making the integration into cargo messy.

I think extracting the data model into a library crate is pretty straightforward and I would be happy to do it if that's what we decide is the best path forward.

@aDotInTheVoid
Copy link

aDotInTheVoid commented Aug 24, 2022

@obi1kenobi
Copy link
Owner Author

Thanks, added to the list! If you'd like to try your hand at it, this lint is probably easier than pub fn changed return type since the actual check is less complex, and I'd be happy to mentor.

@oskgo
Copy link

oskgo commented Aug 30, 2022

I think "trait added method" might be a bit more complicated.

The way I see it adding default methods is fine, and even adding non-default methods is fine, so long as the trait cannot be implemented by an external user. This can be the case for example when using a private super trait or blanket impls. Especially sealed traits are a common pattern in Rust.

@epage
Copy link
Collaborator

epage commented Aug 30, 2022

even adding non-default methods is fine,

I believe trait added methods are a minor compatibility break. The standard library team is running into this problem with moving functions from the extension trait in itertools to Iterator which is causing them to write a new feature to support this due to the pervasiveness of itertools.

@obi1kenobi
Copy link
Owner Author

obi1kenobi commented Aug 30, 2022

Non-defaulted items of any kind in a trait that is implementable outside its crate are semver-major, because any implementers must add the new items: https://doc.rust-lang.org/cargo/reference/semver.html#trait-new-item-no-default

Defaulted items in a trait are trickier. They are definitely at least minor, but could be major as well; some such circumstances are described in the semver reference which shows this as a "possibly-breaking" change: https://doc.rust-lang.org/cargo/reference/semver.html#trait-new-default-item

I believe trait added methods are a minor compatibility break. The standard library team is running into this problem with moving functions from the extension trait in itertools to Iterator which is causing them to write a new feature to support this due to the pervasiveness of itertools.

I believe this might be due to the introduced ambiguity between the built-in Iterator trait and its itertools analogue, which is captured in the breaking example of the possibly-breaking entry I linked above.

@jplatte
Copy link

jplatte commented Aug 30, 2022

it's possible to go from e.g. taking &str to taking S: Into<String> without breaking

This is not true, changing an argument type from a concrete type to a generic will break calls like the_function(foo.into()), which only works for non-generic functions because the parameter type guides type inference. There are cases where changing a parameter types as well as a return type is non-breaking though:

  • Removing trait bounds on parameter types, e.g. x: impl Foo + Bar to x: impl Foo
  • Adding trait bounds to an existential return type, e.g. -> impl Foo to -> impl Foo + Send

@CAD97
Copy link

CAD97 commented Aug 30, 2022

I want to note that while Iterator/Itertools is a good example of the issue, it's a symptom of the wider semver-minor upgrade hazard of adding any new items.

This happens because of how name lookup works in Rust, since Rust allows arbitrary namespace mixins.

  • Adding an item to a trait is name resolution inference breaking, as it could conflict with another trait item where both traits are implemented for the same type.^[Preventable if the trait is sealed and implemented only for types you control.^[If implemented on upstream types, still potentially breaking, if upstream updates; generally we blame downstream if the inference breakage does not happen without downstream, even if it is triggered by updating upstream.]]
  • Adding a (public) item to a struct/enum is name resolution inference breaking, as it could conflict with a trait item implemented for the type, changing the name from referring to the trait associated item to referring to the struct/enum associated item.
  • Adding a (public) item to a module is name resolution inference breaking, as downstream could be glob importing your module's contents and another module's contents which defines the same name, causing the name to be ambiguous between your and the other module.

In other words, in a pedantic mode, semver-checks'd be justified on requiring minor for any new public item. Even weakening generic requirements might cause inference issues, so e.g. --strict-pedantic should probably require a minor bump for any change to the public API's types; IIUC this matches the intent of semver-minor's "new feature" trigger as well, since the API by construction has new API surface.

In practice, API evolution in this manner is necessarily considered perfectly acceptable, and is imho very rarely worth warning for. It's a subjective evaluation of how likely both that a name conflict is possible and that some downstream would have both names in scope simultaneously; in most cases this is reasonably rare because of the convention to avoid glob imports^[If you want a version of the lint which can fire without firing on every API change, consider linting only for new trait associated items reachable through a module called prelude, since that's likely designed for glob importing.], and there's not really a good analytical way to determine the risk of a non-globbed name conflict to provide a lint cutoff better than yes/no.

Iterator is an especially interesting case because it's a language item trait in the prelude. User types don't have this exacerbating factor (being implicitly available everywhere) for this concern.

@CAD97
Copy link

CAD97 commented Aug 30, 2022

Adding trait bounds to an existential return type, e.g. -> impl Foo to -> impl Foo + Send

Note that RPIT already "leaks" autotraits (Send/Sync/Unpin), so that isn't actually a return type refinement.

Actually refining the return type is not-inference-breaking, though you still run the risk of being name-resolution-breaking (e.g. refining to a concrete type or even adding a new guaranteed trait could cause a name conflict with newly applicable extension traits).

@epage
Copy link
Collaborator

epage commented Aug 30, 2022

In practice, API evolution in this manner is necessarily considered perfectly acceptable, and is imho very rarely worth warning for

The fact that there is a lot of nuance to semver and some parts that are contextual is why I feel like #58 is going to be important.

@obi1kenobi
Copy link
Owner Author

type no longer implements pub trait

How hard is this one to implement? I assume this covers things like removing the From impl from an error type? I'd like to try myself on that one.

It does cover removing From from a type. Unfortunately, From is generic, and queries over generics are blocked on #241 as the hardest-to-design bit of schema in that issue. I wouldn't recommend it as a starting point, since it's likely to turn into yak shaving.

A better first issue would be something like #368 where the design is reasonably clear already.

In the meantime, I'm going to migrate the adapter implementation from Trustfall v0.2 to Trustfall v0.3 and take advantage of the massively improved ergonomics therein. If you'd like, I can loop you into that as well!

@thomaseizinger
Copy link
Contributor

thomaseizinger commented May 8, 2023

Leaking or re-exporting another crate's type in one's own API

* for example, having a function that returns a value of another crate's API

* this can cause coupling to the other crate's version, and can be a pain

* there are legitimate reasons to do this sometimes, but it should be an intentional decision and probably worth flagging in review

Two thoughts regarding this:

  1. It would be great if we could somehow specify the intention that a certain item is meant to be hidden from the public API. For example, it is very easy and common to leak a dependency via a From impl for that dependencies Error type.
  2. A reasonable default for the above could be to lint against all dependencies that are < 1.0 and appear in the public API. For a crate that is itself < 1.0, this could be allowed by default but as soon as you bump to 1.0, it should be a warn. If a crate wants to stabilise their public API, they can then opt-in to that lint ahead of time.

See https://rust-lang.github.io/api-guidelines/necessities.html#public-dependencies-of-a-stable-crate-are-stable-c-stable.

@epage
Copy link
Collaborator

epage commented May 9, 2023

It would be great if we could somehow specify the intention that a certain item is meant to be hidden from the public API. For example, it is very easy and common to leak a dependency via a From impl for that dependencies Error type.

Except there is no way to convey this intention to your users. If you implement a public trait on a public type, then that is a compatibility boundary. I avoid From for error types for this very reason.

@thomaseizinger
Copy link
Contributor

It would be great if we could somehow specify the intention that a certain item is meant to be hidden from the public API. For example, it is very easy and common to leak a dependency via a From impl for that dependencies Error type.

Except there is no way to convey this intention to your users. If you implement a public trait on a public type, then that is a compatibility boundary. I avoid From for error types for this very reason.

What I meant was, I want to specify to cargo semver-checks that I want crate XYZ not in my public API. If I make a mistake and still include it, it should generate a warning.

@epage
Copy link
Collaborator

epage commented May 9, 2023

@Nemo157
Copy link
Contributor

Nemo157 commented Nov 30, 2023

Would it be useful to have a list of known undetected breakages to test against too? RustCrypto/elliptic-curves#984 isn't detected currently and doesn't appear to match any of the checks in the list, it's something like "trait associated type added new required bound".

@obi1kenobi
Copy link
Owner Author

Thanks! I updated the list to add that check together with the analogous one for removing bounds from an associated type.

Would it be useful to have a list of known undetected breakages to test against too?

Could you say more about this? I'm curious what form this list would take, and how it would be related to / different from the list in this issue.

@Nemo157
Copy link
Contributor

Nemo157 commented Nov 30, 2023

Rather than being a list of checks, just a list of version pairs that have seen known ecosystem breakage, but pass all current checks. Maybe even something that can run in CI automatically to see if they start being detected.

@obi1kenobi
Copy link
Owner Author

Sorry, I'm still having a bit of trouble understanding the exact suggestion, and who the target audience is / how they benefit.

Would this list of version pairs be something posted in this issue, or part of cargo-semver-checks itself in some way?

When you say "run in CI automatically," is that referring to cargo-semver-checks' own CI, or in the CI of users of cargo-semver-checks?

Sorry I'm having a hard time following here. If it's easier to "show, not tell" I'd be happy to look at a PR too.

@Skgland
Copy link

Skgland commented Dec 18, 2023

Based on rust-lang/rfcs#3535 (comment) which I reproduced in https://github.com/Skgland/rust-semver-break.

It is currently possible in some cases to match non-exhaustive structs exhaustively,
resulting in a breaking change if such a struct is change to have more states (i.e. by adding a field with more than one value).

This is the case if the struct is StructuralPartialEq (constants of the type can be used as a pattern in match) and all possible values of the struct have an accessible constant.

This appears to be missing from this list, though I dought that it is feasible to detect.

@obi1kenobi
Copy link
Owner Author

Wow, that's quite the semver hazard! Thanks for pointing it out.

My preference, as I mentioned in the linked issue, would be to either error or lint on this inside rustc or clippy, since a #[non_exhaustive] type having exhaustive semantics seems to me like an accidental language or compiler bug.

If that doesn't pan out, we can look into our options here and see if we can check for StructuralPartialEq in some way.

@parasyte
Copy link

"Auto trait impls for impl Trait in return type" came up recently on URLO: Implicit Unpin on impl Any, breaking change possible

@obi1kenobi
Copy link
Owner Author

I have figured out how to properly check if a trait is sealed or not, and I've opened #870 with a list of related lints that are now ready to be implemented!

This took 9 months to get right, and I'm excited it's finally there! 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-lint Area: new or existing lint C-enhancement Category: raise the bar on expectations E-help-wanted Call for participation: Help is requested to fix this issue. E-mentor Call for participation: Mentorship is available for this issue.
Projects
None yet
Development

No branches or pull requests