Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should std::env::set_current_dir be unsafe? #37984

Closed
fweimer opened this issue Nov 24, 2016 · 26 comments
Closed

Should std::env::set_current_dir be unsafe? #37984

fweimer opened this issue Nov 24, 2016 · 26 comments
Labels
C-bug Category: This is a bug. I-needs-decision Issue: In need of a decision. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@fweimer
Copy link

fweimer commented Nov 24, 2016

The current directory is a per-process resource (at least on POSIX). Changing it could affect what shared objects are loaded by a library, and therefore, there might be an indirect impact on memory safety. Should it be marked unsafe?

@sfackler
Copy link
Member

Any API that loads libraries at runtime and pulls symbols from them is inherently unsafe for a multitude of reasons unrelated to set_current_directory.

@fweimer fweimer changed the title Should std::env::set_current_directory be unsafe? Should std::env::set_current_dir be unsafe? Nov 24, 2016
@fweimer
Copy link
Author

fweimer commented Nov 24, 2016

@sfackler, loading from the current directory is safe if you know what this directory is because you just called set_current_dir. The load path vulnerabilities on Windows and other systems would not apply as a result.

The reference to library loading was an attempt at a formal argument towards lack of memory safety for set_current_dir. Practically speaking, I think it is more important that when processing command line arguments (or files read from the current directory), you typically have a dependency on the current directory of the process. But currently, some other thread can easily call set_current_dir to another value temporarily (perhaps in preparation of starting another process), and there is currently no indication that this is a very bad thing to do.

@sfackler
Copy link
Member

You are always going to be asserting that some symbol is a function with a certain signature or a static of a certain type.

In any case, the convention is to place the unsafety at the closest layer to where the actual badness could result. It is safe to get a raw pointer from a slice, but not safe to dereference it, for example.

It is also unreasonable to make a function unsafe after it is already stable (and has been for over a year).

@retep998
Copy link
Member

retep998 commented Nov 24, 2016

@fweimer If you want to load a specific library, use an absolute path! I can't think of any time when you absolutely must use a relative path to the current directory. Get the current directory once when your process first starts up and then use absolute paths from then on.

@fweimer
Copy link
Author

fweimer commented Nov 25, 2016

@sfackler Is it really impossible already to correct past mistakes? Would it make sense to add a warning to the documentation? Or do you disagree that the issue exists at all?

@retep998
Copy link
Member

set_current_dir is not unsafe because it does not directly cause any memory unsafety. The only unsafety that can result is from other unsafe code assuming the current directory will not change, in which case it is that code's fault for assuming the current directory won't change, and not set_current_dir's fault.

The only way I could see set_current_dir needing unsafe is if it modifies the current directory without any sort of synchronization, but I'm pretty sure chdir is synchronized and I know SetCurrentDirectory definitely is.

@fweimer
Copy link
Author

fweimer commented Nov 25, 2016

@retep998 There is no common object to synchronize on, though. If a function receives a relative pathname (or gets it from the command line or the environment), it has to perform pathname resolution based on the current directory. If there is anything else in the process which temporarily alters the current directory for any purpose, this is not reliable and can give incorrect results. And because there is no obvious lock to take around the current directory change, it will always be risky to call this function.

The conclusions I and others (Rich Felker in particular, with his library-safe functions concepts) is that library code cannot call chdir, ever. I was therefore surprised to see it included in the Rust standard library, without further comment.

@retep998
Copy link
Member

@fweimer That's not memory unsafe. Rust has very specific rules about what is considered unsafe, and setting the current directory is not unsafe by those rules. It's a simple atomic operation to either get or set the current directory. As long as the safety of unsafe code never relies on the incorrect assumption that the current directory will never change, then everything is perfectly fine. If unsafe code ever does make that assumption, then it is that code's fault for doing so

As I said before, it is relatively simple to avoid depending on the current directory. Anything that can take a relative path can also take an absolute path, so just stick to absolute paths everywhere. You can get the current directory once when your program first starts up, cache the result, and then every time you get a relative path as input you can just make it absolute using that current directory you cached. That way if some other code changes the current directory you won't be affected.

@fweimer
Copy link
Author

fweimer commented Nov 25, 2016

@retep998 I was following this precedent:

static mut var: u32 = 0;

fn main() {
   var = 5;
}

This does not compile:

error[E0133]: use of mutable static requires unsafe function or block
 --> t.rs:4:4
  |
4 |    var = 5;
  |    ^^^ use of mutable static

This cannot cause memory safety violations on its own, either.

@hanna-kruppe
Copy link
Contributor

It can (at least, the general feature of mutating statics) -- accesses from multiple threads cause data races, which cause UB, which can in turn cause anything including memory safety violations.

@retep998
Copy link
Member

Meanwhile set_current_dir corresponds to chdir and SetCurrentDirectory and I know for sure that SetCurrentDirectory does not have any data races. Every time it gets or sets the current directory it uses an internal lock. chdir is likely the same.

@steveklabnik steveklabnik added A-libs T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Nov 25, 2016
@vadimcn
Copy link
Contributor

vadimcn commented Dec 2, 2016

A similar problem exists with std::env::set_var and, really, any function that changes the global state (including modification of files). We can't possibly make them all "unsafe".
But I agree, it would be prudent to add a warning in documentation for set_current_dir.

@fweimer
Copy link
Author

fweimer commented Dec 3, 2016

@vadimcn, regarding std::env::set_var, there is a related issue #24741, to which I added a comment.

I must say I find it rather strange that Rust considers writing to a u32 variable potentially unsafe and bans it, while it provides a function which can change the process environment (which is clearly unsafe, see issue #27970).

@vadimcn
Copy link
Contributor

vadimcn commented Dec 3, 2016

@fweimer, concurrent modification of an int is still considered a UB by C/C++ and LLVM.

@retep998
Copy link
Member

retep998 commented Dec 3, 2016

There's several ways to concurrently modify an integer. Unsynchronized which is totally undefined behavior, behind a mutex which is fine, and using atomics which is also fine.

For the current directory it's just a matter of figuring out which method it uses to do the modification. Since on Windows (and probably non-windows) it works with the current directory behind a mutex, it is therefore safe. If it did the modification without any synchronization, then that would be a huge issue, but it doesn't, so everything is fine.

@tbu-
Copy link
Contributor

tbu- commented Mar 7, 2017

@retep998

The WinAPI docs say it's unsafe to call from many threads. Is it actually locked as you describe?

Multithreaded applications and shared library code should not use the SetCurrentDirectory function and should avoid using relative path names. The current directory state written by the SetCurrentDirectory function is stored as a global variable in each process, therefore multithreaded applications cannot reliably use this value without possible data corruption from other threads that may also be reading or setting this value. This limitation also applies to the GetCurrentDirectory and GetFullPathName functions. The exception being when the application is guaranteed to be running in a single thread, for example parsing file names from the command line argument string in the main thread prior to creating any additional threads. Using relative path names in multithreaded applications or shared library code can yield unpredictable results and is not supported.
SetCurrentDirectory function (Windows)

@retep998
Copy link
Member

retep998 commented Mar 8, 2017

@tbu- The current directory is stored in the Process Environment Block (PEB) and every time the current directory is get or set (or any part of the PEB is accessed), it acquires the FastPebLock. I know this because I stepped into the code with a debugger to see what exactly it does. There is no memory unsafety involved in simply getting or setting the current directory. The danger is purely in unsafe code which assumes a given path won't suddenly point somewhere else.

@Mark-Simulacrum Mark-Simulacrum added C-bug Category: This is a bug. I-needs-decision Issue: In need of a decision. labels Jul 26, 2017
@Zoxc
Copy link
Contributor

Zoxc commented Nov 5, 2017

@retep998 Which Windows versions did you test on?

I'm not a fan of relying on undocumented behavior for memory safety. Microsoft could have trivially said that these functions were thread safe, but they explicitly documented that they were not. Anything can happen if SetCurrentDirectory, GetCurrentDirectory or relative paths is used in a process were multiple threads are present.

@retep998
Copy link
Member

retep998 commented Nov 5, 2017

@Zoxc FastPebLock was in the PEB at least as far back as Windows XP. Therefore I am confident that all Windows versions that Rust supports have accesses to the current directory properly synchronized. While you will not get a data race from messing with the current directory, you can get race conditions, where you access the same path twice and get completely different results because the current directory changed. That race condition is what the documentation is referring to, not actual memory unsafety.

@Zoxc
Copy link
Contributor

Zoxc commented Nov 5, 2017

The current directory state written by the SetCurrentDirectory function is stored as a global variable in each process, therefore multithreaded applications cannot reliably use this value without possible data corruption from other threads that may also be reading or setting this value.

This does sound like more than a race condition.

I don't expect Microsoft to remove that lock though. Ideally we'd get them to document it.

@tbu-
Copy link
Contributor

tbu- commented Sep 12, 2018

There are no data races in the implementation on Windows, and neither on the other platforms. Rust supports global mutable variables via atomics. I think this issue can be closed.

@KodrAus
Copy link
Contributor

KodrAus commented Nov 26, 2020

We discussed this in the recent Libs meeting and the answer to the question that preempts this one: could std::env::set_current_dir be unsafe? has to be: no. It's stable and we can't change it.

It's been a few years since we've had activity here so I'll go ahead and close this one, but if any new information comes up we can open new, targeted issues for them.

@KodrAus KodrAus closed this as completed Nov 26, 2020
@briansmith
Copy link
Contributor

@tbu- wrote in 2017:

The WinAPI docs say it's unsafe to call from many threads. Is it actually locked as you describe?

[...] The current directory state written by the SetCurrentDirectory function is stored as a global variable in each process, therefore multithreaded applications cannot reliably use this value without possible data corruption from other threads that may also be reading or setting this value. [...] Using relative path names in multithreaded applications or shared library code can yield unpredictable results and is not supported.
SetCurrentDirectory function (Windows)

As of today, the current version of the SetCurrentDirectory documentation at https://docs.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-setcurrentdirectory?redirectedfrom=MSDN says:

The current directory is shared by all threads of the process: If one thread changes the current directory, it affects all threads in the process. Multithreaded applications and shared library code should avoid calling the SetCurrentDirectory function due to the risk of affecting relative path calculations being performed by other threads. Conversely, multithreaded applications and shared library code should avoid using relative paths so that they are unaffected by changes to the current directory performed by other threads.

The references to corruption have been removed. This is consistent with the locking behavior observed in comments above. My guess is that the locking wasn't originally there, and was added later. So there might be some version of Windows that doesn't have the locking. However, it might be the case that those versions are so old that Rust doesn't support them. Importantly, what matters is not whether there is locking on a particular version of Windows (whichever one we did the debugging on) but that the locking is present in all versions of Windows supported by Rust. (One might infer from the change to the Microsoft docs that all the versions of Windows that are supported by Microsoft do have the locking.)

KodrAus wrote in 2020:

We discussed this in the recent Libs meeting and the answer to the question that preempts this one: could std::env::set_current_dir be unsafe? has to be: no. It's stable and we can't change it.

Currently we think SetCurrentDirectory is memory-safe and unless/until we find a version of Windows supported by Rust where SetCurrentDirectory isn't memory-safe (i.e. there is no locking), it's unlikely any changes to the standard library would be made. However, If we were to find a version of Windows where SetCurrentDirectory isn't memory-safe then I think it would be worth reconsidering makeing env::set_current_dir #[deprecated] at least on Windows, and maybe adding a unsafe fn env::set_current_dir_unchecked() variant, or doing something else.

I'm commenting on this here because env::set_env has a similar issue where it really does need to be deprecated because it really is not memory-safe; see #27970, and we shouldn't just conclude that our hands are tied and we're forced to do nothing, for the sake of backward compatibility.

@ChrisDenton
Copy link
Member

I understand you're making a broader point on libs policy but I wanted to address this specific issue. There has been no change in behaviour here for either supported or unsupported versions of Windows. As the commit that amended the docs says:

Existing text erroneously implied that the current directory is not thread safe.

The old documentation was overzealous in warning about the dangers of implicitly relying on the current directory to be unchanging in a multithreaded program. It was not meant to imply memory safety issues, hence the change in wording. In short, this was a documentation bug that has been fixed.

@briansmith
Copy link
Contributor

The old documentation was overzealous in warning about the dangers of implicitly relying on the current directory to be unchanging in a multithreaded program. It was not meant to imply memory safety issues, hence the change in wording. In short, this was a documentation bug that has been fixed.

Understood. One thing that isn't clear though, is how far back the MS team went to verify the change. I.e. is it their policy to consider versions that are no longer supported to have never existed (especially since, in practice, it is hard for any other policy to be a good use of time, as that's the whole purpose of dropping support)? I do feel some comfort in seeing that that commit was apparently authored by @oldnewthing.

@oldnewthing
Copy link

@briansmith The lock has been there since Win95 and NT 3.51. (My archives don't go back to NT 3.1 but I doubt the story in 3.1 is any different.)

You are correct that when an old version of Windows is dropped, we stop updating the documentation for that obsolete version. For example, InterlockedIncrement originally returned only the sign of the result, but since NT 4.0, it has returns the incremented value. We stopped documenting the old behavior since all applicable versions of Windows are out of support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. I-needs-decision Issue: In need of a decision. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests