Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Windows trampolines for compatibility with Python 3.7 and earlier #8649

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nahco314
Copy link

Summary

uv-trampoline uses Python's zipimport feature in a slightly hacky way to provide exe files, but the current method causes errors in Python 3.7 and earlier.

Specifically, uv-trampoline stores the Python executable path and other information as archive comments in the zip file. However, zipimport before Python 3.7 does not support archive comments (https://docs.python.org/3/library/zipimport.html#:~:text=Changed%20in%20version%203.8%3A%20Previously%2C%20ZIP%20archives%20with%20an%20archive%20comment%20were%20not%20supported ).

When I actually run the installed exe with Python 3.7 on Windows, I get the following puzzling error.

$ pycowsay
  File "C:\Users\nahco\.local\bin\pycowsay.exe", line 1
SyntaxError: Non-UTF-8 code starting with '\x83' in file C:\Users\nahco\.local\bin\pycowsay.exe on line 2, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

This PR will support Python 3.7 and earlier by moving the path to Python, etc. (as well as the exe file content) before the zip portion.

Is this necessary?

Of course, Python 3.7 is EOL.
But it is still used in practice and as long as it is installable with uv, it is a good thing to support it. The change to support it is simple and will not change the cost of maintenance much.

Also, if this change is unacceptable, I think we should display an error indicating that the exe is not supported in Python 3.7 or earlier. In that case I am still willing to create a PR.

Test Plan

Simply install and run some tool using Python 3.7.
I added this for now because I don't really understand the uv code based testing method, but I will fix it if there are any problems.

@zanieb
Copy link
Member

zanieb commented Oct 28, 2024

Thanks for contributing!

Unfortunately, I also worked on the trampoline today and this will conflict with #8637 – I presume that can be reconciled though.

I'm a +1 on this. We'll want to hear from @konstin as well.

cc @samypr100 if you're interested, I know you've done a fair bit of work on the trampolines.

@zanieb zanieb added the compatibility Compatibility with a specification or another tool label Oct 28, 2024
@zanieb zanieb requested a review from konstin October 28, 2024 22:19
@zanieb
Copy link
Member

zanieb commented Oct 28, 2024

I am a bit confused since I thought the trampolines were written when 3.7 was not EOL — did we break support at some point?

@nahco314
Copy link
Author

I'm not sure, but at the time the trampoline in posy was created, 3.7 must have been somewhat old, so it may simply have gone unnoticed.

Copy link
Member

@konstin konstin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm -1 on supporting Python 3.7. Python 3.7 was already EOL when uv was released, and users should upgrade asap rather than patching the support into tools. Otherwise the code looks good.

print_last_error_and_exit("Failed to set the file pointer to the start of zip EOCD");
});

let read_bytes = file_handle.read(&mut eocd_buf).unwrap_or_else(|_| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


// Size of the central directory (in bytes)
let cd_size = u32::from_le_bytes(eocd_buf[12..16].try_into().unwrap_or_else(|_| {
eprintln!("Slice length is not equal to 4 bytes");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be unreachable!()

@T-256
Copy link
Contributor

T-256 commented Oct 29, 2024

Previously raised issue: #2445.
For some works I'm stuck at Python 3.7. On Windows I have this problem that I cannot run apps/scripts directly after activating venv, or neither via uv run.
There is limited workaround: uv run python -m <MODULE> (only for those has same script name exposed as module name).

I can verify uv fully works on Python 3.7 on Windows and the only issue remained is that trampoline issue.

@samypr100
Copy link
Collaborator

-1 for 3.7 support.
I'm not sure if the added complexity is worth it given now that 3.8 is also effectively EOL, so we'd be supporting two EOL versions. In addition, it adds additional overhead to the developer when testing as they'd have to test 3.7 doesn't break the trampolines to @notatallshaw 's point in #7418 (comment)

@T-256
Copy link
Contributor

T-256 commented Oct 29, 2024

That's correct both 3.7 and 3.8 EOL, but I think uv is still able to support them as in:

/// Check if the request is for a version supported by uv.
///
/// If not, an `Err` is returned with an explanatory message.
pub(crate) fn check_supported(&self) -> Result<(), String> {

@zanieb
Copy link
Member

zanieb commented Oct 29, 2024

One problem here is that I need to be able to sniff executables to tell if they're trampolines. I feel like that's trivial with the magic number at the end but kind of annoying here? If we're inverting the format I guess I'd expect

|       `launcher.exe`        |
| :-------------------------: |
| `<b'U', b'V', b'U', b'V'>`  |
| `<len(path to python.exe)>` |
|   `<path to python.exe>`    |
|  `<zipped python script>`   |

right?

@nahco314
Copy link
Author

nahco314 commented Oct 29, 2024

It would be difficult to put the magic number of the UV in a trivial position, since the magic number of the exe must be placed at the beginning of the file and the EOCD at the end.
The inverted version is also difficult because the length of the exe file is non-trivial. (The length of the exe file is known in advance, but this may not be very useful since it can change from version to version.)

I think it is possible to insert the magic number in the last 2 bytes of the EOCD (where the length of the comment is stored). Whether this will work is implementation-dependent, but a normal zip implementation should work fine.

$ ./.venv/Scripts/black.exe --version
black, 23.3.0 (compiled: yes)
Python (CPython) 3.7.9

$ od -c ./.venv/Scripts/black.exe | tail -n 3
0125700 005 006  \0  \0  \0  \0 001  \0 001  \0   9  \0  \0  \0   9 001
0125720  \0  \0   U   V
0125724

@zanieb
Copy link
Member

zanieb commented Oct 29, 2024

Yeah sorry, I forgot that there's the entire trampoline executable itself at the front.

(The length of the exe file is known in advance, but this may not be very useful since it can change from version to version.)

We could hard-code this, I guess. Or pad to a specific size.

I basically need to be able to inspect the trampoline, as in

#[derive(Debug)]
pub struct Launcher {
pub kind: LauncherKind,
pub python_path: PathBuf,
}
impl Launcher {
/// Read [`Launcher`] metadata from a trampoline executable file.
///
/// Returns `Ok(None)` if the file is not a trampoline executable.
/// Returns `Err` if the file looks like a trampoline executable but is formatted incorrectly.
///
/// Expects the following metadata to be at the end of the file:
///
/// ```text
/// - file path (no greater than 32KB)
/// - file path length (u32)
/// - magic number(4 bytes)
/// ```
///
/// This should only be used on Windows, but should just return `Ok(None)` on other platforms.
///
/// This is an implementation of [`uv-trampoline::bounce::read_trampoline_metadata`] that
/// returns errors instead of panicking. Unlike the utility there, we don't assume that the
/// file we are reading is a trampoline.
#[allow(clippy::cast_possible_wrap)]
pub fn try_from_path(path: &Path) -> Result<Option<Self>, Error> {
let mut file = File::open(path)?;
// Read the magic number
let Some(kind) = LauncherKind::try_from_file(&mut file)? else {
return Ok(None);
};
// Seek to the start of the path length.
let Ok(_) = file.seek(io::SeekFrom::End(
-((MAGIC_NUMBER_SIZE + PATH_LENGTH_SIZE) as i64),
)) else {
return Err(Error::InvalidLauncher(
"Unable to seek to the start of the path length".to_string(),
));
};
// Read the path length
let mut buffer = [0; PATH_LENGTH_SIZE];
file.read_exact(&mut buffer)
.map_err(|err| Error::InvalidLauncherRead("path length".to_string(), err))?;
let path_length = {
let raw_length = u32::from_le_bytes(buffer);
if raw_length > MAX_PATH_LENGTH {
return Err(Error::InvalidLauncher(format!(
"Only paths with a length up to 32KBs are supported but the Python executable path has a length of {raw_length}"
)));
}
// SAFETY: Above we guarantee the length is less than 32KB
raw_length as usize
};
// Seek to the start of the path
let Ok(_) = file.seek(io::SeekFrom::End(
-((MAGIC_NUMBER_SIZE + PATH_LENGTH_SIZE + path_length) as i64),
)) else {
return Err(Error::InvalidLauncher(
"Unable to seek to the start of the path".to_string(),
));
};
// Read the path
let mut buffer = vec![0u8; path_length];
file.read_exact(&mut buffer)
.map_err(|err| Error::InvalidLauncherRead("executable path".to_string(), err))?;
let path = PathBuf::from(String::from_utf8(buffer).map_err(|_| {
Error::InvalidLauncher("Python executable path was not valid UTF-8".to_string())
})?);
Ok(Some(Self {
kind,
python_path: path,
}))
}
}

Notably with #8637 there isn't any zip payload at all in this variant, which perhaps simplifies things — we just won't be able to inspect the script variants.

@nahco314
Copy link
Author

Anyway, I will read changes in #8637 .

@samypr100
Copy link
Collaborator

samypr100 commented Oct 30, 2024

Generally speaking I'd assume you'd always want the magic at the beginning or end (not in-between) to avoid issues with the magic casually popping up in-between twice due to other reasons (e.g. particular exe ends up having the magic repeated mid-way) which can happen often depending on the magic.

@nahco314
Copy link
Author

If we parse the EOCD and read the magic number in the middle of the file, the probability of false positives is as low as with the current method (or more).
Perhaps the problem is that the implementation would be somewhat more complicated. If we should to avoid that, padding the exe file sounds like a good idea.

@nahco314
Copy link
Author

One idea: In the case of UVSC, embedding "UV" in the comment length section of the EOCD and using the EOCD's magic number (PK\005\006) for detection allows for a robust and reliable method that only requires static reading.
For UVPY, simply placing "UVPY" at the end is sufficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compatibility Compatibility with a specification or another tool
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants