Move StateDict to Haliax, clean it up a lot. #90

dlwh · 2024-06-01T07:03:34Z

No description provided.

dlwh · 2024-06-01T07:04:20Z

docs/state_dict.md

@@ -0,0 +1,20 @@
+# Serialization


(The main "draft-y" part of this PR is just the docs.

rjpower · 2024-06-02T17:34:00Z

src/haliax/nn/linear.py

+        *,
+        key,
+        use_bias=True,
+        out_first: bool = False,


Nit: this seems like it only affects the internals... for the purpose of simplifying the torch integration, would it be harmful to make out_first default to True (or remove/ignore it entirely and always output in torch compatible mode?).

If it's a performance issue, maybe transform to the out_first=False mode during loading?

It's more that GPT2's linear was out_first=False and I modeled Haliax's linear on that. I need to go through and make the default True...

rjpower

LGTM, just some nits & thoughts!

rjpower · 2024-06-10T17:43:48Z

docs/state-dict.md

+```
+
+Similarly, to load weights from the state dict, you might need to implement `from_state_dict`. For the case of
+`BackpackLMHeadModel`, we didn't because we can just not call the parent class's `from_state_dict` method.


we didn't because we can just not call the parent class's

For BackpackLMHeadModel, we can use the default implementation of from_state_dict.

(It's not obvious to me why you don't need from_state_dict here, given the remapping that happens on the update_state_dict side).

oops that what I get for writing docs late at night.

The issue is that the backpacklm implementation in levanter enforces tied weights but they're not required to be so in HF. So we have to write it twice in the state dict, but we only bother to read it in once. Not ideal but honestly backpacks are not a high priority and might get chopped.

It's also not the most ideal example, but it's about the only place left where we need update_state_dict

rjpower · 2024-06-10T17:51:49Z

src/haliax/_src/state_dict.py

+    return t
+
+
+def _flatten_to_unflatten(t, state_dict, prefix):


Maybe a quick comment:

"""Flatten the torch compatiblestate_dict before loading into t, and then recover the unflattened layers."""

rjpower · 2024-06-10T17:52:44Z

src/haliax/_src/state_dict.py

+    This applies [haliax.state_dict.from_state_dict][] followed by [haliax.state_dict.unflatten_linear_layers][].
+    """
+    if unflatten_linear:
+        t = _flatten_to_unflatten(t, state_dict, prefix)


You're probably doing this for a good reason, but it seems epsilon-cleaner to unflatten state_dict and then restore into t.

because the modules themselves know what has to be flattened/unflattened (in terms of their specific memvars), and so I'd need to implement parallel logic to do it in the state dict. I did think this solution was a bit too clever, but it works and it lets me be lazy.

(unless i'm missing something!)

Ah, no that makes sense, if the flattening is module-specific, then it does seem kind of a pain to try to hoist something separate alongside. This is fine, it just stuck out to me.

src/haliax/_src/state_dict.py

rjpower · 2024-06-10T17:59:15Z

src/haliax/_src/state_dict.py

+
+
+def default_eqx_module_from_state_dict(mod: Mod, state_dict: StateDict, prefix: Optional[str] = None) -> Mod:
+    key_map: Dict[str, Optional[str]] = getattr(mod, "_state_dict_key_map", lambda: {})()  # type: ignore


Nit: it feels a little magical to have this (e.g. "block": None) vs having the user override to_state_dict entirely and explicitly specifying the names they want. Just a thought, feel free to ignore.

hrm yeah, i guess the parallel in my head was that apply_prefix(None, suffix) == suffix and the None here means the same thing

src/haliax/_src/state_dict.py

rjpower · 2024-06-10T18:13:00Z

src/haliax/_src/state_dict.py

+    Stack all keys matching prefix in a new state dict, returning a state dict that has all keys matching
+    prefix stacked, but otherwise the same.
+
+    Stacked in this case means roughly "compatible with a torch.nn.Sequential", which means that the


This comment is the same for stacked and unstacked below. Maybe:

"The unstacked format is compatible with torch.nn.Sequential, with keys of the form (...). The stacked/vectorized format is required for haliax.nn.Stacked and vectorizes all such tensors into a single shared key.".

oops, thanks

dlwh added 13 commits May 15, 2024 20:40

wip

0609b32

wip state_dict serialization for haliax

80890e8

wip

d361bf7

add flatten_linear_layers to Linear. probably gonna move it out though

0420a1a

this is much cleaner

9e4e857

tests

f1d0c0f

remove old bad jax config bug

68dfb87

oops

02fc32e

ok, the new stacked seems to basically work

54c4bfc

add an explicit to_torch_compatible dict

c7a61dc

make a public model for state

382e405

wip

9061cfb

Merge remote-tracking branch 'origin/main' into state_dict_in_haliax

b64a374

dlwh commented Jun 1, 2024

View reviewed changes

docs/state_dict.md Outdated

@@ -0,0 +1,20 @@

# Serialization

Copy link

Member Author

dlwh Jun 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(The main "draft-y" part of this PR is just the docs.

dlwh marked this pull request as draft June 1, 2024 07:04

dlwh mentioned this pull request Jun 1, 2024

Add factorized llama model for testing. stanford-crfm/levanter#604

Closed

rjpower reviewed Jun 2, 2024

View reviewed changes

dlwh added 2 commits June 8, 2024 23:43

Merge remote-tracking branch 'origin/main' into state_dict_in_haliax

1799fe8

add docs, clean up tests some

4a0c35b

dlwh requested a review from rjpower June 10, 2024 04:46

dlwh marked this pull request as ready for review June 10, 2024 04:46

fix deprecations

0ee7136

rjpower approved these changes Jun 10, 2024

View reviewed changes

dlwh added 2 commits August 26, 2024 23:24

Merge remote-tracking branch 'origin/main' into state_dict_in_haliax

c6bc2de

remove update_state_dict

c877679

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move StateDict to Haliax, clean it up a lot. #90

Move StateDict to Haliax, clean it up a lot. #90

dlwh commented Jun 1, 2024

dlwh Jun 1, 2024

rjpower Jun 2, 2024

dlwh Jun 2, 2024

rjpower left a comment

rjpower Jun 10, 2024

dlwh Jun 11, 2024

rjpower Jun 10, 2024

rjpower Jun 10, 2024

dlwh Jun 11, 2024

dlwh Jun 11, 2024

rjpower Jun 11, 2024

rjpower Jun 10, 2024

dlwh Jun 11, 2024

rjpower Jun 10, 2024

dlwh Jun 11, 2024



		def default_eqx_module_from_state_dict(mod: Mod, state_dict: StateDict, prefix: Optional[str] = None) -> Mod:
		key_map: Dict[str, Optional[str]] = getattr(mod, "_state_dict_key_map", lambda: {})() # type: ignore

Move StateDict to Haliax, clean it up a lot. #90

Are you sure you want to change the base?

Move StateDict to Haliax, clean it up a lot. #90

Conversation

dlwh commented Jun 1, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rjpower left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment