Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Support LLVM12 #503

Merged
merged 15 commits into from
Jul 26, 2021
Merged

[WIP] Support LLVM12 #503

merged 15 commits into from
Jul 26, 2021

Conversation

Jcd1230
Copy link
Contributor

@Jcd1230 Jcd1230 commented Jul 10, 2021

My attempt at getting Terra to build with LLVM12 on ArchLinux.
On my machine at least, this works and it seems all the tests run successfully most of the time. A few tests seem to be segfaulting once every few runs:

  • shallowfreeze.t
  • cconv.t
  • compile_time_array3.t
  • bug_372_perf.t

Not sure what the implications are of removing the setUseOrcMCJITReplacement line (and the OrcMCJITReplacement.h include), but it at least builds and runs!

Have not tested any other OS, I'm not very experienced with CMake so I would appreciate any help if someone wants to try and make my CMakeLists.txt changes work across multiple OSes and/or make it still be compatible with older LLVM versions (Not sure the best way to test this myself)

@Jcd1230
Copy link
Contributor Author

Jcd1230 commented Jul 10, 2021

Ah, I'll put in some #ifs so that this only affects builds with LLVM12

@Jcd1230 Jcd1230 changed the title Use LLVM12 (Linked dynamically) [WIP] Support LLVM12 Jul 10, 2021
@Jcd1230
Copy link
Contributor Author

Jcd1230 commented Jul 11, 2021

Think I got the conditionals right to let it build correctly on appveyor with the older LLVM versions, and it still builds for me with LLVM12.
Still haven't figured out how to get the static linking working; I cleaned up my cmake changes, I can't get it to build without linking clang-cpp, which does not seem to have a static version. Adding clang-cpp fixes the build, but I also have to link LLVM dynamically or else I get the same runtime error as this issue: #357. Since that references Arch as well, the static/dynamic linking build issues might be Arch only.

@elliottslaughter
Copy link
Member

For what it's worth, macOS with LLVM 12 built from source is giving me:

ld: library not found for -lclang-cpp
clang: error: linker command failed with exit code 1 (use -v to see invocation)

So I think we need a more sophisticated way to figure out when we need this. It may make sense to split out that aspect of the PR from the LLVM 12 parts.

@elliottslaughter
Copy link
Member

When I remove that, it builds, and ctest -j4 gives me:

99% tests passed, 4 tests failed out of 537

Total Test time (real) =   7.96 sec

The following tests FAILED:
	 20 - arrayt2.t (Subprocess aborted)
	 56 - cconv.t (Subprocess aborted)
	 73 - compile_time_array3.t (SEGFAULT)
	376 - objc2.t (Subprocess aborted)

So that looks pretty good. I haven't taken a look at the specific tests to figure out what's going wrong there yet.

@elliottslaughter
Copy link
Member

Please note: I pushed a rebase to pick up some CI upgrades that should hopefully fix the compiler issues we were seeing.

@elliottslaughter
Copy link
Member

I was digging a bit more into why this patch works, given that it seemingly removes the OrcMCJITReplacement that was enabling us to use ORCv1 (apparently since LLVM 5).

As best I can tell, ORCv1 really is gone, along with OrcMCJITReplacement [1]. This is confirmed by the ORCv1 to v2 transition docs (web archive).

Therefore, it seems that the removal of the call to setUseOrcMCJITReplacement is putting us back on... MCJIT itself. Yes, that's right: the JIT we used before the OrcMCJITReplacement for ORCv1 in the first place. Which is hilarious, given noise on the LLVM mailing list that MCJIT was supposed to be deprecated and scheduled for removal five years ago. I guess MCJIT outlived ORCv1.

Anyway, I guess it's fair game to use MCJIT since it's still around and LLVM has seen fit to maintain it for longer than its supposed replacement. But I think this does mean that at some point we need to go back to the ORC transition documents and take a harder look at what needs to be done. I don't think it'll be as easy as just adding or removing a function call.

I'm fine with this PR as-is, because I still think it's progress, but note that because we're technically reverting to an older JIT, I'm not sure what issues might pop up that we (and the LLVM team at large) have likely been ignoring since version 5. Someone will actually need to look at the failing tests and figure out what's going on there.

[1]: There are a couple of vestiges hanging around like the OrcMCJITReplacement.h and LLVMLinkInOrcMCJITReplacement function, but these don't actually lead anywhere.

@elliottslaughter
Copy link
Member

Just FYI, I reverted the CMake change since it's breaking in CI too (see here):

/usr/bin/ld: cannot find -lclang-cpp
collect2: error: ld returned 1 exit status

I think we're going to have to look into a separate flag, possibly TERRA_STATIC_LINK_CLANG in order to support this. But at this point it would be better as a separate PR.

@Jcd1230
Copy link
Contributor Author

Jcd1230 commented Jul 15, 2021

Well, I've been playing around with my LLVM12 version and I was running into some weird, sporadic segfaults.
It seems to be significantly correlated to when I was passing structs by value.
While I was debugging this, I found out that saveobj did not work ever when I was passing a struct by value.
I boiled it down to this test case, which fails with my build of Terra/LLVM12.

Summary: Seems to have a problem passing structs by value IF the struct size is >128 bits
(Other examples, such as 5 int32s or 3 pointers in the struct also cause the crash)

struct T {
	a: int8[17] --16 works but 17 fails. 128 bit maximum?
}

terra useStruct(t: T)
end

terra main()
	var example = T {}
	useStruct(example)
end

terralib.saveobj("a.out", "executable", {main=main})

Causes the output:

Attribute 'byval' type does not match parameter!
  call void @"$useStruct"(%1* byval(half) %1)
in function main
LLVM ERROR: Broken function found, compilation aborted!

I couldn't find any simple example of a runtime segfault when calling the function rather than compiling it, but I suspect the issue is related.

I tried the binary release and it passes the test just fine, so I guess this is probably an incompatibility with LLVM12

@Jcd1230
Copy link
Contributor Author

Jcd1230 commented Jul 16, 2021

To add on to my previous comment,
The type in byval() is not always half, it seems to just be like a random type, like something is uninitialized.
After quite a few tries I got

  • double
  • float
  • bfloat
  • ppc_fp128
  • x86_fp80
    And some other crazy stuff like this error:
Attribute 'byval' type does not match parameter!
  call void @"$useStruct"(%1* /usr/lib/libLLVM-12.so(_ZN4llvm11raw_ostream21flush_tied_then_writeEPKcm+0x7a) [0x7faedb2fe76a]
[0x55f68bc047]

So, definitely something wrong going on with passing large structs by value.

@elliottslaughter
Copy link
Member

Are you saying the code generated by Terra is nondeterministic? That would definitely be bad, regardless of what exactly that generated code is.

You might find it easier to verify with a .ll output, though I'm not sure LLVM will let you get that far if it validates the IR eagerly.

Fixes "Attribute 'byval' type does not match parameter!" errors
@Jcd1230
Copy link
Contributor Author

Jcd1230 commented Jul 18, 2021

Well! That was an adventure. I think I got it all figured out & fixed.
I eventually stumbled onto this [1] golang issue that ran into the same error message as me. I took a look into the commit [2] that fixed it, and it had to with replacing Attribute::ByVal... after a while of trying to copy that fix I found this LLVM doc page [3] with some helpful info about LLVM attributes, in particular:

Because attributes are no longer represented as a bit mask, you will need to convert any code which does treat them as a bit mask to use the new query methods on the Attribute class.
So, I had to replace Attribute::ByVal and Attribute:SRet with the corresponding methods (::getWithByValType/::getWithSRetType), and that seems to fix it!
Notably, the new way requires the types of those parameters to be passed in when the attribute is created - Hence why it seemed like the type was garbage/uninitialized before.

Heres a small test that I ran to check that the fix worked:

struct Big {
	a: int8[1000]
}
terra example(t: Big)
end
print(example:disas())

Terra Binary Release:

define dso_local void @"$example"(%0* byval) {
entry:
  ret void
}

This branch now:

define dso_local void @"$example"(%0* byval(%0) %0) {
entry:
  ret void
}

(Note the extra (%0) %0 after the parameter.. Not sure if that is a problem or if thats just how it should be now)

[1] golang/go#43870
[2] lumontec/gollvm@6b4233f
[3] https://llvm.org/docs/HowToUseAttributes.html

@Jcd1230
Copy link
Contributor Author

Jcd1230 commented Jul 18, 2021

Looks like this might have fixed the last few tests that were failing randomly too. Ran them all 5 times and no failures!

@elliottslaughter
Copy link
Member

I think something's missing an LLVM version bound.

@Jcd1230
Copy link
Contributor Author

Jcd1230 commented Jul 19, 2021

Just pushed it a bit hastily, wrong name for the attribute in the old LLVM. I might try to set up a docker container so I can test the older versions myself.

@elliottslaughter
Copy link
Member

Looks like LLVM 12 is unhappy again.

If you want to try out Docker, this Dockerfile may help you get started. Though you will have to bump it up to Ubuntu 21.04 in order to get LLVM 12.

https://github.com/terralang/terra/blob/master/docker/Dockerfile.ubuntu

(I certainly agree it's faster to iterate this way than to push everything through CI, though I will keep approving runs on my end.)

src/llvmheaders.h Outdated Show resolved Hide resolved
@Jcd1230
Copy link
Contributor Author

Jcd1230 commented Jul 21, 2021

In case it was not noted, the CMake build was fine, it was just the regular Makefile that was causing problems...
It seemed to be some combination of the Makefile missing libPolly from the LLVM libs, as well as the linker+unpacklibraries.lua script not doing a great job of including everything it needed.
At least for me, this fixes all the undefined references on Arch as well as an Ubuntu 18.04 docker container, and fixes the tests that were erroring with 'JIT not linked'.
Not sure if this is going to mess up the older versions, since my best guess is I think the linker output from (-Wl,)-t has changed (since older Ubuntu's). Also not sure if this is going to end up including more stuff than the CMake build does.

@elliottslaughter
Copy link
Member

I don't object to the Makefile changes in principle, but it does seem to have messed something up in the Ubuntu 16.04/LLVM 3.8 case (and possible others, but the CI got killed before it checked the rest). So if there's a way to harmonize that better, I'd appreciate it. But we use ar in the CMake code path (which we do test on LLVM 3.8), so hypothetically it should be possible to make it work....

@Jcd1230
Copy link
Contributor Author

Jcd1230 commented Jul 21, 2021

Well, first off looks like no pushd/popd apparently on sh the CI, I'll switch it to just cd

/bin/sh: 6: popd: not found
/bin/sh: 4: pushd: not found

More importantly, the linker output is indeed different, so I will have to add something to normalize it.
My version outputs just the library names, whereas the older versions output library/object:

(/usr/lib/llvm-3.8/lib/libLLVMSupport.a)Errno.cpp.o

(which is what unpacklibraries.lua was relying on.)
The new linkers can output that as well if you pass -t twice, however I was still running into issues doing it that way.

@elliottslaughter
Copy link
Member

In case it helps, I think there was an attempt to fix this in #459 at some point, but I don't remember if anyone made an effort to make this fully backwards compatible with the old versions.

Changes the way LLVM & Clang object files are detected & extracted in
the default Makefile to handle a wider variety of linker `-t` output formats.
@Jcd1230
Copy link
Contributor Author

Jcd1230 commented Jul 22, 2021

Hm... Not sure extracting all the static libs is necessarily a good idea. Looks like its failing because its never pulling in LLVMObjCARCOpts.a, which is on the linker command line, but the trace does not output it, so it does not get extracted into llvm_objects/
It works in my container with manually extracting libLLVMObjCARCOpts.a to that spot, but I'm wondering why it needs that in the first place since the trace does not output it. I guess since I made it extract and use whole static libraries, its ending up needing more dependencies since its pulling in extra files it doesn't need.

If you have any ideas let me know, otherwise I think I'm going to try reverting it back to the old method but with a special case for the new linker output.

@elliottslaughter
Copy link
Member

I think that's fine.

My general attitude about the Makefile is that we need to not break it, but it doesn't need to be pretty. It's formally deprecated, and while it won't be going away any time soon, all new feature work/polish/cleanup beyond the bare minimum is going into CMake. So if there's a reasonable but slightly ugly solution for Make that gets us past this, I'd just go for it.

@Jcd1230
Copy link
Contributor Author

Jcd1230 commented Jul 22, 2021

I decided to go the other route of just including all the LLVM libs, which let me clean up some of the craziness that was going on. I'm guessing this will probably increase the binary size, since its no longer filtering out the libs it does not need, and it makes it required to link to libffi and libedit, but it seems like this is closer to what the CMake build is doing, and is a lot simpler of a solution. Removed the REEXPORT_LLVM_COMPONENTS option as well, since it shouldn't be necessary either as LLVM is included in its entirety always now.

Let me know if you think thats too drastic, though to me it seems like a good way of supporting the Makefile builds easier going forward.

@Jcd1230
Copy link
Contributor Author

Jcd1230 commented Jul 22, 2021

Hm, looks like it needs libPolly. Oddly I didn't need that to build it with LLVM12. Polly is listed in the LLVMExports cmake file, not sure if theres a way to determine that with llvm-config. I'll be looking into it

@elliottslaughter
Copy link
Member

If it can be made to work I'm fine with this approach.

@elliottslaughter elliottslaughter merged commit 13b8b11 into terralang:master Jul 26, 2021
@elliottslaughter
Copy link
Member

Thanks so much for working on this and pushing it all the way through! I've gone ahead and merged the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants