Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get info registers #1

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

get info registers #1

wants to merge 4 commits into from

Conversation

i4ki
Copy link
Contributor

@i4ki i4ki commented Nov 18, 2016

PoC to discuss several things:

  • language used
  • compiler
  • Code style
  • Technique (for getting register info)

and so on.

@geyslan

Signed-off-by: Tiago Natel de Moura <tiago.natel@neoway.com.br>
Signed-off-by: Tiago Natel de Moura <tiago.natel@neoway.com.br>
Signed-off-by: Tiago Natel de Moura <tiago.natel@neoway.com.br>
@i4ki i4ki mentioned this pull request Nov 21, 2016
@i4ki i4ki requested a review from geyslan January 6, 2017 21:28
@geyslan
Copy link
Member

geyslan commented Jan 7, 2017

@tiago4orion, hi there! 😄

language used

For this kind of software, C, undoubtedly. We can try to use C over assembly but assembly still will be necessary.

compiler

Smaller C is an interesting choice. For now, it's ok.

Code style

Perhaps a mix of linux kernel, pike, ritchie...

Technique (for getting register info)

The way it's we're destroying eax data when using it to return. But it can be easily circumvented restoring it after dumping registers into the structure.

If using functions, I think they should be inlined, since it will not mess up with the stack.

Another way could be:

#include <stdio.h>

typedef struct Regs {
        int eax;
        int ebx;
} Regs;

void read_regs(Regs *regs)
{
        register int eax asm ("eax");
        register int ebx asm ("ebx");
}

int main(void)
{
        Regs regs;

        read_regs(&regs);

        printf("eax=%04x\n", regs.eax);
        printf("ebx=%04x\n", regs.ebx);

        return 0;
}

I think that this line is misguided: https://github.com/c0defellas/exp/blob/xxx/xxx/regs.c#L68

Won't it return the address of the code that calls eip()? https://github.com/c0defellas/exp/blob/xxx/xxx/regs.c#L16

That's why I told you about inline functions. #defines perhaps are better in this case.

@i4ki
Copy link
Contributor Author

i4ki commented Jan 7, 2017

For this kind of software, C, undoubtedly. We can try to use C over assembly but assembly still will be necessary.

Yeah, mixing C and assembly in the same file isn't a good idea. It was only the most fast way to get things done. Worst problem with this approach is that C files will have architecture dependent code (32bit x86 in this case). Making the whole project non-portable.

Maybe the best option is just glue C and assembly files at link time. But then, the next question: What would be a good project layout?

What do you think of:

λ> find .
./src
./src/port
./src/port/main.c
./src/386
./src/386/register.asm
./src/amd64
./src/amd64/register.asm
./Makefile
λ> 

Smaller C is an interesting choice. For now, it's ok.

SmallerC is good for start, but not good for long term support. We can support gcc or other compiler from day zero also. We can adjust CI to build both.

The way it's we're destroying eax data when using it to return. But it can be easily circumvented restoring it after dumping registers into the structure.

We're not destrying eax, but just setting the return value of the function to what we want. But maybe I did not understand what you said. We can set the data structure directly also, but this is not much more complex and error prone? Changing order of fields in the struct would not silently break code?

Another way could be:

Sorry, but I don't know this asm semantics. It's something > c89 ? I'm sure it's not ANSI C.
I tested your snippet with gcc and it works, but do not compile with -std=c89.

The use of register keyword is harmful also. It's just ignored on many compilers (including plan9 compilers).

Regarding using inline to avoid corrupting stack, the problem is that it's not ANSI C. Hard to make it work properly among compilers. But this can be enabled on compilers supporting it =)

I think that this line is misguided: https://github.com/c0defellas/exp/blob/xxx/xxx/regs.c#L68

Yeah, thanks for catching this up. This is my default way of getting eip in shellcode, but forgot that in this case it must be the original eip.

That's why I told you about inline functions. #defines perhaps are better in this case.

Maybe this should be addressed in another way. I think inlining functions nor using macros won't work. Separating C and assembly will require a compiler independent solution.

Thank you!

@geyslan
Copy link
Member

geyslan commented Jan 7, 2017

Maybe the best option is just glue C and assembly files at link time.

👍

What would be a good project layout?

Really nice. I liked it. port means portable, am I right?
A few more suggestions:

  • ./src/common
  • ./src/carch (common arch)
  • ./src/march (multiple arch)
  • ./src/aarch (all arch)

We're not destrying eax, but just setting the return value of the function to what we want. But maybe I did not understand what you said.

We're relying on the calling convention, so nothing granted here for other archs and conventions.

i32 eax() {
        /* by calling convention */
}

Getting eax register we make two calls (newreg() and eax()), so we're possibly tainting the registers depending on the arch. If this it's only a dump triggered when something crashes I think that tainting after retrieving the real value isn't a problem. But if we want to debug and retake things the way they were, it's better rethink it.
Maybe the best approach is to save regs into the stack before retrieving it, without touching calling conventions.
x86 has instructions to push general purpose registers into the stack. Their advantage is that they don't taint other registers too.

Sorry, but I don't know this asm semantics. It's something > c89 ? I'm sure it's not ANSI C.

ISO/IEC 9899:TC3

  • J.5.10 The asm keyword
    The asm keyword may be used to insert assembly language directly into the translator output (6.8).
    The most common implementation is via a statement of the form: asm (character-string-literal);

The use of register keyword is harmful also. It's just ignored on many compilers (including plan9 compilers).

Could you point me some references?

Regarding using inline to avoid corrupting stack, the problem is that it's not ANSI C. Hard to make it work properly among compilers.

It's in C99. Is C89 a bit oldish, perhaps? Let's discuss about standards versions, pointing out their benefits and drawbacks.

Hard to make it (inline) work properly among compilers. But this can be enabled on compilers supporting it =)

Inline is complicated too. From wikipedia

... it serves as a compiler directive that suggests (but does not require) that the compiler substitute the body of the function inline by performing inline expansion...

Now I'm only seeing macros as a solution. What do you think about inline vs macro?

Yeah, thanks for catching this up.

👍 I realized that that code is shellcode pimp style. 😄

@i4ki
Copy link
Contributor Author

i4ki commented Jan 7, 2017

J.5.10 The asm keyword
The asm keyword may be used to insert assembly language directly into the translator output (6.8).
The most common implementation is via a statement of the form: asm (character-string-literal);

By ANSI C, as the statement above says, the asm keyword only puts the assembly verbatim in-place.
But your function is:

void read_regs(Regs *regs)
{
        register int eax asm ("eax");
        register int ebx asm ("ebx");
}

eax isn't valid ISA mnemonic, then the semantic of your function above isn't ANSI C, maybe C99.

Getting eax register we make two calls (newreg() and eax()), so we're possibly tainting the registers depending on the arch. If this it's only a dump triggered when something crashes I think that tainting after retrieving the real value isn't a problem. But if we want to debug and retake things the way they were, it's better rethink it.
Maybe the best approach is to save regs into the stack before retrieving it, without touching calling conventions.
x86 has instructions to push general purpose registers into the stack. Their advantage is that they don't taint other registers too.

newreg is the constructor of object Regs, it was intended to initialize the struct, not to get registers every time. But I agree with you that relying on the calling convention is not a good idea.

Ok, to align expectations I'll re-post here the initial proposal of the software to help we define what would be the next PoC.

@geyslan What do you think of start coding a software like this:

$ ./xxx
> r32<ENTER>
eax=00000000    esp=0000ff00
ebx=00000000    ebp=00000000
ecx=00000000    esi=00000000
edx=00000000    edi=00000000
eip=00004dc0

> r16
ax=0000    sp=ff00
bx=0000    bp=0000
cx=0000    si=0000
dx=0000    di=0000
cs=1234    ds=0000
es=0000    fs=0000
gs=0000    

 > rs
cs=1234    ds=0000
es=0000    fs=0000
gs=0000    

> mov ax, 1337h
> r ax
ax=1337

> load ./cat
Binary loaded at address 0xffff1000
> mov eax, $LOADADDR # 0xffff1000
> mov byte bl, [eax]
> mov byte bh, [eax+1]
> ;;; test ELF
> mov ebx, [eax+entry]
> mov eax, ebx
> jmp [eax]
> script
while (eip != main) {
    disas(eip)
    r32
}
> 

You are right about pusha/popa, we'll definitely need them, but I was thinking in use it to save/restore registers before/after every call into debugger code. But maybe the correct approach should be getting register data directly from memory. The debugger will have a parser of assembly syntax and maintain the plain list of opcodes to execute in memory. The entire state of the debugger will have to be saved every time we switch between usercode <-> dbg code.

I think we'll need to make a new PoC with a basic interpreter working to validate the ideas. You must be right regarding adding a macro to save/restore registers before entering/exiting dbg managed code.
Another problem is that baremetal implementation will require save/restore registers at each keyboard handling routines also.

Thanks =)

@i4ki
Copy link
Contributor Author

i4ki commented Jan 7, 2017

The use of register keyword is harmful also. It's just ignored on many compilers (including plan9 compilers).
Could you point me some references?

Sorry, forgot to answer that question. The register keyword is only a hint to compiler, an optimizing feature that could (and probably will) be ignored.
https://github.com/tiago4orion/Papers/blob/master/ansi_C/ansi.c.txt#L4160

The content of the variable will be stored in a register only if the compiler handle register keyword and has free register available to use. Why it's harmful? Because it can destroy global variables stored in register also, if multiple register variable exists at same time (globals and locals). The specification says nothing about multiple register variables at time.

The storage location of a variable is architecture/compiler dependent. We cannot rely on that.

ken Thompson compilers just ignores register as storage class for local variables, but handles extern register for globals. Extern register has great benefits for kernel programming, but very unsafe if not carefully coded.
About Plan9 compilers, see: http://doc.cat-v.org/plan_9/4th_edition/papers/compiler

I'm not 100% sure, but using register keyword could have the drawback of disabling specific gcc code optimizations also. GCC manual says that after setting storage class of a variable to register, this register will be unused even with variable isn't used anymore (dead code elimination do not free it).

@geyslan
Copy link
Member

geyslan commented Jan 7, 2017

You must try __asm__ instead asm.

void read_regs(Regs *regs)
{
        register int eax __asm__ ("eax");
        register int ebx __asm__ ("ebx");
}

gcc -std=c89 works nicely. However I didn't find reference in the standard. See this http://wiki.osdev.org/Inline_Assembly#Using_C99.

Ok, to align expectations I'll re-post here the initial proposal of the software to help we define what would be the next PoC.

Got it. About parsing assembly directly, what do you think of a set of options for setting and loading registers and memory? It may be hard to make a parse or grammars for every architecture/ISA.

Instead of

mov ax, 1337h

we could run sr (set register)

sr ax 1337h

The parsing would be simpler. And the core functions to set a register wouldn't be complex because it will bind to the correct assembly in the specified architecture tree.

Another problem is that baremetal implementation will require save/restore registers at each keyboard handling routines also.

I think it will be necessary. We'll have to manage signals.

@i4ki
Copy link
Contributor Author

i4ki commented Jan 7, 2017

gcc -std=c89 works nicely. However I didn't find reference in the standard. See this http://wiki.osdev.org/Inline_Assembly#Using_C99.

We need to stick with a standard or stick with a compiler implementation. I prefer the first option and I have strong feelings about C99 ... it generates complex binaries only to make syntax sugars in source code (like for(int i = 0; i < x; i++) ... and other syntax constructs inherited from C++).

@i4ki
Copy link
Contributor Author

i4ki commented Jan 7, 2017

Got it. About parsing assembly directly, what do you think of a set of options for setting and loading registers and memory? It may be hard to make a parse or grammars for every architecture/ISA.

Interesting. I need to think more about it. This will need to generate arch dependent code anyway, but maybe simplify things.

@geyslan
Copy link
Member

geyslan commented Jan 7, 2017

@rscampos @raphaelsc

Guys, just so you know about this discussion.

asm("__geteip: mov eax, [esp]\n"
" ret\n"
"call __geteip");
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's awesome.... i didn't know asm allowed to define function like that =)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That inlined assembly works as a label definition. ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the old school asm directive. C and asm was cool before gcc introduced the extended inline asm thing...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GCC dislikes that because people unaware of calling convention could mess things up trying to set local C variables from inside asm directives... To do that, you need to set things directly to ebp+8, ebp+16, and so on (arch dependent) and this could change depending on compiler optimizations (and compiler flags). To pack local variables, the order of variables in the stack could change also.
Then, to use oldish asm directive best to leave it alone inside the function (as I did). Never put C and asm together inside same function...

@geyslan
Copy link
Member

geyslan commented Jan 7, 2017

Interesting. I need to think more about it. This will need to generate arch dependent code anyway, but maybe simplify things.

Perhaps the syntax may be
sr ax=h1337
sr bx=b00101101
sr eax=(0xdeadc0de) using parens ( and ) for loading effective adress
sm (0xdeadc0de)=h1337
sm (0xdeadc0de)=ebx

C99 ... it generates complex binaries only to make syntax sugars in source code (like for(int i = 0; i < x; i++) ... and other syntax constructs inherited from C++).

We could stick with C89 and build the internal hacks as we need them.

@i4ki
Copy link
Contributor Author

i4ki commented Jan 7, 2017

@geyslan Good =)

The bad point is that we cannot just paste assembly code in the debugger interpreter and get things executed on the fly =(

@geyslan
Copy link
Member

geyslan commented Jan 7, 2017

@tiago4orion I see, it's an issue. I have tought that we could use commands or a generic assembly for all archs. However, I think the last idea (own assembly) will demand more work.

@geyslan
Copy link
Member

geyslan commented Jan 7, 2017

Who knows both of them? Noobs like me will be able to use the "easier" commands that translate themselves in actual assembly. I liked it. =)

@i4ki
Copy link
Contributor Author

i4ki commented Jan 7, 2017

@CoolerVoid Hey my friend. Happy to see you here =)
Black magic this wcc =) What I want here is something like the old MS debug command. Did you remember that?

@i4ki
Copy link
Contributor Author

i4ki commented Jan 7, 2017

Who knows both of them? Noobs like me will be able to use the "easier" commands that translate themselves in actual assembly. I liked it. =)

Your proposed syntax could be the architecture independent way of using the debugger =)

@i4ki
Copy link
Contributor Author

i4ki commented Jan 7, 2017

@CoolerVoid I'll definitely watch this presentation. Thanks !

@geyslan
Copy link
Member

geyslan commented Jan 10, 2017

@c0defellas

Just reading A New C Compiler from Ken, got this:

4.2. Calling convention
Rule: ‘‘It is a mistake to use the manufacturer’s special call instruction.’’ The relationship between the
(virtual) frame pointer and the stack pointer is known by the compiler. It is just extra work to mark this
known point with a real register. If the stack grows towards lower addresses, then there is no need for an argument pointer. It is also at a known offset from the stack pointer. If the convention is that the caller saves the registers, then the entry point saves no registers. There is therefore no advantage to a special call instruction.
On the National 32100 computer programs compiled with the simple ‘‘jsr’’ instruction would run in about half the time of programs compiled with the ‘‘call’’ instruction.

It's very nice. 👍

@geyslan
Copy link
Member

geyslan commented Jan 11, 2017

@CoolerVoid I think that debug which @tiago4orion did mention is this https://www.youtube.com/watch?v=s1uOGjt0YJk.

@geyslan geyslan removed their request for review June 17, 2024 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants