Disclaimer: The following code sucks. Badly. Please don't consider this typical of what I write; it's a quick hack I threw together in the last hour or so using parts of earlier code written many years ago.
This is the start of the decompiler for Retro 9.2.10. It's based in part on a decompiler from 8.x, and has a few bits from earlier releases.
: h. ( n- ) base @ swap hex . base ! ;
Displays numbers in hexadecimal format, saving and restoring the base.
: lit 1 ;
' lit 1+ @ ' lit 1+ + cell+ constant dolit
This one provides a constant pointing to the internal routine that pushes numbers to the stack. It's definitely not portable, but works. A bit of explanation is in order. Retro 9.x and earlier were coded in assembly for x86 processors. The compiled code is (generally) a series of call, jump, and return instructions. But the calls/jumps are relative, so we can't just fetch the target, we need to adjust it to the actual base address. So:
Compiled code is basically:
call dolit
dd 1
ret
Call is one byte on x86, so: ' lit 1+ will get us to the target (4 bytes). We @ this, then add the address of the call target (' lit 1+ +), and finally adjust by the size of the call target (cell+). Kind of messy, but it works.
variable xt
variable d
variable flag
: scan ( - )
last repeat @ 0; dup :xt @ xt @ =if d ! -1 flag ! ;then again
;
: xt->d ( a-df )
xt ! 0 flag ! 0 d ! scan d @ flag @ ;
To resolve names, we have a word to get the dictionary entry and a success/failure flag from a base address. The use of variables here is due to laziness.
variable addr
This holds the current address being decompiled.
: shift ( n- ) addr +! ;
Advance the addr variable by some arbitrary number of bytes. Each handler will invoke this to cover the byte range being decompiled.
: .name ( - )
1 shift here addr @ @ addr @ + cell+
dup dolit =if ." <>" drop cr ;then
dup xt->d
if :name unpack type cr ;then drop h. cr ;
An ugly way to display either a word name or the address being called/jumped to.
: tab 9 emit ;
: comment tab ." ( " later ." )" cr ;
Output formatting. Ideally tab would be more intelligent and try to keep columns lined up properly, but that's not crucial. The comment word uses later to wrap output in parenthesis.
: call, ." call " .name 4 shift ;
: jmp, ." jmp " .name 4 shift ;
: ret, ." ret" cr 1 shift ;
: lodsd, ." lodsd" tab comment ." drop" 1 shift ;
: inc, ." inc eax" comment ." 1+" 1 shift ;
: dec, ." dec eax" comment ." 1-" 1 shift ;
: stc, ." stc" comment ." false" 1 shift ;
: clc, ." clc" comment ." true" 1 shift ;
: push, ." push eax" 1 shift comment ." >r";
: mov, ." mov [esi],eax" 2 shift cr ;
: pop, ." pop eax" 1 shift comment ." r>" ;
: or, ." or eax, eax" 2 shift comment ." or" ;
: swap, 1+ ." xchg eax, [esi]" comment ." swap" 2 shift ;
: nip, 2 + ." add esi, 4" comment ." nip" 3 shift ;
: .name 1 shift here addr @ @ addr @ + cell+ h. ;
: jz, ." jz " .name ." <>if" 1 shift cr ;
: jng, ." jng " .name ."
: jnz, ." jnz " .name ." =if" 1 shift cr ;
: jnl, ." jnl " .name ." >if" 1 shift cr ;
: cmp, ." cmp eax, [esi]" 1+ comment ." (if)" 2 shift ;
: nop, ." nop" comment ." then" 1 shift ;
: add, ." add eax,[esi]" comment 2 shift ;
: sub, ." sub [esi],eax" comment 2 shift ;
: mul, ." mul dword [esi]" comment 2 shift ;
: show_it dup c@ >r dup 1+ r type r> dup 1+ shift + ;
: string, ." string" tab comment show_it ;
A bunch of words to handle various sequences used. This was all ripped (with minor changes) from 8.x.
: lookup ( n-n )
addr @ h. tab
dup $e8 =if call, ;then
dup $e9 =if jmp, ;then
dup $c3 =if ret, ;then
dup $ad =if lodsd, ;then
dup $40 =if inc, ;then
dup $48 =if dec, ;then
dup $f9 =if stc, ;then
dup $f8 =if clc, ;then
dup $50 =if push, ;then
dup $89 =if mov, ;then
dup $58 =if pop, ;then
dup $09 =if or, ;then
dup $87 =if swap, ;then
dup $83 =if nip, ;then
dup $74 =if jz, ;then
dup $7e =if jng, ;then
dup $75 =if jnz, ;then
dup $7d =if jnl, ;then
dup $3b =if cmp, ;then
dup $90 =if nop, ;then
dup $03 =if add, ;then
dup $29 =if sub, ;then
dup $f7 =if mul, ;then
dup $eb =if string, ;then
." Unknown: " dup h. 1 shift cr ;
I really hate this part. This is a huge if/then block to call the handlers for each starting byte. It'd be better to use a jump table (which was done in 8.x), but this was quicker to implement for testing purposes.
: see ( n"- ) ' addr ! for addr @ c@ lookup drop next ;
A wrapper word to invoke the decompiler. Pass it the number of instructions to decompile, and follow it by the name of a word:
: big 1 2 + . ;
5 foo big
Ok, so that's that. It's at least partially functional, but still has lots of work needed.
No comments:
Post a Comment