Saturday, April 24, 2010

Simplifying Things

Now that the VM code has been condensed and simplified, I'm turning my focus to the image. No new features, but I want a simple build process, with things bound a bit more tightly.

The work in progress can be found at http://drop.io/retroforth

So far I have a working VM, single file for building a new base image, and makefile to automate the process. I've moved files, sockets, and canvas words into this, so the core stuff now includes all the basics needed. I'll probably move 'words' here as well.

Overall it seems to be working out nicely, so I'll keep going.

Wednesday, April 21, 2010

Shrinking the VM

I'm working on a simplified C implementation of Ngaro; no major changes, but I'm hoping for something a bit easier to use as a base for future experiments. (And hopefully a bit easier to adapt as a skeleton for new ports, etc).

The new code is one file (retro.c), and is 841 lines long. It still has all the major stuff (console i/o, file i/o, sockets, etc), but drops the statistics tracking, endian conversion, and runtime disassembly tracing. It works nicely on my OS X box; I'll be testing on other platforms in the near future.

Expect to see this checked into the repository within the next day or so.

Saturday, April 17, 2010

Online Demonstration

It's live and all links point to it now.

The code for rebuilding the retroImage.js is now in the repository.

Preparing to deploy updated online demonstration

One of the more fun parts of Retro is the online demo. Thanks to the Ngaro JS implementation, it's really easy to test Retro without needing to install it.

Unfortunately, the existing demo is now quite old. It's running a 10.2/10.3 hybrid image, which is far out of date. As part of the work done on 10.5, the various forthlets have been updated to run on the latest image, and I'm now doing final tests of a new image + current Ngaro at http://rx-core.org/jsvm/

Soon this will be rolled out to all mirrors and replace the existing demos. Also, it'll be integrated into the repo, so deploying updates should be easier in the future.

Decompiling code in 9.2.10

Disclaimer: The following code sucks. Badly. Please don't consider this typical of what I write; it's a quick hack I threw together in the last hour or so using parts of earlier code written many years ago.

This is the start of the decompiler for Retro 9.2.10. It's based in part on a decompiler from 8.x, and has a few bits from earlier releases.

: h. ( n- ) base @ swap hex . base ! ;

Displays numbers in hexadecimal format, saving and restoring the base.

: lit 1 ;
' lit 1+ @ ' lit 1+ + cell+ constant dolit

This one provides a constant pointing to the internal routine that pushes numbers to the stack. It's definitely not portable, but works. A bit of explanation is in order. Retro 9.x and earlier were coded in assembly for x86 processors. The compiled code is (generally) a series of call, jump, and return instructions. But the calls/jumps are relative, so we can't just fetch the target, we need to adjust it to the actual base address. So:

Compiled code is basically:

call dolit
dd 1
ret

Call is one byte on x86, so: ' lit 1+ will get us to the target (4 bytes). We @ this, then add the address of the call target (' lit 1+ +), and finally adjust by the size of the call target (cell+). Kind of messy, but it works.

variable xt
variable d
variable flag
: scan ( - )
last repeat @ 0; dup :xt @ xt @ =if d ! -1 flag ! ;then again
;

: xt->d ( a-df )
xt ! 0 flag ! 0 d ! scan d @ flag @ ;

To resolve names, we have a word to get the dictionary entry and a success/failure flag from a base address. The use of variables here is due to laziness.

variable addr

This holds the current address being decompiled.

: shift ( n- ) addr +! ;

Advance the addr variable by some arbitrary number of bytes. Each handler will invoke this to cover the byte range being decompiled.

: .name ( - )
1 shift here addr @ @ addr @ + cell+
dup dolit =if ." <>" drop cr ;then
dup xt->d
if :name unpack type cr ;then drop h. cr ;

An ugly way to display either a word name or the address being called/jumped to.

: tab 9 emit ;
: comment tab ." ( " later ." )" cr ;

Output formatting. Ideally tab would be more intelligent and try to keep columns lined up properly, but that's not crucial. The comment word uses later to wrap output in parenthesis.

: call, ." call " .name 4 shift ;
: jmp, ." jmp " .name 4 shift ;
: ret, ." ret" cr 1 shift ;
: lodsd, ." lodsd" tab comment ." drop" 1 shift ;
: inc, ." inc eax" comment ." 1+" 1 shift ;
: dec, ." dec eax" comment ." 1-" 1 shift ;
: stc, ." stc" comment ." false" 1 shift ;
: clc, ." clc" comment ." true" 1 shift ;
: push, ." push eax" 1 shift comment ." >r";
: mov, ." mov [esi],eax" 2 shift cr ;
: pop, ." pop eax" 1 shift comment ." r>" ;
: or, ." or eax, eax" 2 shift comment ." or" ;
: swap, 1+ ." xchg eax, [esi]" comment ." swap" 2 shift ;
: nip, 2 + ." add esi, 4" comment ." nip" 3 shift ;
: .name 1 shift here addr @ @ addr @ + cell+ h. ;
: jz, ." jz " .name ." <>if" 1 shift cr ;
: jng, ." jng " .name ."
: jnz, ." jnz " .name ." =if" 1 shift cr ;
: jnl, ." jnl " .name ." >if" 1 shift cr ;
: cmp, ." cmp eax, [esi]" 1+ comment ." (if)" 2 shift ;
: nop, ." nop" comment ." then" 1 shift ;
: add, ." add eax,[esi]" comment 2 shift ;
: sub, ." sub [esi],eax" comment 2 shift ;
: mul, ." mul dword [esi]" comment 2 shift ;
: show_it dup c@ >r dup 1+ r type r> dup 1+ shift + ;
: string, ." string" tab comment show_it ;

A bunch of words to handle various sequences used. This was all ripped (with minor changes) from 8.x.

: lookup ( n-n )
addr @ h. tab
dup $e8 =if call, ;then
dup $e9 =if jmp, ;then
dup $c3 =if ret, ;then
dup $ad =if lodsd, ;then
dup $40 =if inc, ;then
dup $48 =if dec, ;then
dup $f9 =if stc, ;then
dup $f8 =if clc, ;then
dup $50 =if push, ;then
dup $89 =if mov, ;then
dup $58 =if pop, ;then
dup $09 =if or, ;then
dup $87 =if swap, ;then
dup $83 =if nip, ;then
dup $74 =if jz, ;then
dup $7e =if jng, ;then
dup $75 =if jnz, ;then
dup $7d =if jnl, ;then
dup $3b =if cmp, ;then
dup $90 =if nop, ;then
dup $03 =if add, ;then
dup $29 =if sub, ;then
dup $f7 =if mul, ;then
dup $eb =if string, ;then
." Unknown: " dup h. 1 shift cr ;

I really hate this part. This is a huge if/then block to call the handlers for each starting byte. It'd be better to use a jump table (which was done in 8.x), but this was quicker to implement for testing purposes.

: see ( n"- ) ' addr ! for addr @ c@ lookup drop next ;

A wrapper word to invoke the decompiler. Pass it the number of instructions to decompile, and follow it by the name of a word:

: big 1 2 + . ;
5 foo big

Ok, so that's that. It's at least partially functional, but still has lots of work needed.

Monday, April 12, 2010

A bit of life in 9.x

A user on the comp.lang.forth is working on a fork of the 9.x codebase with a focus on moving even more out of the initial assembly kernel. While I'm not supporting continued use of 9.x; I am writing a crude decompiler/disassembler for use with the fork.

Sunday, April 11, 2010

More on Prefixes

In the last post, I briefly touched on the overall implementation of prefixes. I'll try to cover them in a bit more detail now.

Retro's listener will call notfound when a word name is not found, and when the input token can not be converted to a number in the current base. Prior to 10.6, notfound just displays some sort of error message. In 10.6, it has been rewritten and extended to do additional checks related to prefixes.

The code:

label: ___ " ___" $,
t: get ( $-$ ) dup, @, ___ # 2 # +, !, 1+, ;
t: try ( - )
TIB # get find
if d->xt @, ___ # find
if dup, d->xt @, swap, d->class @, with-class
pop, pop, 2drop ; then
drop,
then drop, ;
t: filter ( - ) TIB # getLength 2 # >if try then ;
t: notfound ( - ) filter char: ? # emit cr ;

Ok, first thing to note is that the syntax looks a bit unusual. This is written using the metacompiler, which introduces some new rules:

  • t: instead of :
  • # following numbers/constants
  • , following opcode names
Getting this out of the way, we can proceed to break it down.

label: ___ " ___" $,

This line creates a constant pointing to a string containing three underscores. This string is modified later, with the last underscore replaced by a prefix character.

t: get ( $-$ ) dup, @, ___ # 2 # +, !, 1+, ;

The get receives a string from the stack; moves the first character into the last position of ___ and then returns the rest of the provided string.

t: try ( - )
TIB # get find
if d->xt @, ___ # find
if dup, d->xt @, swap, d->class @, with-class
pop, pop, 2drop ; then
drop,
then drop, ;

This is the largest part of the prefix handling. It passes the token in tib to get and then searches for the name (sans the prefix character) in the dictionary.

If found, the xt of the word is returned, and a search is done for __X, where X is the prefix character. If this is found, the xt and class of the prefix handler are pushed to the stack, and with-class is called to execute everything. Also, if found, control is passed directly back to the listener.

If not found, the stack is cleaned up and control passes back to the notfound handler.

t: filter ( - ) TIB # getLength 2 # >if try then ;

To ensure that the prefixes are not searched for with single character tokens we have a simple filter word. A subtle note here: >if and in Retro are inclusive, so this is actually checking for 2 or greater length, not length greater than 2.

t: notfound ( - ) filter char: ? # emit cr ;

And finally the notfound word itself. It calls filter, which then calls try, and so on. If the token + prefix are not found, a ? is displayed and control is returned to the listener.

Ok, now for the prefixes that are provided by default:

t: __& ( a-n ) .data ;

Usage: &name
Action: Return the address (xt) of the word.
Similar To: ['] name

t: __+ ( a- ) .data ' +! # .word ;

Usage: +name
Action: Add a value from the stack to the value stored at name's xt field.
Similar To: name +!

t: __- ( a- ) .data ' -! # .word ;

Usage: -name
Action: Subtract a value from the value stored at name's xt field.
Similar To: name -!

t: __@ ( a-n ) .data ' @ # .word ;

Usage: @name
Action: Return the value stored at the address (xt) field of the word.
Similar To: name @

t: __! ( a-n ) .data ' ! # .word ;

Usage: !name
Action: Store a value to the address field of a name
Similar To: name !

If you want to create your own prefixes, you can do this as well. For a simple example:

: __~ ( a- ) .data ` :see ; immediate prefix

This will create a new ~ prefix that shows decompiled code for the word. Note the prefix at the end of the definition. This is optional, but moves the new prefix handler to the prefixes vocabulary, allowing it to be grouped cleanly with the other prefixes.

Saturday, April 10, 2010

Prefixes

I introduced prefixes in the 9.x releases. They were a basic shortcut technique: add a single character before a name to alter the behavior. The original code used a non-editable table for determining which characters were valid and the new behaviors the prefixes would introduce. When adding prefixes to the current Retro, I took a different approach.

Prefixes are handled by words, with a consistent naming format: __X where X is the prefix character. I also added a prefixes vocabulary, allowing the prefixes to be toggled on and off simply.

The notfound error handler was extended to search for prefix words if a name isn't found in the dictionary. If the prefix is found, and the rest of the name is found, the word is passed to the prefix handler.

The end result is a clean, extensible way of handling, adding, and replacing prefixes. Plus, they can easily be disabled or enabled as desired.

The State of the Forth - April 2010

Though I haven't updated this blog in almost a year, I am still actively working on Retro.

The stable release is now 10.5. Since the last update here, Retro has seen a lot of major changes:
  • profiling
  • files
  • sockets
  • fully cross-platform canvas (javascript, sdl)
  • nestable vocabularies
  • time device
  • reorganized library
  • new examples
Platform support currently stands at:
  • Windows (via Mono and .NET)
  • Cellphones (J2ME/MIDP)
  • Linux
  • BSD
  • BeOS
  • MacOS X
  • AIX
  • iPhoneOS (jailbroke devices)
  • Browsers (Opera, FireFox, Safari, Chrome)
There is also an Emacs Lisp implementation that can run 10.4 and older images; I hope to have it updated for 10.6 and beyond in the near future.

10.6 (in the new git repository) has added support for word prefixes, and some cleanups to the core.