I tried to port Smalltalk to the CPC. It didn't work out in the end.

PulkoMandy · 15:32, 24 February 24

Read about it on my website

Well, it didn't work this time. I may retry this later with a different approach...

BSC · 21:00, 24 February 24

Interesting write-up. Just one question regarding your memory allocator, since you said it can't free memory. Have you considered the possibility that you would not run out of memory if your allocator would support freeing memory (and doing the necessary book-keeping and reordering)? I mean I am aware that I don't know what exactly that parser is doing and why it needs to allocate memory, but it seems counter-intuitive that this eats all of your free memory.

Another thing: Do you think it is possible to write the image directly to disk instead? It sounds like you are not doing it, but building the image in memory. Even then it might be relatively easy to build the image inside of the extra 64 (or more)k of memory.

Anyhow, keep it up! I am looking forward to the next iteration.

GUNHED · 00:42, 25 February 24

Would be a nice project for SOS, it already has overlapping windows and all that.

PulkoMandy · 09:50, 25 February 24

The original code has no calls to free(). It is designed to have all the classes and methods in RAM, so they can cross-reference each other, and then in a second step, store it all to a file.

I think it really needs at least all the class descriptions to be in memory. Maybe it will be able to compile the methods one by one and stream them to the output file? Or maybe all methods from one class at a time? That could be made to fit in RAM. But I think it will need some bigger changes to the compiler.

I am not out of ideas, for example, I could move most of the code into the C000-FFFF memory range to free more low RAM for the heap. Either put it in a bank, or remove the printing to screen and use the printer port for debug output (with an emulator that sends the output to a textfile, or with a real CPC and printer for more dramatic effect

).

But still, in the end, the generated "image" file, which contains all the classes and methods, will be larger than the CPC main RAM (about 100K). So, yes, maybe I can make some changes to the compiler and get it to run through. But then, I have to also get the bytecode interpreter running, otherwise, this is quite useless.

The goal of this experiment was to see if it was reasonable to get Little Smalltalk running on a 64K machine, maybe as a BASIC replacement/alternative. I stopped when I got the answer (NO). The interpreter code for the virtual machine would fit, but the interpreted Smalltalk image would not.

I don't give up on Smalltalk for the CPC yet, but now I know that even the smallest existing implementation would need memory banks, I may as well go with the full Smalltalk-80 version anyways. And I need to design my VM in a way that it can manage the Smalltalk image being loaded into multiple banks. That will certainly be an interesting project for when I have more time. And I will need to do it from the ground up for the VM, since none of the existing implementations work that way, as far as I know. I don't know if that would still result in a somewhat usable system, or it if would be way too slow.

asertus · 11:12, 25 February 24

What about a 128kb version, given that "normal" CPCs with disk drive are the 6128?

PulkoMandy · 11:26, 25 February 24

With 100K for the image, let's say 16K for the interpreter (it's larger currently, but it's sdcc compiled code), and 16K for the screen (not counting the amsdos and the user written code), I don't think even this minimal version of Smalltalk can fit on a 128K machine without any hardware expansions. That's why we can call this a failed experiment.

GUNHED · 18:28, 25 February 24

Why not going to 576 KB of RAM? (Or even higher if RAM is there). Most 'not-only-game-playing-users' probably have at least an 512 KB RAM expansion today - imho.

In your detailed description (link see first post) it's explained that Smalltalk uses a Virtual Machine. So on CPC it could be implemented by providing such a VM by any OS with banking capabilities.

andycadley · 22:17, 25 February 24

The problem, I assume, is not using more than 64K in general. It's managing the memory model such that calls between routines still work. You could do with 24 bit pointers to all data and subroutines but that would suffer a massive performance penalty. So you very rapidly start needing a compiler that is smart enough to allocate and manage memory in the most optimal way.

PulkoMandy · 22:38, 25 February 24

It's not really a problem to get it running, since Smalltalk is a virtual machine. The virtual machine "just" needs to know in which bank the called method is, and page it in. So, yes, 24-bit pointers and a vm that knows what to do with them.

But then you have to care about performance. In Smalltalk-76, they didn't have the luxury of memory banks. Instead the image was stored on a harddisk, with the ram used as a cache for the most recently used objects. And they managed to get something usable out of it. So, with banks, it should be even easier?

To make it optimal, you can organize the memory banks so that methods and objects that are frequently used together eventually end up in the same bank. This can be handled as part of the memory allocator and garbage collector, if the vm is implemented in a way that moving objects around in memory is possible (that means an extra level of indirection when accessing them, essentially).

GUNHED · 02:56, 26 February 24

So, would it help to have the VM on CPC running?

If yes, what needs the VM be able to do? (I had a good read of quite some text, but a link or short 'list of features' would help). Sorry, in case I ask too much here, but it's a new topic for me.

PulkoMandy · 08:44, 26 February 24

The VM is like an emulator for a very simple CPU. For Little Smalltalk V4 there isn't a lot of documentation (the author never finished the new version of the book that goes with it). For Little Smalltalk V1 there is the "A Little Smalltalk" book that gets into a bit more details.

A list of opcodes:

- Push Instance: puts an object reference on the stack
- Push Argument: puts a method argument on the stack
- Push Temporary: puts a temporary variable on the stack
- Push Litteral: puts a litteral value on the stack
- Push Constant: puts a constant on the stack (numbers 0 to 9, true, false, or nil/NULL)
- Push Block: create an execution context for a "block". This allows to have a piece of code called later that will have this context in use and can reference other objects from there
- Assign Instance: set a value in an object instance variable (taking a value from the stack)
- Assign temporary: set a value (from the stack) in one of the temporary variables
- Mark Arguments: pop as many values as needed from the stack and put them in the "arguments" array to pass them to a method
- Send Message: call a method (with the arguments above)
- Send Unary, Send Binary: optimized cases for some methods with no parameters or only one parameter (+, <, <=, isNil, ...)
- Primitives: call some native code. Print a character on screen, read a char from keyboard, basic math functions, and some accessors to data (for example: get the class corresponding to an object, or get the size of an object, create a new object). File IO is also implemented here.
- "Special" operations: return from a method or a block and clean up the stack, duplicate an element on the stack, branches and conditions

The bytecode is usually encoded as 4 bit opcode + 4 bit argument. When the argument does not fit in 4 bits, instead, the opcode is encoded on 8 bits (with the 4 high bits being 0000, which is a reserved opcode so it doesn't conflict with the 4 bit version) and the argument is encoded on the next byte or bytes (I don't remember).

The VM implementation can decide how exactly to store its internal data. But more importantly it has to manage dynamic memory. There are opcodes to create new objects, but the bytecode doesn't explicitly track when an object is not needed anymore. This has to be implemented either with reference counting, or garbage collection. Previous versions of Little Smalltalk used reference counting, but V4 uses garbage collection.

Before we can get to executing bytecode, the interpreter needs to load it from a file. The file is a compacted representation of a tree of objects, classes and methods. The interpreter parses it and creates the corresponding objects, classes and methods, and then calls the "bootMethod", and from there, it starts running bytecode.

The C sourcecode isn't very long, you can read it here: https://github.com/crcx/littlesmalltalk/tree/master/lst4/source

interp.c: the bytecode interpreter
memory.c: the garbage collector
main.c: ties it all together.

The initial objects and classes are defined in text form here: https://github.com/crcx/littlesmalltalk/blob/master/lst4/ImageBuilder/imageSource
The ImageBuilder tool parses this and creates the binary "image" file that the interpreter needs to start.

After taking a closer look, I have found out two things:

- The binary "image" file is defined such as the "Method" objects contains not only the bytecode, but also the ASCII sourcecode for the method. This should not be needed, I think it can be removed to make the image smaller. I have tested this quickly and I managed to parse 16% of the source file on the CPC (vs 10% before this change).
- The binary "image" file is not platform dependant as I thought originally. So I don't really need to generate it on CPC. I can do it on a computer with large linear RAM, and transfer it afterwards.

The image has about 4000 "objects" (that counts objects, but also classes, methods, integers, ... everything is an object in Smalltalk). Each object needs at least 4 bytes in RAM: a pointer to the class, and a size. But it needs more if it does something at all (methods need space for their bytecode, objects need space for their fields, integers need space for their value, classes need space for their method list, etc).

GUNHED · 19:04, 26 February 24

Thanks for the detailed explanation. It's quite some stuff, but should be doable on the CPC.

PulkoMandy · 22:39, 27 February 24

Ok it got me thinking about garbage collectors and memory banking, so, I wrote another article:

https://pulkomandy.tk/_/_Development/_Ideas%20for%20a%20garbage%20collector%20for%20memory-banked%20systems

Let me know what you think (if it makes any sense... maybe I'll re-read it tomorrow and notice I wrote something stupid that can't work).

zhulien · 01:24, 02 March 24

I read the journey and wonder if you put the stack at let's say... #3fff. And instead of using the heap as a single contiguous block, treat it as an array of 16kb blocks between 4000 and 7fff. This is one way.

The other way is more akin to how cpm+ works, and put all code to be interpreted into a 2nd 64kb. If you are using an upper rom, for the actual smalltalk itself, then you can easy have 48kb available (minus the first 100 bytes or so). Or with a little cpm+ style gymnastics or Ramlam... close to 64kb without going to further banks.

zhulien · 01:30, 02 March 24

BTW you can use 16bit packed far addresses instead of 24 bit pointers if it helps, just make sure your pointers are pointing to aligned memory. A 4mb cpc expansion gives you 256 x 16kb blocks that are quite easy to manage if you have a class limitation that a class cannot exceed 16kb.

Or alternatively being a vm it makes it relatively simply to have the 64 banks of 64kb which could allow greater than 16kb per class or heap allocations.

zhulien · 01:39, 02 March 24

One other note is that you don't have to work in 16kb blocks on the cpc, you can work in 64kb banks, for the majority of things that is not BASIC. or if you call firmware from an external bank be sure to bank switch first. The cpc is quite flexible for its bank switching and even for multitasking... although I am told msx and enterprise 128 is even more flexible.

You can context switch on a cpc for example just by swapping entire 64kb banks at the right time (pus registers save stack swap 64k restore stack restore registers) and cpc very happy and its super fast.

And... if you can make the code romance you get all that bank available for the VM... basically an array of 64kb banks with a ROM that can ram lam the ram under it.

PulkoMandy · 09:15, 02 March 24

The objects in smalltalk are usually extremely small, a max size of 256 would not be too much of a constraint. But there are a not of objects (a string is an object, an integer is an object, a method bytecode is an object, ...) and also a lot of references between them. If the memory management isn't good enough, and objects are spread apart everywhere, every operation will require a bankswitch.

The current state s, I have noticed that the smalltalk image I was trying to build includes not only the bytecode, but also the sourcecode for all methods. I removed that and now the imagefile is a more reasonable 35 kilobytes. I have not tried to load it yet (I think the memory usage will be a bit higher). That's still a bit large for working without banks, but maybe I can get something to mun. Which would be easier for me, once it runs in a very simple way, I can start optimizing it and making it better. But I don't feel confident writing a super complicated thing from tthe start.

The first version will probably be unable to free any memory at all. Believe it or not, this is how some people preferred to use their LISP machines back when tht was a thing: save todsk frequently, and when the system runs out of mmory and crashes, reload the last savestate. That was apparently better than having the garbage collector slow everything down

PulkoMandy · 15:58, 02 March 24

Also, I think I understand the possibilities of the various banking schemes pretty well, given that I designed and built a memory expansion myself. So my problem here is not that I don't have ideas for how to make it work, it's rather that there are a lot of options and I have not yet decided what is the best one.

Moreover, if I find the existing banking schemes too limiting, I could easily design a more flexible memory extension. In fact I already have the Nova which opens a few more possibilities (so it is even harder to decide what to do now).

I will try first to make it fit without banking. But the C code built with SDCC is too big to fit in a ROM I think. So I already have to optimize it quite a bit if I want to try that...

zhulien · 18:30, 02 March 24

Can you build the c code at 0100h and run it from the 2nd 64k bank... then you have close to 64kb for that, and about 4pkb from the main bank for code and data. Yes some bank switching but given 64kb in the 2nd bank you might be able to allocate 16kb for buffers to allow for fast transfers between the two.

m_dr_m · 22:20, 11 March 24

Fantastic project! (I was tempted myself)

Would be useful for prototyping and tools (hmm, for that, maybe something like free pascal would be more indicated).
As a side note, I use a limited VM (well, ARM-like bytecode interpreter with some access modes dedicated to object programming) in "Emotion Trouble" and "Ayane Ayane LaCarree" (which was almost a bad idea, as it's more difficult to debug for now).

Let us know how we can help!

PulkoMandy · 23:31, 11 March 24

I have not given up yet on this, just briefly distracted by other projects.

The current status: I removed a lot of things from the C code (the garbage collector, most of the opcodes from the virtual machine) to get the code under 16k so I can fit it in a rom. I hope I can then fit the initial smalltalk image in main ram to start with something "simple"

Then optimize the code a bit, make space for one more feature, and so on, until I get something usable.
I don't know yet if that will work. But I am unable to cut this project in smaller chunks I test one at a time, so I hope this will fit.

I have seen your experiments, it saved me some time on other languages I knew would not be worth trying. I will come back when I have news...

m_dr_m · 21:47, 12 March 24

It may deserve its own thread:
* Have you tried to port SDCC itself? (since crossdev is aberrant in general and abhorrent in my book!)
* Have you tried the latest LLVM-Z80 to see how it compares?

PulkoMandy · 23:35, 12 March 24

Sdcc already needs too much resources to run on a modern pc. No chance of running it on cpc at all.

Cloudstrife has tried llvm to compile C++ for the cpc. It works, and we used it to build a player for reality adlib tracker 2 music for the willy/opl3lpt soundcard. The player is very slow, but it runs.

If you want something that runs on the cpc, the best choice for a currently developped language would probably be David Given's Cowgol. I have not tried it yet, but it has a goal of being self hosting on 8 bit machines.

stevensixkiller · 21:40, 13 March 24

I thought there was no working Z80 backend for LLVM. Did he used https://github.com/grapereader/llvm-z80 ?

PulkoMandy · 23:37, 13 March 24

I don't remember which version it was. The generated code might have required some patching (occasional use of non-existing opcodes or something like that) and we also changed the compiled code a bit (making variables static instead of stack allocated, etc). Given the speed of execution, it was not really worth digging further...

News:

I tried to port Smalltalk to the CPC. It didn't work out in the end.