News:

Printed Amstrad Addict magazine announced, check it out here!

Main Menu

Ripascii - a Basic ASCII code archiver.

Started by copychr$, 12:46, 23 October 13

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

copychr$

-----------------------
Downloads & Updates:

The Ripascii download: [attachurl=2]
Please copy the demo.dsk into the ripascii folder. It will show a complete file set for one program; spiders.bas

From post #1 - Prepared files: [attachurl=3]
-----------------------

There exist two versions of this program, Ripascii presented here, and it's clone Ripadsk.
They both produce exactly the same output; Archives holding text and HTML versions of Amstrad Basic code, along with identical log files.
Basic Code and MODE 1 and 2 Listings can be obtained for most Bas programs.

They both use Windows Dos and do enough unconventional things with Basic code to warrant full information on use.
That is, they are exceedingly simple to run but require some set-up and an explanation of the output.
As Ripascii and Ripadsk are clones, everything needed can be found in the existing Ripadsk thread:
ripadsk - a utility to automate code archiving.

You do need to please go there and skim the walk-through, as most things works the same.
I'm sorry for that inconvenience. It would have been smarter to present this program first, as there are fewer complications.

Some subjects there will be cross linked from this thread and marked **.
E.g. nothing can work without the presence, on the path, of GnuWin32 SED.exe**, a must download. ripadsk - a utility to automate code archiving.

The difference with Ripadsk is that the Baslist.exe programs are not used at all.
Both Kevin and Markus have posted their versions on: BASlist Java Tool to list BASIC files
With Baslist.exe, original Basic code is automatically extracted and output as ASCII text.
Unfortunately that output is not yet dependable and, promising as they are, things remain in programming.

Ripascii on the contrary is a functional application, but uses as source prepared ASCII versions of BAS programs on dsk.
Results are just as automatic and very fast with (almost) perfect output now, however there is that unavoidable manual step.

copychr$

#1
Before I hear any whining about that last, this is actually the second prong of attack to keep 8-bit code as the apex of computing.
That plan, as laid out in a secret diagram, is only being divulged here for a second time ;-)

[attach=2]

A Basic helper will be along to speed the making of those ASCII files on dsk, in the emulator or CPC.
By hand: LOAD"PROGRAM.BAS - SAVE"PROGRAM.TXA",A

So, whether with Baslist.exe or with a little work, it is possible to reproduce and archive most Basic Locomotive code.
The output of complete archives is there to pick and choose from, or use as seen fit.
These can easily be discarded as it takes only seconds to run a dsk through again.

Any programming language would allow for a more user friendly and refined approach, but the basic output might not be very different.
Also, Dos has a bit of a "do the least, do the most" effect, as things are mostly straight file access and disk writes.

Simple-minded, Ripascii can only pull ASCII files with the extension "*.TXA".
You can download a zip from the top of the first post with about 60 of such prepared files.
These are all programs from the ACU 1989-92 collection that are still "rejected" under Baslist exe, but can now be seen here.
Please note that required data files, *.bin, etc. maybe missing, as only Bas programs are present.

One program, snakx.bas, is already faulty when saved as ascii. It uses raw control characters which maim the text output.

Gryzor

Niiiice!


Hope people get to use it...

copychr$

#3
For the longest time I had sort of ass-umed that ASCII versions from the CPC would just print out good.
And they do. But when non-ASCII characters are mixed in, this does not hold. Our vaunted "ASCII" files have become MS Windows-ANSI files.

This is the only stumbling block encountered when using original Bas files saved with the ",A" switch.
 
The problems concern non-ASCII Basic characters that have an ASC(n) value between 128 and 255 (&H80-&HFF) or ASCII control characters 0-31 (&H00-&H1F) and 127 (&H7F). Specifically, when the coder has used the raw characters to make things look sexy.

A common example is finding PRINT"©" instead of PRINT CHR$(164). When this char #164 appears in a Windows-ANSI text file, it is rendered as "¤", the correct equivalent. On putting back such a char to the CPC, it is mapped to the original symbol - "©" - which will be listed correctly again.

But there is an added layer of weirdness yet, and we'll need to dig down a bit.
The programs and text files on this charset.dsk [attachurl=2] can reproduce all that follows below:

Leaving aside the control characters 0-31, program "makechr$.bas" writes all Basic CHR$ from 32 to 255 to a text file on dsk.
All characters are reproduced normally in the CPC and when the text file "baschar$.txt" is extracted to a default Win-ANSI file in the PC, give this result:
[attachurl=3]
All our significant or favourite chars between 127-255 have been turned to dung.

Eight of these are particular:

[attachimg=3]

[attachurl=5]

When comparing this Win-ANSI source to the ANSI character set from this site:
ANSI character set and equivalent Unicode and HTML characters
we can see these eight chars described as not used and/or without any referenced symbols.
Char #127 (an ASCII control char) still gets a symbol on the PC.
Char #160 is a non-breaking space and char #173 a soft hyphen.

More ...

copychr$

#4
With the second program "ansinull.bas" these eight chars are seen like this on the CPC:

[attachimg=1]
ansinull.bas_LIST.gif

[attachimg=2]
ansinull.bas_RUN.gif

But after extracting the ASCII version of the program the result is:

[attachimg=3]
[attachurl=5]

Looking at it, that code is now totally useless; information is completely missing.
However, those "invisible" chars are still encoded in the text and put back to the CPC give an identical listing; all eight chars are accounted for.

For text that is more or less it. Even though the original Basic code is garbled or invisible it can be put back to the CPC and will run.
If one is aware of these occurrences or has software to check, things would be workable.
Anyway, programs with "funny" chars need to be singled out.

robcfg

There's nothing wrong with the ascii code.


If you take a look with a hex editor, the characters are there with theis correct value. Thing is, ASCII covers the first 128 chars, from there, there can be different representations of the characters.


Try to load the ascii code back, it should work.

copychr$

Hi robcfg. Thanks for checking it out.
You are right, all non-ASCII chars can be put back from text. Good for us ;-)

I singled out these eight chars, because most are plain invisible or have a quirk.
That is a problem if one wants to publish such code as text, as well, nothing gets published.
Any changes would be up to the authors, and impossible for 10-liners.

Only rendering non-ASCII chars presents any problems. These posts try to run down every irregularity for the default win-ANSI
environment and web publishing. With some more links, there hopefully should be enough to sort things for the curious.
And control chars can be pretty messy too  :)


copychr$

What remains is to show text versions of Basic code in HTML files and copy good code from those pages.
That comes down to choosing the right char set and that in turn depends on the text source.

If the files were unicode (or converted to) that choice would be "utf-8":

What is Unicode?
http://www.unicode.org/Public/5.2.0/ucd/UnicodeData.txt

This will not work with default Win-ANSI text files, as non-ASCII chars are not recognized and marked  "?".
No information can be copied back from the browser.

With default text files from the PC, only the "iso-8859-1" char set gives desired results.

http://www.w3schools.com/tags/ref_entities.asp
http://www.w3schools.com/tags/ref_ascii.asp

and more general info on html character sets:
http://www.w3schools.com/tags/ref_charactersets.asp

This page shows all characters seen before, but in a web view: [attachurl=2]
And the result of copy/paste from the browser: [attachurl=3]

Only two things are different here from working directly with text sources:
- Char #160 (cpc: a ^), a non breaking space, is correctly extracted to text files and can then be put back to the CPC.
    However, it does not copy from a browser window and so can not be put back. The result is a simple "space" #32.
- The soft hyphen, char #173, does not get a symbol under iso-8859-1 but will copy back to text and CPC.

When selecting code or fragments in the browser with the mouse, the last line does not get a CR/LF!
That must be added manually, otherwise the last line of code will be truncated on putting back. Use "Select All" or CTRL+A to avoid this problem.

A cursory viewing of ripascii archives in Ubuntu/Firefox does not reveal any problems. gedit shows encoding information for any "invisible" ANSI null chars.


arnoldemu

How should these characters be displayed? One possible way is like this:

[CTRL+A] - to show the keys to press to make the char.

Another would be to try and find a similar character and place that into the output - knowing that if it was to be typed into an emulator then the appropiate cpc character is substituted.

Perhaps now we should lobby the utf-8/utf-16 people to add all our cpc characters ! :)

Do any of our cpc characters have utf-8 equivalents?
My games. My Games
My website with coding examples: Unofficial Amstrad WWW Resource

Sykobee (Briggsy)

#9
There should be suitable HTML elements for displaying CPC characters - e.g., the block characters appear in utf-8 as characters 0x2596 through 0x259F, and some dotted around. Checkboard is 0x2592, etc.

And that's great for display in HTML, but cutting and pasting would need a javascript interception at copy time, or emulator transcoding at paste time, to translate back to CPC format.

Alternatively, create a Amstrad CPC truetype font with the correct glyphs, and use HTML5 font stuff to force listings to show using it.

copychr$

#10
Thank you both for your interest.

All this "old-timer" character output falls between the cracks.
When mixing it up against later character sets, there are a lot of "holes" were no symbol exists to visualize low-byte chars.

I'll be doing Control chars next, and overall results from extracting to text and putting back most Basic chars is very good.
Irregularities that pose a real problem are few enough to enumerate.

Like you both say, the difficulty is to even be aware there is any code present when viewing some text files.
In Linux, gedit shows a numerical reference when no symbol is available, which is an immediate heads-up.

Some text search/substitution scheme might alert to or mark up invisible and problem characters.
For HTML it is worse, for example no chars below #32 can be shown at all under iso-8859-1 or utf-8.

No, it is time for a new universal char set; cpc-8.
Use chars with all negative numbers and let the unicode boys try to figure it out ;-)

copychr$

#11
A.k.a. Control Characters.
Most of what follows can be reproduced from this download: [attachurl=2]

All Basic characters seen so far can be extracted to text, and consequently be put back to the CPC where that ASCII code will save correctly to BAS and run.
That includes two characters which I've found described as being either normal ASCII or Control Characters:
"Space", SP, char #32 and "Delete", DEL, char #127.
In teletype-like transmissions SP is non-printing and used to advance a printhead/tape by one position, a "space-feed".
DEL, referred to as Rubout, is printable. It overprints the existing character, as in X-ing out.

For a useful view of pure ASCII chars and, if one scrolls down there, a description of all control chars sorted by function:
ASCII Code Table and Descriptions

The use of raw control chars in Basic code is rare to start with. If the coder chooses to execute a control function, like BEEP or reverse video, the ASCII version will be correct. If the symbol has been chosen for "decorative" purposes, he may have inadvertently picked one that cooks his ASCII code, like LF or CR.

Manually, control chars are obtained by using CTRL+char. There is a list in the 6128 manual, chap 7, part 3.
The site linked above can also be consulted. The control chars and corresponding keyb symbols are lined up there.
- NUL, 0 dec. + 64 corresponds to "@"-64 dec.
- CR, 13 dec. + 64 = "M"-77 dec.
CTRL+M will give a carriage return even right in Notepad!

In code, the "trick" for printing Basic Control Characters to screen, is to precede with CHR$(1), SOH, "start of heading".
The result will be a literal char and an escape of the control effect. E.g. PRINT CHR$(1);CHR$(7) will show a Bell char on screen, but not the audible beep as normal.

"showctrl.bas" takes a look at CHR$(0-31):

[attachimg=2]

All but CHR$(0) can be copied into code with Shift/Copy.

Next, using the same set-up as before, program "makectrl.bas" writes control chars from 0 dec. to 31 dec. to a text file.
Extracted, the result is: [attachurl=4]

Using Notepad.exe only, as for viewing all extracted code, the following chars appear as text modifiers:
char #0, "null", NUL; appears as -" "- when -""- might be expected.
char #1, "start of heading", SOH; appears as one extra-wide blank char.
char #9, "horizontal tab", TAB; inserts text editor's default tab.
char #10, "line feed - new line", LF; effective new line char.
char #26, "substitute", SUB; interpreted as Ctrl+Z/EOF has been moved to the end of the list.
A few chars are invisible.

It seems every text editor has it's own interpretation of ASCII control chars. Opening the above file with WORDpad.exe gives a wildly divergent view. Saved back as text and then seen in NOTEpad several discrepancies with the original file will remain.

More ...

copychr$

#12
After putting back the above "control$.txt" file, it becomes impossible to access its content with Openin/Line Input.
All Control Characters that can execute will do so and the resulting mess requires a Reset.

With "testctrl.bas" however, we can check out every raw control char when inserted in a working line of code.
Program: [attachurl=2]
It was not written to run, but it does after a fashion. Weird things happen along the way. Several lines are remmed out in order for the program to reach an ending.
Only the Listing is of interest for comparison after text extraction and putting back.

Three characters have been moved to the end of that Listing:

[attachimg=2]

LF and CR result in "Direct Command" found when putting back. SUB, ctrl+z, truncates any ASCII file already in the CPC.
All three are text-killers and prohibit re-use of ASCII code.

After saving as ASCII and extracting the Listing: [attachurl=4]

The text ends in line 780. Ctrl+z itself is not present, but has prevented completion of the text version.
Don't take me on my word but all these lines, except the last three, will put back to the CPC in good order.

After Loading and Listing, comparison with the original shows all remaining chars are accounted for, and the truncated program will run.

Please note that I did not manage to insert a CHR$(0), NUL, in the code. Also, this char disappears when text files are saved and is replaced by a "Space", #32.

Looking at the same file in an HTML version: [attachurl=5]
one is struck by the total absence of information on all these, admittedly flaky, characters.

But still it is possible to copy every invisible char from the browser and correctly paste it back to text and then put it to the CPC.

Notepad++.exe shows a much more informative view for the text version of the above Listing:

copychr$

#13
ASCII.BAS
Program: [attachurl=2]

Program Code: [attachurl=4]

As promised, a dedicated Basic application to speed up Conversion of Basic programs to ASCII files.
It is suitable for creating ASCII program versions for other purposes; the default ".TXA" extension recognized by ripascii can be modified at the top of the code.

It replaces all command line intervention and can send the ASCII versions to a separate drive.
This is most straight-forward and avoids filling up a dsk before all code could be treated.

The presence of ASCII.BAS is required in only one drive. It is re-called with Ctrl+A or B after every "Save" cycle.
No file management is provided, though one can Run, Load and List.

Help file: [attachurl=3]

[attachimg=4]

If one works dsks from the emulator right inside a ripascii folder, it is possible to check on current results without delay.
Clearly name a blank target dsk and convert away from the source dsk holding Bas files.

The same applies to coding projects, especially with large files, version reconciliation or cold code being revisited.
If one numbers archives as work progresses and keeps the latest around, full back-up and Listings remain available from dsks and files in there.

To keep things looping along some KEY DEFs are used, this also on the RETURN key for Saving.
If this string hangs use the ENTER key instead. CALL &BB00 or running the program again will sort the problem.
Best use Cancel for "missing disk" messages. Back out of an unwanted "Save" with CTRL+A or B key combination.

Keep an eye on the Drive Letters and be sure to wait for the "Ready" before and after saving.
It's easy to get carried away, but Need-for-Speed it ain't ;-)

DSK download: [attachurl=6]

Powered by SMFPacks Menu Editor Mod