News:

Printed Amstrad Addict magazine announced, check it out here!

Main Menu
avatar_llopis

6128 glitching out (memory issues?)

Started by llopis, 22:18, 06 February 17

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Bryce

Quote from: llopis on 21:56, 20 June 18
Update on this case (for those of you following along):


I decided to use the logic probe to see if I could detect a RAM chip that wasn't working like the others. It turns out pin 14 (Dout) of both chips in the third row (which was already one of the suspects after the resistance test) is always high. All the other chips show normal activity in pin 14.


The rest of the pins look like you'd expect: CAS is off for bank 2 but on for bank 1, Din flickers everywhere...


So the evidence mounts against row 3. The thing that still surprises me is why the whole row is acting that way. If this were a normal RAM failure I would expect a single chip to fail right? (On the other hand, it explains why hardwiring the PAL to use bank 2 made no difference).

When two failures occur on a circuit, the first thing you do is check whether they share any other common components. In this case, both Dout pins go to the GA, the 74LS244 and the 74LS373, so any one of these other components could be partially defect and causing the bits to stay high.
Bryce.

llopis

Quote from: Bryce on 07:57, 21 June 18When two failures occur on a circuit, the first thing you do is check whether they share any other common components. In this case, both Dout pins go to the GA, the 74LS244 and the 74LS373, so any one of these other components could be partially defect and causing the bits to stay high.
That makes sense. So I decided to do some more investigating today.


First of all, I was not exactly right. Those pins aren't set to high. They must have a tiny amount of low voltage in them, because there's a hint of the low light on the probe. They definitely look a bit different than the pure GRN pin. Still, definitely unusual compared to any of the other RAM chips.


Then I replaced the GA with a working one and made no difference.


I checked connectivity on the board following the traces and looking for corrosion or unusual stuff. Everything looked good.


I poked around the latch and buffer IC you mentioned and everything looked about right. I didn't go as far as to power just those chips and verify their logic, but I would bet they're working fine.


Oh wait, I just realized why the whole "row" output looks bad: Because only one chip is outputting any data! The other one has the CAS bit always off, so of course both pins look similar. Doh! OK, that reinforces the theory of that RAM chip being faulty. What puzzles me is why when I swapped the RAM banks it also loaded up with the same pattern. Maybe I didn't do it correctly. Hmm... more stuff to investigate :-)

llopis

I was puzzled with my inability to start the computer with the second RAM bank by hardwiring the PAL chip, so I did some tests.


- I swapped PAL chips with a working CPC. No difference.
- However, the working CPC *without* the PAL chip creates the exact same pattern I'm seeing on screen (see picture earlier in the thread). That's very interesting.
- Just to confirm, I was able to boot the CPC with bank 0 and bank 1 by hardwiring the PAL chip.
- The CAS0/1 signal being sent when I hardwire the PAL chip on the broken CPC is very different than the working one. I suppose it could be because the CPU is doing very different things, so maybe that's OK.


Sooo... I'm not convinced at all I'm going to find a bad RAM chip anymore. I'll continue investigating.

llopis

Update:
I decided to use the probe carefully on all the RAM chips and record the results. I discovered multiple things.


Normal config

The signals on a good chip functioning normally should be pulsing on all the pins except for 1 (no signal), 8 (H), and 16 (L).
All the chips are like that except IC134 (2 H and 14 H) and IC129 (14 L).
That's kind of an odd discovery. I get Dout being stuck on H or L on a bad chip, but pin 2 is Din!  I looked for shorts or connectivity to Vcc from that line but everything seemed fine. With Din being stuck at H, I understand why Dout in that chip is H, so that doesn't mean the IC is bad.

More odd things: Every so often, as I was measuring things, the screen would go garbled. Sometimes turning the CPC off and back on again would still result in a garbled screen (sometimes border sometimes no border). Eventually it would go back to the gray screen with black border. I don't know if my measurements with the logic probe were interfering, but it was odd.


PAL chip removed
I remember noticing how I would get a similar screen even with the PAL chip missing. So I removed and measured all the RAM ICs again. All signals are normal except that 14 is L and 15 is not connected. That makes sense.


PAL chip hardwired to Bank 0
With the PAL chip removed, I added jumper wires to select Bank 0. Same grey screen and black borders. The signals were the same as with the PAL chip in place (those two RAM chips with their odd Dout and Din).


PAL chip hardwired to Bank 1
Same grey screen and black borders. All the RAM chip signals seemed normal!!!


That last bit really puzzles me. IC129 at least seems defective since it always gives a L for Dout. Switching to Bank 1 makes it so Dout for that row of chips is fine, yet I still get the same non-working functionality.


Anybody have any more suggestions of any other tests to run? (I'm still waiting for X-MEM to arrive). Should I go ahead and swap out that IC129 at least? (I have sockets and RAM chips already).

llopis

After writing that last night, I decided that IC129 was looking really bad and I had nothing to lose by replacing it.
It was the first time I had to desolder a chip, and I have to admit it was a bit challenging! I tried doing it without cutting the leads as practice for chips I don't want to destroy. I had to do a combination of manual pump and flux/desoldering tape. In the end, for those stubborn leads, the trick was to heat up the lead on one end and suck it the other end.
Anyway, I put on a 16-pin socket, a new memory and... still same screen on boot. But Dout on IC129 is pulsing as normal now, so that's a step in the right direction.
Interestingly, Din on IC134 seems fine now, but Dout is permanently low. So that might be the next replacement. It's just weird that two RAM chips died at the same time, isn't it? I wonder if I'm going to have to end up redoing them all like Bryce mentioned in another thread...  :o

llopis

Sooooo... I replaced IC134 as well. This time, it was either having practiced on the other chip, or there was something different about this one, it was super easy to desolder and the hand pump did the job perfectly (maybe because it was all the way at the bottom).
I put in the new memory chip and... BINGO! It works!


Two open questions I'm still unclear about:

       
  • Why did two RAM chips fail suddenly at once? Especially since it started intermittently with that glitching on the screen.
  • How come the computer didn't boot up when I had the PAL hardwired to Bank 1?
I suppose there could be a chip in Bank 1 that's not working. I better run some tests.

llopis

#31
Oh interesting, the computer doesn't start up when hardwired to use Bank 1 (same grey screen with black border). So there are more defective RAM chips!
I tested them with the logic probe, and all signals seem fine. No obvious bits stuck on high or low. This might really have to wait for X-MEM this time (or I could write a BASIC program now that I think about it).


I wonder if the reason more than one chip went bad is because I damaged some when I tried doing the resistance measurements while the computer was on...

llopis

I got a little sidetracked because the drive belt in that computer was DESTROYED (I mean, as in shake the drive and the dry pieces were falling off!). And that's for a drive that was working two years ago!


Anyway, I decided to write a minimal program to test the second bank. If my understanding is correct, each memory IC is 64K bits. So testing any address on the second bank should hit all the second bank-chips, right?
I went with something super simple:

org #3000
ld b, &7F
ld c, &C7
out (c), c
ret

It basically sets the second block of 16K to be bank 7 (or whatever bank you want by changing C7 to something else).
Then all I had to do was to poke some values to &4000 and read them back with peek.
I did that and... everything seems to come back normal. I always read back the values I wrote. I even wrote a quick loop to write to every address between &4000 and &7fff and they all return the correct data.
Any thoughts why those tests pass but starting the CPC from bank 1 failed? (and I tested it worked fine in the other CPC).

llopis

I finally got the drive belts and the drive works like a charm.
However, the second memory back has some definite problems. I tried running some 128K games and they fail (or they give blank graphics in places).
I dug out some memory tests from past posts and I see they're definitely coming up with errors, but I don't know how to narrow it down to which ICs are failing. Any idea?


Syx's memory test only detects 32KB expansion.


The other one runs through a few iterations and eventually turns those bars red (not orange like shown in the picture). Unfortunately I don't know what the numbers at the bottom of the screen are or how they can help me track that down.


On top of that, I'm puzzled why my BASIC program didn't detect an error when swapping banks and writing to them.


Any thoughts?

gerald

Quote from: llopis on 15:57, 15 July 18
The other one runs through a few iterations and eventually turns those bars red (not orange like shown in the picture). Unfortunately I don't know what the numbers at the bottom of the screen are or how they can help me track that down.
The 1st group of number is test result for C3 mode
The list at the end of the screen is the read back tag used to detect actual memory extension and shadow.
Both look fine.
Unfortunately this test does not report bit level information.

Quote from: llopis on 15:57, 15 July 18On top of that, I'm puzzled why my BASIC program didn't detect an error when swapping banks and writing to them.


Any thoughts?
Did you test the whole range or only one address ? If you tested whole range, did you use the same patter for all location ?
My test does 2 pass
- one that fill the banks with bank unique value (fast, but will miss addressing error).
- the second that use a pseudo random pattern that will make sure that addressing error will be detected.

llopis

Quote from: gerald on 16:15, 15 July 18
Did you test the whole range or only one address ? If you tested whole range, did you use the same patter for all location ?
My test does 2 pass
- one that fill the banks with bank unique value (fast, but will miss addressing error).
- the second that use a pseudo random pattern that will make sure that addressing error will be detected.
Good point. My test naively wrote the same bit pattern to all addresses. I'll change it to write different ones depending on the address.

llopis

#36
I improved my naive program to write random bytes 1KB at a time and compare against values in Bank 0. Surprisingly, it passes without detecting any errors.


(Yes, I know it's a dumb program to write in BASIC because it's so slow, but I didn't remember the ROM calls for printing strings).


Any idea what might be going wrong? Is the logic of my test wrong? Everything else seems to detect something wrong with Bank 1, so clearly I'm missing something. Well, doh, I just saw the incorrect data check! Re-running... <blush>


Edit: Yeah, it's failing right away. Yay!

llopis



That BASIC program was indeed showing that RAM on Bank 1 wasn't working properly. But I had to write to a bunch of memory like that in order to have the error come up. Just writing to one address and reading back from it wasn't enough. That always worked.


I did a few more experiments and I was able to reduce the problem to this:


When writing to an address, apart from correctly writing memory to that address, bit 3 of the data (i.e. mask 0x08) also gets written to the address with bit 6 toggled.


In other words, bit 3 of the data gets written to both [address1] and [address2], where the difference is bit 6.


So it seems that IC122 is faulty and it's incorrectly writing to both locations. I replaced it (I'm getting good at that now) and... everything works perfectly now. All memory tests pass and 128K games load and play just fine.


So another happy ending. Thanks for all the help and suggestions!



gerald

Quote from: llopis on 18:05, 16 July 18
But I had to write to a bunch of memory like that in order to have the error come up. Just writing to one address and reading back from it wasn't enough. That always worked.
Most if not all memory test does fill the memory then read it back  ;)
Writing then immediately reading a memory location will just tell you if that cell is OK, not the address decoding.

Nice that you got it sorted.

And remember : it's all about the journey, not the destination  :)

Powered by SMFPacks Menu Editor Mod