NAND BBT corruption on MPC83xx

Matthew L. Creech mlcreech at gmail.com
Wed Jul 6 05:58:46 EST 2011


On Fri, Jun 17, 2011 at 5:34 PM, Scott Wood <scottwood at freescale.com> wrote:
>
> It seems that the generic code always passes -1 with PAGEPROG, and only
> provides the actual page address on SEQIN.
>
> I don't think the ECC readback is needed, and the fact that it looks like
> it has always been broken would seem to confirm that.  It's broken in
> other ways, too -- it assumes a particular ECC layout.  Let's get rid of it.
>
> As for the corruption, could it be degradation from repeated reads of that
> one page?
>

I modified nanddump to do repeated reads, and compare the data
obtained from the first iteration with that obtained later (to detect
bit-flips).  I tried 3 different variations:

- one which reads the first page (2k) of the last block
- one which reads the second page (2k) of the last block
- one which reads the entire last block (128k), just for comparison

As I understand it, read-disturb would primarily come into play when
the second page is read, since it's adjacent to the first page (please
correct me if I'm wrong there).  Anyway, all 3 of these tests were run
for at least 50 million read cycles, with no bit-flips detected.  So
I'm somewhat doubtful that this is the cause of the BBT corruption
I've been seeing.

====

Separately, I set up 2 test devices to run while I was away last week.
 One of them contained 2 patches:

- Mike Hench's patch which eliminates this block of code in fsl_elbc_nand.c
- Adam Thomson's patch
(http://lists.infradead.org/pipermail/linux-mtd/2011-June/036427.html)
which initializes oob_poi correctly

Upon my return, the device with these patches saw no problems at all,
and had no additional bad blocks.  The device without these patches
had some 200+ blocks which had been newly marked as bad in the BBT
over the course of 10 days.  After rebooting, this latter device then
failed to boot, as shown here:

http://mcreech.com/work/bbt-ecc-error4.txt

I'm currently running another test to verify which of the two patches
actually fixed this problem (which might take a few days), but it
seems like removing that block of code in fsl_elbc_nand.c is a good
idea.

-- 
Matthew L. Creech


More information about the Linuxppc-dev mailing list