Resending ... Re: [PATCH] [2.6] PPC64: log firmware errors during boot.

Thu Jul 22 07:30:41 EST 2004

----- Forwarded message from Mail Delivery System <Mailer-Daemon at bilge> -----
------ This is a copy of the message, including all the headers. ------

Return-path: <linas at bilge>
Received: from linas by bilge with local (Exim 3.36 #1 (Debian))
	id 1Bj0dp-0005gG-00; Fri, 09 Jul 2004 14:02:37 -0500
Date: Fri, 9 Jul 2004 14:02:37 -0500
To: Jake Moilanen <moilanen at austin.ibm.com>
Cc: paulus at samba.org, linuxppc64-dev at lists.linuxppc.org,
	linux-kernel at vger.kernel.org
Subject: Re: [PATCH] [2.6] PPC64: log firmware errors during boot.
Message-ID: <20040709190237.GE17333 at bilge>
References: <20040629191046.Q21634 at forte.austin.ibm.com> <16610.39955.554139.858593 at cargo.ozlabs.ibm.com> <20040706084116.11ab7988.moilanen at austin.ibm.com> <20040708110337.N21634 at forte.austin.ibm.com> <20040708125545.41aae667.moilanen at austin.ibm.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20040708125545.41aae667.moilanen at austin.ibm.com>
User-Agent: Mutt/1.5.6+20040523i
From: Linas Vepstas <linas at bilge>

On Thu, Jul 08, 2004 at 12:55:45PM -0500, Jake Moilanen was heard to remark:
> On Thu, 8 Jul 2004 11:03:37 -0500
> linas at austin.ibm.com wrote:
>
> > Actually, they don't seem to be queueed at all; when I turned on
> > logging earlier, a whole pile of messages poped out that weren't
> > visible before.
>
> If you are seeing a different pile of messages, I would imagine the
> messages that popped out are not coming from event-scan then.  Might be
> last_error, which messages do not come in from event-scan.  I can see
> them not being logged in early boot.

Yep.  They were due to EEH not being enabled on empty slots.
Appearently, they were being generated during boot for years,
but no one noticed them before, because we had this logging turned
off.  So once burned, twice shy... if we got the messages earlier,
we'd be less likely to overlook the root problem ...

> A problem I could see, is if we make an rtas call before the VM
> is up.  The kmalloc for last_error won't like that.

Ugh, yes, eeh is initialized very early, before the vm system is up.
I'll have to prepare a patch to check malloc_sizes->cs_cachep
for NULL, and not call kmalloc() if it is.  Is there a better way
to poll to find out if VM is up?

--linas

----- End forwarded message -----

** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/