[PATCH] rpaphp broken in ameslab

Fri Jul 2 15:57:41 EST 2004

linas at austin.ibm.com writes:

> The detailed explanations are in bugzilla.  I'll crawl through the
> archives and pull these; but this makes me unhappy :(  I personally
> would be happier with a process that requires less make-work; these
> patches should have been picked up at the time they were authored,
> not post-factum.

I agree, but it is the person who knows the code who should decide (a)
whether the patch in bugzilla is what should go upstream and (b)
whether the patch has actually fixed the bug, and take the action to
push the patch upstream.  That is logically the job of whoever knows
the code best, and they should do it as soon as the patch is known to
have actually fixed the bug.

> > I want to see a notifier list exported by eeh.c as I proposed in a
> > previous email before that goes upstream.
>
> Its currently implemented as a work queue. Is that acceptable?
> To keep gregkh happy, I'll move the work-queue to
> drivers/pci/hotplug/rpaphp_eeh.c, will this work?

It's not the work queue that is the problem, it is that the EEH code
is taking a decision about what hotplug should do.  I am saying that
the EEH code should offer to provide notifications to any interested
code about slot isolation events.  The slot isolation event is a fact,
the request to do an unplug operation is policy.  Let's leave the
policy up to the rpaphp driver and/or userspace.

Specifically, I think we should leave the workqueue in eeh.c, replace
eeh_disable_slot with a notifier list, and move the check for ethernet
devices to rpaphp.c.  The eeh_register_disable_func call then changes
to notifier_chain_register(&eeh_slot_isolation_list, &my_notify_block)
or somesuch.

> Yes well, I and others were operating under the assumption that you
> were actually monitoring bugzilla, and pulling the patches & explanations
> from there, merging and sending them upstream.  Clearly, this did not
> occur.  The process broke down.  How will we do better next time?

I guess we need to have it clearly stated who is the designated person
for each area of the code.  Within arch/ppc64 and include/asm-ppc64, I
would like to have people who take responsibility for legacy iSeries,
RAS, DLPAR, EEH, PCI, NUMA, oprofile/perfctr, and Powermac support.
Stephen Rothwell has been pretty much looking after the iSeries stuff,
and Ben H is clearly the powermac maintainer.  So far, Anton seems to
be the expert on NUMA and oprofile/perfctr.  We have a whole team
doing DLPAR work and I look to people like John Rose to say what
should go upstream in that area.  It's been an informal process so far
which has worked in some areas but not in others.

> Its an inefficient, error-prone process.  I can generate more patches
> a day than I am able to track by email.  Ergo, patch-tracking is a
> bottleneck.

The discipline of sending patches with explanations is that it forces
you to write a description of the changes that will give someone who
doesn't know the history a reasonable idea of the motivation behind
them.  That turns out to be an important part of keeping the kernel
maintainable - the fact that you can look at a year-old change and get
a reasonable idea of what the patch author was trying to achieve is
invaluable.  That is far more important than having people churn out
patches at the maximum possible rate.  One of the big reasons why the
ameslab BK tree doesn't work as a source of patches to send upstream
is that many people haven't been putting in much in the way of
changeset descriptions.

Paul.

** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/