iommu_alloc failure and panic

Mark Haverkamp markh at osdl.org
Sat Jan 28 11:00:47 EST 2006


On Sat, 2006-01-28 at 10:48 +1100, Anton Blanchard wrote:
> Hi,
> 
> > I have been testing large numbers of storage devices.  I have 16000 scsi
> > LUNs configured.  (4000 fiberchannel LUNS seen 4 times).  They are
> > configured as 4000 multipath devices using device mapper.  I have 100 of
> > those devices configured as a logical volume using LVM2.  Once I start
> > bonnie++ using one of those logical volumes I start seeing iommu_alloc
> > failures and then a panic.  I don't have this issue when accessing the
> > individual devices or the individual multipath devices.  Only when
> > conglomerated into a logical volume.
> > 
> > Here is the syslog output:
> > 
> > Jan 26 14:56:41 linux kernel: iommu_alloc failed, tbl c00000000263c480 vaddr c0000000d7967000 npages 10
> 
> This stuff should be OK since the lpfc driver should handle the iommu
> filling up. Im guessing since you have so many LUNs you can get a whole
> lot of outstanding IO, enough to fill up the TCE table.
> 
> > DAR: C000000600711710
> 
> > NIP [C00000000000F7D0] .validate_sp+0x30/0xc0
> > LR [C00000000000FA2C] .show_stack+0xec/0x1d0
> > Call Trace:
> > [C0000000076DBBC0] [C00000000000FA18] .show_stack+0xd8/0x1d0 (unreliable)
> > [C0000000076DBC60] [C000000000433838] .schedule+0xd78/0xfb0
> > [C0000000076DBDE0] [C000000000079FC0] .worker_thread+0x1b0/0x1c0
> 
> This is interesting, an oops in validate_sp which is a pretty boring
> function. We have seen this in the past when you overflow your kernel
> stack. The bottom of the kernel stack contains the threadinfo struct and
> in that we have the task_cpu() data.
> 
> When we overflow the stack and overwrite the threadinfo we end up with 
> a crazy number for task_cpu() and then oops when doing 
> hardirq_ctx[task_cpu(p)]. Can you turn on DEBUG_STACKOVERFLOW and
> DEBUG_STACK_USAGE and see if you get any stack overflow warnings? 

It looks like they are already on:

CONFIG_DEBUG_STACKOVERFLOW=y
CONFIG_DEBUG_STACK_USAGE=y

> The
> second option will also allow you to do sysrq t and dump the most stack
> each process has used at that point.

The machine is frozen after the panic. 


Mark.


> 
> Anton
-- 
Mark Haverkamp <markh at osdl.org>




More information about the Linuxppc64-dev mailing list