[PATCH] mlx4_en: map entire pages to increase throughput

Tue Jul 17 07:08:30 EST 2012

I was thinking more along the lines of an additional comparison,
explicitly using netperf TCP_RR or something like it, not just the
packets per second from a bulk transfer test.

rick
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

> I used a uperf profile that is similar to TCP_RR. It writes, then reads
> some bytes. I kept the TCP_NODELAY flag.
>
> Without the patch, I saw the following:
>
> packet size	ops/s		Gb/s
> 1		337024		0.0027
> 90		276620		0.199
> 900		190455		1.37
> 4000		68863		2.20
> 9000		45638		3.29
> 60000		9409		4.52
>
> With the patch:
>
> packet size	ops/s		Gb/s
> 1		451738		0.0036
> 90		345682		0.248
> 900		272258		1.96
> 4000		127055		4.07
> 9000		106614		7.68
> 60000		30671		14.72
>

So, on the surface it looks like it did good things for PPS, though it 
would be nice to know what the CPU utilizations/service demands were as 
a sanity check - does uperf not have that sort of functionality?

I'm guessing there were several writes at a time - the 1 byte packet 
size (sic - that is payload, not packet, and without TCP_NODELAY not 
even payload necessarily) How many writes does it have outstanding 
before it does a read?  And does it take care to build-up to that number 
of writes to avoid batching during slowstart, even with TCP_NODELAY set?

rick jones