Benjamin Herrenschmidt writes: > Do we have any indication that it performs better than the C one ? I would expect it to, given that the assembler one has two branches in the per-byte loop compared to 3 in the C version. Paul.