Today in Micronaut optimization: The left code is almost twice as fast as the right code… Until it's compiled by C2, then it's slower. It's amusing to watch the benchmark *slow down* during the warmup phase.
I don't do this level of optimization very often, but my guess is that the CPU really does not like a data dependency between loop iterations.