毁三观的 Java for 循环语句优化！

据说，这是一道谷歌的面试题。

就是有 3 个 for 循环，有的程序员会把 3 个一样的 for 循环合成一个来写，有的把一个 for 循环，扯成 3 个来写。比如下面的这道题。

//第一种情况
for(int i=0;i<100;i++){
    //opr 1
}
for(int i=0;i<100;i++){
    //opr 2
}
for(int i=0;i<100;i++){
    //opr 3
}
//第二种情况
for(int i=0;i<100;i++){
    //opr 1
    //opr 2
    //opr 3
}

那么从效率等方面来说，上述两种 Java for 循环有区别吗?

我相信大部分人都认同，第二种情况效率更高。但实际是这样吗？我们来实验一下！

先说结论：单个多操作的循环性能和多个单操作循环差不多。大多数情况前者会好一点点。不过可以忽略。

现在我们来看我们的实验。分三种情况。

第一种，循环里直接执行简单操作。

public static long oneLoop(int times) {
    long start = System.nanoTime();
    for (int i = 0, x = 0; i < times; i++) {
        x++; x++; x++;
    }
    long end = System.nanoTime();
    return end - start;
}
public static long threeLoop(int times) {
    long start = System.nanoTime();
    for (int i = 0, x = 0; i < times; i++) {
        x++;
    }
    for (int i = 0, x = 0; i < times; i++) {
        x++;
    }
    for (int i = 0, x = 0; i < times; i++) {
        x++;
    }
    long end = System.nanoTime();
    return end - start;
}

第二种，循环里调用简单方法。

private static int plusOne(int i) {
    return ++i;
}
public static long oneLoopCallMethod(int times) {
    long start = System.nanoTime();
    for (int i = 0, x = 0; i < times; i++) {
        x = plusOne(x);
        x = plusOne(x);
        x = plusOne(x);
    }
    long end = System.nanoTime();
    return end - start;
}
public static long threeLoopCallMethod(int times) {
    long start = System.nanoTime();
    for (int i = 0, x = 0; i < times; i++) {
        x = plusOne(x);
    }
    for (int i = 0, x = 0; i < times; i++) {
        x = plusOne(x);
    }
    for (int i = 0, x = 0; i < times; i++) {
        x = plusOne(x);
    }
    long end = System.nanoTime();
    return end - start;
}

第三种，循环里模拟比较复杂的调用。

private static final Random R = new Random();
private static final char[] LETTERS = "abcdefghijklmnopqrstuvwxyz".toCharArray();
// 方法1
private static String randomWord() {
    int length = R.nextInt(10);
    char[] chars = new char[length];
    for (int i = 0; i < length; i++) {
        chars[i] = LETTERS[R.nextInt(LETTERS.length)];
    }
    return new String(chars);
}
// 方法2
private static String randomUpperWord() {
    return randomWord().toUpperCase();
}
// 方法3
private static int wordLength() {
    return randomWord().length();
}
// 测试
public static long oneLoopThreeOperation(int times) {
    long start = System.nanoTime();
    for (int i = 0; i < times; i++) {
        randomWord();
        randomUpperWord();
        wordLength();
    }
    long end = System.nanoTime();
    return end - start;
}
// 对应测试拆分成3个for循环
public static long threeLoopThreeOperation(int times) {
    long start = System.nanoTime();
    for (int i = 0; i < times; i++) {
        randomWord();
    }
    for (int i = 0; i < times; i++) {
        randomUpperWord();
    }
    for (int i = 0; i < times; i++) {
        wordLength();
    }
    long end = System.nanoTime();
    return end - start;
}

测试框架很简单，单元测试内循环 10000 次，测总时间（纳秒）。然后每个单元测试重复100次，丢弃前 10 次预热结果，计平均时间。

long[] result = new long[6];
int loops= 100;
for (int i = 0; i < loops; i++) {
    if (i >= 10) { // 丢弃前10次预热
        result[0] += oneLoop(10000);
        result[1] += threeLoop(10000);
        result[2] += oneLoopCallMethod(10000);
        result[3] += threeLoopCallMethod(10000);
        result[4] += oneLoopThreeOperation(10000);
        result[5] += threeLoopThreeOperation(10000);
    }
}
// 打印以上时间/90

结果有点出乎意料。

简单自增操作，单个for比三个for快一点。
调用同一个方法，单个for比三个for慢了接近一倍。
接下来复杂场景下的结果恢复正常，单个循环还是要好一些。

为了搞清楚为什么第二种方法为什么用三个for循环反而快，用 -XX:+PrintCompilation 参数打印了 TIJ 的编译情况。

真实的原因，因为用一个循环，只会内联 plusOne() 方法。

for (int i = 0, x = 0; i < times; i++) {
    x = plusOne(x);
    x = plusOne(x);
    x = plusOne(x);
}

但如果是相同的三次循环，TIJ 有可能直接把整个循环编译了。后面两次就会节省时间。

for (int i = 0, x = 0; i < times; i++) {
    x = plusOne(x);
}

for (int i = 0, x = 0; i < times; i++) { // TIJ编译了整个循环
    x = plusOne(x);
}

for (int i = 0, x = 0; i < times; i++) {
    x = plusOne(x);
}

有网友在 github 上，对这个测试改成 C++ 扩大到跑10亿(1E9)次计数。我想这个测试以后可能会被各个语言给玩坏！

对于这个结论，有 CPU 设计团队的人员回复到。针对于 for 的性能，cpu 使用的是 branch predictor 中的 loop 模块。这个模块会根据历史信息，去预测 for 循环未来的行为。

并从 3 个方面给出了分析：

执行性能：一个for循环比三个for循环，cpu在第一次训练过程中，会少出现两次misp，但是，在以后再次遇到这个for语句的预测过程中两种写法是一样的。总结：一个for循环会节省几十个时钟周期，具体数字和cpu架构和芯片频率有关，大约5ns左右。
占用CPU资源：一个for循环会比三个for循环更少。这个资源只是预测中一个子模块的资源，并不是执行单元的资源。虽然，节省三分之一，但是，并不是很多。
Java代码可读性：代码的可读性和可维护性，对于软件工程师很重要的。以上三个for循环，所多占用的CPU资源和耽误的时间，对于整个程序来讲，实在微不足道。对于初学者，写出整洁的代码是十分重要的。

本文由创作，采用知识共享署名4.0 国际许可协议进行许可。本站文章除注明转载/出处外，均为本站原创或翻译，转载前请务必署名。最后编辑时间为: 2021/04/17 03:22