MoSal

joined 1 year ago
[–] MoSal@lemm.ee 5 points 3 months ago (1 children)

This will not work on Unicode

Correct.

Greek and Arabic have cases as well

Who told you that?

[–] MoSal@lemm.ee 1 points 1 year ago* (last edited 1 year ago) (1 children)

Okay. I updated mold to v2.0.0. Added "-Z", "time-passes" to get link times, ran cargo with --timings to get CPU utilization graphs. Tested on two projects of mine (the one from yesterday is "X").

Link times are picked as the best from 3-4 runs, changing only white space on main.rs.

lto="fat" lld mold
project X (cu=1) 105.923 106.380
Project X (cu=8) 103.512 103.513
Project S (cu=1) 94.290 94.969
Project S (cu=8) 100.118 100.449

Observations (lto="fat"): As expected, not a lot of utilization of multi-core. Using codegen-units larger than 1 may even cause a regression in link time. Choice of linker between lld and mold appears to be of no significance.


lto="thin" lld mold
project X (cu=1) 46.596 47.118
Project X (cu=8) 34.167 33.839
Project X (cu=16) 36.296 36.621
Project S (cu=1) 41.817 41.404
Project S (cu=8) 32.062 32.162
Project S (cu=16) 35.780 36.074

Observations (lto="thin"): Here, we see parallel LLVM_lto_optimize runs kicking in. Testing with codegen-units=16 was also done. In that case, the number of parallel LLVM_lto_optimize runs was so big, the synchronization overhead caused a regression running that test on a humble workstation powered by an Intel i7-7700K processor (4 physical, 8 logical cores only). The results will probably look different running this test case (cu=16) in a more powerful setup. But still, the choice of linker between lld and mold appears to be of no significance.


lto=false lld mold
project X (cu=1) 29.160 29.231
Project X (cu=8) 8.130 8.293
Project X (cu=16) 7.076 6.953
Project S (cu=1) 11.996 12.069
Project S (cu=8) 4.418 4.462
Project S (cu=16) 4.357 4.455

Observations (lto=false): Here, codegen-units becomes the dominant factor with no heavy LLVM_lto_optimize runs involved. Going above codegen-units=8 does not hurt link time. Still, the choice of linker between lld and mold appears to be of no significance.


lto="off" lld mold
project X (cu=1) 29.109 29.201
Project X (cu=8) 5.896 6.117
Project X (cu=16) 3.479 3.637
Project S (cu=1) 11.732 11.742
Project S (cu=8) 2.354 2.355
Project S (cu=16) 1.517 1.499

Observations (lto="off"): Same observations as lto=false. Still, the choice of linker between lld and mold appears to be of no significance.


Debug builds link in <.4 seconds.

[–] MoSal@lemm.ee 6 points 1 year ago (3 children)

codegen-units=1, debug=true, varying lto

lto = "fat"

Flags Clean build time Pre-strip size Post-strip size
(default) 2:31 90.8207MiB 7.3374MiB
["-Z", "gcc-ld=lld"] 2:31 91.9731MiB 7.3332MiB
linker = "clang" 2:32 90.8207MiB 7.3375MiB
linker = "clang"; fuse-ld="mold" 2:31 92.1107MiB 7.3334MiB

lto = "thin"

Flags Clean build time Pre-strip size Post-strip size
(default) 1:33 96.9630MiB 8.1695MiB
["-Z", "gcc-ld=lld"] 1:32 98.3889MiB 8.1777MiB
linker = "clang" 1:33 96.9631MiB 8.1695MiB
linker = "clang"; fuse-ld="mold" 1:32 98.6903MiB 8.1797MiB

lto = false

Flags Clean build time Pre-strip size Post-strip size
(default) 1:32 113.5656MiB 8.0601MiB
["-Z", "gcc-ld=lld"] 1:30 115.1210MiB 8.1122MiB
linker = "clang" 1:32 113.5656MiB 8.0602MiB
linker = "clang"; fuse-ld="mold" 1:31 115.4679MiB 8.0663MiB

lto = "off"

Flags Clean build time Pre-strip size Post-strip size
(default) 1:33 113.5666MiB 8.0601MiB
["-Z", "gcc-ld=lld"] 1:31 115.1231MiB 8.1122MiB
linker = "clang" 1:32 113.5667MiB 8.0602MiB
linker = "clang"; fuse-ld="mold" 1:31 115.4697MiB 8.0662MiB

codegen-units=8, debug=true, varying lto

lto = "fat"

Flags Clean build time Pre-strip size Post-strip size
(default) 2:21 104.9842MiB 7.6304MiB
["-Z", "gcc-ld=lld"] 2:19 106.1436MiB 7.6264MiB
linker = "clang" 2:21 104.9882MiB 7.6344MiB
linker = "clang"; fuse-ld="mold" 2:19 106.2864MiB 7.6325MiB

lto = "thin"

Flags Clean build time Pre-strip size Post-strip size
(default) 1:12 134.1112MiB 9.0445MiB
["-Z", "gcc-ld=lld"] 1:09 136.1897MiB 9.0660MiB
linker = "clang" 1:12 134.1113MiB 9.0446MiB
linker = "clang"; fuse-ld="mold" 1:09 136.4466MiB 9.0494MiB

lto = false

Flags Clean build time Pre-strip size Post-strip size
(default) 1:14 158.1049MiB 9.0328MiB
["-Z", "gcc-ld=lld"] 1:11 159.9998MiB 9.1129MiB
linker = "clang" 1:14 158.1050MiB 9.0328MiB
linker = "clang"; fuse-ld="mold" 1:12 160.3123MiB 9.0428MiB

lto = "off"

Flags Clean build time Pre-strip size Post-strip size
(default) 0:57 145.9463MiB 9.4586MiB
["-Z", "gcc-ld=lld"] 0:54 148.6021MiB 9.6001MiB
linker = "clang" 0:57 145.9464MiB 9.4587MiB
linker = "clang"; fuse-ld="mold" 0:55 148.8842MiB 9.4668MiB

mold appears to be similar but not faster than lld.

With the caveat that this is not a proper benchmark since:

  • I didn't measure link time alone.
  • I didn't bother running each case multiple times picking the fastest run (since I perceived the differences to be insignificant).

And a side note, lto = false appears to be practically useless.