What version are you using (fq -v)?
$ fq -v
0.15.1 (linux amd64 go1.24.5)
How was fq installed?
From Arch Linux repository:
What did you do?
I tried to find out the complexity of the addition operation for binary data in fq.
What result did you expect?
I made a little jq prototype myself that has binary data support. Here, adding binary data takes linear time:
$ hyperfine -M 2 -L n 10000,100000,1000000 "cargo run --release -p jaq -- -n 'add(limit({n}; repeat(0 | tobytes))) | empty'"
Benchmark 1: cargo run --release -p jaq -- -n 'add(limit(10000; repeat(0 | tobytes))) | empty'
Time (mean ± σ): 193.1 ms ± 155.1 ms [User: 65.7 ms, System: 25.0 ms]
Range (min … max): 83.4 ms … 302.8 ms 2 runs
Benchmark 2: cargo run --release -p jaq -- -n 'add(limit(100000; repeat(0 | tobytes))) | empty'
Time (mean ± σ): 173.6 ms ± 54.4 ms [User: 108.2 ms, System: 31.1 ms]
Range (min … max): 135.1 ms … 212.0 ms 2 runs
Benchmark 3: cargo run --release -p jaq -- -n 'add(limit(1000000; repeat(0 | tobytes))) | empty'
Time (mean ± σ): 661.6 ms ± 0.0 ms [User: 627.4 ms, System: 32.7 ms]
Range (min … max): 661.5 ms … 661.6 ms 2 runs
What did you see instead?
In fq, the complexity seems to be at least quadratic. This can be seen between the second and third benchmark: Multiplying the number of input elements by 10 multiplies the runtime by more than 40. Compared to my prototype, the last benchmark is also more than 100 times slower in fq.
$ hyperfine -M 2 -L n 10000,100000,1000000 "fq -n 'add(limit({n}; repeat(0 | tobytes))) | empty'"
Benchmark 1: fq -n 'add(limit(10000; repeat(0 | tobytes))) | empty'
Time (mean ± σ): 146.1 ms ± 0.9 ms [User: 298.1 ms, System: 19.9 ms]
Range (min … max): 145.4 ms … 146.7 ms 2 runs
Benchmark 2: fq -n 'add(limit(100000; repeat(0 | tobytes))) | empty'
Time (mean ± σ): 1.972 s ± 0.001 s [User: 7.597 s, System: 0.162 s]
Range (min … max): 1.971 s … 1.973 s 2 runs
Benchmark 3: fq -n 'add(limit(1000000; repeat(0 | tobytes))) | empty'
Time (mean ± σ): 86.166 s ± 0.195 s [User: 520.231 s, System: 2.294 s]
Range (min … max): 86.028 s … 86.304 s 2 runs
My guess is that when fq does l + r for binary data l, r, it creates a new l' = l in memory distinct from l, which results in quadratic runtime.
jq avoids this e.g. for string types, by "reusing" the l if the current position is the only place where l is referenced:
$ hyperfine -M 2 -L n 10000,100000,1000000 "jq -n 'add(limit({n}; repeat([0] | implode))) | empty'"
Benchmark 1: jq -n 'add(limit(10000; repeat([0] | implode))) | empty'
Time (mean ± σ): 10.1 ms ± 0.3 ms [User: 9.7 ms, System: 0.2 ms]
Range (min … max): 9.9 ms … 10.3 ms 2 runs
Benchmark 2: jq -n 'add(limit(100000; repeat([0] | implode))) | empty'
Time (mean ± σ): 83.2 ms ± 1.1 ms [User: 82.0 ms, System: 1.1 ms]
Range (min … max): 82.5 ms … 84.0 ms 2 runs
Benchmark 3: jq -n 'add(limit(1000000; repeat([0] | implode))) | empty'
Time (mean ± σ): 793.8 ms ± 3.3 ms [User: 786.3 ms, System: 4.2 ms]
Range (min … max): 791.5 ms … 796.2 ms 2 runs
Summary
jq -n 'add(limit(10000; repeat([0] | implode))) | empty' ran
8.22 ± 0.26 times faster than jq -n 'add(limit(100000; repeat([0] | implode))) | empty'
78.44 ± 2.27 times faster than jq -n 'add(limit(1000000; repeat([0] | implode))) | empty'
Here, we see linear runtime as well. The same works in fq, too. So perhaps this technique could be applied to binary data in fq as well.
What version are you using (
fq -v)?How was fq installed?
From Arch Linux repository:
What did you do?
I tried to find out the complexity of the addition operation for binary data in fq.
What result did you expect?
I made a little jq prototype myself that has binary data support. Here, adding binary data takes linear time:
What did you see instead?
In fq, the complexity seems to be at least quadratic. This can be seen between the second and third benchmark: Multiplying the number of input elements by 10 multiplies the runtime by more than 40. Compared to my prototype, the last benchmark is also more than 100 times slower in fq.
My guess is that when fq does
l + rfor binary datal, r, it creates a newl' = lin memory distinct froml, which results in quadratic runtime.jq avoids this e.g. for string types, by "reusing" the
lif the current position is the only place wherelis referenced:Here, we see linear runtime as well. The same works in fq, too. So perhaps this technique could be applied to binary data in fq as well.