Quadratic complexity for adding binary data

### What version are you using (`fq -v`)?

<pre>
$ fq -v
0.15.1 (linux amd64 go1.24.5)
</pre>

### How was fq installed?

From Arch Linux repository:

    pacman -S fq

### What did you do?

I tried to find out the complexity of the addition operation for binary data in fq.

### What result did you expect?

I made a little jq prototype myself that has binary data support. Here, adding binary data takes linear time:

~~~
$ hyperfine -M 2 -L n 10000,100000,1000000 "cargo run --release -p jaq -- -n 'add(limit({n}; repeat(0 | tobytes))) | empty'"
Benchmark 1: cargo run --release -p jaq -- -n 'add(limit(10000; repeat(0 | tobytes))) | empty'
  Time (mean ± σ):     193.1 ms ± 155.1 ms    [User: 65.7 ms, System: 25.0 ms]
  Range (min … max):    83.4 ms … 302.8 ms    2 runs
 
Benchmark 2: cargo run --release -p jaq -- -n 'add(limit(100000; repeat(0 | tobytes))) | empty'
  Time (mean ± σ):     173.6 ms ±  54.4 ms    [User: 108.2 ms, System: 31.1 ms]
  Range (min … max):   135.1 ms … 212.0 ms    2 runs
 
Benchmark 3: cargo run --release -p jaq -- -n 'add(limit(1000000; repeat(0 | tobytes))) | empty'
  Time (mean ± σ):     661.6 ms ±   0.0 ms    [User: 627.4 ms, System: 32.7 ms]
  Range (min … max):   661.5 ms … 661.6 ms    2 runs
~~~

### What did you see instead?

In fq, the complexity seems to be at least quadratic. This can be seen between the second and third benchmark: Multiplying the number of input elements by 10 multiplies the runtime by more than 40. Compared to my prototype, the last benchmark is also more than 100 times slower in fq.

~~~
$ hyperfine -M 2 -L n 10000,100000,1000000 "fq -n 'add(limit({n}; repeat(0 | tobytes))) | empty'"
Benchmark 1: fq -n 'add(limit(10000; repeat(0 | tobytes))) | empty'
  Time (mean ± σ):     146.1 ms ±   0.9 ms    [User: 298.1 ms, System: 19.9 ms]
  Range (min … max):   145.4 ms … 146.7 ms    2 runs
 
Benchmark 2: fq -n 'add(limit(100000; repeat(0 | tobytes))) | empty'
  Time (mean ± σ):      1.972 s ±  0.001 s    [User: 7.597 s, System: 0.162 s]
  Range (min … max):    1.971 s …  1.973 s    2 runs
 
Benchmark 3: fq -n 'add(limit(1000000; repeat(0 | tobytes))) | empty'
  Time (mean ± σ):     86.166 s ±  0.195 s    [User: 520.231 s, System: 2.294 s]
  Range (min … max):   86.028 s … 86.304 s    2 runs
~~~

My guess is that when fq does `l + r` for binary data `l, r`, it creates a new `l' = l` in memory distinct from `l`, which results in quadratic runtime.
jq avoids this e.g. for string types, by "reusing" the `l` if the current position is the only place where `l` is referenced:

~~~
$ hyperfine -M 2 -L n 10000,100000,1000000 "jq -n 'add(limit({n}; repeat([0] | implode))) | empty'"
Benchmark 1: jq -n 'add(limit(10000; repeat([0] | implode))) | empty'
  Time (mean ± σ):      10.1 ms ±   0.3 ms    [User: 9.7 ms, System: 0.2 ms]
  Range (min … max):     9.9 ms …  10.3 ms    2 runs
 
Benchmark 2: jq -n 'add(limit(100000; repeat([0] | implode))) | empty'
  Time (mean ± σ):      83.2 ms ±   1.1 ms    [User: 82.0 ms, System: 1.1 ms]
  Range (min … max):    82.5 ms …  84.0 ms    2 runs
 
Benchmark 3: jq -n 'add(limit(1000000; repeat([0] | implode))) | empty'
  Time (mean ± σ):     793.8 ms ±   3.3 ms    [User: 786.3 ms, System: 4.2 ms]
  Range (min … max):   791.5 ms … 796.2 ms    2 runs
 
Summary
  jq -n 'add(limit(10000; repeat([0] | implode))) | empty' ran
    8.22 ± 0.26 times faster than jq -n 'add(limit(100000; repeat([0] | implode))) | empty'
   78.44 ± 2.27 times faster than jq -n 'add(limit(1000000; repeat([0] | implode))) | empty'
~~~

Here, we see linear runtime as well. The same works in fq, too. So perhaps this technique could be applied to binary data in fq as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quadratic complexity for adding binary data #1168

What version are you using (`fq -v`)?

How was fq installed?

What did you do?

What result did you expect?

What did you see instead?

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Quadratic complexity for adding binary data #1168

Description

What version are you using (fq -v)?

How was fq installed?

What did you do?

What result did you expect?

What did you see instead?

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

What version are you using (`fq -v`)?