Skip to content

Quadratic complexity for adding binary data #1168

@01mf02

Description

@01mf02

What version are you using (fq -v)?

$ fq -v
0.15.1 (linux amd64 go1.24.5)

How was fq installed?

From Arch Linux repository:

pacman -S fq

What did you do?

I tried to find out the complexity of the addition operation for binary data in fq.

What result did you expect?

I made a little jq prototype myself that has binary data support. Here, adding binary data takes linear time:

$ hyperfine -M 2 -L n 10000,100000,1000000 "cargo run --release -p jaq -- -n 'add(limit({n}; repeat(0 | tobytes))) | empty'"
Benchmark 1: cargo run --release -p jaq -- -n 'add(limit(10000; repeat(0 | tobytes))) | empty'
  Time (mean ± σ):     193.1 ms ± 155.1 ms    [User: 65.7 ms, System: 25.0 ms]
  Range (min … max):    83.4 ms … 302.8 ms    2 runs
 
Benchmark 2: cargo run --release -p jaq -- -n 'add(limit(100000; repeat(0 | tobytes))) | empty'
  Time (mean ± σ):     173.6 ms ±  54.4 ms    [User: 108.2 ms, System: 31.1 ms]
  Range (min … max):   135.1 ms … 212.0 ms    2 runs
 
Benchmark 3: cargo run --release -p jaq -- -n 'add(limit(1000000; repeat(0 | tobytes))) | empty'
  Time (mean ± σ):     661.6 ms ±   0.0 ms    [User: 627.4 ms, System: 32.7 ms]
  Range (min … max):   661.5 ms … 661.6 ms    2 runs

What did you see instead?

In fq, the complexity seems to be at least quadratic. This can be seen between the second and third benchmark: Multiplying the number of input elements by 10 multiplies the runtime by more than 40. Compared to my prototype, the last benchmark is also more than 100 times slower in fq.

$ hyperfine -M 2 -L n 10000,100000,1000000 "fq -n 'add(limit({n}; repeat(0 | tobytes))) | empty'"
Benchmark 1: fq -n 'add(limit(10000; repeat(0 | tobytes))) | empty'
  Time (mean ± σ):     146.1 ms ±   0.9 ms    [User: 298.1 ms, System: 19.9 ms]
  Range (min … max):   145.4 ms … 146.7 ms    2 runs
 
Benchmark 2: fq -n 'add(limit(100000; repeat(0 | tobytes))) | empty'
  Time (mean ± σ):      1.972 s ±  0.001 s    [User: 7.597 s, System: 0.162 s]
  Range (min … max):    1.971 s …  1.973 s    2 runs
 
Benchmark 3: fq -n 'add(limit(1000000; repeat(0 | tobytes))) | empty'
  Time (mean ± σ):     86.166 s ±  0.195 s    [User: 520.231 s, System: 2.294 s]
  Range (min … max):   86.028 s … 86.304 s    2 runs

My guess is that when fq does l + r for binary data l, r, it creates a new l' = l in memory distinct from l, which results in quadratic runtime.
jq avoids this e.g. for string types, by "reusing" the l if the current position is the only place where l is referenced:

$ hyperfine -M 2 -L n 10000,100000,1000000 "jq -n 'add(limit({n}; repeat([0] | implode))) | empty'"
Benchmark 1: jq -n 'add(limit(10000; repeat([0] | implode))) | empty'
  Time (mean ± σ):      10.1 ms ±   0.3 ms    [User: 9.7 ms, System: 0.2 ms]
  Range (min … max):     9.9 ms …  10.3 ms    2 runs
 
Benchmark 2: jq -n 'add(limit(100000; repeat([0] | implode))) | empty'
  Time (mean ± σ):      83.2 ms ±   1.1 ms    [User: 82.0 ms, System: 1.1 ms]
  Range (min … max):    82.5 ms …  84.0 ms    2 runs
 
Benchmark 3: jq -n 'add(limit(1000000; repeat([0] | implode))) | empty'
  Time (mean ± σ):     793.8 ms ±   3.3 ms    [User: 786.3 ms, System: 4.2 ms]
  Range (min … max):   791.5 ms … 796.2 ms    2 runs
 
Summary
  jq -n 'add(limit(10000; repeat([0] | implode))) | empty' ran
    8.22 ± 0.26 times faster than jq -n 'add(limit(100000; repeat([0] | implode))) | empty'
   78.44 ± 2.27 times faster than jq -n 'add(limit(1000000; repeat([0] | implode))) | empty'

Here, we see linear runtime as well. The same works in fq, too. So perhaps this technique could be applied to binary data in fq as well.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions