Skip to content

Benchmark Workflows

SecureVault includes two benchmark workflows under secure-vault/benchmark and secure-vault/scripts/benchmark.

Use them for different questions:

  • Use the retrieval benchmark when you want to show search responsiveness after documents are already indexed.
  • Use the pipeline benchmark when you want to show end-to-end semantic indexing and ranking quality with the live Google embedding path.

What is in the repository

  • Evaluator-facing reports and methodology notes live in secure-vault/benchmark.
  • Executable benchmark scripts live in secure-vault/scripts/benchmark.
  • retrieval/ contains the latency benchmark entrypoint and report builders.
  • pipeline/ contains the end-to-end accuracy benchmark entrypoint and report builders.
  • shared/runtime.ts loads .env.local or .env from secure-vault/ and checks MariaDB vector support before a run starts.

Which benchmark to run

BenchmarkCommandBest forIncludes live embeddings
Retrieval latencynpm run benchmark:semanticShowing semantic and hybrid search response time after indexingNo
Pipeline accuracynpm run benchmark:semantic:pipelineShowing indexing behavior and retrieval quality end to endYes

Latest benchmark results

The latest checked-in reports were generated on 2026-04-20.

Retrieval latency report:

BenchmarkSamplesAvgP50P95P99MaxAvg results
Semantic retrieval only151015.31 ms1005.73 ms1104.04 ms1104.04 ms1104.04 ms10.00
Hybrid retrieval151013.42 ms1016.98 ms1063.83 ms1063.83 ms1063.83 ms10.00

Pipeline accuracy report:

SuiteBenchmarkTop-1 AccuracyTop-3 RecallMRRAvg query timeAvg indexing time
ControlledSemantic100.0%100.0%1.000456.50 ms5691.43 ms
ControlledHybrid100.0%100.0%1.000781.83 ms5691.43 ms
StressSemantic50.0%100.0%0.750749.99 ms6800.25 ms
StressHybrid50.0%100.0%0.750562.53 ms6800.25 ms

Use these numbers as the current evaluator-facing snapshot. Re-run the benchmark scripts before demos when hardware, MariaDB version, embedding settings, or dataset size changes.

Prerequisites

Run both commands from secure-vault/.

Both benchmarks need:

  • MariaDB running and reachable through DATABASE_HOST, DATABASE_PORT, DATABASE_USER, DATABASE_PASSWORD, and DATABASE_NAME
  • MariaDB vector support available for VEC_FromText(...) and vec_distance_cosine(...)
  • a local env file in secure-vault/.env.local or secure-vault/.env if the values are not already exported in your shell

The pipeline benchmark also needs:

  • SEMANTIC_INDEXING_ENABLED=true
  • SEMANTIC_INDEXING_PROVIDER=google
  • GEMINI_API_KEY set

Recommended local path for the pipeline benchmark:

env
SEMANTIC_INDEXING_ENABLED=true
SEMANTIC_INDEXING_EXECUTION_MODE=inline
SEMANTIC_INDEXING_PROVIDER=google
GEMINI_API_KEY=<your-key>

inline is the simpler local mode. If you intentionally switch to queued, the semantic config also requires Redis to be configured and available.

Retrieval benchmark

Use this benchmark for evaluator-facing latency numbers after indexing is already complete.

Command:

powershell
cd secure-vault
npm run benchmark:semantic

What it does:

  • checks that MariaDB vector functions are available
  • seeds a synthetic benchmark user, files, embedding jobs, and embedding chunks
  • runs semantic-only and hybrid retrieval through the real application search path
  • writes markdown and JSON reports
  • deletes the seeded benchmark data in a cleanup step

Useful flags:

powershell
npm run benchmark:semantic -- --themes 6 --files-per-theme 500 --chunks-per-file 3 --queries-per-theme 5

Available options:

  • --themes
  • --files-per-theme
  • --chunks-per-file
  • --queries-per-theme
  • --warmup-runs
  • --query-top-k
  • --file-batch-size
  • --chunk-batch-size
  • --output-dir

Default output files:

Read the generated report as a responsiveness benchmark, not an accuracy benchmark. It excludes live embedding latency by design.

Pipeline benchmark

Use this benchmark when you need evidence that SecureVault can index benchmark documents end to end and still retrieve the correct file.

Command:

powershell
cd secure-vault
npm run benchmark:semantic:pipeline

What it does:

  • checks MariaDB vector support
  • validates that semantic indexing is enabled with the Google provider
  • generates temporary benchmark PDFs
  • chunks and embeds them through the real semantic pipeline
  • stores vectors in MariaDB
  • runs semantic-only and hybrid retrieval for benchmark queries
  • writes markdown and JSON reports

Useful flags:

powershell
npm run benchmark:semantic:pipeline -- --suite stress --themes 4 --files-per-theme 3

Available options:

  • --themes
  • --files-per-theme
  • --suite controlled|stress|both
  • --output-dir

Default output files:

The benchmark runs two suites by default:

  • controlled for clean, direct phrasing and easier topical separation
  • stress for paraphrased queries and confusable same-theme documents

Use the pipeline report for ranking quality discussions. Pair it with the retrieval benchmark if you also want to show search speed.

Reading the reports

Use the retrieval report to answer "How fast is search after indexing?"

  • focus on average latency and P95
  • compare semantic-only and hybrid retrieval at the same dataset size

Use the pipeline report to answer "How well does the system retrieve the right file?"

  • focus on Top-1 Accuracy, Top-3 Recall, and MRR together
  • treat controlled as the cleaner correctness benchmark
  • treat stress as the more realistic retrieval benchmark

Built with VitePress and deployed through GitHub Pages.