Benchmark Workflows
SecureVault includes two benchmark workflows under secure-vault/benchmark and secure-vault/scripts/benchmark.
Use them for different questions:
- Use the retrieval benchmark when you want to show search responsiveness after documents are already indexed.
- Use the pipeline benchmark when you want to show end-to-end semantic indexing and ranking quality with the live Google embedding path.
What is in the repository
- Evaluator-facing reports and methodology notes live in
secure-vault/benchmark. - Executable benchmark scripts live in
secure-vault/scripts/benchmark. retrieval/contains the latency benchmark entrypoint and report builders.pipeline/contains the end-to-end accuracy benchmark entrypoint and report builders.shared/runtime.tsloads.env.localor.envfromsecure-vault/and checks MariaDB vector support before a run starts.
Which benchmark to run
| Benchmark | Command | Best for | Includes live embeddings |
|---|---|---|---|
| Retrieval latency | npm run benchmark:semantic | Showing semantic and hybrid search response time after indexing | No |
| Pipeline accuracy | npm run benchmark:semantic:pipeline | Showing indexing behavior and retrieval quality end to end | Yes |
Latest benchmark results
The latest checked-in reports were generated on 2026-04-20.
Retrieval latency report:
- Source report:
secure-vault/benchmark/semantic-search-benchmark-latest.md - Dataset:
1,000files,3,000embedding chunks,15query cases - Configuration:
5themes,200files per theme,3chunks per file,10result limit
| Benchmark | Samples | Avg | P50 | P95 | P99 | Max | Avg results |
|---|---|---|---|---|---|---|---|
| Semantic retrieval only | 15 | 1015.31 ms | 1005.73 ms | 1104.04 ms | 1104.04 ms | 1104.04 ms | 10.00 |
| Hybrid retrieval | 15 | 1013.42 ms | 1016.98 ms | 1063.83 ms | 1063.83 ms | 1063.83 ms | 10.00 |
Pipeline accuracy report:
- Source report:
secure-vault/benchmark/semantic-pipeline-accuracy-latest.md - Configuration:
4themes,3files per theme, controlled and stress suites, Google embeddings - Controlled dataset:
12indexed PDFs and12evaluated queries - Stress dataset:
36indexed PDFs and12evaluated queries
| Suite | Benchmark | Top-1 Accuracy | Top-3 Recall | MRR | Avg query time | Avg indexing time |
|---|---|---|---|---|---|---|
| Controlled | Semantic | 100.0% | 100.0% | 1.000 | 456.50 ms | 5691.43 ms |
| Controlled | Hybrid | 100.0% | 100.0% | 1.000 | 781.83 ms | 5691.43 ms |
| Stress | Semantic | 50.0% | 100.0% | 0.750 | 749.99 ms | 6800.25 ms |
| Stress | Hybrid | 50.0% | 100.0% | 0.750 | 562.53 ms | 6800.25 ms |
Use these numbers as the current evaluator-facing snapshot. Re-run the benchmark scripts before demos when hardware, MariaDB version, embedding settings, or dataset size changes.
Prerequisites
Run both commands from secure-vault/.
Both benchmarks need:
- MariaDB running and reachable through
DATABASE_HOST,DATABASE_PORT,DATABASE_USER,DATABASE_PASSWORD, andDATABASE_NAME - MariaDB vector support available for
VEC_FromText(...)andvec_distance_cosine(...) - a local env file in
secure-vault/.env.localorsecure-vault/.envif the values are not already exported in your shell
The pipeline benchmark also needs:
SEMANTIC_INDEXING_ENABLED=trueSEMANTIC_INDEXING_PROVIDER=googleGEMINI_API_KEYset
Recommended local path for the pipeline benchmark:
SEMANTIC_INDEXING_ENABLED=true
SEMANTIC_INDEXING_EXECUTION_MODE=inline
SEMANTIC_INDEXING_PROVIDER=google
GEMINI_API_KEY=<your-key>inline is the simpler local mode. If you intentionally switch to queued, the semantic config also requires Redis to be configured and available.
Retrieval benchmark
Use this benchmark for evaluator-facing latency numbers after indexing is already complete.
Command:
cd secure-vault
npm run benchmark:semanticWhat it does:
- checks that MariaDB vector functions are available
- seeds a synthetic benchmark user, files, embedding jobs, and embedding chunks
- runs semantic-only and hybrid retrieval through the real application search path
- writes markdown and JSON reports
- deletes the seeded benchmark data in a cleanup step
Useful flags:
npm run benchmark:semantic -- --themes 6 --files-per-theme 500 --chunks-per-file 3 --queries-per-theme 5Available options:
--themes--files-per-theme--chunks-per-file--queries-per-theme--warmup-runs--query-top-k--file-batch-size--chunk-batch-size--output-dir
Default output files:
secure-vault/benchmark/semantic-search-benchmark-latest.mdsecure-vault/benchmark/semantic-search-benchmark-latest.json
Read the generated report as a responsiveness benchmark, not an accuracy benchmark. It excludes live embedding latency by design.
Pipeline benchmark
Use this benchmark when you need evidence that SecureVault can index benchmark documents end to end and still retrieve the correct file.
Command:
cd secure-vault
npm run benchmark:semantic:pipelineWhat it does:
- checks MariaDB vector support
- validates that semantic indexing is enabled with the Google provider
- generates temporary benchmark PDFs
- chunks and embeds them through the real semantic pipeline
- stores vectors in MariaDB
- runs semantic-only and hybrid retrieval for benchmark queries
- writes markdown and JSON reports
Useful flags:
npm run benchmark:semantic:pipeline -- --suite stress --themes 4 --files-per-theme 3Available options:
--themes--files-per-theme--suite controlled|stress|both--output-dir
Default output files:
secure-vault/benchmark/semantic-pipeline-accuracy-latest.mdsecure-vault/benchmark/semantic-pipeline-accuracy-latest.json
The benchmark runs two suites by default:
controlledfor clean, direct phrasing and easier topical separationstressfor paraphrased queries and confusable same-theme documents
Use the pipeline report for ranking quality discussions. Pair it with the retrieval benchmark if you also want to show search speed.
Reading the reports
Use the retrieval report to answer "How fast is search after indexing?"
- focus on average latency and
P95 - compare semantic-only and hybrid retrieval at the same dataset size
Use the pipeline report to answer "How well does the system retrieve the right file?"
- focus on
Top-1 Accuracy,Top-3 Recall, andMRRtogether - treat
controlledas the cleaner correctness benchmark - treat
stressas the more realistic retrieval benchmark
Related files
- Benchmark package overview:
secure-vault/benchmark/README.md - Retrieval methodology:
secure-vault/benchmark/semantic-search-benchmark-guide.md - Pipeline methodology:
secure-vault/benchmark/semantic-pipeline-accuracy-guide.md - Retrieval CLI:
secure-vault/scripts/benchmark/retrieval/cli.ts - Pipeline CLI:
secure-vault/scripts/benchmark/pipeline/cli.ts