There are two ways to implement SemBench queries in your system, depending on how your system is deployed.
Code-based systems (Python packages) implement queries as Python methods.
Query-based systems (SQL engines) write queries as standalone SQL files.
Both approaches inherit from the same GenericRunner base class and produce results in a unified format.
This guide walks you through the SemBench architecture step-by-step.
Code-based systems (Python packages like LOTUS) implement queries as Python methods in a runner class.
Query-based systems (SQL engines like BigQuery) write queries as standalone SQL files with a minimal runner.
Newer scenarios also support Code* mode where Python query files live in files/ and are loaded dynamically.
Click any file in the explorer to view its contents, or use the links in each step below.
Note: The file contents shown below are simplified excerpts for illustration purposes. Please refer to the original source files for the complete implementation reference.
The CLI entry point is src/run.py. You specify which system, scenario, and queries to run.
The script dynamically loads the correct runner class for each system using get_runner_class().
Every system runner inherits from GenericRunner
(located at src/runner/generic_runner.py).
It defines the abstract interface
(get_system_name(), execute_query()), sets up paths to data, queries, and results,
and provides load_data() for reading CSVs.
It also provides two query-discovery mechanisms:
_discover_query_impl(query_id) finds a Python method named _execute_q{id} via reflection (used by Code mode),
and _discover_query_text(query_id) locates a Q{id}.sql file in the system’s query directory (used by SQL mode).
You should not need to modify this file.
Each system should implement its own intermediate base runner between GenericRunner and the per-scenario runners.
Create it at src/runner/generic_{system}_runner/generic_{system}_runner.py.
This is where you put shared logic specific to your system: engine initialization, hyper-parameter configuration,
token usage extraction, and monetary cost calculation.
For example, GenericLotusRunner
(at src/runner/generic_lotus_runner/)
initializes the LOTUS LM, configures reasoning effort per model,
calls _discover_query_impl() to dispatch to _execute_q*() methods, and extracts token stats from lotus.settings.lm.stats.
GenericBigQueryRunner
(at src/runner/generic_bigquery_runner/)
sets up the BigQuery client, calls _discover_query_text() to load .sql templates,
uses Jinja2 to substitute <<variables>>, and queries inference logs for cost tracking.
There are two approaches depending on your system’s mode:
Code mode — Create a per-scenario runner at
src/scenario/{scenario}/runner/{system}_runner/{system}_runner.py
and implement _execute_q*() methods.
Each method loads data, applies semantic operators (e.g., sem_filter, sem_join, sem_map),
and returns a DataFrame. See the movie LOTUS runner for examples.
SQL / Code* mode — Write standalone query files in files/{scenario}/query/{system}/.
SQL queries (Q{id}.sql) are templates with <<variable>> placeholders substituted at runtime.
Code* queries (Q{id}.py) are Python files with a run() function that receives the data directory.
The per-scenario runner itself
(at src/scenario/{scenario}/runner/{system}_runner/{system}_runner.py)
is minimal — just inherit from your system’s base runner and set defaults.
The table below shows how each system implements queries for each scenario. Code = Python methods in the runner; Code* = external Python query files loaded dynamically; SQL = standalone SQL files; Hybrid = combination of approaches.
| System | Movie | Animals | MMQA | Ecomm | Medical | Cars |
|---|---|---|---|---|---|---|
| LOTUS | Code | Code | Code | Code* | Code* | Code* |
| Palimpzest | Code | Code | Code | Code* | Code* | Code* |
| BigQuery | SQL | SQL | SQL | SQL | SQL | SQL |
| ThalamusDB | Hybrid | Hybrid | Hybrid | Hybrid | Hybrid | Hybrid |
The submission process for uploading benchmark results to the SemBench leaderboard is currently under discussion. Stay tuned for updates!