Appearance
Data Analysis Tools
The worker containers come pre-installed with a comprehensive set of tools for navigating and analyzing civil engineering project data.
Python Libraries
A Python 3 virtual environment (/opt/venv) is available with the following packages:
IFC Models
| Package | Purpose |
|---|---|
| IfcOpenShell | Read and query IFC models (supports IFC2X3 and IFC4 schemas) |
Example agent usage:
python
import ifcopenshell
model = ifcopenshell.open("/data/PID_Karavanke/.../model.ifc")
elements = model.by_type("IfcBuildingElementProxy")
for el in elements:
psets = ifcopenshell.util.element.get_psets(el)
klasifikacija = psets.get("KAR_Klasifikacija", {})
print(klasifikacija.get("ElementTip"), klasifikacija.get("Funkcija"))IFC Schema Versions
The Karavanke dataset contains two IFC schemas: IFC2X3 (209 files) and IFC4 (64 files). IfcOpenShell handles both transparently.
Excel Tables
| Package | Purpose |
|---|---|
| openpyxl | Read and write Excel files (.xlsx) |
| pandas | Tabular data analysis, filtering, aggregation |
Example agent usage:
python
import pandas as pd
df = pd.read_excel("/data/.../ListaKampad_Elea.xlsx", sheet_name="Blockbuch")
kpp = df[df["Tip kampade"] == "KPP"]
print(f"Found {len(kpp)} KPP kampadas")PDF Documents
| Package | Purpose |
|---|---|
| pdfplumber | Extract text and tables from PDF files |
| PyMuPDF (fitz) | Fast PDF rendering, text extraction, and OCR support |
| pytesseract | OCR engine for scanned documents |
The system includes Tesseract OCR for handling scanned PDFs that don't contain selectable text.
Example agent usage:
python
import pdfplumber
with pdfplumber.open("/data/.../technical_report.pdf") as pdf:
for page in pdf.pages:
text = page.extract_text()
tables = page.extract_tables()System Tools
The following command-line tools are available for fast data navigation:
| Tool | Purpose | Example |
|---|---|---|
ripgrep (rg) | Fast text search across files | rg "kampada A271" /data/ --type pdf |
| jq | JSON processing and filtering | jq '.elements[] | .name' output.json |
| sqlite3 | SQLite database queries | sqlite3 /artefacts/state/session_registry.db '.tables' |
| git | Version control (for agent workspace) | git log --oneline |
ripgrep for Fast File Discovery
ripgrep is particularly useful for quickly finding relevant files across the 1,672-file dataset:
bash
# Find all files mentioning a specific kampada
rg -l "A271" /data/
# Search for a term in PDF-extracted text
rg -l "stropna plosca" /data/ --type-add 'txt:*.txt'
# Count occurrences across file types
rg -c "KPP" /data/ --type-add 'xlsx:*.xlsx'Tool Availability by Container
| Tool | Controller | Worker |
|---|---|---|
| Node.js 22 | Yes | Yes |
| Python 3 + venv | No | Yes |
| IfcOpenShell | No | Yes |
| pandas, openpyxl | No | Yes |
| pdfplumber, PyMuPDF | No | Yes |
| pytesseract + Tesseract | No | Yes |
| ripgrep, jq, sqlite3 | Yes | Yes |
| git | Yes | Yes |
INFO
All AI query execution happens on workers, which have the full toolchain. The controller only serves the web UI and routes requests.