Add publication-ready documentation and reproducible experiment package.

Rewrite the README with secure setup instructions, add dedicated setup/security docs, and include the standalone local-volatility instability experiment materials for reproducible analysis. Made-with: Cursor
2026-04-02 16:30:56 +02:00
parent b3663258e4
commit 3dacc0a418
12 changed files with 613 additions and 3 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,24 @@
+# Built Python extension dropped next to qengine/__init__.py for local dev
+/qengine/*.so
+/qengine/*.dylib
+/qengine/__pycache__/
+
+/skbuild-build/
+
+/build/
+/.idea/
+**/__pycache__/
+/docs/html/
+/docs/latex/
+
+# Local reference tree (optional clone)
+/CPP-design-pattern-derivatives-pricing/
+
+# Local environment and secrets
+.env
+.env.*
+!.env.example
+
+# Local tooling caches
+/.pycache/
+/.mplconfig/
--- a/README.md
+++ b/README.md
@@ -1,5 +1,79 @@
-# pricing
+# option_pricing

-Monte Carlo pricing of European options under Black–Scholes
+C++/Python quantitative finance engine for option pricing, implied-volatility analysis, and market-data ingestion.

-### Project structure
+## What is included
+
+- `cpp/`: core C++ pricing library (Monte Carlo + Black-Scholes closed form), DB ingestion hooks, and pybind bindings.
+- `qengine/`: Python package exposing the native extension (`import qengine`).
+- `src/ImpliedVolatility/`: SVI calibration and implied-volatility tooling.
+- `src/data/`: data ingestion, SQL schema, and analytics helpers.
+- `tests/`: C++ unit tests (GoogleTest).
+- `scripts/`: operational scripts, including PostgreSQL setup.
+- `docs/`: Doxygen configuration and generated API docs (ignored in git for publication).
+
+## Quickstart
+
+### 1) Clone and create a Python environment
+
+```bash
+python3 -m venv .venv
+source .venv/bin/activate
+pip install --upgrade pip
+pip install -e .
+pip install pandas yfinance sqlalchemy psycopg2-binary matplotlib scipy
+```
+
+### 2) Configure environment variables
+
+```bash
+cp .env.example .env
+```
+
+Then edit `.env` with your local database credentials.
+
+### 3) Create database and schema
+
+Use the idempotent setup script:
+
+```bash
+source .env
+python scripts/setup_postgres.py
+```
+
+This script creates/updates:
+- database role (`DB_USER`)
+- database (`DB_NAME`)
+- tables/indexes from `src/data/sql/schema.sql`
+
+### 4) Build C++ extension and run tests
+
+```bash
+cmake -S . -B build
+cmake --build build -j
+ctest --test-dir build --output-on-failure
+```
+
+### 5) Run Yahoo options ingestion
+
+```bash
+source .env
+python src/data/ingestion/ingest_yahoo_options.py
+```
+
+`PIPELINE_SYMBOLS` in `.env` controls which symbols are ingested (comma-separated, e.g. `SPY,AAPL,QQQ`).
+
+## Security and publication notes
+
+- No credentials are stored in source code.
+- `.env` files are git-ignored; only `.env.example` is committed.
+- Before publishing, rotate any credentials that were ever committed in the past.
+- Prefer least-privilege DB users for runtime ingestion jobs.
+
+## Generating C++ API docs
+
+```bash
+cmake --build build --target docs
+```
+
+Generated output goes to `docs/html/` and is ignored in version control.
--- a/docs/SECURITY.md
+++ b/docs/SECURITY.md
@@ -0,0 +1,27 @@
+# Security Checklist
+
+## Secrets handling
+
+- Never commit `.env` or any file containing credentials.
+- Use `.env.example` for non-sensitive defaults only.
+- Set DB credentials through environment variables.
+- Rotate credentials if they have ever appeared in git history.
+
+## Database hardening
+
+- Use a dedicated runtime user with least required privileges.
+- Keep administrative users separate from ingestion users.
+- Restrict DB network access to trusted hosts/VPC/private network.
+- Enable SSL/TLS for non-local database connections.
+
+## Publication readiness
+
+Before making the repository public:
+
+1. Confirm `git status` has no secret files staged.
+2. Search for potential secret patterns:
+   - passwords
+   - API keys
+   - tokens
+3. Verify `.gitignore` includes local secret files (`.env*`).
+4. Regenerate credentials used during development.
--- a/docs/SETUP.md
+++ b/docs/SETUP.md
@@ -0,0 +1,60 @@
+# Setup Guide
+
+This guide describes a clean local setup for development and reproducible runs.
+
+## Prerequisites
+
+- Python 3.10+
+- CMake 3.16+
+- A C++20 compiler
+- PostgreSQL 14+ (or Docker)
+- On macOS, Homebrew packages for C++ DB support:
+  - `libpq`
+  - `libpqxx`
+  - `eigen`
+  - `pybind11`
+
+## Python dependencies
+
+```bash
+python3 -m venv .venv
+source .venv/bin/activate
+pip install --upgrade pip
+pip install -e .
+pip install pandas yfinance sqlalchemy psycopg2-binary matplotlib scipy
+```
+
+## Environment configuration
+
+```bash
+cp .env.example .env
+```
+
+Edit `.env` and set:
+
+- `DB_HOST`, `DB_PORT`, `DB_NAME`, `DB_USER`, `DB_PASSWORD`
+- `PIPELINE_SYMBOLS`
+- admin credentials used only by setup script (`POSTGRES_ADMIN_*`)
+
+## Database bootstrap
+
+```bash
+source .env
+python scripts/setup_postgres.py
+```
+
+The script is idempotent and safe to rerun.
+
+## Build and test C++
+
+```bash
+cmake -S . -B build
+cmake --build build -j
+ctest --test-dir build --output-on-failure
+```
+
+## Generate Doxygen docs
+
+```bash
+cmake --build build --target docs
+```
--- a/standalone_numerical_experiments/local_volatility_instability/INDEPENDENT_STANDALONE.txt
+++ b/standalone_numerical_experiments/local_volatility_instability/INDEPENDENT_STANDALONE.txt
@@ -0,0 +1,6 @@
+This folder is intentionally self-contained.
+
+- No imports from the parent option_pricing package (no qengine, src/, cpp bindings).
+- Third-party dependencies: numpy, matplotlib (see requirements.txt).
+- Run: python run_experiment.py [--out lv_rmse.png]
+- Safe to copy elsewhere or run in isolation.
--- a/standalone_numerical_experiments/local_volatility_instability/figures/lv_relerr.png
+++ b/standalone_numerical_experiments/local_volatility_instability/figures/lv_relerr.png
--- a/standalone_numerical_experiments/local_volatility_instability/figures/lv_rmse.png
+++ b/standalone_numerical_experiments/local_volatility_instability/figures/lv_rmse.png
--- a/standalone_numerical_experiments/local_volatility_instability/figures/lv_sigma2.png
+++ b/standalone_numerical_experiments/local_volatility_instability/figures/lv_sigma2.png
--- a/standalone_numerical_experiments/local_volatility_instability/gatheral_local_vol.py
+++ b/standalone_numerical_experiments/local_volatility_instability/gatheral_local_vol.py
@@ -0,0 +1,108 @@
+"""
+Gatheral local variance in total-variance / log-moneyness form (practitioner's guide).
+
+sigma^2 = (d_T w) / ( 1 - (y/w) d_y w
+            + (1/4)(-1/4 - 1/w + y^2/w^2) (d_y w)^2
+            + (1/2) d_yy w )
+
+where w = omega is total implied variance, y is log-moneyness (convention as in the note).
+"""
+
+from __future__ import annotations
+
+import numpy as np
+
+
+def local_variance_from_derivatives(
+    y: np.ndarray,
+    w: np.ndarray,
+    dy_w: np.ndarray,
+    dyy_w: np.ndarray,
+    dT_w: np.ndarray,
+    *,
+    eps: float = 1e-14,
+) -> np.ndarray:
+    """Vectorized Gatheral formula. Invalid / near-singular points become nan."""
+    y = np.asarray(y, dtype=float)
+    w = np.asarray(w, dtype=float)
+    dy_w = np.asarray(dy_w, dtype=float)
+    dyy_w = np.asarray(dyy_w, dtype=float)
+    dT_w = np.asarray(dT_w, dtype=float)
+
+    out = np.full_like(y, np.nan, dtype=float)
+    ok = np.isfinite(w) & (np.abs(w) > eps) & np.isfinite(dy_w) & np.isfinite(dyy_w) & np.isfinite(dT_w)
+
+    denom = np.empty_like(w)
+    denom[ok] = (
+        1.0
+        - (y[ok] / w[ok]) * dy_w[ok]
+        + 0.25 * (-0.25 - 1.0 / w[ok] + (y[ok] ** 2) / (w[ok] ** 2)) * (dy_w[ok] ** 2)
+        + 0.5 * dyy_w[ok]
+    )
+
+    ok2 = ok & (np.abs(denom) > eps)
+    out[ok2] = dT_w[ok2] / denom[ok2]
+    return out
+
+
+def quadratic_total_variance(
+    y: np.ndarray,
+    alpha: float,
+    beta: float,
+    gamma: float,
+    T: float,
+) -> tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray, np.ndarray]:
+    """
+    w(y,T) = T * (alpha + beta*y + gamma*y^2), with derivatives as in the note:
+
+      d_T w = alpha + beta*y + gamma*y^2
+      d_y w = T * (beta + 2*gamma*y)
+      d_yy w = 2*gamma*T
+    """
+    y = np.asarray(y, dtype=float)
+    f = alpha + beta * y + gamma * y ** 2
+    w = T * f
+    dT_w = f
+    dy_w = T * (beta + 2.0 * gamma * y)
+    dyy_w = np.full_like(y, 2.0 * gamma * T)
+    return w, dT_w, dy_w, dyy_w
+
+
+def analytic_local_variance_quadratic(
+    y: np.ndarray,
+    alpha: float,
+    beta: float,
+    gamma: float,
+    T: float,
+) -> np.ndarray:
+    """Closed form from the note (equivalent to plugging derivatives into Gatheral)."""
+    y = np.asarray(y, dtype=float)
+    w, dT_w, dy_w, dyy_w = quadratic_total_variance(y, alpha, beta, gamma, T)
+    return local_variance_from_derivatives(y, w, dy_w, dyy_w, dT_w)
+
+
+def central_first_derivative_uniform(w: np.ndarray, h: float) -> np.ndarray:
+    """Interior (w[i+1]-w[i-1])/(2h); endpoints nan."""
+    w = np.asarray(w, dtype=float)
+    out = np.full_like(w, np.nan)
+    out[1:-1] = (w[2:] - w[:-2]) / (2.0 * h)
+    return out
+
+
+def second_derivative_uniform(w: np.ndarray, h: float) -> np.ndarray:
+    """Interior second difference / h^2; endpoints nan."""
+    w = np.asarray(w, dtype=float)
+    out = np.full_like(w, np.nan)
+    out[1:-1] = (w[2:] - 2.0 * w[1:-1] + w[:-2]) / (h ** 2)
+    return out
+
+
+def add_multiplicative_noise(
+    w: np.ndarray,
+    sigma_noise: float,
+    rng: np.random.Generator,
+) -> np.ndarray:
+    """tilde w(y_i) = w(y_i) * (1 + eps), eps ~ N(0, sigma_noise^2)."""
+    w = np.asarray(w, dtype=float)
+    eps = rng.normal(0.0, sigma_noise, size=w.shape)
+    return w * (1.0 + eps)
--- a/standalone_numerical_experiments/local_volatility_instability/lv_rmse.png
+++ b/standalone_numerical_experiments/local_volatility_instability/lv_rmse.png
--- a/standalone_numerical_experiments/local_volatility_instability/requirements.txt
+++ b/standalone_numerical_experiments/local_volatility_instability/requirements.txt
@@ -0,0 +1,2 @@
+numpy>=1.20
+matplotlib>=3.5
--- a/standalone_numerical_experiments/local_volatility_instability/run_experiment.py
+++ b/standalone_numerical_experiments/local_volatility_instability/run_experiment.py
@@ -0,0 +1,309 @@
+#!/usr/bin/env python3
+"""
+Local-volatility instability experiment (Gatheral total variance in log-moneyness).
+
+We compare the analytic local variance σ²(y) from a quadratic total variance
+w(y,T) = T(α + βy + γy²) to σ² reconstructed from a noisy discrete surface
+w̃(y_i) = w(y_i)(1 + ε_i) using finite differences in y, for several levels of
+multiplicative noise σ_noise. This script only produces the figure: RMSE of the
+FD reconstruction vs σ_noise (log–log), with a y = σ reference line of slope 1.
+
+Dependencies: numpy, matplotlib only (see INDEPENDENT_STANDALONE.txt).
+"""
+
+from __future__ import annotations
+
+import argparse
+import os
+import sys
+from typing import Literal
+
+# Prevent accidental imports from the parent repository
+_REPO_ROOT = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", ".."))
+if _REPO_ROOT in sys.path:
+    sys.path.remove(_REPO_ROOT)
+
+import matplotlib as mpl
+import matplotlib.pyplot as plt
+import numpy as np
+
+from gatheral_local_vol import (
+    add_multiplicative_noise,
+    analytic_local_variance_quadratic,
+    central_first_derivative_uniform,
+    local_variance_from_derivatives,
+    quadratic_total_variance,
+    second_derivative_uniform,
+)
+
+# ---------------------------------------------------------------------------
+# Defaults (quadratic total variance, positive w on y ∈ [-0.5, 0.5])
+# ---------------------------------------------------------------------------
+
+ALPHA = 0.04
+BETA = 0.0
+GAMMA = 0.1
+T_MATURITY = 1.0
+Y_MIN = -0.5
+Y_MAX = 0.5
+N_GRID = 201
+
+
+def ensure_parent_dir(path: str) -> None:
+    parent = os.path.dirname(os.path.abspath(path))
+    if parent:
+        os.makedirs(parent, exist_ok=True)
+
+
+def log_uniform_sigma_grid(n_points: int, sigma_min: float, sigma_max: float) -> np.ndarray:
+    """
+    Return `n_points` values of σ_noise with log₁₀(σ) equally spaced.
+
+    This is the correct sampling for a log–log RMSE plot; it is not linspace(σ_min, σ_max).
+    """
+    n_points = max(4, n_points)
+    if sigma_min <= 0 or sigma_max <= 0 or sigma_max < sigma_min:
+        raise ValueError("Require 0 < sigma_min <= sigma_max.")
+    return np.logspace(np.log10(sigma_min), np.log10(sigma_max), n_points)
+
+
+def relative_pointwise_error(
+    sigma2_analytic: np.ndarray, sigma2_fd: np.ndarray, eps: float = 1e-12
+) -> np.ndarray:
+    return (sigma2_fd - sigma2_analytic) / np.maximum(np.abs(sigma2_analytic), eps)
+
+
+def rmse_absolute(
+    sigma2_analytic: np.ndarray,
+    sigma2_fd: np.ndarray,
+    interior: slice,
+) -> float:
+    """RMSE of (σ²_FD − σ²_analytic) on interior indices."""
+    sa = np.asarray(sigma2_analytic, dtype=float)[interior]
+    sf = np.asarray(sigma2_fd, dtype=float)[interior]
+    m = np.isfinite(sa) & np.isfinite(sf)
+    if not np.any(m):
+        return float("nan")
+    d = sf[m] - sa[m]
+    return float(np.sqrt(np.mean(d * d)))
+
+
+def rmse_relative(
+    sigma2_analytic: np.ndarray,
+    sigma2_fd: np.ndarray,
+    interior: slice,
+    eps: float = 1e-12,
+) -> float:
+    """RMSE over grid points of relative error (σ²_FD − σ²_analytic) / |σ²_analytic|."""
+    re = relative_pointwise_error(sigma2_analytic, sigma2_fd, eps=eps)[interior]
+    m = np.isfinite(re)
+    if not np.any(m):
+        return float("nan")
+    return float(np.sqrt(np.mean(re[m] ** 2)))
+
+
+def local_variance_one_draw(
+    y: np.ndarray,
+    h: float,
+    alpha: float,
+    beta: float,
+    gamma: float,
+    T: float,
+    sigma_noise: float,
+    rng: np.random.Generator,
+    dT_mode: Literal["exact", "noisy_ratio"],
+) -> tuple[np.ndarray, np.ndarray]:
+    """One noisy surface and FD local variance; returns (σ²_analytic, σ²_FD)."""
+    w_true, dT_w_true, _, _ = quadratic_total_variance(y, alpha, beta, gamma, T)
+    sigma2_a = analytic_local_variance_quadratic(y, alpha, beta, gamma, T)
+
+    w_tilde = add_multiplicative_noise(w_true, sigma_noise, rng)
+    dy = central_first_derivative_uniform(w_tilde, h)
+    dyy = second_derivative_uniform(w_tilde, h)
+
+    if dT_mode == "exact":
+        dT = dT_w_true
+    elif dT_mode == "noisy_ratio":
+        dT = w_tilde / T
+    else:
+        raise ValueError(dT_mode)
+
+    sigma2_fd = local_variance_from_derivatives(y, w_tilde, dy, dyy, dT)
+    return sigma2_a, sigma2_fd
+
+
+def rmse_curves_averaged(
+    y: np.ndarray,
+    h: float,
+    alpha: float,
+    beta: float,
+    gamma: float,
+    T: float,
+    sigma_grid: np.ndarray,
+    rng: np.random.Generator,
+    dT_mode: Literal["exact", "noisy_ratio"],
+    interior: slice,
+    trials_per_sigma: int,
+) -> tuple[np.ndarray, np.ndarray]:
+    """
+    For each σ in `sigma_grid`, average RMSE (relative and absolute) over
+    `trials_per_sigma` independent noise draws.
+    """
+    rel: list[float] = []
+    abs_: list[float] = []
+    trials_per_sigma = max(1, trials_per_sigma)
+
+    for sig in sigma_grid:
+        tr: list[float] = []
+        ta: list[float] = []
+        for _ in range(trials_per_sigma):
+            sa, sf = local_variance_one_draw(
+                y, h, alpha, beta, gamma, T, float(sig), rng, dT_mode
+            )
+            tr.append(rmse_relative(sa, sf, interior))
+            ta.append(rmse_absolute(sa, sf, interior))
+        rel.append(float(np.nanmean(tr)))
+        abs_.append(float(np.nanmean(ta)))
+
+    return np.asarray(rel, dtype=float), np.asarray(abs_, dtype=float)
+
+
+def plot_rmse_vs_noise(
+    sigma_grid: np.ndarray,
+    rmse_rel: np.ndarray,
+    rmse_abs: np.ndarray,
+    *,
+    h: float,
+    T: float,
+    dT_mode: str,
+    trials_per_sigma: int,
+) -> mpl.figure.Figure:
+    """
+    Log–log plot: RMSE (relative and absolute in σ²) vs σ_noise, reference y = σ.
+    """
+    fig, ax = plt.subplots(figsize=(5.8, 3.8), constrained_layout=True)
+
+    x = np.asarray(sigma_grid, dtype=float)
+    pos = x > 0
+    n = len(x)
+    ms = 3.5 if n > 50 else 4.5
+
+    ax.loglog(
+        x[pos],
+        rmse_rel[pos],
+        "o-",
+        ms=ms,
+        lw=1.25,
+        label=r"RMSE of relative error $(\sigma^2_{\mathrm{FD}}-\sigma^2_{\mathrm{nat}})/|\sigma^2_{\mathrm{nat}}|$",
+        zorder=3,
+    )
+    ax.loglog(
+        x[pos],
+        rmse_abs[pos],
+        "s--",
+        ms=ms - 1,
+        lw=1.0,
+        alpha=0.9,
+        label=r"RMSE of $\sigma^2$ error $|\sigma^2_{\mathrm{FD}}-\sigma^2_{\mathrm{nat}}|$",
+        zorder=2,
+    )
+
+    s_lo, s_hi = float(x[pos].min()), float(x[pos].max())
+    ax.loglog([s_lo, s_hi], [s_lo, s_hi], ":", color="0.4", lw=2.0, zorder=1, label=r"reference slope 1: $y=\sigma_{\mathrm{noise}}$")
+
+    ax.set_xlabel(r"$\sigma_{\mathrm{noise}}$ (multiplicative noise on $\tilde{w}$)")
+    ax.set_ylabel("RMSE (interior $y$)")
+    subtitle = f"$T={T}$, $h={h:.4f}$, $\\partial_T w$: {dT_mode}"
+    if trials_per_sigma > 1:
+        subtitle += f", mean over {trials_per_sigma} draws per $\\sigma$"
+    ax.set_title("FD local variance: RMSE vs noise\n" + subtitle, fontsize=10)
+    ax.grid(True, which="both", alpha=0.35)
+    ax.legend(loc="best", fontsize=8, framealpha=0.95)
+
+    return fig
+
+
+def configure_matplotlib_style() -> None:
+    """Conservative defaults suitable for print."""
+    mpl.rcParams.update(
+        {
+            "figure.dpi": 120,
+            "savefig.dpi": 300,
+            "font.size": 10,
+            "axes.labelsize": 10,
+            "axes.titlesize": 10,
+            "legend.fontsize": 8,
+            "axes.grid": True,
+        }
+    )
+
+
+def main() -> None:
+    configure_matplotlib_style()
+
+    parser = argparse.ArgumentParser(
+        description="RMSE of finite-difference local variance vs multiplicative noise (single figure).",
+        formatter_class=argparse.ArgumentDefaultsHelpFormatter,
+    )
+    parser.add_argument("--seed", type=int, default=42, help="RNG seed.")
+    parser.add_argument(
+        "--out",
+        type=str,
+        default="lv_rmse.png",
+        help="Output image path.",
+    )
+    parser.add_argument(
+        "--dT-mode",
+        choices=("exact", "noisy_ratio"),
+        default="exact",
+        help="Treatment of ∂_T w when w is replaced by noisy w̃ on the grid.",
+    )
+    parser.add_argument("--rmse-points", type=int, default=35, help="Number of σ_noise values (log-uniform).")
+    parser.add_argument("--rmse-sigma-min", type=float, default=1e-5, help="Smallest σ_noise.")
+    parser.add_argument("--rmse-sigma-max", type=float, default=5e-4, help="Largest σ_noise.")
+    parser.add_argument(
+        "--rmse-trials",
+        type=int,
+        default=50,
+        help="Independent noisy surfaces per σ_noise; RMSE is averaged.",
+    )
+    args = parser.parse_args()
+
+    rng = np.random.default_rng(args.seed)
+    y = np.linspace(Y_MIN, Y_MAX, N_GRID)
+    h = float(y[1] - y[0])
+    interior = slice(1, -1)
+
+    sigma_grid = log_uniform_sigma_grid(args.rmse_points, args.rmse_sigma_min, args.rmse_sigma_max)
+    rmse_rel, rmse_abs = rmse_curves_averaged(
+        y,
+        h,
+        ALPHA,
+        BETA,
+        GAMMA,
+        T_MATURITY,
+        sigma_grid,
+        rng,
+        args.dT_mode,
+        interior,
+        args.rmse_trials,
+    )
+
+    fig = plot_rmse_vs_noise(
+        sigma_grid,
+        rmse_rel,
+        rmse_abs,
+        h=h,
+        T=T_MATURITY,
+        dT_mode=args.dT_mode,
+        trials_per_sigma=args.rmse_trials,
+    )
+
+    ensure_parent_dir(args.out)
+    fig.savefig(args.out, bbox_inches="tight")
+    print(f"Wrote {args.out}")
+    plt.close(fig)
+
+
+if __name__ == "__main__":
+    main()