Files
pricing/docs/Requirements.md
2026-03-04 00:44:52 +01:00

521 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Quant Finance Hybrid Project — Requirements Elicitation + Mathematical Specification (C++ Core, Python Research Layer)
**Project codename:** `quant-engine`
**Goal:** Build a reproducible, testable, performance-aware pricing library for European options under BlackScholes, with Monte Carlo (MC) as the primary numerical method, and an optional PDE/FEM track for cross-validation and CSE-grade numerical analysis.
---
## 1. Stakeholders and Intended Use
### 1.1 Primary stakeholder
- **You (developer/researcher):** want a sustainable workflow: derivations → implementation → verification → benchmarking → reporting.
### 1.2 Secondary stakeholders (implicit)
- **Course staff / graders:** expect numerics rigor (convergence, stability, error bars), clean code structure, and evidence of understanding.
- **Recruiters / interviewers (optional):** expect engineering maturity (modular design, tests, profiling) and mathematical correctness.
### 1.3 Use cases
1. Price a European call/put via Monte Carlo with confidence intervals.
2. Compute Greeks (Delta, Vega) with stable estimators.
3. Reduce variance via antithetic + control variates; quantify variance reduction.
4. Benchmark performance (single-thread vs multi-thread; C++ vs Python wrapper).
5. (Optional) Solve BlackScholes PDE numerically (FD or FEM) and compare to MC and analytic pricing.
---
## 2. Scope and Non-Scope
### 2.1 In-scope (baseline)
- BlackScholes risk-neutral dynamics
- European options (call/put)
- MC pricing with error estimation
- Variance reduction: antithetic variates, control variates
- Greeks: Delta, Vega via pathwise derivative
- Deterministic validation against closed-form BlackScholes formula
- Clean C++ library with a small CLI runner and Python experiment scripts
### 2.2 In-scope (advanced extensions)
- Quasi-MC (Sobol) optional
- OpenMP parallel MC
- Calibration (implied vol) optional
- PDE solver (FD or FEM/Galerkin) optional for cross-validation
### 2.3 Explicit non-scope (for now)
- Exotic options requiring full path simulation (barrier/asian) **unless** you later extend
- Jump-diffusion, local vol, Heston calibration (future phases)
- Full market data infrastructure (DB ingestion, tick-level)
---
## 3. Deliverables
### 3.1 Code deliverables
- `libqengine` (C++): pricing + Greeks + variance reduction
- `qengine_cli` (C++): command-line program to run experiments
- `pyqengine` (Python layer): plotting + experiment orchestration (optional binding via pybind11)
- Test suite (unit + numerical regression tests)
- Benchmark scripts + reproducible configs
### 3.2 Documentation deliverables
- A technical report in Markdown/LaTeX including:
- model assumptions
- derivations
- estimator definitions
- error/variance analysis
- convergence plots
- performance benchmarks
- limitations + future work
---
## 4. Mathematical Model Specification
### 4.1 Risk-neutral model (BlackScholes)
Under the risk-neutral measure $\mathbb{Q}$,
$$
dS_t = r S_t\,dt + \sigma S_t\,dW_t,
$$
where:
- $S_t > 0$ is the asset price
- $r$ is the continuously-compounded risk-free rate
- $\sigma > 0$ is volatility
- $W_t$ is standard Brownian motion.
### 4.2 Terminal distribution (exact sampling)
Apply Itôs lemma to $\log S_t$:
$$
d(\log S_t) = \left(r - \tfrac12\sigma^2\right)dt + \sigma\,dW_t.
$$
Integrating from $0$ to $T$ yields:
$$
S_T = S_0 \exp\!\left(\left(r - \tfrac12\sigma^2\right)T + \sigma\sqrt{T}\,Z\right),\quad Z\sim \mathcal{N}(0,1).
$$
### 4.3 Payoffs
- European call: $h(S_T)=(S_T-K)^+ = \max(S_T-K,0)$
- European put: $h(S_T)=(K-S_T)^+ = \max(K-S_T,0)$
### 4.4 Pricing equation (risk-neutral valuation)
$$
V_0 = e^{-rT}\,\mathbb{E}^{\mathbb{Q}}[h(S_T)].
$$
---
## 5. Numerical Methods Specification (Primary Track: Monte Carlo)
### 5.1 Basic Monte Carlo estimator
Define discounted payoff random variable:
$$
X = e^{-rT}h(S_T).
$$
With i.i.d. samples $Z_i\sim\mathcal{N}(0,1)$, compute $S_T^{(i)}$ and $X_i$, then:
$$
\widehat{V}_N = \frac{1}{N}\sum_{i=1}^N X_i.
$$
**Properties**
- Unbiased: $\mathbb{E}[\widehat{V}_N]=V_0$
- Variance: $\mathrm{Var}(\widehat{V}_N)=\mathrm{Var}(X)/N$
- CLT: $\sqrt{N}(\widehat{V}_N-V_0)\Rightarrow \mathcal{N}(0,\mathrm{Var}(X))$
### 5.2 Standard error and confidence interval
Estimate variance with sample variance:
$$
\widehat{\mathrm{Var}}(X)=\frac{1}{N-1}\sum_{i=1}^N (X_i-\overline{X})^2.
$$
Standard error of the mean:
$$
\mathrm{SE}(\widehat{V}_N)=\sqrt{\widehat{\mathrm{Var}}(X)/N}.
$$
Approximate 95% confidence interval:
$$
\widehat{V}_N \pm 1.96\,\mathrm{SE}(\widehat{V}_N).
$$
### 5.3 Variance reduction requirements
#### 5.3.1 Antithetic variates
For each $Z$, also use $-Z$. Define:
$$
\widehat{V}^{\text{anti}}_N=\frac{1}{N}\sum_{i=1}^N \frac{X(Z_i)+X(-Z_i)}{2}.
$$
**Expected outcome:** lower variance for monotone payoffs (calls/puts) due to negative correlation.
#### 5.3.2 Control variates (with known expectation)
Choose a control variable $Y$ correlated with $X$ and with known mean.
A standard choice:
$$
Y=e^{-rT}S_T,\qquad \mathbb{E}[Y]=S_0.
$$
Estimator:
$$
\widehat{V}^{\text{cv}}=\frac{1}{N}\sum_{i=1}^N \left(X_i - \beta (Y_i-\mathbb{E}[Y])\right).
$$
Optimal coefficient:
$$
\beta^*=\frac{\mathrm{Cov}(X,Y)}{\mathrm{Var}(Y)}.
$$
In practice estimate $\beta^*$ from samples.
**Acceptance criterion:** demonstrate variance reduction factor empirically, e.g.
$$
\frac{\widehat{\mathrm{Var}}(\widehat{V}_N)}{\widehat{\mathrm{Var}}(\widehat{V}^{\text{cv}}_N)} > 1.
$$
### 5.4 Greeks (pathwise derivative)
Let $V(S_0,\sigma,r,T)=e^{-rT}\mathbb{E}[h(S_T)]$.
#### 5.4.1 Delta
Since $S_T = S_0 \exp(\cdots)$, we have:
$$
\frac{\partial S_T}{\partial S_0}=\frac{S_T}{S_0}.
$$
For a call, $h'(S_T)=\mathbf{1}_{\{S_T>K\}}$ almost everywhere.
Thus:
$$
\Delta = \frac{\partial V}{\partial S_0}
= e^{-rT}\mathbb{E}\left[\mathbf{1}_{\{S_T>K\}}\frac{S_T}{S_0}\right].
$$
MC estimator:
$$
\widehat{\Delta}_N=\frac{e^{-rT}}{N}\sum_{i=1}^N \mathbf{1}_{\{S_T^{(i)}>K\}}\frac{S_T^{(i)}}{S_0}.
$$
#### 5.4.2 Vega
Differentiate:
$$
\log S_T = \log S_0 + (r-\tfrac12\sigma^2)T + \sigma\sqrt{T}Z.
$$
So:
$$
\frac{\partial \log S_T}{\partial \sigma} = -\sigma T + \sqrt{T}Z,
\qquad
\frac{\partial S_T}{\partial \sigma}=S_T(-\sigma T + \sqrt{T}Z).
$$
Hence for a call:
$$
\text{Vega} = e^{-rT}\mathbb{E}\left[\mathbf{1}_{\{S_T>K\}}\,S_T(-\sigma T+\sqrt{T}Z)\right].
$$
**Greeks acceptance criterion:** compare MC Greeks against analytic Greeks (BlackScholes closed form) within statistical error.
---
## 6. Analytical Reference (BlackScholes Closed-Form)
To validate MC, implement closed form pricing.
Define:
$$
d_1=\frac{\ln(S_0/K)+(r+\tfrac12\sigma^2)T}{\sigma\sqrt{T}},\quad
d_2=d_1-\sigma\sqrt{T}.
$$
Call:
$$
C = S_0\Phi(d_1) - K e^{-rT}\Phi(d_2).
$$
Put:
$$
P = K e^{-rT}\Phi(-d_2) - S_0\Phi(-d_1).
$$
Where $\Phi$ is standard normal CDF.
**Acceptance criterion:** MC estimates converge to analytic price; error decreases ~ $O(N^{-1/2})$.
---
## 7. Optional PDE Track (for CSE/Numerics Alignment)
### 7.1 BlackScholes PDE
Option price $V(t,S)$ satisfies:
$$
\frac{\partial V}{\partial t}
+\frac12\sigma^2 S^2\frac{\partial^2 V}{\partial S^2}
+rS\frac{\partial V}{\partial S}
-rV=0,\quad V(T,S)=h(S).
$$
### 7.2 Weak form (Galerkin idea)
Choose test functions $w(S)$ and integrate over domain $S\in[S_{\min},S_{\max}]$ (truncation required).
A typical weak form after multiplying by $w$ and integrating (with integration by parts on the second derivative term) yields a bilinear form involving:
- mass term $\int V w \, dS$
- diffusion term $\int \sigma^2 S^2 V_S w_S \, dS$
- convection term $\int r S V_S w \, dS$
- reaction term $\int r V w \, dS$
Time discretization (e.g. backward Euler / CrankNicolson) gives linear systems per timestep.
**Why include this:** deterministic solver for 1D that lets you cross-check MC and show FEM competence.
---
## 8. Functional Requirements (FR)
### FR-1 Pricing API
Provide an API to compute:
- European call/put price under BlackScholes
- methods: `MC`, `MC+anti`, `MC+cv`, `MC+anti+cv`
Inputs:
- $S_0,K,r,\sigma,T$
- number of samples $N$
- RNG seed
Outputs:
- price estimate
- standard error
- confidence interval
- runtime metrics
### FR-2 Greeks API
Compute:
- Delta and Vega (at minimum)
Outputs:
- estimate + standard error
### FR-3 Analytic reference
Compute closed-form BlackScholes price and Greeks (Delta, Vega) for validation.
### FR-4 Experiment runner
CLI to run:
- convergence experiment over $N$ grid (e.g. 1e3…1e7)
- variance reduction comparison
- parameter sweeps over $K$, $\sigma$, $T$
### FR-5 Reproducibility
- Deterministic results under fixed seed
- Logs include git commit hash (optional), compiler flags, CPU info (optional), params
---
## 9. Non-Functional Requirements (NFR)
### NFR-1 Performance
- Baseline MC engine should handle $N \ge 10^7$ samples in reasonable time on VPS hardware (exact thresholds depend on CPU).
- Vectorization-friendly and allocation-free inner loop.
### NFR-2 Numerical correctness
- Pass regression tests comparing to analytic price within tolerance tied to SE:
- e.g. $|\widehat{V}_N - V_{\text{BS}}| \le 3\,\mathrm{SE}$ for large $N$.
### NFR-3 Maintainability
- Clear module boundaries: model vs payoff vs engine vs stats.
- Minimal header coupling.
- Consistent naming, docs, and unit tests.
### NFR-4 Portability
- Build with CMake on Linux; optional macOS.
- No reliance on proprietary libraries.
---
## 10. System Architecture Requirements
### 10.1 C++ core modules
- `models/`
- `black_scholes.hpp/.cpp`
- `payoffs/`
- `payoff.hpp`
- `call_payoff.hpp/.cpp`
- `put_payoff.hpp/.cpp`
- `mc/`
- `mc_engine.hpp` (templated or type-erased interface)
- `variance_reduction.hpp` (anti/cv)
- `stats.hpp` (mean, variance, CI)
- `analytics/`
- `bs_closed_form.hpp/.cpp` (price + Greeks)
- `cli/`
- `main.cpp` experiment runner
### 10.2 Python layer (optional)
- `experiments/`
- `convergence.py`
- `variance_reduction.py`
- `plots.py`
Binding options:
- pybind11 wrapper around core functions OR
- run CLI and parse CSV output (simpler and still professional).
---
## 11. Step-by-Step Implementation Plan (What to implement, in order)
### Step 1 — Deterministic foundation (no variance reduction yet)
**Implement**
- BlackScholes terminal sampler: `ST(Z)`
- Payoffs: call/put
- MC estimator: price
- Stats: mean, sample variance, SE, CI
**Verification**
- sanity: $S_T$ sample mean close to $S_0 e^{rT}$
- price monotonicity:
- call increases with $S_0$, decreases with $K$
- put decreases with $S_0$, increases with $K$
### Step 2 — Closed-form BlackScholes
**Implement**
- $\Phi$ (normal CDF) using `std::erfc`/`std::erf`
- call/put analytic price
- analytic Delta/Vega (optional now, required later)
**Verification**
- compare MC price to analytic for increasing $N$
- show empirical $1/\sqrt{N}$ error trend on log-log plot (Python)
### Step 3 — Antithetic variates
**Implement**
- paired sampling: evaluate $Z$ and $-Z$ per iteration
- update stats on paired average payoff
**Verification**
- variance reduction factor > 1 for calls/puts
### Step 4 — Control variates
**Implement**
- compute $Y_i = e^{-rT}S_T^{(i)}$, with known mean $S_0$
- estimate $\beta^*$ via sample cov/var
- output both raw and CV prices + variances
**Verification**
- variance reduction factor consistent across parameters
- show dependence on correlation $\rho_{XY}$
### Step 5 — Greeks (pathwise)
**Implement**
- Delta estimator for call/put
- Vega estimator for call/put
- SE for Greeks (same sample variance approach)
**Verification**
- compare to analytic Greeks (BlackScholes) within SE bands
### Step 6 — Performance engineering
**Implement**
- avoid per-iteration allocations
- optional OpenMP parallelization
- benchmark scaling with threads
**Verification**
- measure throughput paths/sec
- show speedup and parallel efficiency
### Step 7 (Optional) — PDE Solver (FD or FEM)
**Implement**
- domain truncation $S\in[S_{\min},S_{\max}]$
- boundary conditions (e.g. call: $V(t,0)=0$, $V(t,S_{\max})\approx S_{\max}-K e^{-r(T-t)}$)
- time stepping and spatial discretization (Galerkin/FEM or FD)
**Verification**
- PDE solution matches analytic at $t=0$
- compare PDE vs MC vs analytic
---
## 12. Testing Strategy
### 12.1 Unit tests (fast)
- `terminal_price(Z)` correctness for fixed Z
- payoff correctness at boundary cases
- normal CDF sanity checks
### 12.2 Numerical regression tests (slower)
- Fix seed and moderate N (e.g. 1e6) and verify price within tolerance window
- Verify CI covers analytic price most of the time (statistical test across runs)
### 12.3 Property tests (optional)
- invariants like put-call parity:
$$
C - P = S_0 - K e^{-rT}.
$$
Use analytic or MC estimates to test parity (MC will have noise).
---
## 13. Risk Register (What can go wrong)
- **RNG issues:** poor seeding or non-determinism in parallel runs.
- **Estimator bugs:** incorrect discounting, wrong variance formula, wrong control mean.
- **Numerical instability:** CDF implementation errors for extreme $d_1,d_2$.
- **Performance pitfalls:** unnecessary allocations, virtual dispatch in hot loop.
- **PDE truncation error:** wrong boundary conditions dominate solution.
Mitigation: keep a known parameter set where analytic results are stable; build regression tests early.
---
## 14. Acceptance Criteria (Definition of Done)
Baseline completion:
1. MC pricing returns price + SE + CI.
2. Analytic BlackScholes price implemented.
3. Convergence plot shows MC error ~ $N^{-1/2}$.
4. Antithetic and control variate implemented with demonstrated variance reduction.
5. Delta and Vega implemented and validated vs analytic Greeks.
6. CLI runs experiments and outputs CSV/JSON for Python plotting.
Advanced completion (optional):
7. OpenMP scaling plot + efficiency.
8. PDE solver cross-checks analytic solution.
9. Python binding or orchestration layer produces publication-quality plots.
---
## 15. Implementation Notes (Engineering choices)
### 15.1 RNG
- Use `std::mt19937_64` + `std::normal_distribution<double>`
- For parallel runs: per-thread RNG with deterministic seed schedule.
### 15.2 Hot loop design
- Prefer templates or function objects to avoid virtual calls inside the MC loop.
- Keep payoff polymorphism at the boundary (setup), not inside the innermost loop.
### 15.3 Data output
- Output CSV with columns:
- method, N, price, se, ci_low, ci_high, runtime_ms, seed, params...
This makes Python plotting trivial.
---
## 16. Suggested Reading Map (aligned with requirements)
- **BlackScholes theory / derivations:** Shreve, *Stochastic Calculus for Finance II*
- **Monte Carlo + variance reduction + Greeks:** Glasserman, *Monte Carlo Methods in Financial Engineering*
- **C++ architecture for pricing libraries:** Joshi, *C++ Design Patterns and Derivatives Pricing*
- **PDE + FEM track:** any numerical PDE/FEM text; treat BS PDE as a parabolic convectiondiffusionreaction equation.
---
## Appendix A — Analytic Greeks (for validation)
### Call Delta
$$
\Delta_{\text{call}}=\Phi(d_1).
$$
### Put Delta
$$
\Delta_{\text{put}}=\Phi(d_1)-1.
$$
### Vega (call = put)
$$
\text{Vega}= S_0 \phi(d_1)\sqrt{T},
$$
where $\phi$ is standard normal PDF:
$$
\phi(x)=\frac{1}{\sqrt{2\pi}}e^{-x^2/2}.
$$
---
## Appendix B — Put-Call Parity
$$
C - P = S_0 - K e^{-rT}.
$$
Use this as an additional correctness check.
---