This repository implements a production-style quantitative valuation pipeline for equity options, combining high-performance pricing models with a full data and calibration workflow.

The system goes beyond a standalone pricer: it integrates market data ingestion, structured storage, numerical pricing, and volatility surface calibration into a single reproducible framework.

The goal of this project

The goal of this project is to serve as a modular foundation for quantitative modeling and experimentation in option pricing and financial time series.

Rather than implementing a single model, the system is designed to support:

benchmarking different pricing approaches (analytical, simulation-based, and data-driven),
comparing numerical methods under realistic market data conditions,
and extending toward more advanced workflows such as statistical learning and model calibration.

A key objective is to create an environment where new ideas from research can be implemented, tested, and evaluated within a consistent pipeline, rather than in isolated scripts or notebooks.

This includes:

integrating alternative pricing methodologies into a shared framework,
analyzing model behavior across time and market regimes,
and building reproducible pipelines for both numerical and data-driven approaches.

Ultimately, the project aims to bridge:

theoretical models (e.g. stochastic processes, volatility parameterizations),
numerical methods (simulation, calibration),
and data-driven techniques (time-series analysis, machine learning),

within a single, extensible system. Moving closer to a production-grade pipeline.

What the system does

The system supports the following workflow:

Ingest listed option market data (Yahoo Finance)
Normalize and store it in a relational database (PostgreSQL)
Compute implied volatilities from observed prices
Calibrate parametric volatility surfaces (SVI)
Run pricing models (Black-Scholes, Monte Carlo)
Expose fast pricing routines via Python for analysis and research

This project aims to unify these components into a coherent system, with clear interfaces between:

Data layer (ingestion, storage, schema)
Model layer (C++ pricing engines)
Analytics layer (Python calibration and diagnostics)
Execution layer (reproducible pipelines)

Technology choices

The architecture deliberately combines multiple technologies, each chosen for a specific role:

C++ (C++20)
Used for performance-critical pricing components (Monte Carlo, closed-form models) and clean domain modeling.
Python
Used for orchestration, data processing, calibration (SVI), and rapid experimentation.
pybind11
Bridges C++ and Python, enabling high-performance models to be used in flexible workflows.
PostgreSQL + SQLAlchemy
Provides structured, queryable storage for market data and supports reproducible calibration pipelines.

Key challenges addressed

This project tackles several non-trivial challenges:

Bridging performance and usability
Integrating a C++ pricing engine into a Python-driven research pipeline.
Data consistency and reproducibility
Designing a schema and ingestion process that supports reliable downstream calibration.
Implied volatility inversion and calibration
Implementing stable numerical inversion and robust SVI fitting under noisy market data.
System design over isolated models
Ensuring that data, models, and workflows interact cleanly as a unified system.

Future directions

Planned improvements focus on moving further toward production-grade systems:

Arbitrage-free implied volatility surface construction
More robust calibration and smoothing techniques
Performance optimization (parallel Monte Carlo, batching)
Extension to additional data sources and APIs
Improved testing of end-to-end data and calibration pipelines
comparing classical stochastic models vs data-driven approaches for pricing or volatility forecasting

What is included

cpp/: core C++ pricing library (Monte Carlo + Black-Scholes closed form), DB ingestion hooks, and pybind bindings.
qengine/: Python package exposing the native extension (import qengine).
src/ImpliedVolatility/: SVI calibration and implied-volatility tooling.
src/data/: data ingestion, SQL schema, and analytics helpers.
tests/: C++ unit tests (GoogleTest).
scripts/: operational scripts, including PostgreSQL setup.
docs/: Doxygen configuration and generated API docs (ignored in git for publication).

Quickstart

1) Clone and create a Python environment

python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -e .
pip install pandas yfinance sqlalchemy psycopg2-binary matplotlib scipy

2) Configure environment variables

cp .env.example .env

Then edit .env with your local database credentials.

3) Create database and schema

Use the idempotent setup script:

source .env
python scripts/setup_postgres.py

This script creates/updates:

database role (DB_USER)
database (DB_NAME)
tables/indexes from src/data/sql/schema.sql

4) Build C++ extension and run tests

cmake -S . -B build
cmake --build build -j
ctest --test-dir build --output-on-failure

5) Run Yahoo options ingestion

source .env
python src/data/ingestion/ingest_yahoo_options.py

PIPELINE_SYMBOLS in .env controls which symbols are ingested (comma-separated, e.g. SPY,AAPL,QQQ).

Generating C++ API docs

cmake --build build --target docs

📚 Further Analysis

A more detailed discussion of numerial stability, implied volatility inversion, and calibration challenges is available here

👉 Project blog

This includes deeper analysis of:

implied volatility instability from raw market data
calibration challenges under noisy inputs
numerical experiments and diagnostics (see in particular Observations and further analysis)