ddoebel 0174a9b38c
Some checks failed
C++ CI / build (push) Has been cancelled
Update README.md
Add references to further analysis
2026-04-02 23:02:39 +00:00
2026-03-03 21:52:20 +00:00
2026-04-02 23:02:39 +00:00

Option Pricing Engine with Market Data Pipeline

📌 Project Description

This repository implements a production-style quantitative valuation pipeline for equity options, combining high-performance pricing models with a full data and calibration workflow.

The system goes beyond a standalone pricer: it integrates market data ingestion, structured storage, numerical pricing, and volatility surface calibration into a single reproducible framework.

The goal of this project

The goal of this project is to serve as a modular foundation for quantitative modeling and experimentation in option pricing and financial time series.

Rather than implementing a single model, the system is designed to support:

  • benchmarking different pricing approaches (analytical, simulation-based, and data-driven),
  • comparing numerical methods under realistic market data conditions,
  • and extending toward more advanced workflows such as statistical learning and model calibration.

A key objective is to create an environment where new ideas from research can be implemented, tested, and evaluated within a consistent pipeline, rather than in isolated scripts or notebooks.

This includes:

  • integrating alternative pricing methodologies into a shared framework,
  • analyzing model behavior across time and market regimes,
  • and building reproducible pipelines for both numerical and data-driven approaches.

Ultimately, the project aims to bridge:

  • theoretical models (e.g. stochastic processes, volatility parameterizations),
  • numerical methods (simulation, calibration),
  • and data-driven techniques (time-series analysis, machine learning),

within a single, extensible system. Moving closer to a production-grade pipeline.

What the system does

The system supports the following workflow:

  • Ingest listed option market data (Yahoo Finance)
  • Normalize and store it in a relational database (PostgreSQL)
  • Compute implied volatilities from observed prices
  • Calibrate parametric volatility surfaces (SVI)
  • Run pricing models (Black-Scholes, Monte Carlo)
  • Expose fast pricing routines via Python for analysis and research

This project aims to unify these components into a coherent system, with clear interfaces between:

  • Data layer (ingestion, storage, schema)
  • Model layer (C++ pricing engines)
  • Analytics layer (Python calibration and diagnostics)
  • Execution layer (reproducible pipelines)

Technology choices

The architecture deliberately combines multiple technologies, each chosen for a specific role:

  • C++ (C++20)
    Used for performance-critical pricing components (Monte Carlo, closed-form models) and clean domain modeling.

  • Python
    Used for orchestration, data processing, calibration (SVI), and rapid experimentation.

  • pybind11
    Bridges C++ and Python, enabling high-performance models to be used in flexible workflows.

  • PostgreSQL + SQLAlchemy
    Provides structured, queryable storage for market data and supports reproducible calibration pipelines.


Key challenges addressed

This project tackles several non-trivial challenges:

  • Bridging performance and usability
    Integrating a C++ pricing engine into a Python-driven research pipeline.

  • Data consistency and reproducibility
    Designing a schema and ingestion process that supports reliable downstream calibration.

  • Implied volatility inversion and calibration
    Implementing stable numerical inversion and robust SVI fitting under noisy market data.

  • System design over isolated models
    Ensuring that data, models, and workflows interact cleanly as a unified system.


Future directions

Planned improvements focus on moving further toward production-grade systems:

  • Arbitrage-free implied volatility surface construction
  • More robust calibration and smoothing techniques
  • Performance optimization (parallel Monte Carlo, batching)
  • Extension to additional data sources and APIs
  • Improved testing of end-to-end data and calibration pipelines
  • comparing classical stochastic models vs data-driven approaches for pricing or volatility forecasting

What is included

  • cpp/: core C++ pricing library (Monte Carlo + Black-Scholes closed form), DB ingestion hooks, and pybind bindings.
  • qengine/: Python package exposing the native extension (import qengine).
  • src/ImpliedVolatility/: SVI calibration and implied-volatility tooling.
  • src/data/: data ingestion, SQL schema, and analytics helpers.
  • tests/: C++ unit tests (GoogleTest).
  • scripts/: operational scripts, including PostgreSQL setup.
  • docs/: Doxygen configuration and generated API docs (ignored in git for publication).

Quickstart

1) Clone and create a Python environment

python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -e .
pip install pandas yfinance sqlalchemy psycopg2-binary matplotlib scipy

2) Configure environment variables

cp .env.example .env

Then edit .env with your local database credentials.

3) Create database and schema

Use the idempotent setup script:

source .env
python scripts/setup_postgres.py

This script creates/updates:

  • database role (DB_USER)
  • database (DB_NAME)
  • tables/indexes from src/data/sql/schema.sql

4) Build C++ extension and run tests

cmake -S . -B build
cmake --build build -j
ctest --test-dir build --output-on-failure

5) Run Yahoo options ingestion

source .env
python src/data/ingestion/ingest_yahoo_options.py

PIPELINE_SYMBOLS in .env controls which symbols are ingested (comma-separated, e.g. SPY,AAPL,QQQ).

Generating C++ API docs

cmake --build build --target docs

📚 Further Analysis

A more detailed discussion of numerial stability, implied volatility inversion, and calibration challenges is available here

👉 Project blog

This includes deeper analysis of:

  • implied volatility instability from raw market data
  • calibration challenges under noisy inputs
  • numerical experiments and diagnostics (see in particular Observations and further analysis)
Description
Monte Carlo pricing of European options under Black–Scholes
Readme MIT 1.1 MiB
Languages
Python 67.8%
C++ 28.9%
CMake 3.3%