
Kronos treats financial candlestick sequences as a learnable “language”. It first applies a dedicated tokenizer to quantize continuous, multi-dimensional OHLCV into hierarchical discrete tokens, then pretrains a decoder-only autoregressive Transformer over token sequences to unify forecasting, generation, and downstream quant tasks. Model weights and tokenizers can be pulled from Hugging Face, and a Predictor interface packages normalization, truncation, sampling, and inverse transforms into a reusable pipeline. When you need domain fit for a specific asset universe or frequency, you can structure data and backtests with Qlib and run two-stage finetuning (tokenizer + predictor) via torchrun to keep training and evaluation regression-friendly.
| ✕Traditional Pain Points | ✓Innovative Solutions |
|---|---|
| Feeding raw financial series into general models often fails under noise and scale shifts; assumptions drift quickly across markets and frequencies. | Kronos uses a two-stage discrete tokenizer plus autoregressive pretraining to convert continuous OHLCV into a learnable token language that is more stable and transferable. |
| Classic TS pipelines scatter bucketing, normalization, sampling, and evaluation across scripts, making experiments hard to reproduce and share. | A Predictor interface and finetuning scripts harden training/inference/evaluation into configurable pipelines for A/B comparisons and regression tests. |
1git clone https://github.com/shiyu-coder/Kronos.git && cd Kronos && python -m venv .venv && . .venv/bin/activate1pip install -U pip && pip install -r requirements.txt1python examples/prediction_example.py1pip install pyqlib && python finetune/qlib_data_preprocess.py1torchrun --standalone --nproc_per_node=2 finetune/train_tokenizer.py && torchrun --standalone --nproc_per_node=2 finetune/train_predictor.py| Core Scene | Target Audience | Solution | Outcome |
|---|---|---|---|
| A forecasting baseline for quant research | quant researchers | model multi-asset candlesticks as token sequences and benchmark forecasts | faster iteration with reproducible evaluation and fewer ad-hoc scripts |
| Representation learning across markets | multi-venue teams | align frequencies and scales through a unified tokenizer | reduce drift-driven rework and make transfer learning operational |
| Signals plus backtests as one pipeline | strategy engineering | turn forecasts into tradable signals and run backtests | a train→infer→backtest loop that supports regression and version comparisons |
