Back to Blog
10 min read
TECHNICAL

How Our AI Learns to Trade: Training an Ensemble Classifier to Predict Crypto Market Signals

Behind every BUY and SELL signal CapTradeAI generates is a machine learning pipeline that turns raw market data into calibrated predictions. Here's a technical look at how it's built, trained, and deployed.

Machine Learning Ensemble Models Feature Engineering Quantitative Trading

Predicting whether a crypto asset will rise or fall is one of the hardest problems in computational finance. Our approach is not to predict price — it's to predict the right action: BUY, HOLD, or SELL. This distinction shapes everything from how we label training data to how we deploy the model in production.

The Core Problem: Framing Trading as Classification

Most people think of price prediction as a regression problem — forecast a number, act on it. In practice, regression targets are noisy, hard to threshold, and don't map cleanly to executable decisions. We model trading as a three-class classification problem instead:

BUY

The asset is expected to rise meaningfully over the next N periods. Enter a position.

HOLD

Insufficient signal. Market conditions are ambiguous — stay flat or maintain existing position.

SELL

The asset is expected to decline. Exit the position or do not enter.

This framing has a key advantage: the model learns to be selective. Rather than always producing an opinion, a well-trained classifier assigns high confidence to BUY or SELL only when the evidence genuinely supports it — and defaults to HOLD otherwise. In a live trading system, being wrong is costly. Being quiet when uncertain is not.

Labeling Historical Data Without Leakage

Supervised learning requires ground-truth labels. For a trading classifier, this means looking at each historical bar and deciding, with hindsight, whether a BUY, HOLD, or SELL would have been the correct action.

We label based on forward returns over a short horizon: if the asset's price N bars later is sufficiently higher, we label the bar BUY; if it is sufficiently lower, SELL; otherwise HOLD. The thresholds are derived from the distribution of returns in the training window, producing balanced class proportions across different market regimes.

⚠️ The Data Leakage Problem

A critical error in many trading ML pipelines is including features derived from future data in the training set. Common examples include future prices used to compute rolling highs/lows, or label quantiles calculated over the full dataset before splitting train/test.

We explicitly audit and exclude all such features. The label horizon (future return) is only ever used to construct the target — never as an input feature. Cross-validation is done on strict temporal folds: the model never sees future data during training or validation, even indirectly.

Feature Engineering: 184 Market Signals

The model ingests 184 engineered features derived exclusively from information available at the moment of prediction. These span five broad categories:

Momentum & Price Trend

Returns over windows from 3 to 100 bars, momentum strength scores, rate-of-change, and trend consistency across multiple moving averages. These capture the direction and persistence of price movement.

Volatility Structure

Rolling return standard deviations at 10, 25, 30, 50, and 100-bar windows. Ratios between these windows — such as vol100/vol30 — capture whether volatility is expanding or contracting, which is a strong regime signal. This class of features ranks among the most important for the model.

Technical Indicators

RSI at multiple periods (7, 14, 28), MACD histogram and signal crossovers, Bollinger Band position and width, Stochastic %K/%D, ADX with directional movement index, Awesome Oscillator, and full Ichimoku cloud components (Tenkan-sen, Kijun-sen, Senkou Span A/B).

Volume & Flow

Volume ratio (current vs. 20-bar moving average), On-Balance Volume normalized by its historical range, VWAP deviation, volume-weighted momentum, and volume spike detection. Volume confirms or contradicts price moves — a BUY signal on low volume is treated very differently than one on a volume surge.

Temporal & Portfolio Context

Hour-of-day and day-of-week encoded as sine/cosine pairs (capturing cyclical market patterns), month/quarter, portfolio cash ratio, and portfolio value. These allow the model to learn that, for example, low-liquidity hours should be treated differently even with the same technical picture.

All 184 features are computed from data available at prediction time — no look-ahead, no future prices, no derived quantities that would be unknowable in a live session.

The Ensemble Architecture

No single model family consistently dominates on financial time series. Tree-based models handle non-linearity and feature interactions well; gradient boosting methods tend to be excellent at capturing complex patterns with limited data; others are more robust to distribution shift. We combine all of them.

The production model is a voting ensemble of four base learners:

Random Forest

Bagged decision trees trained on random feature subsets. Provides stable probability estimates and is naturally resistant to overfitting. Strong at capturing global structure in the feature space.

XGBoost

Gradient boosted trees with L1/L2 regularization. Learns sequentially from residual errors. Excellent at identifying high-precision decision boundaries for BUY and SELL signals.

CatBoost

Gradient boosting with ordered boosting to reduce target leakage during training. Particularly robust on datasets with class imbalance — important for a HOLD-dominated label distribution.

LightGBM

Leaf-wise tree growth with histogram-based binning. Highly efficient on large feature sets. Tends to find fine-grained patterns that column-wise methods miss, and trains an order of magnitude faster than comparable methods.

Each model produces a probability distribution over {SELL, HOLD, BUY}. The ensemble averages these distributions, producing a single calibrated probability for each class. The final decision is made by taking the class with the highest probability — but only if it clears a confidence threshold.

Confidence Thresholds: Only Acting When Certain

Raw model probabilities are not enough. A model can be well-calibrated on aggregate but still produce many low-confidence signals that shouldn't trigger trades. We use a separate ConfidencePredictor that is trained after the main ensemble to estimate per-class precision at different probability thresholds.

For each class (BUY, HOLD, SELL), the ConfidencePredictor finds the minimum probability threshold at which that class achieves a target precision on held-out data. These thresholds are stored in the model artifact and loaded automatically at inference time — no manual tuning needed when retraining.

# Example thresholds from a trained model
SELL threshold: 33.08%  # act on SELL if confidence ≥ 33.08%
HOLD threshold:  9.97%  # HOLD is default — low bar
BUY threshold: 32.98%  # act on BUY if confidence ≥ 32.98%

In practice, BUY and SELL thresholds end up close to each other, reflecting the symmetric difficulty of predicting positive and negative moves. The HOLD threshold is intentionally low — when neither BUY nor SELL clears their bar, HOLD is the correct answer.

Regime-Aware Signal Gating

Financial markets are non-stationary. A model trained on bull market data will systematically underperform in sideways or bear regimes — and vice versa. We address this with a second layer of filtering: the regime gate.

Before a BUY or SELL signal is forwarded to the execution layer, the current market regime is assessed using a combination of global index trends, volatility structure, and asset-specific momentum. The signal threshold is then dynamically adjusted:

Regime Threshold Adjustment Rationale
Bull Base threshold Trending market — model signals are most reliable
Sideways Base + 2% Range-bound — require slightly stronger signal
Bear Base + 5% High risk — only act on high-conviction signals

This two-stage gate — model confidence first, regime adjustment second — significantly reduces false positives without sacrificing true positives in favourable conditions. The base threshold is always derived directly from the model artifact, ensuring the regime adjustment is anchored to the model's actual calibration.

Training, Validation, and Continuous Retraining

The model is trained on years of historical OHLCV data across multiple crypto assets. We use time-series cross-validation with expanding windows — each fold trains on all data up to a cutoff and validates on the period immediately following it. Standard k-fold cross-validation is not appropriate here because it allows information from the future to contaminate the training set.

Evaluation Metrics

We evaluate on macro-averaged F1 score across all three classes. Accuracy alone is misleading when HOLD dominates the label distribution — a model that predicts HOLD 100% of the time would achieve high accuracy while being useless for trading.

  • F1 (macro): primary training objective — balances precision and recall across all three classes
  • BUY/SELL precision: secondary — low precision = too many false signals = unnecessary trades and fees
  • Temporal stability: F1 variance across CV folds — a model that works in some regimes but not others is not production-ready

As markets evolve, older training data becomes less representative. We retrain on a rolling basis, tracking model performance over time and triggering retrains when degradation is detected. Each retraining run goes through the full pipeline: feature extraction, leakage audit, cross-validation, confidence threshold calibration, and a final out-of-sample evaluation before the model is promoted to production.

From Model Output to Executed Trade

The model's role ends at producing a probability distribution. What happens next is handled by the agent's execution layer, which applies additional checks before any order is placed:

1
Model confidence check

Signal must exceed the per-class threshold learned from training data. Signals below threshold are discarded regardless of direction.

2
Regime gate

Threshold is raised in sideways or bear conditions. The signal must clear this higher bar to proceed.

3
Multi-strategy consensus

The AI signal is one vote in a broader multi-strategy system. Volume analysis, trend following, and other signals must align before an order is placed.

4
Position and risk sizing

Order size is determined by portfolio allocation rules, not the model. A high-confidence signal doesn't mean an oversized position — risk management is always in control.

5
Profit-based trailing stop

Once a position is open, the exit is managed by a trailing stop that locks in profits as the trade moves favorably. The model is not involved in exit decisions — a separate mechanism handles that.

Why This Approach Works

The most common failure mode in trading ML is a model that looks great on backtests but breaks in production. This happens for predictable reasons: data leakage, look-ahead bias, overfitting to a specific market regime, or a mismatch between the features used in training and the features available at inference time.

Our pipeline is designed to close each of these gaps explicitly. Features are audited for leakage. Training uses strict temporal cross-validation. The confidence thresholds are calibrated on held-out data. And at inference, we verify that every feature the model was trained on is being computed with the same formula — not zero-filled or approximated in a way that would silently degrade predictions.

The result is a model whose performance on held-out historical data is a reliable indicator of its live performance — which is the most important property a trading model can have.

Continuous improvement: every new model version is evaluated against the previous one on out-of-sample data before deployment. The bar to enter production is not "better than random" — it's "better than what we already have." That standard keeps the system honest.

See the AI in Action

Every signal described in this article is running live in the CapTradeAI agent — around the clock, across multiple assets.