Find Alpha Where the Market Breaks
If youβre building ML models for algorithmic trading, you already know the problem:
High-frequency market data is noisy, expensive to clean, and painfully slow to label.
The MagSeven High-Freq Anomaly Dataset skips all of that.
Instead of raw OHLCV files, you get a precision-engineered collection of real market stress events β moments where volatility and liquidity spike together, and models actually learn something meaningful.
We scan minute-level market data (1m and 5m aggregation) for the Magnificent Seven tech stocks and extract only statistically significant volatility + volume shock events, ready for immediate use.
This is not a general-purpose price dataset. It is a feature-engineered anomaly corpus built specifically for ML and quantitative research.
What You Get
200+ High-Quality Anomaly Events from:
- AAPL, AMZN, GOOGL, META, MSFT, NVDA, TSLA
Strict Dual-Factor Detection Logic:
- Volume Z-Score > 5.0
- Volatility Z-Score > 2.0
- (Triggered simultaneously)
ML-Ready Feature Engineering:
- Normalized price and volume (relative to event window start)
- Velocity and acceleration factors
- Money flow and volume momentum
- Bar intensity and market phase labels
Forward-Looking Outcome Labels Each event includes:
future_return_30mmax_upside_30mmax_drawdown_30m- Directional classification: BULLISH or BEARISH
Multi-Scale Context Windows
- 1-minute microstructure context
- 5-minute short-term trend context
Clean JSONL Format Stream directly into Python, Pandas, PyTorch, or TensorFlow dataloaders β no preprocessing required.
π‘ Common Use Cases
- Anomaly Detection Train autoencoders or contrastive models on normal vs anomaly regimes.
- Directional Prediction Supervised Buy / Sell / Hold classification focused only on high-impact events.
- Market Microstructure Research Study liquidity shocks, volatility clustering, and intraday regime changes.
- Execution and Slippage Stress Testing Evaluate strategies under extreme but realistic market conditions.
π Start Modeling Immediately
Get the full dataset with Forward-Looking Outcome Labels and Multi-Scale Context Windows.
Or preview the data structure:
π¬ Contact & Support
If you have any questions about this dataset, licensing, or access to the full version, feel free to reach out:
π§ Email: [email protected]
Please note that this email is intended for dataset-related inquiries only. We aim to respond within 1β2 business days.