angie's interpretation of https://www.twosigma.com/articles/why-human-intuition-is-essential-in-machine-learning/
| Situation | Example Case | Warning Signs 🚨 | Human Intuition Solution | |
| ----------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------- |
| When a problem is (definitely) not a machine learning problem | Housing price predictions across geographic areas (NYC vs Albany) | • Data violates I.I.D. assumptions<br>• Economic contexts fundamentally different | • Identify ML unsuitability<br>• Seek alternative solutions | 🧠🤜<br>understand limitations AND imagine alternative approaches |
| When a problem is (probably) not a machine learning problem | • (unseen situation) Elections and natural disasters prediction<br>• (low effective size of data) NYC housing boilers (1.4M measurements from 100 units)<br>• (low signal in data) Crop loss prediction in Malawi (homogenous annual weather) with satellite image data<br>• Financial market movements | • Unprecedented events with no historical parallels<br>• Effective dataset much smaller than appears | • Recognize data limitations<br>• Switch to expert-based systems<br><br> • Identify cases requiring human expertise<br>• Determine true data granularity | |
| When you need to find nails but all you have is a hammer | Predicting customer "wallet" (total sales opportunity) | • Ground truth unavailable<br>• Traditional metrics ineffective | • Reframe as quantile prediction<br>• Adapt algorithm approach | |
| When you need to decide which "wrong" problem is the right one to solve | Using night luminosity to measure economic development | • Direct economic data lacking<br>• Need for innovative measurement | • Identify data-rich proxies<br>• Apply transfer learning | |
| When the "right" model is predicting the wrong things | Mobile app ad clicks (flashlight apps) | • High accidental clicks<br>• No intent correlation | • Question optimization targets<br>• Apply business logic | |
| When a model's predictions might be biased | Airport customer prediction (targeting workers vs travelers) | • Easy but unfair predictions<br>• Systematic targeting bias | • Monitor prediction fairness<br>• Design bias controls | |
| When your model is too good to be true | Ad-tech fraud detection (sudden performance increase) | • Unexpected performance jump<br>• Too-perfect patterns | • Apply domain skepticism<br>• Investigate anomalies | |
| When you might be missing something | P2P lending default prediction | • Information leakage<br>• Hidden temporal bias | • Identify collection bias<br>• Structure temporal validation | |
| When it's unclear whether—or how much—your model can generalize | Medical diagnosis across different facilities | • Facility-specific patterns<br>• Equipment calibration differences | • Assess true generalization<br>• Design context controls | |
### synthesizing with [[📝product-process]]
| Level vs Nature | **Individual** 👤<br>Single model/problem focus | **Population** 👥<br>*Multi-agent/systemic view* |
| ------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Fixed Planning (HOW) 🧭<br>*Just figuring out how things work* | #3 🔨 "Hammer-Nail"<br>*"How can I make this tool work?"*<br><br>#5 ⚠️ "Wrong Predictions"<br>*"How is my model misbehaving?"*<br><br>#8 🕵️ "Missing Data Patterns"<br>*"How is my data incomplete?"* | #2 📊 "Probably Not ML"<br>*"How sufficient is our collective data?"*<br><br>#7 🎯 "Too Good Performance"<br>*"How are these results suspiciously perfect?"* |
| Inverse Planning (HOW + WHY) 🗺️<br>*Understanding both method & purpose* | #1 🚫 "Definitely Not ML"<br>*"Should we even use ML here?"*<br><br>#4 👺 "Wrong Problem Selection"<br>*"What's the real problem to solve?"*<br> | #6 ⚖️ "Bias Detection"<br>*"How & why are groups affected differently?"*<br>#7 🤔 "Too Good Performance"<br>*"Why is this suspiciously perfect?"*<br>#9 🌐 "Generalization"<br>*"How & why does this work across contexts?"* |
| Aspect | High-Frequency Trading (HFT) Firms | Traditional Trading Firms |
| ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ |
| **Technology Infrastructure** | • Cutting-edge hardware and software<br>• Co-location services near exchanges<br>• Ultra-low latency networks<br>• Custom-built trading systems | • Standard trading platforms<br>• Regular market data feeds<br>• Conventional IT infrastructure<br>• Often third-party systems |
| **Trading Approach** | • Thousands of trades per second<br>• Hold positions for seconds/milliseconds<br>• Focus on tiny price discrepancies<br>• Fully automated systems | • Lower trading frequency<br>• Hold positions for days/months<br>• Focus on larger price movements<br>• Mix of human and automated trading |
| **Business Model** | • Tiny profits per trade, huge volume<br>• Market making and arbitrage<br>• Extremely latency-sensitive<br>• Focus on liquid markets | • Larger profit per trade<br>• Multiple revenue streams<br>• Advisory and research services<br>• Can trade less liquid markets |
| **Risk Management** | • Real-time automated risk checks<br>• Zero overnight positions<br>• Focus on technical risks<br>• Automated circuit breakers | • Traditional risk metrics<br>• Can hold overnight positions<br>• Market and credit risk focus<br>• Human oversight of positions |
| **Personnel** | • Quantitative developers<br>• Software engineers<br>• Network specialists<br>• Small, technical teams | • Traders<br>• Research analysts<br>• Sales teams<br>• Portfolio managers<br>• Larger, diverse teams |
| **Capital Requirements** | • High initial tech investment<br>• Lower trading capital needs<br>• Focus on operational costs | • Lower tech investment<br>• Higher trading capital needs<br>• Focus on position sizes |
| **Market Impact** | • Provides market liquidity<br>• Reduces bid-ask spreads<br>• Increases market efficiency | • Can move markets<br>• Creates price discovery<br>• Influences longer-term trends |
[[eg(a2s)_asssa.png]]
[[📝product-process]]
1. synthesize yichen, marc conv to re-write
2. send mail to cv on two sigma (scientific understanding of intelligence)
3. send mail to cvs with a table summarizing quest for intelligence
4.