angie's interpretation of https://www.twosigma.com/articles/why-human-intuition-is-essential-in-machine-learning/ | Situation | Example Case | Warning Signs 🚨 | Human Intuition Solution | | | ----------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------- | | When a problem is (definitely) not a machine learning problem | Housing price predictions across geographic areas (NYC vs Albany) | • Data violates I.I.D. assumptions<br>• Economic contexts fundamentally different | • Identify ML unsuitability<br>• Seek alternative solutions | 🧠🤜<br>understand limitations AND imagine alternative approaches | | When a problem is (probably) not a machine learning problem | • (unseen situation) Elections and natural disasters prediction<br>• (low effective size of data) NYC housing boilers (1.4M measurements from 100 units)<br>• (low signal in data) Crop loss prediction in Malawi (homogenous annual weather) with satellite image data<br>• Financial market movements | • Unprecedented events with no historical parallels<br>• Effective dataset much smaller than appears | • Recognize data limitations<br>• Switch to expert-based systems<br><br> • Identify cases requiring human expertise<br>• Determine true data granularity | | | When you need to find nails but all you have is a hammer | Predicting customer "wallet" (total sales opportunity) | • Ground truth unavailable<br>• Traditional metrics ineffective | • Reframe as quantile prediction<br>• Adapt algorithm approach | | | When you need to decide which "wrong" problem is the right one to solve | Using night luminosity to measure economic development | • Direct economic data lacking<br>• Need for innovative measurement | • Identify data-rich proxies<br>• Apply transfer learning | | | When the "right" model is predicting the wrong things | Mobile app ad clicks (flashlight apps) | • High accidental clicks<br>• No intent correlation | • Question optimization targets<br>• Apply business logic | | | When a model's predictions might be biased | Airport customer prediction (targeting workers vs travelers) | • Easy but unfair predictions<br>• Systematic targeting bias | • Monitor prediction fairness<br>• Design bias controls | | | When your model is too good to be true | Ad-tech fraud detection (sudden performance increase) | • Unexpected performance jump<br>• Too-perfect patterns | • Apply domain skepticism<br>• Investigate anomalies | | | When you might be missing something | P2P lending default prediction | • Information leakage<br>• Hidden temporal bias | • Identify collection bias<br>• Structure temporal validation | | | When it's unclear whether—or how much—your model can generalize | Medical diagnosis across different facilities | • Facility-specific patterns<br>• Equipment calibration differences | • Assess true generalization<br>• Design context controls | | ### synthesizing with [[📝product-process]] | Level vs Nature | **Individual** 👤<br>Single model/problem focus | **Population** 👥<br>*Multi-agent/systemic view* | | ------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Fixed Planning (HOW) 🧭<br>*Just figuring out how things work* | #3 🔨 "Hammer-Nail"<br>*"How can I make this tool work?"*<br><br>#5 ⚠️ "Wrong Predictions"<br>*"How is my model misbehaving?"*<br><br>#8 🕵️ "Missing Data Patterns"<br>*"How is my data incomplete?"* | #2 📊 "Probably Not ML"<br>*"How sufficient is our collective data?"*<br><br>#7 🎯 "Too Good Performance"<br>*"How are these results suspiciously perfect?"* | | Inverse Planning (HOW + WHY) 🗺️<br>*Understanding both method & purpose* | #1 🚫 "Definitely Not ML"<br>*"Should we even use ML here?"*<br><br>#4 👺 "Wrong Problem Selection"<br>*"What's the real problem to solve?"*<br> | #6 ⚖️ "Bias Detection"<br>*"How & why are groups affected differently?"*<br>#7 🤔 "Too Good Performance"<br>*"Why is this suspiciously perfect?"*<br>#9 🌐 "Generalization"<br>*"How & why does this work across contexts?"* | | Aspect | High-Frequency Trading (HFT) Firms | Traditional Trading Firms | | ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ | | **Technology Infrastructure** | • Cutting-edge hardware and software<br>• Co-location services near exchanges<br>• Ultra-low latency networks<br>• Custom-built trading systems | • Standard trading platforms<br>• Regular market data feeds<br>• Conventional IT infrastructure<br>• Often third-party systems | | **Trading Approach** | • Thousands of trades per second<br>• Hold positions for seconds/milliseconds<br>• Focus on tiny price discrepancies<br>• Fully automated systems | • Lower trading frequency<br>• Hold positions for days/months<br>• Focus on larger price movements<br>• Mix of human and automated trading | | **Business Model** | • Tiny profits per trade, huge volume<br>• Market making and arbitrage<br>• Extremely latency-sensitive<br>• Focus on liquid markets | • Larger profit per trade<br>• Multiple revenue streams<br>• Advisory and research services<br>• Can trade less liquid markets | | **Risk Management** | • Real-time automated risk checks<br>• Zero overnight positions<br>• Focus on technical risks<br>• Automated circuit breakers | • Traditional risk metrics<br>• Can hold overnight positions<br>• Market and credit risk focus<br>• Human oversight of positions | | **Personnel** | • Quantitative developers<br>• Software engineers<br>• Network specialists<br>• Small, technical teams | • Traders<br>• Research analysts<br>• Sales teams<br>• Portfolio managers<br>• Larger, diverse teams | | **Capital Requirements** | • High initial tech investment<br>• Lower trading capital needs<br>• Focus on operational costs | • Lower tech investment<br>• Higher trading capital needs<br>• Focus on position sizes | | **Market Impact** | • Provides market liquidity<br>• Reduces bid-ask spreads<br>• Increases market efficiency | • Can move markets<br>• Creates price discovery<br>• Influences longer-term trends | [[eg(a2s)_asssa.png]] [[📝product-process]] 1. synthesize yichen, marc conv to re-write 2. send mail to cv on two sigma (scientific understanding of intelligence) 3. send mail to cvs with a table summarizing quest for intelligence 4.