two_sigma_human_intution - amoon.world🌙

angie's interpretation of https://www.twosigma.com/articles/why-human-intuition-is-essential-in-machine-learning/ | Situation | Example Case | Warning Signs 🚨 | Human Intuition Solution | | | ----------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------- | | When a problem is (definitely) not a machine learning problem | Housing price predictions across geographic areas (NYC vs Albany) | • Data violates I.I.D. assumptions • Economic contexts fundamentally different | • Identify ML unsuitability • Seek alternative solutions | 🧠🤜 understand limitations AND imagine alternative approaches | | When a problem is (probably) not a machine learning problem | • (unseen situation) Elections and natural disasters prediction • (low effective size of data) NYC housing boilers (1.4M measurements from 100 units) • (low signal in data) Crop loss prediction in Malawi (homogenous annual weather) with satellite image data • Financial market movements | • Unprecedented events with no historical parallels • Effective dataset much smaller than appears | • Recognize data limitations • Switch to expert-based systems • Identify cases requiring human expertise • Determine true data granularity | | | When you need to find nails but all you have is a hammer | Predicting customer "wallet" (total sales opportunity) | • Ground truth unavailable • Traditional metrics ineffective | • Reframe as quantile prediction • Adapt algorithm approach | | | When you need to decide which "wrong" problem is the right one to solve | Using night luminosity to measure economic development | • Direct economic data lacking • Need for innovative measurement | • Identify data-rich proxies • Apply transfer learning | | | When the "right" model is predicting the wrong things | Mobile app ad clicks (flashlight apps) | • High accidental clicks • No intent correlation | • Question optimization targets • Apply business logic | | | When a model's predictions might be biased | Airport customer prediction (targeting workers vs travelers) | • Easy but unfair predictions • Systematic targeting bias | • Monitor prediction fairness • Design bias controls | | | When your model is too good to be true | Ad-tech fraud detection (sudden performance increase) | • Unexpected performance jump • Too-perfect patterns | • Apply domain skepticism • Investigate anomalies | | | When you might be missing something | P2P lending default prediction | • Information leakage • Hidden temporal bias | • Identify collection bias • Structure temporal validation | | | When it's unclear whether—or how much—your model can generalize | Medical diagnosis across different facilities | • Facility-specific patterns • Equipment calibration differences | • Assess true generalization • Design context controls | | ### synthesizing with [[📝product-process]] | Level vs Nature | **Individual** 👤 Single model/problem focus | **Population** 👥 *Multi-agent/systemic view* | | ------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Fixed Planning (HOW) 🧭 *Just figuring out how things work* | #3 🔨 "Hammer-Nail" *"How can I make this tool work?"* #5 ⚠️ "Wrong Predictions" *"How is my model misbehaving?"* #8 🕵️ "Missing Data Patterns" *"How is my data incomplete?"* | #2 📊 "Probably Not ML" *"How sufficient is our collective data?"* #7 🎯 "Too Good Performance" *"How are these results suspiciously perfect?"* | | Inverse Planning (HOW + WHY) 🗺️ *Understanding both method & purpose* | #1 🚫 "Definitely Not ML" *"Should we even use ML here?"* #4 👺 "Wrong Problem Selection" *"What's the real problem to solve?"* | #6 ⚖️ "Bias Detection" *"How & why are groups affected differently?"* #7 🤔 "Too Good Performance" *"Why is this suspiciously perfect?"* #9 🌐 "Generalization" *"How & why does this work across contexts?"* | | Aspect | High-Frequency Trading (HFT) Firms | Traditional Trading Firms | | ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ | | **Technology Infrastructure** | • Cutting-edge hardware and software • Co-location services near exchanges • Ultra-low latency networks • Custom-built trading systems | • Standard trading platforms • Regular market data feeds • Conventional IT infrastructure • Often third-party systems | | **Trading Approach** | • Thousands of trades per second • Hold positions for seconds/milliseconds • Focus on tiny price discrepancies • Fully automated systems | • Lower trading frequency • Hold positions for days/months • Focus on larger price movements • Mix of human and automated trading | | **Business Model** | • Tiny profits per trade, huge volume • Market making and arbitrage • Extremely latency-sensitive • Focus on liquid markets | • Larger profit per trade • Multiple revenue streams • Advisory and research services • Can trade less liquid markets | | **Risk Management** | • Real-time automated risk checks • Zero overnight positions • Focus on technical risks • Automated circuit breakers | • Traditional risk metrics • Can hold overnight positions • Market and credit risk focus • Human oversight of positions | | **Personnel** | • Quantitative developers • Software engineers • Network specialists • Small, technical teams | • Traders • Research analysts • Sales teams • Portfolio managers • Larger, diverse teams | | **Capital Requirements** | • High initial tech investment • Lower trading capital needs • Focus on operational costs | • Lower tech investment • Higher trading capital needs • Focus on position sizes | | **Market Impact** | • Provides market liquidity • Reduces bid-ask spreads • Increases market efficiency | • Can move markets • Creates price discovery • Influences longer-term trends | [[eg(a2s)_asssa.png]] [[📝product-process]] 1. synthesize yichen, marc conv to re-write 2. send mail to cv on two sigma (scientific understanding of intelligence) 3. send mail to cvs with a table summarizing quest for intelligence 4.