2025-04-12 summarize https://chatgpt.com/c/67f6704d-cf8c-8002-a575-eace37131e72 newsboy - | Aspect | 🌲 Hierarchical Testing (Moon) | 🗞️ Newsvendor Model (Scarf) | Cognitive or Managerial Interpretation | the idea of "cognitive inventory" - [[🗄️🧠moshe]]'s [[chen23_need_modeling_analysis.pdf]] 2025-04-11 using visualizing [pilot sizing rule for test two choose one stopping rule](https://claude.ai/chat/b8704906-619e-4e84-ba02-6074792ae38b) # **Test Two and Choose One: A Bayesian Pilot Sizing Rule for Entrepreneurial Experimentation** ![[chicken2chicken 2025-04-11-8]] **Abstract:** This study addresses a fundamental question in entrepreneurship: _How long should a startup test an idea before deciding to scale or pivot?_ We develop a Bayesian decision rule for **pilot experiment sizing** that balances the _value of information_ against _cost_. Building on Moon (2025)’s Proposition 3, we show that the **optimal pilot sample size** grows when the entrepreneur’s prior belief about success far exceeds reality, but shrinks with high per-unit test cost ([moon25_testing_day2.pdf](file://file-tbajsdheb3bwrmqrxqedym%23:~:text=10,up%20a%20gmt%20is%20cheap/)) ([moon25_testing_day2.pdf](file://file-tbajsdheb3bwrmqrxqedym%23:~:text=is%20small%20\(e,lot,%20boosting%20n%20can%20make/)). The core contribution is a simple _“Test Two, Choose One”_ rule: **test two leading alternatives in parallel, keep expanding the pilots until the expected learning gain from another unit equals its cost, then choose the better alternative to fully launch**. This rule translates Bayesian sequential analysis into a clear guideline, helping entrepreneurs avoid both under-testing and over-testing. The framework is _theory-forward_, deriving its rule from Bayesian principles, yet yields **actionable guidance** for managers (e.g. lean testing for startups vs. large pilots for incumbents). Empirical illustrations from the EV industry (Tesla’s phased strategy, a startup “TAXIE” vs. Toyota’s pilot) demonstrate how belief-driven testing can improve decision-making. **Managerially,** the paper offers a rigorous but intuitive principle to _fail fast but not too fast_, enhancing strategic experimentation under uncertainty. **Explained for a 6-year-old:** _When you have a great idea, you shouldn’t **jump in without testing**, but you also shouldn’t **keep testing forever**. Imagine you have two cookies and you’re not sure which one is yummier. You **take a bite of each** (that’s your test) until you’re pretty sure which cookie tastes better. Then you **pick your favorite**. We do the same with new ideas: **try two small experiments, learn which one is best, then choose that one**!_ 🍪🍪👉👍 ## 🗄️1. Table of Contents (Question & Answer) |**Section (Key Aspect)**|**Research Question** (❓)|**Answer (Key Insight)**|**Literature Link** (🧱)| |---|---|---|---| |**Introduction** (🧍‍♀️🌏💭)|**Q:** What problem does this research address in entrepreneurial strategy?|**A:** It tackles _how entrepreneurs should decide the size of a pilot test under uncertainty_. Many startups either launch too large a pilot (wasting money) or too small (missing learnings). We ask how to find the **“just right” sample size** for market experiments. This is a core challenge in entrepreneurship – making decisions with limited knowledge, by **balancing risk vs. l ([Afeyan Murray Pisano BE Foreword 08 24.pdf](file://file-dxxavqvajpujufb9ubnfja%23:~:text=,solutions%20they%20are%20generating,%20this/))†L57-L64】.|*Entrepreneurial decision-making under uncertainty requires systematic experimentation rather than gut ([Afeyan Murray Pisano BE Foreword 08 24.pdf](file://file-dxxavqvajpujufb9ubnfja%23:~:text=,solutions%20they%20are%20generating,%20this/))†L57-L64】. The need for a rule reflects calls to bring scientific rigor (Bayesian updating) into startup strategy (e.g. _lean startup_ ethos of test-and-learn).| |**Theoretical Background** (🗺️💭)|**Q:** What existing theories and gaps inform this study’s approach?|**A:** We build on _decision theory under uncertainty_ (choosing with incomplete info) and _value-of-learning_ principles (worth of gaining more data). Classic theory says gather information until its **marginal value equals marginal cost** (the core of Bayesian **value of information**). In practice, entrepreneurial methods like **Lean Startup** urge _“fail fast”_ experimentation, testing multiple ideas. However, there is a gap: **no clear rule for how much to test** each idea. By uniting Bayesian logic with entrepreneurial contexts, we address _how to quantify “enough testing”_.|_Decision analysis_ provides formal criteria for information gathering (e.g. setting experiments until value ≈ cost), while _entrepreneurial frameworks_ emphasize iterat ([Experimentation and Startup Performance: Evidence from A/B testing](https://www.nber.org/system/files/working_papers/w26278/w26278.pdf#:~:text=the%20role%20of%20a%20%E2%80%9CMendelian%E2%80%9D,advise%20entrepreneurs%20to%20propose))g. Prior work suggests a “Mendelian” approach of trying alternative hypotheses and selecti ([Experimentation and Startup Performance: Evidence from A/B testing](https://www.nber.org/system/files/working_papers/w26278/w26278.pdf#:~:text=the%20role%20of%20a%20%E2%80%9CMendelian%E2%80%9D,advise%20entrepreneurs%20to%20propose))t, but without prescribing pilot size. This paper bridges that gap with an operational rule.| |**Model & Key Proposition** (📐💭)|**Q:** How does the model determine optimal pilot sample size in go-to-market tests?|**A:** We model an entrepreneur with a **subjective prior** on market success (e.g. belief that success probability is μ) who can run a pilot of size _n_. The pilot yields data (successes/failures), updating the belief (Bayesian learning). **Proposition 3** finds that the _incremental benefit_ of a larger pilot is higher when the **belief gap** (entrepreneur’s optimism vs. reality) is large, and lower when per-unit ([moon25_testing_day2.pdf](file://file-tbajsdheb3bwrmqrxqedym%23:~:text=10,up%20a%20gmt%20is%20cheap/))10†L461-L469】. In other words, **if you suspect you might be very wrong (μ ≠ φ_true) and tests are cheap, do a bigger pilot**; but **if tests are costly or you’re fairly confident, ke ([moon25_testing_day2.pdf](file://file-tbajsdheb3bwrmqrxqedym%23:~:text=is%20small%20\(e,lot,%20boosting%20n%20can%20make/))10†L469-L477】. This yields a _monotonic guideline_: sample size increases with the value of learning, until **diminishing returns** set in.|In formal terms, the optimal n equates **marginal info gain to marginal cost**, echoing classic Bayesian expe ([moon25_testing_day2.pdf](file://file-tbajsdheb3bwrmqrxqedym%23:~:text=10,up%20a%20gmt%20is%20cheap/))10†L461-L469】. Empirically, Moon (2025) shows large pilots are justified when an entrepreneur’s prior is overly optimistic and tests are inexpensive, whereas high testing costs or modest priors favor ([moon25_testing_day2.pdf](file://file-tbajsdheb3bwrmqrxqedym%23:~:text=is%20small%20\(e,lot,%20boosting%20n%20can%20make/))10†L469-L477】.| |**Decision Rule: “Test Two, Choose One”** (🧭📐)|**Q:** What is the new decision rule proposed, and how does it work in practice?|**A:** The rule is: **Simultaneously test two best ideas, and continue each pilot until the next test unit’s expected learning benefit equals its cost. Then stop and pick the better-performing idea**. In practice, an entrepreneur would pilot two alternatives (e.g. two product versions or strategies), _monitor results_, and **stop expanding the tests when additional data no longer pays off**. At that stopping point, whichever alternative has higher updated expected payoff is chosen for full launch. This rule operationalizes the Bayesian logic into a simple heuristic: _keep experimenting until additional insight is worth no more than it costs_, then exploit the winner. It extends the “test two, choose one” intuition by formally **linking pilot size to beliefs**.|This stopping rule merges Bayesian sequential analysis with entrepreneurial decision-making. It reflects an **optimal stopping** condition common in statistics (Wald, 1947) and exemplified in our model by the point where adding the _third trial_ no longer improves ([moon25_testing_day2.pdf](file://file-tbajsdheb3bwrmqrxqedym%23:~:text=marginal%20value%20varies%20with%20n:,side%20learning/))f. Conceptually, it aligns with the strategy of testing multiple ideas and committ ([Experimentation and Startup Performance: Evidence from A/B testing](https://www.nber.org/system/files/working_papers/w26278/w26278.pdf#:~:text=the%20role%20of%20a%20%E2%80%9CMendelian%E2%80%9D,advise%20entrepreneurs%20to%20propose)) best, but now we provide a concrete threshold for _when_ to commit.| |**Implications & Applications** (💸🧍‍♀️)|**Q:** How does this rule impact real-world entrepreneurial experimentation in different contexts?|**A:** **For startups,** it means _lean testing_: run small, affordable experiments until your learning saturates, then either pivot or double-down. This helps **fail fast without failing wastefully**, preserving cash by not over-investing in pilots. **For established firms,** the rule justifies larger-scale pilots when feasible – they can leverage greater resources to gain precise market feedback (e.g. extensive beta tests) instead of relying on intuition. **For policymakers,** it suggests funding innovation with _stage-gated experiments_: support at least two competing pilot projects and require evidence (data) before scaling one. In all cases, decisions become more evidence-based: e.g. a startup “TAXIE” kept its pilot very small and exited early to ([moon25_testing_day2.pdf](file://xn--file-tbajsdheb3bwrmqrxqedym%23:~:text=example%20consider%20startup%20taxie%20vs,exited%20without%20huge%20sunk%20costs-1e00fud/))ses, whereas Toyota ran a much larger Prius pilot fleet to inform its go-t ([moon25_testing_day2.pdf](file://xn--file-tbajsdheb3bwrmqrxqedym%23:~:text=%20toyota%20,hybrid%20demand%20before%20full%20expansion-nb24d/))egy. The rule provides a **structured approach to experimentation** across these scenarios.|**Startup strategy:** Aligns with the _“fail fast”_ mantra – experiment in small batches and pivot early if metrics look poor. **Corporate R&D:** Resonates with stage-gate processes (Cooper, 1990) by quantifying the gate criteria (stop when additional data’s value < cost). **Policy:** Echoes practices like DARPA’s multi-project trials or VC incubation (funding parallel experiments, then scaling the winners). These applications illustrate the rule’s versatility, f ([moon25_testing_day2.pdf](file://xn--file-tbajsdheb3bwrmqrxqedym%23:~:text=example%20consider%20startup%20taxie%20vs,exited%20without%20huge%20sunk%20costs-1e00fud/))ups to corporat ([moon25_testing_day2.pdf](file://xn--file-tbajsdheb3bwrmqrxqedym%23:~:text=%20toyota%20,hybrid%20demand%20before%20full%20expansion-nb24d/))abs.| |**Conclusion** (💭💸)|**Q:** What are the broader contributions of this study to theory and management practice?|**A:** This work contributes a **clear decision principle** at the intersection of Bayesian theory and entrepreneurial practice. The “Test Two and Choose One” rule distills complex statistical decision analysis into a **practical guide** that entrepreneurs and managers can readily use. Theoretically, it advances the emerging *Bayesian Entrepre ([Afeyan Murray Pisano BE Foreword 08 24.pdf](file://file-dxxavqvajpujufb9ubnfja%23:~:text=,solutions%20they%20are%20generating,%20this/))amework by formalizing how prior beliefs and costs can be translated into an actionable experiment size rule – effectively providing a missing piece of a unifying theory of entrepreneurial experimentation. Practically, it improves how new ventures and innovators operate: instead of guessing or following fads, they have a **rigorous yet intuitive rule-of-thumb** for sequential investment decisions. In sum, the study **changes our thinking** about startup experiments from ad-hoc trials to **structured, optimal learning processes**, enhancing both scholarly understanding and real-world innovation management.|By anchoring entrepreneurial experimentation in a Bayesian optimal stopping rule, the paper offers what the field has lacked: a principled framework akin to Porter’s 5-Forces in strategy or ([Afeyan Murray Pisano BE Foreword 08 24.pdf](file://file-dxxavqvajpujufb9ubnfja%23:~:text=professional%20education,what%20the%20e%40icient%20market%20hypothesis/)) ([Afeyan Murray Pisano BE Foreword 08 24.pdf](file://file-dxxavqvajpujufb9ubnfja%23:~:text=,solutions%20they%20are%20generating,%20this/))43-L51】. It transforms _experimentation_ from art to a more exact science of decision-making under uncertainty, with immediate relevance for entrepreneurs, corporate strategists, and innovation policymakers.| ## 🗄️2. Comparison with Existing Theories To highlight the paper’s **📐 productizing contribution** (a tangible decision tool), we compare our approach to three key frameworks: classic decision-making under uncertainty, cost-benefit of learning models, and prevailing entrepreneurial testing approaches. |**Aspect**|**Decision Under Uncertainty** (🌏💭 classical)|**Cost-Benefit of Learning** (🧭📐 Bayesian)|**Entrepreneurial Testing Models** (🧍‍♀️💭 practice)|**Our Approach: “Test Two, Choose One”** (🧭📐💸)| |---|---|---|---|---| |**Core Focus**|One-shot choice with unknown outcomes. Emphasizes **predicting** or assuming outcomes based on prior beliefs or risk tolerance (no iterative learning).|Sequential decision-making as an optimization problem: **How much to learn** before deciding. Focuses on the **value of information** and finding an optimal stopping point mathematically.|Iterative experimentation in venture development: try ideas, get feedback, **pivot or persevere** (Lean Startup, effectuation). Emphasizes learning by doing but often without formal criteria for stopping.|Experimentation as a _designed process_: **Run two parallel tests and stop at a data-driven threshold**. Focuses on _both_ exploring alternatives _and_ a clear rule for when to stop exploring (and exploit).| |**Assumptions**|The decision-maker has a prior belief or distribution of outcomes, but additional testing either isn’t considered or is exogenous. Often assumes risk-neutral expected value maximization _without new info_.|Rational agent with known prior distribution can sample repeatedly. Assumes each data point has a cost and the agent can compute expected gains from more data. Bayesian updating is central; environment is stationary.|Startups face high uncertainty; learning is necessary. Assumes **low initial knowledge**, ability to run cheap tests (MVPs). Often assumes entrepreneurs will use heuristics (like “affordable loss”) to limit risk, not formal calculus.|Entrepreneur can identify two top alternatives and assign modest testing budgets. Assumes a prior for each alternative’s success (subjective belief) and ability to observe outcomes. **Uses Bayesian updates** but in an accessible way (e.g. success rates in pilots). Accepts real-world constraints (time, money) by stopping when marginal gain ≈ cost.| |**Mechanism/Decision Rule**|**Expected Value Rule:** If the expected payoff of an option (given current info) exceeds the status quo or threshold, choose it; otherwise, don’t. (E.g., invest if _E_[Outcome] > 0). No explicit role for running experiments – either you decide now or not at all.|**Optimal Stopping / Experimentation:** Calculate the **expected marginal benefit** of each additional sample. Continue sampling until this falls below the marginal cost (the point of indifference). Often solved via dynamic programming or closed-form solution for specific distributions (e.g. Wald’s Sequential Probability Ratio Test).|**Test-and-Pivot Heuristics:** Perform a small experiment (MVP). If results are promising, continue or scale; if not, pivot to a new idea. No strict formula – relies on entrepreneur’s judgment of what constitutes “promising” or how many iterations to try. (Rules of thumb: “fail fast, fail cheap,” or use **vanity metrics** cautiously).|**Belief-Driven Threshold Rule:** _Iteratively update_ the probability of success for two competing ideas as data comes in. **Stop when** an additional trial would not significantly change the relative confidence (i.e., when **expected learning ≈ cost**). Then **select the alternative with higher posterior success probability**. This is essentially a two-armed bandit approach with a stopping criterion, but packaged as a simple rule (“test two, pick one”).| |**Strengths**|**Simplicity:** Easy to apply when no experimentation is possible; works if prior is very informative or environment is static. Provides a baseline decision (go/no-go) quickly.|**Optimality (Normative):** Maximizes expected utility by formally accounting for information value. Minimizes regret by not stopping too early or late. Well-developed mathematically (robust theory for when to stop gathering data).|**Adaptability & Speed:** Encourages action and learning in highly uncertain environments. Can lead to faster discovery of a viable model or early termination of bad ideas (“pivoting”). Culturally resonates (celebrates learning from failure).|**Actionable + Quantitative:** Offers a **clear operational guide** (when to stop testing) that is grounded in theory. Balances exploration and exploitation effectively. More _user-friendly_ than solving dynamic programs, yet more structured than ad-hoc heuristics. Facilitates **parallel exploration** of ideas (two at once) which increases chances of finding a good opportunity.| |**Limitations**|Ignores learning – may either avoid useful experiments or commit to a path with unvalidated assumptions. Essentially static; can lead to **large errors** if prior is wrong (since it never checks via data).|Can be **complex to implement** for entrepreneurs without statistical tools – calculating expected value of information isn’t trivial. Assumes a known model for learning; real markets may violate model assumptions. Risk of “analysis paralysis” if overemphasized.|Lacks precision – **when to stop testing** or how big a test should be is vague. Entrepreneurs might quit too early or iterate too long. Success depends on founder’s intuition and ability to interpret ambiguous feedback.|Relies on having two clear alternatives to test (may not suit cases where only one idea is on the table). The rule’s optimality depends on reasonable priors; poor estimates can still mislead (garbage in, garbage out). Also assumes tests of each alternative can be run independently and in parallel.| |**Productizing Contribution** 📐|(Baseline approach, no new “product” – just a decision criterion based on expectation; no learning mechanism to productize.)|(Provides a formula for stopping, but typically requires a **specialist** to compute – not in a readily digestible form for many entrepreneurs.)|(Emphasizes mindset and process, but the _lack of a formula_ means it’s hard to make it a repeatable tool; mostly a philosophy.)|**Delivers a usable tool**: a rule-of-thumb that any innovator can apply: _“Always test at least two options, and stop testing when your confidence stops improving relative to cost.”_ It turns theory into a **practical decision aid** – much like a product that can be taught in incubators or used in corporate innovation playbooks.| **Interpretation:** Our approach complements these frameworks by offering a _middle ground_: it retains rigorous underpinning from cost-benefit analysis (like **optimal stopping**) but simplifies it into a heuristic that even a resource-constrained startup can follow. Unlike the traditional one-shot decision paradigm (which might lead entrepreneurs astray by not testing) or purely intuitive experimentation (which might not know when to stop), **“Test Two and Choose One” provides a concrete, step-by-step strategy to maximize learning efficiency**. ## 🗄️3. Practical Implications and Examples The proposed rule has broad implications across different domains of innovation. The table below outlines three applications (💸), showing how **resource-constrained startups, large corporations, and public innovation programs** can each leverage the “test two, choose one” approach. We include real or illustrative examples – such as a startup **TAXIE vs. Toyota** – to demonstrate each use case. |**Domain**|**Implication of the Rule**|**Example Application**| |---|---|---| |**Resource-Constrained Startups** (💡 small ventures)|**Lean Testing for Maximum Learning:** Startups with limited budgets should run **small, focused experiments** guided by this rule. They test their top two ideas in a **minimal viable pilot** form, monitor results, and stop once additional tests would yield little new insight. This prevents wasting scarce funds on excessive trials and encourages _failing fast_ on a bad idea. Essentially, it gives a quantitative edge to the Lean Startup approach – knowing _how many users or trials are enough_ before deciding.|**EV Rideshare Startup “TAXIE”:** A new electric taxi startup has two concepts (A: premium service, B: budget service). Using the rule, the team runs tiny pilots of both (just a few cars each) and measures driver and rider response. They find concept B is not catching on and further testing would be too costly for little gain, so they stop early. *_TAXIE keeps its pilot “n” very small_ ([moon25_testing_day2.pdf](file://xn--file-tbajsdheb3bwrmqrxqedym%23:~:text=example%20consider%20startup%20taxie%20vs,exited%20without%20huge%20sunk%20costs-1e00fud/)) to learn viability), saving money. They pivot or shut down before sinking big costs – a real-world example of _failing fast but smart_.| |**Corporate Pilot Programs** (🏢 large firms)|**Structured Scaling of Pilots:** Big companies can use the rule to avoid both extremes: _over-cautious testing_ (too small pilots that don’t reveal much) and _over-confident launches_. Instead of relying on bureaucracy or gut, they apply the threshold: keep expanding a pilot until the data’s incremental value plateaus. Corporations can afford larger n, so they will push experiments further than startups, which yields more precise market forecasts. The result is a **stage-gated innovation process** grounded in evidence – de-risking major product launches with thorough but efficient pilots.|**Toyota’s Hybrid Pilot:** Toyota, planning the Prius, effectively followed this approach. They tested early hybrid prototypes in increasing fleet sizes. Because their per-unit test cost was relatively low given their scale, they ran a **large pilot fleet** to gather extensive data. They stopped scaling the pilot when learning gains diminished and costs caught up. This led to high confidence in Prius’s market acceptance. **Toyota’s big pilot (large n)** im ([moon25_testing_day2.pdf](file://xn--file-tbajsdheb3bwrmqrxqedym%23:~:text=%20toyota%20,hybrid%20demand%20before%20full%20expansion-nb24d/))y of demand estimates, demonstrating that an incumbent can invest in learning up to the optimal point. The Prius became a success with risks mitigated by this rigorous testing.| |**Public Policy – Innovation Funding** (🏛️💰 government/NGOs)|**Evidence-Based Funding Decisions:** Policymakers can integrate the rule into innovation programs (e.g., grants, incubators, R&D subsidies). Rather than fund one big project outright, agencies can support **two parallel pilot projects** (e.g., two teams tackling a clean energy challenge) and use a _data-driven down-select_. They continue funding both until additional tests yield marginal benefits equal to cost, then concentrate resources on the project that shows superior results. This ensures taxpayer or grant money is used efficiently – exploring multiple ideas but only scaling the one proven most viable. It also introduces accountability, as decisions to continue or stop are based on measurable learning.|**ARPA-E Dual Prototypes:** The Advanced Research Projects Agency-Energy often funds multiple competing prototypes (for say, new battery technology). Using this rule, ARPA-E could structure a program where **two teams each get seed funds to test their tech**. Both run experiments (field trials) to demonstrate performance. Funding is extended incrementally: as long as each additional test (e.g., scaling a prototype plant) yields valuable info about real-world viability, both projects continue. Once the data suggests further tests won’t change the ranking of the two (one is clearly better or learning has plateaued), the agency stops and fully funds the **better alternative**. This way, public funds effectively _“test two, choose one”_ to back the winner, promoting innovation while limiting sunk costs on the loser. (Analogous approaches could apply in startup accelerators or research grants competitions.)| **Bottom line:** Across these scenarios, the rule provides a **common playbook**: always maintain at least two options in testing (to avoid getting locked into a false favorite), and let a cost-benefit threshold decide when you’ve learned enough to make the big decision. Startups gain a disciplined way to allocate precious runway, corporates get a safeguard against costly go/no-go errors, and policymakers can justify decisions with experimental evidence. This **democratizes rigorous experiment strategy** – it’s not just for statisticians, but a practical guide for everyday innovato ([image](https://chatgpt.com/c/67f8c756-eda4-8002-894c-8434b0806f8b))ed–Solution Mapping _Figure 1: **Mapping the Problem to the Solution.** The diagram illustrates the core challenge (💜 _unclear pilot sizing decision_) and our proposed solution (💚 _belief-driven stopping rule_). On the left, entrepreneurs face a **gap** – traditional approaches either use a fixed pilot size or intuition to decide how much to test, which often fails to account for how strongly the entrepreneur believes in the idea or how uncertain the market is. This can lead to two pitfalls: **over-testing** (wasting time/money when insight gained per test drops off) or **under-testing** (prematurely deciding without enough evidence). On the right, the solution is a **Bayesian stopping rule** that ties the pilot size to the entrepreneur’s own belief and the cost of tests. The rule says to **continue experimenting until the next bit of information is just worth its cost, then stop**. It effectively plugs the belief-uncertainty gap by tailoring the amount of testing to how confident or skeptical one is. In practice, this means no more one-size-fits-all pilots – instead, _“just enough”_ testing based on your prior and data. The arrow (“addresses”) signifies that the solution directly tackles the problem: it replaces guesswork with a principled threshold, ensuring efficient learning._ ## 🖼️2. Methodology Visual ([image](https://chatgpt.com/c/67f8c756-eda4-8002-894c-8434b0806f8b))tion vs. Cost Trade-off _Figure 2: **Optimal Experimentation Trade-off – When to Stop Testing.** This figure captures the central decision logic of our methodology. The blue curve shows the **marginal information gain** from each additional pilot observation (each new test user, trial, or datapoint), which **diminishes** as the sample size grows – the first few tests are very inf ([moon25_testing_day2.pdf](file://file-tbajsdheb3bwrmqrxqedym%23:~:text=marginal%20value%20varies%20with%20n:,side%20learning/))later ones add less insight. The horizontal red dashed line is the **marginal cost** per test unit (e.g., cost to run one more trial). At small sample sizes (left side of graph), the blue curve is above the cost line, indicating that each new test is worth it (information gain > cost). As we increase the sample size n, the blue curve drops due to **diminishing returns** in learning – fewer unknowns remain, so ne ([moon25_testing_day2.pdf](file://file-tbajsdheb3bwrmqrxqedym%23:~:text=marginal%20value%20varies%20with%20n:,side%20learning/)) change our beliefs as much. The optimal pilot size **n**_ is found where the blue and red lines intersect (marked by the red dot): at this point, the **value of an extra test equals its cost**. Pushing beyond n* would mean paying more for data than it’s worth, so the rule says **stop at n***. In the illustrated example, n* ≈ 11 trials. Importantly, if the entrepreneur’s prior belief is very high or very low (i.e. if there’s a big belief gap to resolve), the blue curve starts higher and declines slower – moving the intersection to the right (larger n*). Conversely, if tests are very expensive (raise the red line) or if there’s little uncertainty, n* shifts left (fewer tests). This visual thus explains **Proposition 3’s logic**: choose pilot size wh ([moon25_testing_day2.pdf](file://file-tbajsdheb3bwrmqrxqedym%23:~:text=10,up%20a%20gmt%20is%20cheap/)) learning = marginal cost**. The red-highlighted area under the blue curve before n* represents the total learning gained, and it’s cut off when further area would be below the cost line, illustrating an efficient stopping point. Overall, this trade-off figure demonstrates _how our method finds the sweet spot between experimenting and executing_ – a key to strategic innovation management.* 📄 **Title & Abstract (with 6-year-old explanation)** – _see top of document._ The title encapsulates the rule (_Test Two and Choose One_) and its context (Bayesian pilot sizing for startups). The abstract provides a comprehensive overview, emphasizing the motivation, theoretical basis, contribution, and practical significance. Finally, the concept is distilled in a child-friendly metaphor: **testing two cookies to pick the yummiest** – conveying in simple terms the essence of a belief-driven threshold for experimentation. This playful explanation underscores the intuition behind our rule: _try enough to learn, then choose the best_.