📄
Abstract - Story Operators: Decomposing the Original $\to$ Sequel Transformation in Embedding Space
I treat a book as a point in a sentence-embedding space and a literary transformation as an operation on points. Given an original novel and its sequel, I ask what it takes, geometrically, to turn the first into the second. Using all-mpnet-base-v2 paragraph embeddings drawn from a precomputed index of the PG19 corpus, I form the displacement $d=\bar{x}_{\rm seq}-\bar{x}_{\rm orig}$ and greedily decompose it along a content basis obtained by PCA over the two books' own paragraphs. Each component is an interpretable axis anchored by real passages at its poles. Across thirteen verified author pairs from Project Gutenberg, the decomposition reveals a small taxonomy of sequels: formulaic (a tiny, low-rank change: Doyle's Holmes collections, $\|d\|=0.12$), concentrated (one dominant axis: Alcott's Little Women $\to$ Little Men, 75% on a single move), and compositional (many small axes: Twain, Burroughs's Barsoom, Nesbit). For the canonical case, Tom Sawyer $\to$ Huckleberry Finn, the dominant recovered axis is structural -- the collapse of sheltering domesticity into a picaresque road -- rather than the famous surface themes of vernacular voice or slavery, which ride later, smaller axes; and the transformation routes through adventure-journey space rather than diluting toward generic realism. I corroborate the recovered geometry against Twain's documented authorial intent (his 1875--76 letters to Howells), which names the first-person picaresque move years in advance, and I quantify, with an explicit representation caveat, how much of the realized transformation his stated intentions span. All computations are reproducible from the released scripts and data.
故事算子:在嵌入空间中分解“原作到续集”的文学转换过程 /
Story Operators: Decomposing the Original $\to$ Sequel Transformation in Embedding Space
1️⃣ 一句话总结
本文提出了一种将小说转化为句子嵌入向量,并通过主成分分析(PCA)来分解原作和续集之间几何位移的方法,从而揭示续集的类型(公式化、集中化或组合化),并以《汤姆·索亚历险记》到《哈克贝利·费恩历险记》为例,验证了该方法能发现作者意图中隐藏的结构性变化(如从家庭叙事转为流浪冒险)。