Project Euler 328
Project Euler 328
题目
Lowest-cost Search
We are trying to find a hidden number selected from the set of integers \(\{1, 2, \dots, n\}\) by asking questions. Each number (question) we ask, has a cost equal to the number asked and we get one of three possible answers:
- “Your guess is lower than the hidden number”, or
- “Yes, that’s it!”, or
- “Your guess is higher than the hidden number”.
Given the value of n, an optimal strategy minimizes the total cost (i.e. the sum of all the questions asked) for the worst possible case. E.g.
If \(n=3\), the best we can do is obviously to ask the number “\(\mathbf{2}\)“. The answer will immediately lead us to find the hidden number (at a total cost = \(2\)).
If \(n=8\), we might decide to use a “binary search” type of strategy: Our first question would be “\(\mathbf{4}\)“ and if the hidden number is higher than \(4\) we will need one or two additional questions.
Let our second question be “\(\mathbf{6}\)“. If the hidden number is still higher than \(6\), we will need a third question in order to discriminate between \(7\) and \(8\).
Thus, our third question will be “\(\mathbf{7}\)“ and the total cost for this worst-case scenario will be \(4+6+7=\color{red}{\mathbf{17}}\).
We can improve considerably the worst-case cost for \(n=8\), by asking “\(\mathbf{5}\)“ as our first question.
If we are told that the hidden number is higher than \(5\), our second question will be “\(\mathbf{7}\)“, then we’ll know for certain what the hidden number is (for a total cost of \(5+7=\color{blue}{\mathbf{12}}\).
If we are told that the hidden number is lower than \(5\), our second question will be “\(\mathbf{3}\)“ and if the hidden number is lower than \(3\) our third question will be “\(\mathbf{1}\)“, giving a total cost of \(5+3+1=\color{blue}{\mathbf{9}}\).
Since \({\color{blue}{\mathbf{12}}}>\color{blue}{\mathbf{9}}\), the worst-case cost for this strategy is \({\color{red}{\mathbf{12}}}\). That’s better than what we achieved previously with the “binary search” strategy; it is also better than or equal to any other strategy.
So, in fact, we have just described an optimal strategy for \(n=8\).
Let \(C(n)\) be the worst-case cost achieved by an optimal strategy for \(n\), as described above.
Thus \(C(1) = 0, C(2) = 1, C(3) = 2\) and \(C(8) = 12\).
Similarly, \(C(100) = 400\) and \(\sum \limits_{n = 1}^{100} {C(n)} = 17575\).
Find \(\sum \limits_{n = 1}^{200000} {C(n)}\) .
解决方案
令\(N=200000\)。一次猜测相当于在集合 \({1,2,\dots,n}\) 上选一个关键字 \(k\),并得到三种结果:
- 命中:结束;
- 更小:转入 \([1,k-1]\);
- 更大:转入 \([k+1,n]\)。
因此任何策略都等价于一棵二叉搜索树(BST):节点是被询问的数,左右子树分别对应更小、更大。对某个隐藏数 \(x\),实际支付的总代价是从根走到 \(x\) 的路径上所有节点值之和。
不难直接想到用DP求解。令区间最优最坏代价为\(F(l,r)\),那么不难写出其状态转移方程:
\[F(l,r)= \left \{\begin{aligned} &0 & & \text{if}\quad l\ge r \\ &\min_{k\in[l,r]}\left\{ k+\max(F(l,k-1),\ F(k+1,r))\right\}. & & \text{otherwise}\quad \\ \end{aligned}\right.\]
其中,第一行是边界,第二行则是极小极大递推:哪怕是进入了较差的分支,损失也要最小。
题目所求\(\displaystyle{C(n)=F(1,n), \sum_{n=1}^{N}C(n)}.\)因此,\(F(l,r)\)不足及解决本题。
把根猜测记为 \(g\)。那么最坏代价一定是 \[ C(n)=\min_{1\le g\le n}\left(g+\max(C(g-1),\ F(g+1,n))\right). \] 关键难点在于如何快速计算上侧区间 \(F(g+1,n)\)。
设上侧区间长度\(k=n-g,\)则上侧就是长度为 \(k\) 的后缀区间\([n-k+1, n].\)
定义一个专门函数: \[ H(k,n)=F(n-k+1,\ n). \]
于是
\[ C(n)=\min_{k\in[1,n-1]}\left\{(n-k)+\max(C(n-k-1),\ H(k,n))\right\}. \]
也就是说,\(H(k,n)\)等价于:在键集合 \(\{n-k+1,\dots,n\}\) 上构造一棵“最优”的二叉搜索树,使得“根到某个键的路径和”的最大值最小;在选定的那棵树上,\(H(k,n)\)就是最大路径和。
在后缀区间 \({n-k+1,\dots,n}\) 中,数值整体偏大。路径上每多询问一次就会额外付出一个接近 \(n\) 的代价,因此对后缀子树的优化具有明显倾向:
- 优先最小化高度:在节点值整体很大时,增加一层深度通常比“稍微换一下根的位置”更昂贵。
- 在同高度下,深叶尽量靠左:因为越靠左的键越小,把更深的叶放在更小的键一侧会减小最坏路径和。
把这两点合并,可将后缀子树限制到一种形状:节点按层尽量填满,最后一层从左到右填(即完全二叉树),且更深的部分出现在左边。这样后缀子问题 \(H(k,n)\) 不需要真正建树,而可以沿着“最坏分支”用计数方式直接求路径和。
因此,\(H(k,n)\)确实就变成“在这棵完全二叉搜索树上找最坏分支路径和”。
对完全二叉树,我们不需要不需要建树,只要在计数意义上知道“本层能容纳的满子树大小”,就能决定最坏分支落到左还是右,并把根的数值加进答案。
设后缀规模为 \(k\),右端点为 \(n\)。令\(\texttt{next}=2^h-1\)表示“下一个满树规模阈值”,使得当前 \(k<\texttt{next}\)。令\(b=\left\lfloor\dfrac{\texttt{next}}{2}\right\rfloor=2^{h-1}-1.\)
这里 \(b\) 可视作“上一层满树节点数”。若恰好 \(b=k\),说明实际上高度应更小一档,因此令\(b\leftarrow \left\lfloor\dfrac{b}{2}\right\rfloor.\)
在 complete tree 中,最后一层最多还能额外容纳 \(b+1\) 个节点。根据填充位置,最坏分支会落在右侧或左侧,对应如下判定:$ k>b+.$
- 若条件为真(最坏分支进入右侧更深部分),根值为\(\mathrm{root}=n-k+(b+1),\)并更新\(k\leftarrow k-(b+1), b\leftarrow\left\lfloor\dfrac{b}{2}\right\rfloor,\)且 \(n\) 不变。
- 否则(最坏分支进入左侧),根值为\(\mathrm{root}=n-\left\lfloor\dfrac{b}{2}\right\rfloor,\)并更新\(s=\left\lfloor\dfrac{b}{2}\right\rfloor+1,k\leftarrow k-s,n\leftarrow n-s,b\leftarrow\left\lfloor\dfrac{b}{2}\right\rfloor.\)
把每一步的 \(\mathrm{root}\) 累加,直到 \(k=1\)(只剩一个数无需再问),即可得到 \(H(k,n)\)。
对固定 \(n\),候选值为\((n-k)+\max(C(n-k-1),H(k,n)).\)
实践中(并与题目讨论区常见结论一致),最优后缀规模 \(k\) 随 \(n\) 单调不减,这是因为随着\(n\)的增大,如果\(k\)递减,左右两侧占用的代价会更加失衡;利用这一点,可以在线性推进 \(n\) 的同时维护当前最优 \(k\),并只尝试少量“倍增步长”的候选更新,从而把整体复杂度压到近似 \(O(N\log N)\)。
代码
1 | N = 2 * 10 ** 5 |