Project Euler 502

Project Euler 502

题目

Counting Castles

We define a block to be a rectangle with a height of \(1\) and an integer-valued length. Let a castle be a configuration of stacked blocks.

Given a game grid that is \(w\) units wide and \(h\) units tall, a castle is generated according to the following rules:

  1. Blocks can be placed on top of other blocks as long as nothing sticks out past the edges or hangs out over open space.

  2. All blocks are aligned/snapped to the grid.

  3. Any two neighboring blocks on the same row have at least one unit of space between them.

  4. The bottom row is occupied by a block of length \(w\).

  5. The maximum achieved height of the entire castle is exactly \(h\).

  6. The castle is made from an even number of blocks.

The following is a sample castle for \(w=8\) and \(h=5\):

Let \(F(w,h)\) represent the number of valid castles, given grid parameters \(w\) and \(h\).

For example, \(F(4,2) = 10, F(13,10) = 3729050610636, F(10,13) = 37959702514,\) and \(F(100,100) \bmod 1 000 000 007 = 841913936\).

Find \((F(10^{12},100) + F(10000,10000) + F(100,10^{12})) \bmod 1 000 000 007\).

解决方案

把宽度为 \(w\) 的城堡按列看,定义第 \(i\) 列高度为 \(H_i\),并约定\(H_0=0,(1\le i\le w).\)由于底层必须整行覆盖,所以\(H_i\ge 1,(1\le i\le w).\)

反过来,任意满足 \(1\le H_i\le h\) 的高度序列都对应一个合法堆叠:第 \(r\) 层占据那些满足 \(H_i\ge r\) 的列,再按极大连续段合并成块。

  • 连续段之间天然至少隔 1 格,满足规则 3。
  • 若某格在第 \(r\) 层被占据,则同列第 \(r-1\) 层也被占据,满足“不悬空”。

因此,计数问题等价于对序列 \((H_1,\dots,H_w)\) 计数。

定义总块数为 \(B(H)\)。从左到右扫列高:当 \(H_i>H_{i-1}\) 时,恰有 \(H_i-H_{i-1}\) 个新块起点在第 \(i\) 列出现;当 \(H_i\le H_{i-1}\) 时,不会新增块起点。故\(\displaystyle B(H)=\sum_{i=1}^{w}\max(H_i-H_{i-1},0).\)题目要求块数为偶数,即\(B(H)\equiv 0\pmod 2.\)

\(G(w,h)\),其语义和\(F(w,h)\)基本相同,区别在于\(F(w,h)\)是计数恰好等于\(h\)的,而\(G(w,h)\)是计数不超过\(h\)的。则题目所求最大高度恰好为 \(h\)的计数为\(F(w,h)=G(w,h)-G(w,h-1).\)

所以核心是求 \(G\)

固定高度上界 \(h\)。定义状态 \(f_i(x,p)\):它表示前 \(i\) 列的合法构型数量,且满足当前列高为 \(H_i=x\)、块数奇偶为 \(B\bmod 2=p\),其中 \(0\le x\le h,p\in\{0,1\}\)。初值:\(f_0(0,0)=1, f_0(x,p)=0(x,p)\ne(0,0).\)

转移:从上一列高度 \(y\) 到当前高度 \(x\)。新增块数为\(\Delta=\max(x-y,0).\)因此奇偶变化为 \(p\mapsto p\oplus(\Delta\bmod2)\)。即

\[ f_i(x,q)=\sum_{y=0}^{h} f_{i-1}\left(y,q\oplus(\max(x-y,0)\bmod2)\right). \]

最终

\[ G(w,h)=\sum_{x=1}^{h}f_w(x,0). \]

可见,朴素复杂度是 \(O(wh^2)\),非常慢。

对固定 \(i\) 和目标高度 \(x\),把来源 \(y\) 分两段:\(y>x\)\(\Delta=0\),奇偶不变;\(y\le x\)\(\Delta=x-y\),按奇偶翻转。于是

\[ f_i(x,p) =\sum_{y=x+1}^{h}f_{i-1}(y,p) +\sum_{y=0}^{x}f_{i-1}(y,p\oplus((x-y)\bmod2)). \]

定义后缀和\(\displaystyle S_p(t)=\sum_{y=t}^{h}f_{i-1}(y,p),\) 则第一项是 \(S_p(x+1)\)

再定义交错前缀\(\displaystyle P_p(x)=\sum_{y=0}^{x}f_{i-1}(y,p\oplus((x-y)\bmod2)),\)第二项就是 \(P_p(x)\)

关键是 \(P\) 可线性递推(\(x\ge 1\)):\(P_0(x)=P_1(x-1)+f_{i-1}(x,0),P_1(x)=P_0(x-1)+f_{i-1}(x,1),\)边界取\(P_0(-1)=P_1(-1)=0.\)

这一步可以在\(O(wh)\)的时间内完成。

\(h\ll w\) 时,状态被编码为:\((x,p), x\in[0,100],p\in\{0,1\},\)\(2(h+1)=202\) 维。转移矩阵 \(T_h\) 固定,满足\(\mathbf v_w=\mathbf v_0T_h^w.\)其中 \(\mathbf v_0\) 只有 \((0,0)\) 分量为 1。

于是\(\displaystyle G(w,h)=\sum_{x=1}^{h}\mathbf v_w(x,0),\)可用矩阵快速幂\(O(h^3\log w)\)计算。再差分得\(F(w,h)=G(w,h)-G(w,h-1).\)

\(h\gg w\) 时,这里反过来固定宽度 \(w\),把高度当自变量。

对固定宽度 \(w\),定义\(\displaystyle E_w(h)=\#\{H:\max_{i=1}^w\{H_i\}\le h, B(H)\equiv0\pmod2\},O_w(h)=\#\{H:\max_{i=1}^w\{H_i\}\le h, B(H)\equiv1\pmod2\}\).

则有命题:四个序列\(E_w(2x),E_w(2x+1),O_w(2x),O_w(2x+1)\)都是关于 \(x\) 的多项式,次数不超过 \(w\)

我们将完成这个命题的证明,大致的方法是对 \(w\) 做强归纳,并同时证明四个序列。

基础情形:当\(w=1\) 时,\(H=(h_1)\),且\(B(H)=h_1.\)因此\(E_1(h)=\left\lfloor \dfrac h2\right\rfloor,O_1(h)=\left\lceil \dfrac h2\right\rceil.\)

于是\(E_1(2x)=x,E_1(2x+1)=x,O_1(2x)=x,O_1(2x+1)=x+1,\)均为次数 \(\le1\) 的多项式。

归纳假设:假设对所有 \(u<w\),四个序列\(E_u(2x),E_u(2x+1),O_u(2x),O_u(2x+1)\)都是关于 \(x\) 的多项式,次数 \(\le u\)

归纳步骤:先证 \(E_w(2x)\);其余三种完全同理。

取任一宽度 \(w\)、高度上界 \(2x\) 的城堡。令\(\displaystyle t=\min_{1\le i\le w}\{H_i\}.\)这正是整宽砖(长度 \(w\))出现的层数。去掉最底部这 \(t\) 层后,剩余高度上界变为 \(2x-t\),并且至少有一列高度为 \(0\)(因为最小值已被减到 \(0\))。

看去掉后第一层非零行(若存在)的占据形状:它由若干互不相邻的连续段组成,段宽记为\(u_1,\dots,u_k, u_j\ge1, u_1+\cdots+u_k\le w-1.\)(严格小于 \(w\),因为至少有一列为 \(0\)。)

固定这种段宽模式(及其在一行中的放置方式;放置数是仅依赖 \(w,u_1,\dots,u_k\) 的常数),每一段上方是一个独立的子城堡,宽度分别为 \(u_j\),高度上界为 \(2x-t\)。设第 \(j\) 段子城堡块数奇偶为 \(\eta_j\in{0,1}\),则去掉后的总奇偶是\(\eta_1\oplus\cdots\oplus\eta_k.\)原城堡总奇偶再额外异或 \(t\bmod2\)(因为加回一层整宽砖会使块数加 \(1\),共加 \(t\) 次)。

所以,对原总奇偶为偶情况的计数,在固定 \(t\) 与固定段宽模式下,可写成\(\displaystyle\sum_{\eta_1\oplus\cdots\oplus\eta_k=t\bmod2}\prod_{j=1}^{k} N_{u_j,\eta_j}(2x-t),\)其中\(N_{u,0}=E_u, N_{u,1}=O_u.\)现在分 \(t\) 奇偶:若 \(t=2r\),则自变量是 \(2(x-r)\);若 \(t=2r+1\),则自变量是 \(2(x-r-1)+1\)

由归纳假设,每个因子都是关于对应参数的多项式,次数至多 \(u_j\);乘积次数至多\(u_1+\cdots+u_k\le w-1.\)再对满足异或约束的有限个 \(\eta\) 求和,仍是次数 \(\le w-1\) 的多项式。再乘上“该模式放置数”常数、并对有限个模式求和,仍是次数 \(\le w-1\) 的多项式。

最后,对所有可行 \(t\) 求和(等价于对 \(r\) 求和)。对一个次数 \(\le w-1\) 的多项式做前缀求和,得到次数至多加 \(1\),即次数 \(\le w\)。故\(E_w(2x)\)是次数 \(\le w\) 的多项式。同理可得\(E_w(2x+1),O_w(2x),O_w(2x+1)\)也都是次数 \(\le w\) 的多项式。归纳完成。

因为\(F(w,h)=E_w(h)-E_w(h-1),\)所以对固定 \(w\)\(F(w,2x)\)\(F(w,2x+1)\) 也是关于 \(x\) 的多项式,次数不超过 \(w\)。因此分别用 \(w+1\) 个偶点与 \(w+1\) 个奇点即可唯一插值并外推到 \(h\)时,\(F(w,h)\)的值即可。

具体来说,用线性 DP 先算出\(F(w,1),F(w,2),\dots,F(w,2w+2).\) * 取奇数高度点\((0,F(w,1)),(1,F(w,3)),\dots,(w,F(w,2w+1))\)插值得到奇段多项式; * 取偶数高度点\((0,F(w,2)),(1,F(w,4)),\dots,(w,F(w,2w+2))\)插值得到偶段多项式。

\(h=10^{12}\) 为偶数,令\(x=\dfrac{h-2}{2},\)代入偶段多项式即可得结果。

代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
import numpy as np

tasks = [
(10 ** 12, 100),
(10000, 10000),
(100, 10 ** 12),
]

MOD = 1_000_000_007
INV2 = (MOD + 1) // 2

def _step(dp0, dp1, alt):
s0 = np.cumsum(dp0[::-1], dtype=np.int64)[::-1] % MOD
s1 = np.cumsum(dp1[::-1], dtype=np.int64)[::-1] % MOD
u = np.cumsum((dp0 + dp1) % MOD, dtype=np.int64) % MOD
d = (dp0 - dp1) % MOD
ce = np.cumsum((d * alt) % MOD, dtype=np.int64) % MOD
v = (ce * alt) % MOD
hs0 = ((u + v) % MOD) * INV2 % MOD
hs1 = ((u - v) % MOD) * INV2 % MOD
ndp0 = np.empty_like(dp0)
ndp1 = np.empty_like(dp1)
ndp0[:-1] = (s0[1:] + hs0[:-1]) % MOD
ndp1[:-1] = (s1[1:] + hs1[:-1]) % MOD
ndp0[-1] = hs0[-1] % MOD
ndp1[-1] = hs1[-1] % MOD
ndp0[0] = 0
ndp1[0] = 0
return ndp0, ndp1

def no_more_than_numpy(W, H):
if H <= 0:
return 0
dp0 = np.zeros(H + 1, dtype=np.int64)
dp1 = np.zeros(H + 1, dtype=np.int64)
dp0[0] = 1
alt = np.ones(H + 1, dtype=np.int64)
alt[1::2] = MOD - 1
for _ in range(W):
dp0, dp1 = _step(dp0, dp1, alt)
return int(dp0[1:].sum() % MOD)

def F_numpy(W, H):
return (no_more_than_numpy(W, H) - no_more_than_numpy(W, H - 1)) % MOD

def berlekamp_massey(seq):
C = [1]
B = [1]
L = 0
m = 1
b = 1
for n in range(len(seq)):
d = seq[n]
for i in range(1, L + 1):
d = (d + C[i] * seq[n - i]) % MOD
if d == 0:
m += 1
continue
coef = d * pow(b, MOD - 2, MOD) % MOD
T = C[:]
if len(C) < len(B) + m:
C += [0] * (len(B) + m - len(C))
for i in range(len(B)):
C[i + m] = (C[i + m] - coef * B[i]) % MOD
if 2 * L <= n:
L = n + 1 - L
B = T
b = d
m = 1
else:
m += 1
return [(-C[i]) % MOD for i in range(1, L + 1)]

def _combine(a, b, rec):
k = len(rec)
tmp = [0] * (2 * k)
for i in range(k):
ai = a[i]
if ai:
for j in range(k):
bj = b[j]
if bj:
tmp[i + j] = (tmp[i + j] + ai * bj) % MOD
for i in range(2 * k - 2, k - 1, -1):
ti = tmp[i]
if ti:
for j in range(1, k + 1):
tmp[i - j] = (tmp[i - j] + ti * rec[j - 1]) % MOD
return tmp[:k]

def linear_nth(init, rec, n):
k = len(rec)
if n < len(init):
return init[n] % MOD
if k == 0:
return 0
pol = [1] + [0] * (k - 1)
e = [0] * k
if k == 1:
e[0] = rec[0]
else:
e[1] = 1
m = n
while m:
if m & 1:
pol = _combine(pol, e, rec)
e = _combine(e, e, rec)
m >>= 1
ans = 0
for i in range(k):
ans = (ans + pol[i] * init[i]) % MOD
return ans

def G_large_w_small_h(W, H):
if H <= 0:
return 0
need = 8 * (H + 1) + 20
dp0 = np.zeros(H + 1, dtype=np.int64)
dp1 = np.zeros(H + 1, dtype=np.int64)
dp0[0] = 1
alt = np.ones(H + 1, dtype=np.int64)
alt[1::2] = MOD - 1
seq = [0]
for _ in range(1, need):
dp0, dp1 = _step(dp0, dp1, alt)
seq.append(int(dp0[1:].sum() % MOD))
rec = berlekamp_massey(seq)
init = seq[:len(rec)]
return linear_nth(init, rec, W)

def lagrange_consecutive(y, x):
n = len(y) - 1
if 0 <= x <= n:
return y[x] % MOD
fac = [1] * (n + 1)
for i in range(1, n + 1):
fac[i] = fac[i - 1] * i % MOD
ifac = [1] * (n + 1)
ifac[n] = pow(fac[n], MOD - 2, MOD)
for i in range(n, 0, -1):
ifac[i - 1] = ifac[i] * i % MOD
pre = [1] * (n + 2)
for i in range(n + 1):
pre[i + 1] = pre[i] * ((x - i) % MOD) % MOD
suf = [1] * (n + 2)
for i in range(n, -1, -1):
suf[i] = suf[i + 1] * ((x - i) % MOD) % MOD
ans = 0
for i in range(n + 1):
num = pre[i] * suf[i + 1] % MOD
den = ifac[i] * ifac[n - i] % MOD
if (n - i) & 1:
den = (MOD - den) % MOD
ans = (ans + y[i] * num % MOD * den) % MOD
return ans

def F_small_w_large_h(W, H):
vals = [0] * (2 * W + 3)
for h in range(1, 2 * W + 3):
vals[h] = F_numpy(W, h)
odd = [vals[2 * x + 1] for x in range(W + 1)]
even = [vals[2 * x + 2] for x in range(W + 1)]
if H & 1:
return lagrange_consecutive(odd, (H - 1) // 2)
return lagrange_consecutive(even, (H - 2) // 2)

def f(w, h, w_cut=220, h_cut=220):
if h <= h_cut:
return (G_large_w_small_h(w, h) - G_large_w_small_h(w, h - 1)) % MOD
if w <= w_cut:
return F_small_w_large_h(w, h)
return F_numpy(w, h)


ans = 0
for w, h in tasks:
ans = (ans + f(w, h)) % MOD
print(ans)