Project Euler 424
Project Euler 424
题目
Kakuro

The above is an example of a cryptic kakuro (also known as cross sums, or even sums cross) puzzle, with its final solution on the right. (The common rules of kakuro puzzles can be found easily on numerous internet sites. Other related information can also be currently found at krazydad.com whose author has provided the puzzle data for this challenge.)
The downloadable text file kakuro200.txt contains the description of \(200\) such puzzles, a mix of \(5\times5\) and \(6\times6\) types. The first puzzle in the file is the above example which is coded as follows:
6,X,X,(vCC),(vI),X,X,X,(hH),B,O,(vCA),(vJE),X,(hFE,vD),O,O,O,O,(hA),O,I,(hJC,vB),O,O,(hJC),H,O,O,O,X,X,X,(hJE),O,O,X
The first character is a numerical digit indicating the size of the information grid. It would be either a \(6\) (for a \(5\times5\) kakuro puzzle) or a \(7\) (for a \(6\times6\) puzzle) followed by a comma (,). The extra top line and left column are needed to insert information.
The content of each cell is then described and followed by a comma, going left to right and starting with the top line.
\(X\) = Gray cell, not required to be filled by a digit.
\(O\) (upper case letter)= White empty cell to be filled by a digit.
\(A\) = Or any one of the upper case letters from A to J to be replaced by its equivalent digit in the solved puzzle.
\(( )\) = Location of the encrypted sums. Horizontal sums are preceded by a lower case “h” and vertical sums are preceded by a lower case “v”. Those are followed by one or two upper case letters depending if the sum is a single digit or double digit one. For double digit sums, the first letter would be for the “tens” and the second one for the “units”. When the cell must contain information for both a horizontal and a vertical sum, the first one is always for the horizontal sum and the two are separated by a comma within the same set of brackets, ex.: (hFE,vD). Each set of brackets is also immediately followed by a comma.
The description of the last cell is followed by a Carriage Return/Line Feed (CRLF) instead of a comma.
The required answer to each puzzle is based on the value of each letter necessary to arrive at the solution and according to the alphabetical order. As indicated under the example puzzle, its answer would be \(8426039571\). At least \(9\) out of the \(10\) encrypting letters are always part of the problem description. When only \(9\) are given, the missing one must be assigned the remaining digit.
You are given that the sum of the answers for the first \(10\) puzzles in the file is \(64414157580\).
Find the sum of the answers for the \(200\) puzzles.
解决方案
可见,这一类问题最自然的方式是把它写成有限域约束满足问题(CSP):每个格子或字母都是变量,规则就是约束;交给约束求解器做搜索与剪枝即可。每个 puzzle 是一个 \(n\times n\) 的“信息网格”。其中 \(n=6\) 对应原题的 \(5\times 5\),\(n=7\) 对应原题的 \(6\times 6\)(因为额外一行一列用于放线索)。
每个单元格 token 有几类:
- \(X\):灰格,不填数字,可能用于分隔连续段。
- \(O\):白空格,需要填入 \(1\sim 9\) 的数字。
- 字母 \(A\sim J\):白格,但其数字必须等于该字母映射值。
- \(()\):线索格。例如 \((h\overline{FE})\) 表示“从该格右边连续白格构成的横向连续段的和是两位数 \(\overline{FE}\)”,也有可能只有一位如 \((v\overline{D})\),不过\(v\)则表示该格下边连续的白格子,同一格可同时给出横、纵线索。
为了把 puzzle 变成 CSP,需要把所有未知量变成整数变量,并给每个变量一个取值域。
题目要求 \(A\sim J\) 是 \(10\) 个不同数字的置换,因此定义:
- 对每个字母 \(L\in\{A,\dots,J\}\),定义整数变量\(\ell_L \in \{0,1,2,\dots,9\}.\)
- 全局全不同约束:\(\text{AllDifferent}(\ell_A,\ell_B,\dots,\ell_J).\)
这一步已经把“加密字母是一一对应的数字置换”完整表达出来。
对每个白格坐标 \((i,j)\)(token 为 \(O\) 或字母 \(A\sim J\)),定义格子变量\(c_{i,j} \in \{1,2,\dots,9\}.\)
注意这里域是 \(1\sim 9\),原因是 Kakuro 白格不能为 \(0\)。若该白格 token 是字母 \(L\in\{A,\dots,J\}\),则表示该格的数字就是该字母的映射值,因此添加等式约束:\(c_{i,j} = \ell_L.\)
由于 \(c_{i,j}\in\{1,\dots,9\}\),这条等式会自动迫使 \(\ell_L\neq 0\)。为了让约束更显式,也可以直接补上一条:\(\ell_L \neq 0.\)
线索格中出现的字母串用于表示一个十进制数字。对一个线索代码(字符串)\(S\):
- 若 \(S\) 长度为 \(1\),即 \(S=X\),则线索值为\(\ell_X.\)并且因为线索是正整数,补充约束\(\ell_X\neq 0.\)
- 若 \(S\) 长度为 \(2\),即 \(S=XY\),则线索值为两位数\(10\ell_X+\ell_Y\)并要求十位不为 \(0\):\(\ell_X\neq 0.\)
这样就把“线索也是由字母加密得到”的规则完全落到变量与线性表达式上。
注意到,Kakuro 的结构由若干横向连续段、纵向连续段组成。每个连续段是一段连续白格,起点由线索格给出。
遍历每个格子 \((i,j)\),若 token 是括号形式 \(()\),解析其中可能包含的横向线索和纵向线索:
- 若存在横向线索 \(hS\),则从 \((i,j)\) 的右侧开始,向右扫描 \(j+1,j+2,\dots\),直到遇到非白格(\(X\) 或线索格)为止,收集这段白格对应的变量列表:\(R = [c_{i,j+1}, c_{i,j+2}, \dots, c_{i,j+r}].\)
- 若存在纵向线索 \(vS\),则从 \((i,j)\) 的下侧开始,向下扫描,得到\(R = [c_{i+1,j}, c_{i+2,j}, \dots, c_{i+d,j}].\)
这一步等价于在网格中重建 Kakuro 的连续段结构。
对每个连续段的变量列表 \(R=\{x_1,\dots,x_k\}\),其对应线索值为 \(s\),必须满足两条 Kakuro 规则:
- 和约束:\(x_1+x_2+\cdots+x_k = s.\)
- 互异约束(同一连续段内数字不重复):\(\text{AllDifferent}(x_1,\dots,x_k).\)
其中 \(s\) 是上一节定义的线性表达式(可能是 \(\ell_X\) 或 \(10\ell_X+\ell_Y\))。
至此,一个 puzzle 的所有规则都被翻译成了整数变量上的线性约束与 AllDifferent 约束。
题目保证每个谜题可解,并且字母映射唯一(或至少足以确定 \(A\sim J\) 的十位串)。
由于:
- 字母变量全排列约束保证 \(\ell_A,\dots,\ell_J\) 是 \(0..9\) 的一个置换;
- 每个白格变量限制在 \(1..9\),且字母白格与字母变量相等;
- 每个 run 同时满足“和”等式与“互异”;
因此任何可行解都严格对应一个合法的 Kakuro 填法与合法的字母到数字映射。
代码
1 | import numpy as np |