Alice: An Interpretable Neural Architecture for Generalization in Substitution Ciphers

Jeff Shen*   &   Lindsay M. Smith*
Princeton University

* Equal contribution.

We present cryptogram solving as an ideal testbed for studying neural network reasoning and generalization; models must decrypt text encoded with substitution ciphers, choosing from 26! possible mappings without explicit access to the cipher.

We develop Alice (an Architecture for Learning Interpretable Cryptogram dEcipherment): a simple encoder-only Transformer that sets a new state-of-the-art for both accuracy and speed on this decryption problem. Surprisingly, Alice generalizes to unseen ciphers after training on only \({\sim}1500\) unique ciphers, a minute fraction (\(3.7 \times 10^{-24}\)) of the possible cipher space.

To enhance interpretability, we introduce a novel bijective decoding head that explicitly models permutations via the Gumbel-Sinkhorn method, enabling direct extraction of learned cipher mappings. Through early exit and probing experiments, we reveal how Alice progressively refines its predictions in a way that appears to mirror common human strategies---early layers place greater emphasis on letter frequencies, while later layers form word-level structures.

Our architectural innovations and analysis methods are applicable beyond cryptograms and offer new insights into neural network generalization and interpretability.

What is a substitution cipher?

Substitution ciphers, or cryptograms, are classic word puzzles where each letter in the alphabet is replaced by another, and the challenge is to recover the original message. For example:


              Ciphertext:  WH NJDII FUDVQ W XGH QLZ...
              Plaintext:   IN THREE WORDS I CAN SUM...
            

Here, 'W' maps to 'I', 'H' maps to 'N', and so on. While humans solve these puzzles by identifying common letters (e.g., 'E' is the most frequent letter in English) and words (e.g., a one-letter word is probably 'A' or 'I'), teaching machines to do the same is surprisingly hard. With 26! possible mappings between letters, the search space is astronomically large; traditional algorithms can take several hours to solve a single puzzle. This, in combination with the simple structure of the problem, makes substitution ciphers an ideal testbed for studying neural network generalization and reasoning in complex domains.

Alice

We present Alice (an Architecture for Learning Interpretable Cryptogram dEcipherments), a novel neural architecture for solving substitution ciphers. Alice uses a bijective decoding strategy that enforces a one-to-one mapping between letters, mirroring the constraints of substitution ciphers. The model's architectural design allows us easily to understand the learned mappings and to visualize the model's reasoning process. Alice achieves state-of-the-art results on cryptogram decipherment, solving puzzles in an instant. Here is an example of Alice in action:

This is slowed down for visualization; real-time solving is nearly instant. This visualization is generated using our early exit technique.

You can see how Alice solves a cryptogram. The top shows the plaintext (solution) and the corresponding encrypted ciphertext, and the animation shows the successive layers of Alice iteratively refining its guess until it arrives at the correct solution; the letters in yellow are the ones that have changed from the previous layer.

Key Results

Accuracy: Alice sets a new state-of-the-art in substitution cipher decipherment, achieving the lowest error rates on short sequences—which are the hardest to solve, because there is less information—amongst all prior search-based and neural-network based approaches.

graph comparing performance against baselines Lower is better

Generalization: Despite the possible search space for each puzzle being 26! (approximately \(4 \times 10^{26}\)), Alice shows remarkable generalization capabilities. In fact, after seeing only \({\sim} 1500\) unique ciphers during training, Alice is capable of achieving near-perfect accuracy on new, unseen ciphers.

generalization accuracy graph Higher is better

Interpretability

Key recovery: Alice's bijective architecture allows us to easily interpret the learned letter mappings. In fact, we can directly extract the substitution cipher key from the model, rather than relying on indirect and sometimes unreliable methods like attention maps. For example, here is a visualization of the recovered letter mappings on the following example:


Ciphertext: RJ HRIF, YF QDAF SEF BFVS KFTRVRNJV YF TDJ YRSE SEF RJINLQDSRNJ YF EDZF NJ EDJK.
Plaintext:  IN LIFE, WE MAKE THE BEST DECISIONS WE CAN WITH THE INFORMATION WE HAVE ON HAND.
          
grid showing recovered letter mappings

Probing: We train linear probes (simple linear layers) to predict the plaintext from Alice's intermediate representations. This directly reveals the information content contained in each layer. By analyzing the similarity of n-grams between the probe outputs and the ground truth, we find that Alice appears to learn and refine its predictions in a way that mirrors human strategies.

plot showing n-gram similarity across layers On the right, what we are seeing is essentially the "focus" of each layer; in early layers, the red lines (1-grams) are dominant, indicating that the model is primarily focusing on individual letter frequencies. As we move to later layers, the blue lines (higher order n-grams) become more prominent, suggesting that the model is starting to recognize and form word-level structures.

Try it yourself!

Can you decrypt the following cryptogram?

WH NJDII FUDVQ W XGH QLZ LC IMIDONJWHP W'MI AIGDHIV GTULN AWBI: WN PUIQ UH.

Alice can help! Enter your encrypted text below and select an Alice variant to decrypt it.

Note: There may be up to 10 seconds wait time on first decryption due to infrastructure setup.
Choose between different model sizes for decryption

Citation

If you find this work useful, please consider citing it as:

@misc{Shen2025Alice,
    title={ALICE: An Interpretable Neural Architecture for Generalization in Substitution Ciphers}, 
    author={Shen, Jeff and Smith, Lindsay M.},
    year={2025},
    eprint={2509.07282},
    archivePrefix={arXiv},
    primaryClass={cs.LG},
    url={https://arxiv.org/abs/2509.07282}, 
}