It is not clear whether you have full (or any) separation between your training and test sets when you re-order the rules. (You do say that you have such separation for your "UnLock" test, but that's another one.) In other words, the improvement from "Original Single Rules" to "Edited Single Version 2" that you've demonstrated might be partially attributable to you training (re-ordering) the rules on the same set that you later test them on.
It's a valid question and it's something I've worried about myself. Referring back to my original post:
For the target set, the RockYou list seemed like an obvious choice. I actually used a subset of the RockYou list of one million passwords I designated for training purposes, (that way I'm not testing/training against the same passwords).
I should have written more about my methodology. Basically the RockYou set represents a huge number of passwords, (over 32 million of them). One of my concerns though has always been over-training my password cracking techniques. As Solar pointed out, you can easily create a highly optimized set of rules that crack one set of passwords extremely well, but performs poorly against everything else. To avoid that, and to get the most use out of the RockYou list, I decided to follow typical machine learning practices and split the list up into one million chunks of passwords. To ensure the same password doesn't end up in different lists, I first randomized the full list by using the GNU shuf tool, and then divided the list into 32 sub-lists containing one million passwords each. I refer to these as the Rockyou1-32 lists. So far I have designated five the sub-lists as test lists, RockYou_test1-5, and five of the sub-lists as training lists, Rockyou_training28-32. I've been assigning them at the end of the spectrum and moving my way to the middle so I don't get confused which one I used for training vs. testing. The remaining lists I'm saving for future tests, so that way I don't "taint" them with my experiments.
Still, since some users had multiple accounts on RockYou due to the way it was set up, it's highly possible, (if not almost certain in some cases), that different passwords from the same user might appear in both a training and a test set. That's also why I would love to get my hands on the original list that included usernames so I can make sure that this doesn't happen. Since there is bound to be some "cross-pollination" though, it's a very valid concern that tests trained on one of the RockYou training sets, and tested against one of the RockYou test sets contain unfair knowledge and don't accurately represent real password cracking attacks.
All the experiments I've run so far have indicated that the above isn't a major problem, but rather than take my word for it, Fig 4.1 shows another round of tests comparing the original single rule-set vs. the reordered rule-set I released the other night. The input dictionary remains the same, (dic-0294), but this time they are both attacking the phpbb.com list.
In the above test, the rearranged version on the Single rule-set performed better, but this time to a much lesser degree. I then ran the both of the attacks against 33k passwords from the MySpace password testing list, (considering the MySpace list was the first set of real passwords I collected, it also was split into a training and test list each containing around 33K passwords). The results of those cracking sessions can be seen in Fig 4.2.
This time the results are much more dramatic over the first 500 million guesses, with the modified ruleset performing significantly better. So now we've seen against three different sets of English website passwords, (the RockYou_test1 list, the PhpBB.com list, and the MySpace test list), that the re-ordering of the single mode mangling rules cracked as many, if not more passwords over the first 500 million guesses compared to the original rule-set.
If you are interested in downloading the modified rule-set, you can still grab it here. To use it, just enter the command line option "rules=Modified_Single".