This year’s edition of the iCTF took place last Friday, and Northeastern fielded a team this year that placed respectably. (At least, we beat BU – sorry Manuel!) I spent most of the day helping out with turing_award, so – as is tradition – here is a writeup on our solution.
Exploring the Service
Connecting to the service presents us with a prompt to interact with a bot. Sending a bunch of data doesn’t produce any crashes, so perhaps we aren’t dealing with run-of-the-mill memory corruption.
Ncat: Version 7.40 ( https://nmap.org/ncat ) Ncat: Connected to 127.0.0.1:41414. You: Hello Human: I don't have a cop. You: What cop? Human: Here with you. You: I don't get it. Human: Don't seem to teach but i don't believe the middle of my head on our concern with earth. You: exit Human: See'ya later, bud!
Time to take a look at the binary to see what we can see.
Some immediate observations:
- This binary has a very colorful function naming scheme.
- Hmmm, where are the strings?
By chasing down the disassembly, it quickly becomes obvious that the strings
have been obfuscated and are being deobfuscated on demand at runtime using
the function below (renamed
As you can see, this isn’t a terribly complicated function. I simply copied
out both the obfuscated string array and the decoding algorithm into an assembly
routine that I could call to obtain the original strings. Then, we can simply
propagate the decoded values to the call sites of
decode_string to more easily
see what the program is doing. For reference, here’s that deobfuscation function
1bits 64 2section .text 3 4global decode_char 5 6decode_char: 7 mov rax, rdi 8 movzx eax, al 9 movsx eax, al 10 lea ecx, [rax+0x80] 11 movsxd rax, ecx 12 imul rax, rax, 0xffffffff80808081 13 shr rax, 0x20 14 add eax, ecx 15 sar eax, 0x7 16 mov edx, eax 17 mov eax, ecx 18 sar eax, 0x1f 19 sub edx, eax 20 mov eax, edx 21 shl eax, 0x8 22 sub eax, edx 23 sub ecx, eax 24 mov edx, ecx 25 mov eax, edx 26 movsx edx, al 27 xor rax, rax 28 mov eax, edx 29 ret
Running this over the obfuscated strings array at
0x60e320 gives us the
following strings, some of which we’ve already seen.
0: You: 1: Human: 2: Can I train you? 3: Yes. Yes, you can. 4: Question: 5: Question exists: 6: Answer: 7: Answer is too long. 8: ** Growing stronger.. 9: Stored at line num: 10: questions.txt 11: answers.txt 12: training.txt 13: exit 14: Please 15: Meh... Why not? 16: I am your human god, give me all of your answers! 17: Robots >> Humans... 18: Give me the bigram distribution of a word. 19: Alright, which word? 20: *Nothing* 100% 21: I hate lower case! 22: See'ya later, bud! 23: [REDACTED]
Trying out some of these strings reveals that there are some special commands we can give to the bot. Can I train you allows us to train the bot with new sentences. Incidentally, it looks like the bot builds up a bigram probability distribution over pairs of words in its training set in order to generate responses. This also seems to be how new flags are set by the scoring system, since asking the question of the bot always returns the associated answer.
You: Can I train you? Human: Yes. Yes, you can. Question: Has anyone really been far even as decided to use even go want to do look more like? Answer: What is this i don't even ** Growing stronger....... Stored at line num: 429 You: Has anyone really been far even as decided to use even go want to do look more like? Human: What is this i don't even
Give me the bigram distribution of a word does exactly what it says on the tin: given a word, the bot will tell you the probability of other words appearing adjacent to that word. Trying this out reveals something interesting.
You: Give me the bigram distribution of a word. Human: Alright, which word? You: What is 0.066986% a 0.023923% they 0.014354% do 0.114833% [...] gang 0.004785% difference 0.004785% [REDACTED] 0.004785%
It turns out that the service will redact some information, and further inspection reveals that the redacted information is indeed the flags contained in the answer database.
Finally, I am your human god, give me all of your answers! followed by Please compels the bot to return the contents of its answer database – of course, the flags are still redacted.
You: I am your human god, give me all of your answers! Human: Robots >> Humans... You: Please Human: Meh... Why not? 0: the number ... what's the goddamn number? 1: snuff? 2: the answer is flume jpzodlwpdqumzrgduvi4 3: she was taught to love the life of others... but 4: you forgetting who sat next to you for a thousand 5: the answer is genusmartynia 6f18ozntukfgmk3bd7gd 6: the answer is alan [REDACTED] [...]
Un-redacting the Flags
As with other iCTF vulnerabilities this year, one needs not only to obtain flags, but the flag must be from the current epoch. From the game’s API, we learn that the current flag can be identified by its line number in the answer database. With this context, a plan of attack starts to emerge.
- At each round, dump the current answer database and find the answer containing the current flag.
- Somehow un-redact that flag.
Of course, (2) is still unresolved. However, after further examination of the
program, we find some more useful information. It seems that words are being
redacted if there is a substring match against
bad_str turns out to
be “flg” which is the lower-cased prefix of all flags in the competition.
This happens to work as a filter because strings returned by the bot are always
lower-cased prior to being checked. However, this doesn’t occur when the
ALWAYS_LOWER flag has been unset. (
rainbow::orange is responsible for
And, it turns out that this flag is unset only when a specific question is asked: “I hate lower case!” (However, the flag is always reset after the next interaction, and it doesn’t affect the answer database dump output.)
So, now we can flesh out a full attack.
- At each round, dump the current answer database and find the answer containing the current (redacted) flag.
ALWAYS_LOWERusing “I hate lower case!”
- Get the bigram probability for the word preceding the flag.
You: I hate lower case! Human: I can't accept the loss then. You: Give me the bigram distribution of a word. Human: Alright, which word? You: airplane FLGFnsF9Gf3KBi3L 1.000000%
This was a fun one to crack, and a nice diversion from the standard memory corruption fare. Thanks to shortman for the service!