This document describes the solving step of the challenge.
Lightweight analysis of “mb_crackme_2.exe”
As we would do with any real malware, we start by performing some basic information gathering on the provided executable. Even if the static and dynamic approaches gave us similar conclusions on the executable’s nature (see 2.4), the different methods have been described nonetheless in the following sections.
Basic static information gathering
Using Exeinfo PE, a maintained successor of the renowned (but outdated) PEiD software, gives us some basic information about the binary:
- The program is a 32 bits Portable Executable (PE), meant to be run in console (no GUI);
- It seems to be compiled from C++ using Microsoft Visual C++ 8;
- No obvious sign of packing is detected by the tool.
Output of Exeinfo PE
Looking for printable strings in the binary already gives us some hints about the executable’s nature:
Many references to Python libraries, PYZ archives and “pyi” substring indicates the use of the PyInstaller utility to build a PE executable from a Python script.
Basic dynamic information gathering
Running the executable (in a sandboxed environment) gives us the following message:
A temporary directory named “_MEI5282” is created under user’s “%temp%” directory, and filled with Python-related resources. In particular, “python27.dll” and “*.pyd” libraries are written and later loaded by the executable.
This behavior is typical of executables generated by PyInstaller.
This allows us to identify the presence of a Python program embedded inside the executable and gives us the name of the main script: another.py. The error message “[$PID] Failed to execute script $scriptName” is typical of PyInstaller-produced programs.
Python files extraction and decompilation
The PyInstaller Extractor program can be used to extract python-compiled resources from the executable.
Restoring the file’s signature produces a correct Python bytecode file.
Stage 1: login
Three user inputs are successively checked: the user’s login, password and PIN code.
Finding the login
We have found the login, let’s search for the password.
Finding the password
The check_password() function hashes user’s input using the MD5 hash function, and compares the result with an hardcoded string:
A quick Internet search of this string gives us the corresponding cleartext password: Password123.
Finding the PIN code
The PIN code is read from standard input, converted into an integer (cf. stage1_login() function), and passed to the get_url_key() function:
This function derives a pseudo-random 32 digits key from the PIN code, using it as a seed for Python’s PRNG. The generated key is then verified using the check_key() function, where its MD5 sum is checked against another hardcoded value.
The key space is obviously too large to be brute-forced, as a 32-digits string corresponds to 10^32 (~2^106) possible combinations. However, we can brute-force the PIN code, being an integer, using the following code:
The solution is obtained in a few milliseconds:
Using the credentials found in the previous step completes the first stage of the challenge.
Clicking “Yes” makes the executable pause after printing the following message in the console:
Let’s find that secret console!
Stage 2: the secret console
Payload download and decoding
Continuing our analysis of the main() function, the next function to be called after credentials verification is decode_and_fetch_url(), with the previously calculated 32-digits key given as argument:
A URL is decrypted using an AES cipher and the 32-digits key. The resource at this URL is then downloaded and its content returned by the function.
To get the decrypted URL, we simply add some logging instructions to the original code of another.py, which can be run independently of mb_crackme_2.exe (given that the required dependencies are present on our machine).
The result execution is the following:
The decrypted URL hosts the PNG image displayed bellow:
The “malware” then reads the Red, Green and Blue components of each of the image’s pixels, interprets them as bytes and constructs a buffer from their concatenation.
This technique is sometimes used by real malware to download malicious code without raising suspicion of traffic-analysis tools, hiding the real nature of the downloaded resource.
Using the “Extract data…” function of the Stegsolve tool allows to quickly preview the data encoded in the image, which appears to be a PE file (and more specifically, a DLL):
The function is_valid_payl() is then used to check whether the decoded payload is correct:
The 23117 and 17744 constants represent the “MZ” and “PE” magic bytes present in the headers of a PE.
The decoded file is then passed to the load_level2() function, which is a wrapper around prepare_stage().
This function starts by allocating enough space to store the downloaded code, using the VirtualAlloc API function call. The allocated space is readable, writable and executable, as the provided arguments reveal (12288 being equal to “MEM_COMMIT | MEM_RESERVE”, and 64 to PAGE_EXECUTE_READWRITE).
The downloaded code is then written in the allocated space using the memmove function, and executed like a shellcode from offset 2.
To get a clean dump of the downloaded code (once decrypted), we add a piece of code in the prepare_stage() function, as follows:
After re-executing the program, we observe that the obtained file is indeed a valid 32 bits Windows DLL:
Time for us to open our favorite disassembler !
Downloaded DLL’s reverse-engineering
From the offset 2 of the file, a little shellcode located in the DOS headers transfers the execution to another code that implements Reflective DLL injection. This technique is used to load the library itself from memory, instead of normally loading the DLL from disk using the LoadLibrary API call.
The reflective loader’s code, located at 0x6E0, is documented in Stephen Fewer’s GitHub and will not be described in this write-up. Since, in the end, the library is loaded by this mechanism as it would be after a normal LoadLibrary call, this downloaded file will be analyzed like a standard DLL in the rest of this write-up.
The list of exported functions being empty (except for the DllEntryPoint function), we start our analysis at the entry point of the DLL.
Our first goal is to search for the DllMain() function from the entry point. If the reverser is not used to analyzing Windows DLLs, a simple way to start would be to open any random non-stripped 32bit DLL, which (with a little luck) would be compiled with the same compiler (Visual C++ ~7.10 here), and which would have a similar CFG structure for the DllEntryPoint function.
An example of CFG comparisons between the analyzed DLL (left) and another non-stripped 32bit DLL (right) is presented below:
This technique allows us to quickly find the DllMain function in our DLL, here being located at 0x10001170.
The function starts by checking if it has been called during the first load of the DLL by a process, by comparing the value of the fdwReason argument against the DLL_PROCESS_ATTACH constant.
The DllMain() function then registers two exception handlers using the AddVectoredExceptionHandler API call. The handlers are named “Handler_0” and “Handler_1” in the screenshot below:
An exception is then manually raised using the “int 3” interruption instruction, triggering the execution of Handler_0.
Interlude: debugging a DLL in IDA Pro
To make the reverse-engineering of some functions easier, debugging the code to observe function inputs and outputs can be an effective method.
One simple way to debug a DLL inside IDA is to load the file as usual, then go to “Debugger ->Process options…” and modify the following value:
- On a 64 bits version of Windows:
- “C:\Windows\SysWOW64\rundll32.exe” to debug a 32 bits library
- “C:\Windows\System32\rundll32.exe” to debug a 64 bits library
- On a 32 bits version of Windows:
- “C:\Windows\System32\rundll32.exe” to debug a 32 bits library
- Obviously, you cannot run (therefore debug) a 64 bits library on a 32 bits version of Windows
- On a 64 bits version of Windows:
- “PATH_OF_YOUR_DLL”,functionToCall [function parameters if any]
Note: The file extension must be “*.dll” for rundll32.exe to accept it.
To test the configuration, just place a breakpoint at the entry point of the DLL:
Run your debugger (F9). If configured correctly, your debugger should break at the DLL entry point, allowing you to debug any DLL function
Looking at Handler_0’s CFG (given below), we see that the function calls two unknown functions (0x100092C0 and 0x1000E61D). To quickly identify these functions, let’s debug the DLL, and look at the functions inputs/outputs:
The function seems to take 3 arguments:
- A buffer (here named “Value”);
- A value (here 0);
- The size of the buffer (here 0x104).
Let’s look at the buffer’s content before and after the function call:
The function prototype and its side effects correspond to the memset function.
The function seems to take 4 arguments:
- An integer (here the PID of the process);
- A buffer (here named “Value”);
- The size of the buffer (here 0x104);
- A value (here 0xA, or 10).
Looking at the provided buffer’s content after the function call, we see that the representation in base 10 of the first integer passed in parameter is written in the provided buffer.
The function prototype and its side effects correspond to the _itoa_s function .
Handler_0 whole CFG and pseudo-code
Here is the graph of the Handler_0 function:
This corresponds to the following pseudo code:
The function checks the presence of the python27.dll library (normally loaded by the main program mb_crackme_2.exe) in the process address space, and sets the “mb_chall” environment variable consequently.
This may be seen as an “anti-debug” trick, because running the DLL independently in a debugger makes the execution follow a different path.
The code of this handler is quite self-explanatory, being similar to the previous handler’s code:
Once again, this corresponds to the following pseudo code:
After this handler, execution restarts at the address of original interruption (“int 3”) +1 or +6 (as presented in the pseudo-code above), whether performed checks pass or not.
We thus continue the analysis at the not_fail function (0x100010D0).
The function only starts a thread and waits for it to terminate.
The created thread executes the MainThread (0x10001110) function, where our analysis continues.
The function loops and calls the EnumWindows API every second, which in turn calls the provided callback function (EnumWindowsCallback) on every window present on the desktop.
EnumWindowsCallback function (0x10005750)
The function, called on each window, uses the SendMessageA API with the WM_GETTEXT message to retrieve the window’s title.
After being converted to C++ std::string, the substrings “Notepad” and “secret_console” are searched in the window’s title.
If both substrings are present, the window’s title is replaced by the hardcoded string “Secret Console is waiting for the commands…”, using the SendMessageA API along with the WM_SETTEXT message. The window is placed to the foreground, using the ShowWindow API call.
Modification of the window’s title using SendMessageA()
The PID of the process corresponding to the window is then written in the “malware”’s console, and sub-windows of this window are enumerated, using the EnumChildWindows API.The function EnumChildWindowsCallback (0x100034C0) is thus called on every sub-window.
EnumChildWindowsCallback function (0x100034C0)
This function gets the content of the sub-window using the SendMessageA API call:
The substring “dump_the_key” is then searched in the retrieved content:
If this string is found, this function calls a decryption routine decrypt_buffer() (0x100016F0) on a buffer (encrypted_buff), using the string “dump_the_key” as argument.
Then, the “malware” loads the actxprxy.dll library into the process memory space. The first 4096 bytes (i.e. the first memory page) of the library is made writable using the VirtualProtect API call, and the decrypted payload is written at this location.
Since the actxprxy.dll library is not used anywhere in the analyzed DLL after being re-written, it may be seen as a covert communication channel between the analyzed DLL and the main program mb_crackme_2.exe. After this, the function clears every allocated memory and exits. The created thread (see 4.2.6) therefore also exits, and the DllEntryPoint function call terminates, giving the control back to the main python script.
Triggering the secret console
As seen in the DLL analysis, to trigger the required conditions, a file named “secret_console – Notepad” is opened in a text editor. As such, the window title contains the mentioned substrings:
As expected, the title of the window is changed to “Secret Console is waiting for the commands…” by the malware. Writing “dump_the_key” in the window validates the second stage.
Stage 3: the colors
After validating the previous step, a message is printed on the console, asking the user to “guess a color”:
The three components (R, G and B) of a specific color, with values going from 0 to 255, need to be entered to validate this step.
Understanding the code
Looking back at the another.py’s main() function code, it seems that the corresponding operations are performed inside the decode_pasted() function.
According to the decode_pasted() function, the decrypted buffer stored at the start of actxprxy.dll’s address space is read and:
- XOR’ed against the user-provided colors values;
- Executed by the Python exec function.
To start our cryptanalysis, we modify the decode_pasted() function to dump the val_arr buffer before the dexor_data() operation, and rerun another.py, providing all required credentials:
Decrypting the val_arr buffer
Knowing that the buffer is a string passed to the “exec” Python statement after being decrypted, it should represent a valid Python source code.
To find the right key, the naïve solution would be to run a brute-force attack on all the possible “(R, G, B)” combinations, and look for printable solutions. This solution would need to perform 256^3 = 16’777’216 dexor_data() calls, which is feasible but inefficient.
Instead, we perform 3 independent brute-force attacks on each R, G and B component, therefore performing 256 x 3 = 768 dexor_data() calls. The 3 brute-force attacks are performed on different “slices” of the val_arr string (of each of stride 3). We then test each combination of potential values previously found for each component.
For example, if our 3 brute-force attacks indicate that:
- R can take values 2 and 37,
- G can take values 77 and 78,
- and B can only take the value 3,
Then we test the combinations (2,77, 3), (37,77, 3), (2,78, 3) and (37,78, 3).
The following code implements our attack:
Executing this code gives us the solution instantly:
The final flag appears in the console:
This challenge was very interesting to solve, because apart from being an original crackme, it also included various topics that could be found during a real malware analysis. These topics included:
- DLL-rewriting techniques, here used as a kind of covert communication channel between a DLL and its main process;
- “Non-obvious” anti-debugging tricks, like checking the presence of a known library in the process’ memory space to identify standalone DLL debugging;
- Concealed malware downloading, using « harmless » formats (like PNG) to hide an executable payload from basic traffic analysis;
- PyInstaller-based malware, (yes, sometimes malware writers can be lazy).
Thanks MalwareBytes for this entertaining challenge!