Simplicity can be powerful when operating in the murky world of the digitally complex. I know the sinking feeling of waging war against state-of-the-art endpoint defenses, only to get crushed at your most hopeful. You have probably spent days wrapping your head around concepts like API unhooking, found and edited PoC code to mount an attack, then only to be crushed by the swift response of multi-tiered machine learning. Well, todays story is different! Today, I will tell you of a battle won with clever simplicity, waged against the multi-headed beast Windows Defender ATP and all its machine learning minions. This battle was not won by replacing jump points in EDR occupied memory space, it was won by probing for weakness and making calculated strikes against the ML models.
“So in war, the way is to avoid what is strong, and strike at what is weak.” ― Sun Tzu, The Art of War
Defender ATP (like other EDRs) employs a multi-tier approach to threat protection. It has hooks and sensors within the operating system, vast amounts of compute power and elaborate machine learning models that are continually classifying known good, from known bad. Yet, standing in opposition to this almost unsurmountable defense, there is a chink in the armor that is revealed to the astute observer. Like many complex systems, their complexity is both the root of their power and also their Achilles heel. Defender uses machine learning models in conjunction with static and dynamic analysis methods to classify bad things from good things. But how does it make these decisions? Well, that I’m sure is a very closely guarded secret by Microsoft, but we can make educated assumption’s by utilizing trial and error and interpreting the results. Machine learning models can be trained by using something called features. In this case, a feature could be an API call made in a malicious context, different offensive behaviors exhibited, or really any other indicator Microsoft feels is relevant. After the model is exposed to thousands, or even hundreds of thousands of features, it is ready to make predictions based on the features it has seen. Now, there are models that produce black and white answers (this file is good / bad) and others that provide an estimation. For this reason (to avoid many false positives), a threshold has to be established before Defender will block the file in question. The threshold Microsoft employs is a confidence score of 90% or greater that the thing in question, is malicious. It is in this area of complex decision making that we can manipulate the features being observed in combination and reduce the model’s confidence score to under 90%. If we do that, our malicious activity will not get blocked by endpoint defenses and we can proceed to inflict havoc as we see fit.
“In order to avoid false positives, cloud protection service is configured by default to require at least 90% probability to block the malware ”
“For the fastest classifiers in our layered stack, the features may include static attributes of the file combined with events (for example, API calls or behaviors) seen while the scanning engine emulates the file using dynamic translation. ”
Where the Hooks At?
I decided to revisit some process hollowing code in C# which now is getting consistently busted by Defender ATP. My intentions were to employ a much more complex attack vector which included switching out some of the API calls and utilizing unhooking techniques that we are seeing trend right now. Before jumping into all that, I started to review some of the circumstances surrounding the immediate quarantine of my injector. I knew that the standard API calls I was using for process hollowing would surely be raising some alarms (SuspendThread, NtUnmapViewOfSection, VirtualAllocEx, CreateRemoteThread, WriteProcessMemory). Looking back at the EDR history I saw definite evidence of that – a detection on NtAllocateVirtualMemory API call which is VirtualAllocEx’s lower-level cousin. After reading the article in which I took the above excerpts, I wondered if the combination of certain processes being created in conjunction with certain other API calls (like VirtualAllocEx) were what was pushing the model to a 90% confidence score. I decided to start to play with process selection and leave the current API calls intact – for now.
So Many Process Injections
While rifling through the system32 folder for processes to create in a suspended state, I soon made the discovery that selecting specific processes was enough to reduce the model’s confidence and let me inject with immunity. I actually got ridiculous with this; injecting 30 different processes with the exact same API calls…including VirtualAllocEx! The realization was clear, as long as I chose processes that the model must not have seen features for, I wouldn’t need to edit the C# injector code at all. Defenders ML model must put a high degree of emphasis on certain processes that are created in conjunction with certain API calls. As long as I avoided the processes it squawked on, I would run undetected – simple and clean. I decided to run through almost the entire system32 folder just to document my findings, as about half of the processes I tried did trip defenses. In the end, I had 30 different processes I could safely inject into, all with varying degrees of Opsec from human hunt teams. There were some other considerations I had alongside of Opsec.
1. Could I spawn a process without any arguments needed?
2. Will the process close immediately after executing?
3. Is there a visible window a user can see and(or) close?
4. What background processes could I use?
5. Should this process be connecting to the internet?
Below, a video of my 30 processes running in all their glory as Covenant C2 Grunts.