Have you ever cursed your Anti-Virus Software for being ineffective or felt the anti-virus wasn’t worth the money or the hype for all the protection it supposedly offered? Most of us have. That’s actually one of the reasons that led me to study topics on viruses and malware. I then dreamt of finding a golden solution to the problem (to be honest, still do at times). As I came across the works of Frederick B. Cohen, especially a demonstration of his in 1987, the words, “Identification and Classification of malware is a continuous ongoing battle”, became clearer to me. Frederick B. Cohen simply proved that there exist certain types of malware (specifically viruses) whose identification is a NP-Complete problem unless certain services are hindered. Let’s learn a bit about certain words in the previous statement mean before, putting it all together to understand it.
Malware Class: Virus
A computer virus is defined as a program that, when executed, replicates itself by inserting copies of itself (possibly modified) into other parts of the hard drive, from executables to files, thus “infecting” them. Most people associate a virus with programs that destroy or corrupt files, log keystrokes etc, but it is not so, the defining characteristic of a virus is that they are programs that replicate themselves without the user’s explicit consent. The malware that one usually comes across, do not belong to one particular family of malware, but are a combination of a few types.
Malware are rarely shipped about in their base executable version since they can be easily detected by statically analysing their binary code. Encryption is one of the most common methods to hide code. Hence malware use one of the three following techniques to hide their raw binary format from anti-virus software:
Polymorphic code is code is code whose encryptor and decryptor change every generation, hence producing different code every generation. Oligomorphic Code is code whose encryptor/decryptor consists of a fixed number of parts, which are randomly selected from a set of such blocks. But obviously, the total number of permutations of the encryptor is countable. The last, metamorphic code is the most amazing, and the hardest to detect. Metamorphic code is code that actually morphs itself. It does this by reordering non sequential pieces of code, by introducing JMP statements, adding additional meaningless statements such as NOPs, changing the registers used, substituting instructions with equivalent alternatives etc.
P and NP class of problems
P(polynomial) and NP(non-deterministic polynomial) basically refer to the time it takes to solve a problem. So if there exists an algorithm that solves a given problem in time in the order of some power of the input size deterministically (meaning running the same input again WILL give the exact same result), then such problems are in the P class of problems. NP decision problems are problems that can verify a given solution in polynomial time non-deterministically. There are 2 things to notice in the previous sentence, the first is “verify” which means that we do not have any information relating to the identification of a solution in polynomial time. The second, more important thing is the word “non-deterministically”, which the result we get for a given input is not uniform. NP-hard problems which, in polynomial time can be converted into another NP-Hard problem. It is suspected (though not decisively proved) that NP-Hard problems cannot be solved in polynomial time. As is shown in the above figure, problems that are both NP and NP-hard are called NP-complete. Hence NP-Complete problems are those problems for which a solution can be verified quickly in polynomial time, there is no known efficient way to solve the problem in the first place.
Putting together all that we’ve learnt above, what Fred Cohen proved was that, assuming the services to the user remain unhindered (for example, access to the net or file system is not halted during identification), identification of a file as malicious or not (assuming the characteristics of the malware is consistent with certain conditions), is not possible in reasonable time. And since, no user will accept or use an anti-virus software that hangs the computer for hours or days, the idea of a single solution is still a bit far off. Hence the protection of computing devices from malware, is a constant battle between anti-virus researchers and malware coders. Although it seems like it’s been the malware coders who’ve been setting the pace, it’s pretty easy to understand why, they have the anti-virus tools at their disposal, making it just a function of time, before they test their solutions to come up with a new class of un-detectable malware.