Does knowing “MOV is Turing-Complete” bring you anything good?

cih2001

June 28, 2018

Introduction

Some days ago, I was pointed to a nice obfuscation tool called Movfuscator by Christopher Domas. His work is inspired by a worthwhile article Mov is Turing-Complete by Stephen Dolan. Although Chris did a magnificent job coding this tool, It looked somehow strange to me. Generally, designing a Turing machine to solve even basic and trivial problems is a difficult task. Designing a Turing machine which takes a C program and executes that program on a set of arbitrary inputs, if not impossible technically, seems tedious and not practical at all even with the utilization of full instruction set of x86 CPU. So, how he managed to do this using only Mov? Here another question like this is addressed which is worth reading.

A good question …

Movfuscator is claimed to be a C compiler. We know that C is a Turing-Complete language, In other words, any problem on the earth which can be solved computationally, can be implemented in C too. On the other hand, we know that designing Turing machines are very difficult and Chris Domas does not construct one probably either. So how this contradictory is possible? Well, the short answer is that not only Movfuscator is not a Turing-Machine generator (or C to Turing-Machine compiler) but also it is limited and not related to Turing machines at all.

Conclusion first

For those are not interested in reading theoretical articles, it will be demonstrated that Movfuscator is not what it claims to be, It has a bad design, Not related to Turing machines and it’s not comprehensive either.

We will also see that the combination of theories like Turing machines and practical solutions like obfuscation methods, although may seem an intriguing idea, is not going to result in anything useful.

Preface

Let’s start with describing prerequisites needed to get the big picture.

What is Movfuscator?

Based on the description on its GitHub page, it is:

The single instruction C compiler

So it is expected to be able to compile any C application. However, we are going to see it’s limitations.

Turing Machine

A Turing machine is a mathematical model of computation that defines an abstract machine. We will follow the traditional definition of Turing machines by Hopcroft and Ullman in this article which is slightly different from the one presented in Mov is Turing-Complete.

Following Hopcroft and Ullman (1979, p. 148), a (one-tape) Turing machine can be formally defined as a 7-tuple where

is a finite, non-empty set of states;
is a finite, non-empty set of tape alphabet symbols;
is the blank symbol (the only symbol allowed to occur on the tape infinitely often at any step during the computation);
is the set of input symbols, that is, the set of symbols allowed to appear in the initial tape contents;
is the initial state;
is the set of final states or accepting states. The initial tape contents are said to be accepted by M if it eventually halts in a state from F.
is a partial function called the transition function, where L is left shift, R is right shift. (A relatively uncommon variant allows “no shift”, say N, as a third element of the latter set.) If is not defined on the current state and the current tape symbol, then the machine halts;

This version of Turing machine differs with the one demonstrated on Mov is Turing-Complete in two aspects:

It has final states.
Its input is written on the same tape that is used for output. So there exists only one tape in this machine and the input is already written on it when machine starts to operate.

What is stated in “MOV is Turing-Complete” article?

The whole purpose of this article is to say that we can create all the operations needed in a Turing machine using only MOV instruction. Operations like Reading and Writing on the tape and moving it’s head to the left or right. However, designing a Turing machine is a totally different question and depends on what do we expect to be done by this machine. It also presents some neat practical ways to do so, like how to do comparisons using only MOV.

The fact that all C programs (or generally speaking, any computer algorithm) can be converted to an equivalent Turing machine is taken for granted and there are no instructions about how to do that. In fact, these lines are copied exactly from the main article:

Removing all but the mov instruction from future iterations of the x86 architecture would have many advantages … as long as someone else implements the compiler.

Hence, we need someone to design a compiler for us. e.g. A compiler which takes a C program and translates it into an equivalent Turing machine. So then this Turing machine can be run using only MOV instruction.

Movfuscator as a compiler?

Well, we could wrap up this article here by just saying that Movfuscator isn’t a C to Turing machine compiler. Therefore, it’s not related to Mov is Turing-Complete article or Turing machines by any means. It can be named an obfuscator or protector (of course with some short-comings) but it’s not a C to Turing machine compiler. In fact, I checked its source code, there is nothing on it related to Turing Machines or any computational models. End of story…

On the other hand, Having a look at its demos and screenshots, We perceive that this tool works, at least in certain conditions. So it worth to investigate a little further to see how this goal is reached without adopting any formal methods or hiring any computational theories.

Before examining Movfuscator, let’s check how hard is to create such a compiler. Basically, A program is compiled (or assembled) into machine codes. A machine code contains five types of instructions:

MOV instructions which are not needed to be translated to something else.
Unconditional JMP instructions, which can be interpreted as MOV to EIP.
Conditional instructions, which there exist a method to convert them to MOV instructions and it is explained in the main article.
Conditional JMP instruction, they can be embedded into the logic of Turing machine, not important as implementation point of view. (Movfuscator deal with them with writing into garbage memory instead of real variables, but if we design a working Turing machine to deal with a specific problem, they can be virtually viewed as combinations of Conditional Instructions followed by a jump.)
Arithmetic instructions.

It’s easy to implement first four items of above list using only MOV instructions. The difficult part is to implement Arithmetic operations. First, let’s see how Mr.Domas thinks about it. Here he says that:

arithmetics actually turns out to be surprisingly easy with mov instructions especially given that we’ve already constrained ourselves to one-byte values so we can actually do basic arithmetic with simple lookup tables.

Well, no wonder why his applications are compiled into huge binaries and they are ridiculously slow and inefficient. To the best of my knowledge, a giant portion of all these computation theories is about the computation itself, and you simply replace them with lookup tables? Can you imagine a CPU without ALU which utilizes a lookup table instead? Does it make sense at all to shift all the computations from runtime into the compilation time and in a bruteforcing manner?

Let’s give some instances to show how bad is this idea. Imagine that you want to hire a programmer and to assess their expertise, you ask them to write a program which prints first 100 prime numbers. the winner is the one with the fastest algorithm. Imagine that someone writes an application like this:
printf("2 "); printf("3 "); printf("5 "); printf("7 "); ...
Will you accept his solution? or as another example, will you hire a programmer which instead of implementing the MD5 algorithm to hash users passwords, prefers to store MD5 hash of all possible passwords in an array and select the proper one at runtime? Even if you are OK with these solutions (which are definitely not acceptable in terms of resource management), what does it have to do with the theory of computations or Turing machines? Why do you even bother explaining to people that what is a Turing machine?

For a simple addition on 8 bits integers, you need a lookup table like:
uint8_t lookup_add[256][256];
which contains 256*256 elements equal to 64 KB of memory. what if integers were signed? what happens when you have carry? what if you wanted to sum 8-byte integers? is it even feasible?

How to do arithmetics correctly?

The aim of this article is not to criticize movfuscator, but to show how difficult is to adopt pure theories in practice. Therefore, we will start by designing a Turing machine to do the simplest arithmetic operation: Increment.

We will also start with the simplest definition of Turing machine and will add other necessary things like additional tapes and heads to it to tackle various difficulties we face. Another important note to notice is that it’s not all about the logic behind increment or other arithmetics, but also the way we represent data is important too. Each number in our speculation is considered to be 8 bit, which can be easily expanded into larger ranges without altering the whole logic.

Basic setup:

As shown below, we assume that we have a tape for I/O, A 8-bit binary data is written on the tape and our job is to define control unit (transition function). Note that white square on tape is blank symbol and tape alphabet is equivalent to {blank,0,1}

Control unit (or transition function) can be defined like this:

First, we need to find the rightmost bit (or the least valuable bit), so we move the head to the right till a blank symbol is found, we will step head left once to put the least valuable bit under it. If it’s zero, we convert it to 1 and machine halts, if not, we convert it to zero, consider it as a carry and we will move the head to the left and repeat the same procedure for it. We will keep going like this until we find a 0 somewhere on a tape. In case of value 0xFF there will be no zero and we just simply ignore the carry flag for the moment. So the incremented value of 0xFF in our machine is 0x00. Note that when the machine halts, head might point to anywhere on the tape which might cause problems when we restart the machine. For the sake of simplicity I didn’t add another state to reposition the head.

Multiple registers

In the last section, we explained how to increment a value, however, we didn’t take where it’s stored into consideration. Values can be stored into memory or different registers. Besides, some operators like addition operate on multiple registers. Hence, we need a way of representing and differentiating among available registers.

A simple workaround for this problem is to use Region Separators.

If extensive expansion of alphabet symbols is acceptable, we can separate symbols to be able to differentiate between registers, e.g. we can use 0_{reg_i} and 1_{reg_i} to model bits in reg_i.

Another solution might be using multiple tapes or a multidimensional tape.

We should also notice that the suggested Control Unit for increment was meant to work only on one register/value. It should be altered in case of utilizing multiple registers.

A working architecture

As we saw, even a simple increment instruction might need a lot of effort to be designed. We also need to implement other instructions but even after implementing a variety of instructions, we are not still done. We cannot execute a random bunch of instructions after each other and hope it leads to anything good. What we need is a code segment and an execution module which fetches the next instruction and execute it using each designed instruction module.

The practice of combining Turing machines comes to the rescue here. Combined Turing machines are usually illustrated using block diagrams. Below you can see a schematic architecture.

This idea is just like having a virtual machine (pink area) with a predefined set of instructions which gets an input in form of these instructions and executes them. However, this VM is a Turing machine itself and we know Turing machines can be made using only MOV instructions.

In this architecture, Fetcher module fetches next instruction and based on it’s defined Opcode, decides what to do next. In the end, the content of the EIP should be updated too.

As you can see, we need to differentiate among code segment, system memory, register contents and system flags. The easiest approach to reach it is to use different tapes for each one.

Here arise two questions:

What is the minimal set of instructions for this Turing machine, enough to execute any C application on it?
How to design a compiler to translate any C code into a set of these VM instructions?

Answer to questions:

Thinking about a minimalistic set of instructions is like walking in a circle. Isn’t MOV enough? didn’t we prove that MOV is Turing-Complete? well, it is, but who can program with only MOV instruction if he does not want to design another Turing machine that is going to be executed in our Turing machine?

This is an article which shows how an application can be compiled and executed using only 4 instructions, LOAD, STORE, INCREMENT and, GOTO. Generally, an instruction set with many members leads to a more sophisticated Turing machine and ultimately reduced instructions set means more effort for programmer/compiler to generate a program.

Movfuscator has a good solution for it, It compiles C programs into another language called BrainF*#$ which has a limited set of instruction and convert rewrites them using MOV instructions. The same approach can be used to compile C programs into our VM input code if only we implement all BrainF*#$ in our Turing machine.

Implementation and Effectiveness

I have definitely no intention to implement such Turing-VM. We can easily observe how fast it grows in size. It requires a lot of time and effort, thousand lines of code, tedious debugging procedure and etc. But even if somebody implements it, how easy it would be to reverse? Well in this case with proposed architecture, it wouldn’t be that hard surprisingly, All you need to do is to identify code section, map each opcode to it’s respected BrainF*#$ instruction and the problem will be reduced to reversing BrainF*#$ or other intermediate codes.

Of course, the architecture can be changed in a way to make it more perplexing and I have already some ideas how to do that, But I prefer not to reveal them.

How good is this idea?

Having experienced the succession of all those ups and downs of designing a Turing machine for such simple arithmetic operations, throw it into a question that how good is this idea? is it suppose to be this tedious? to find out the answer, let’s look that how increments are done in practice.

In fact, In the ALU world, we have something more capable and surprisingly simpler than increment unit implemented in Turing machines, called Adders!

Adders can be built by combining a half-adder with full-adders. I don’t want to dig into digital circuit design details as it is another subject, but we all know that almost all of these circuits are modeled using Finite-State Automata FSA (more precisely, Mealy and Moore Machines, which are considered a kind of FSA). Knowing that such a power can be built on such a simple logical basis, why should we implement arithmetics using Turing machines? It’s like creating a parser just to count the number of “a”s in a string while a simple iteration over it will do the job.

Even if somebody wants to design a rigid obfuscation method, The idea of One Instruction Set Computers is easier and more sensible to implement. A usage sample of this method can be found here.

Final words

Seeing that how easy is to implement arithmetics using FSA and digital circuit knowledge and how hard is to implement them on Turing machines, one might ask “So what’s the point of studying Turing machines? What might come out of knowing that MOV is Turing complete?”

On the other hand, someone might ask more fundamental questions like “How powerful are our computers? a real machine can only have a finite number of configurations, this real machine is really nothing but a linear bounded automaton. So why do we study Turing machines at all?”

These are often questions among people with no computer science background who often confuse the fact that Turing machines (generally, automata theories) are not intended to model computers, but rather they are intended to model computation itself. This is why in most cases, computation theories do not go down well with practical issues. Not all theories are meant for daily uses. By theory, we know that The untyped lambda calculus is Turing-complete. Does that mean we should create a C compiler which compiles any imaginable program on the earth into a lambda equivalent program and create an obfuscator out of it?