You can also liken reverse engineering to the proofs of mathematical equations. So, how is reverse engineering used?
What Is Reverse Engineering?
Reverse engineering is the process of analyzing a system for the purpose of reproducing or improving it. If you look at the working areas of reverse engineering, you can see that you can use it for many different purposes. If you look at it from a cybersecurity perspective, it is possible to perform the following operations with reverse engineering methods:
Source analysis of non-open source software Vulnerability analysis Malware analysis Cracking and Patching
You can see reverse engineering used even in computer games nowadays. For example, developers often create software mods using reverse engineering methods.
In the field of reverse engineering, there are two different analysis methods: static and dynamic. You carry out static analysis when you analyze a program without actually running it. A dynamic analysis method, on the other hand, requires that you run the program to observe its behavior and the data that it uses.
But before you carry out analysis for reverse engineering, there are some important terms you need to know about how computer architecture works.
Major Parts of Computer Architecture
Reverse engineering is practically impossible unless you understand computer architecture. You need to study the four main parts:
Input: A set of methods for entering data. CPU: The CPU processes incoming data and transmits it to its owners. It is the central processing unit. Memory: The space that temporarily holds data during processing. Output: The result that the end user sees.
You can keep all these main issues in mind with an example, like when you press the letter A on your keyboard. When you press it, an input event occurs. After this stage, the CPU processes the data and uses a small space in memory to store it. Finally, you will see the letter A on your screen, ending the process with output.
Dive Into the Depths of the CPU
If you really want to become an expert in reverse engineering and dive deep into this topic, you need to have detailed knowledge of hardware, low-level languages, and especially the CPU. The key topics you’ll need to know about the CPU are:
Control Unit: This is responsible for the processing of data in the CPU and its transfer to the relevant fields. You can think of this unit as a routing control mechanism. ALU: This stands for Arithmetic Logic Unit. This is where some arithmetic and logical operations take place. If you dig deeper into the math, you’ll see that the basic four operations are essentially variations on addition. So the ALU is based on aggregation. For example, subtracting two from three is the same as adding minus two to three. Registers: These are the areas inside the CPU that hold the processed data. There are different types of register, much like there are different types of variable in a programming language. A register is responsible for maintaining the type and attributes of the data assigned to it. Signals: If you want the CPU to carry out many different operations at the same time, some method of organizing them is necessary. The elements that do this are called signals. Each transaction acts according to signals that ensure it does not interfere with another process. Bus: The path used by the data to move from one unit to another. Note how the name suggests transportation.
Concepts You Will Often Hear in Reverse Engineering
Understanding how the CPU processes data and stores it in memory, alongside the concept of registers, can be very useful when reverse engineering. In particular, you can use the diagram below to better understand the concept of memory:
Finally, for reverse engineering analysis, you need to know some basic concepts about registers. They are one of the topics you will focus on the most. Here are some explanations about data, pointers, and index registers that will be useful to you in the most concise way:
1. EAX: Stands for Accumulator Register. It usually saves data that falls under the category of arithmetic operations here. 2. EBX: Stands for Base Register. It plays a role in indirect addressing. 3. EDX: Stands for Data Register. EDX helps other registers. 4. EIP: Stands for Instruction Pointer. Holds the address of the domain to run. 5. ESP: Holds the base address. 6. ESI: Holds the source index information. 7. EDI: Keeps the destination index information.
You should research all of these separately to understand their nuances. But if you look at the basics and try to understand the business logic, no matter what processor architecture you are working with, code analysis for reverse engineering will be quite easy.
Reverse engineering often begins with machine code. You might understand many of the above terms if you’re familiar with assembly or have a command of 32-bit or 64-bit processor architectures. If you want to learn assembly from the ground up, it will be extremely useful in reverse engineering.
What Will You Do With All This?
If you have a good knowledge of reverse engineering, you can do code analysis no matter what operating system or processor architecture you are working with. For example, it is possible to find cracked versions of many programs or computer games. This is a completely illegal method.
However, if you’re going to be an ethical cybersecurity professional, you’ll need to use reverse engineers to understand why these cracked programs are being cracked. If you want to advance in reverse engineering or are just starting out, it would be a good choice if you try to learn about the relationship between hardware and machine code.