nenek cantik

tehligia

TL;DR: Don't use MD-5 to identify malware samples. Believe me, it is a bad idea. Use SHA-256 or a stronger hash function.

This post is dedicated to all malware researchers, still using MD-5 to identify malware samples.

Before deep-diving into the details, let me explain my view on this topic. Whenever you want to identify a malware, it is only OK to publish the MD-5 hash of the malware if you post at least the SHA-256 hash of the malware as well. Publishing only the MD-5 hash is unprofessional. If you want to understand why, please continue reading. If you know about the problem, but want to help me spread the word, please link to my site www.stopusingmd5now.com.

By writing articles/posts/etc. and publishing the MD-5 hash only, it is the lesser problem that you show people your incompetency about hash functions, but you also teach other people to use MD-5. And it spreads like a disease... Last but not least, if I find a sample on your blog post, and you use MD-5 only, I can't be sure we have the same sample.

Here is a list to name a few bad examples (order is in Google search rank order):

Kaspersky
Bromium
Fireeye
Webroot
Mcafee
Fox-IT
Cisco
SANS
ESET
Symantec
Watchguard
And unfortunately, even the best books on malware analysis promote the use of MD-5 - see "Practical malware analysis" Chapter 1 Page 10

Introduction to (cryptographic) hash functions

A long time ago (according to some sources since 1970) people started designing hash functions, for an awful lot of different reasons. It can be used for file integrity verification, password verification, pseudo-random generation, etc. But one of the most important properties of a cryptographic hash function is that it can "uniquely" identify a block of data with a small, fixed bit string. E.g., malware can be identified by using only the hash itself, so everybody who has the same malware sample will have the same hash; thus they can refer to the malware by the hash itself.

It is easy to conclude that there will always be collisions, where a different block of data has the same result hashes. The domain (block of data) is infinite, while the codomain (possible hash values) is finite. The question is how easy it is to find two different blocks of data, having the same hash. Mathematicians call this property "collision resistance." Proper cryptographic hash functions are collision-resistant, meaning it is impractical or impossible to find two different blocks of data, which have the same hash.

In 1989 Ronald Rivest (the first letter in the abbreviation of the RSA algorithm) designed the MD-2 hashing algorithm. Since 1997 there are publications about that this hashing algorithm is far from perfect.

In 1990 Ronald Rivest designed the MD-4 algorithm, which is considered as broken at least from 1991. But MD-4 is still in use from Windows XP until Windows 8 in the password protocol (NTLM). Unfortunately, there are more significant problems with NTLM besides using MD-4, but this can be the topic of a different blog post.

In 1991 (you might guess who) designed yet another hashing algorithm called MD-5, to replace MD-4 (because of the known weaknesses). But again, in from 1993 it has been shown many times that MD-5 is broken as well. According to Wikipedia, "On 18 March 2006, Klima published an algorithm [17] that can find a collision within one minute on a single notebook computer, using a method he calls tunneling". This means, that with the 8 years old computing power of a single notebook one can create two different files having the same MD-5 hash. But the algorithms to generate collisions have been improved since, and "a 2013 attack by Xie Tao, Fanbao Liu, and Dengguo Feng breaks MD-5 collision resistance in 2^18 time. This attack runs in less than a second on a regular computer." The key takeaway here is that it is pretty damn hard to design a secure cryptographic hash function, which is fast, but still safe. I bet that if I would develop a hash function, Ron would be able to hack it in minutes.

Now, dear malware researcher, consider the following scenario. You as, a malware analyst, find a new binary sample. You calculate the MD-5 hash of the malware, and Google for that hash. You see this hash value on other malware researchers or on a sandbox/vendor's site. This site concludes that this sample does this or that, and is either malicious or not. Either because the site is also relying solely on MD-5 or because you have only checked the MD-5 and the researcher or sandbox has a good reputation, you move on and forget this binary. But in reality, it is possible that your binary is totally different than the one analyzed by others. The results of this mistake can scale from nothing to catastrophic.

If you don't believe me, just check the hello.exe and erase.exe on this site from Peter Sellinger. Same MD-5, different binaries; a harmless and a (fake) malicious one... And you can do the same easily at home. No supercomputers, no NSA magic needed.

On a side-note, it is important to mention that even today it can be hard to find a block of data (in generic), if only the MD-5 hash is known ("pre image resistance"). I have heard people arguing this when I told them using MD-5 as a password hash function is a bad idea. The main problem with MD-5 as a password hash is not the weaknesses in MD-5 itself, but the lack of salt, lack of iterations, and lack of memory hardness. But still, I don't see any reason why you should use MD-5 as a building block for anything, which has anything to do with security. Would you use a car to drive your children to the school, which car has not been maintained in the last 23 year? If your answer is yes, you should neither have children nor a job in IT SEC.

Conclusion

If you are a malware researcher, and used MD-5 only to identify malware samples in the past, I suggest to write it down 1000 times: "I promise I won't use MD-5 to identify malware in the future."

I even made a website dedicated to this problem, www.stopusingmd5now.com . The next time you see a post/article/whatever where malware is identified by the MD-5 hash only, please link to this blog post or website, and the world will be a better and more professional place.

PS: If you are a forensics investigator, or software developer developing software used in forensics, the same applies to you.
PS 2: If you find this post too provocative and harsh, there is a reason for this ...

Update: I have modified two malware (Citadel, Atrax) with the help of HashClash, and now those have the same MD-5. Many thanks for Marc Stevens for his research, publishing his code, and help given during the collision finding.

IPsec and Internet Key Exchange (IKE)

IPsec enables cryptographic protection of IP packets. It is commonly used to build VPNs (Virtual Private Networks). For key establishment, the IKE protocol is used. IKE exists in two versions, each with different modes, different phases, several authentication methods, and conﬁguration options. Therefore, IKE is one of the most complex cryptographic protocols in use.

In version 1 of IKE (IKEv1), four authentication methods are available for Phase 1, in which initial authenticated keying material is established: Two public key encryption based methods, one signature based method, and a PSK (Pre-Shared Key) based method.

The relationship between IKEv1 Phase 1, Phase 2, and IPsec ESP. Multiple simultaneous Phase 2 connections can be established from a single Phase 1 connection. Grey parts are encrypted, either with IKE derived keys (light grey) or with IPsec keys (dark grey). The numbers at the curly brackets denote the number of messages to be exchanged in the protocol.

Pre-Shared Key authentication

As shown above, Pre-Shared Key authentication is one of three authentication methods in IKEv1. The authentication is based on the knowledge of a shared secret string. In reality, this is probably some sort of password.

The IKEv1 handshake for PSK authentication looks like the following (simplified version):

In the first two messages, the session identifier (inside HDR) and the cryptographic algorithms (proposals) are selected by initiator and responder.

In messages 3 and 4, they exchange ephemeral Diffie-Hellman shares and nonces. After that, they compute a key k by using their shared secret (PSK) in a PRF function (e.g. HMAC-SHA1) and the previously exchanged nonces. This key is used to derive additional keys (k_a, k_d, k_e). The key k_d is used to compute MAC_I over the session identifier and the shared diffie-hellman secret g^xy. Finally, the key k_e is used to encrypt ID_I (e.g. IPv4 address of the peer) and MAC_I.

Weaknesses of PSK authentication

It is well known that the aggressive mode of authentication in combination with PSK is insecure and vulnerable against off-line dictionary attacks, by simply eavesedropping the packets. For example, in strongSwan it is necessary to set the following configuration flag in order to use it:

charon.i_dont_care_about_security_and_use_aggressive_mode_psk=yes

For the main mode, we found a similar attack when doing some minor additional work. For that, the attacker needs to waits until a peer A (initiator) tries to connect to another peer B (responder). Then, the attacker acts as a man-in-the middle and behaves like the peer B would, but does not forward the packets to B.

From the picture above it should be clear that an attacker who acts as B can compute (g^xy) and receives the necessary public values session ID, n_I, n_R. However, the attacker does not know the PSK. In order to mount a dictionary attack against this value, he uses the nonces, and computes a candidate for k for every entry in the dictionary. It is necessary to make a key derivation for every k with the values of the session identifiers and shared Diffie-Hellmann secret the possible keys k_a, k_d and k_e. Then, the attacker uses k_e in order to decrypt the encrypted part of message 5. Due to ID_I often being an IP address plus some additional data of the initiator, the attacker can easily determine if the correct PSK has been found.

Who is affected?

This weakness exists in the IKEv1 standard (RFC 2409). Every software or hardware that is compliant to this standard is affected. Therefore, we encourage all vendors, companies, and developers to at least ensure that high-entropy Pre-Shared Keys are used in IKEv1 configurations.

In order to verify the attack, we tested the attack against strongSWAN 5.5.1.

Proof-of-Concept

We have implemented a PoC that runs a dictionary attack against a network capture (pcapng) of a IKEv1 main mode session. As input, it also requires the Diffie-Hellmann secret as described above. You can find the source code at github. We only tested the attack against strongSWAN 5.5.1. If you want to use the PoC against another implementation or session, you have to adjust the idHex value in main.py.

Responsible Disclosure

We reported our ﬁndings to the international CERT at July 6th, 2018. We were informed that they contacted over 250 parties about the weakness. The CVE ID for it is CVE-2018-5389 [cert entry].

Credits

On August 10th, 2018, we learned that this attack against IKEv1 main mode with PSKs was previously described by David McGrew in his blog post Great Cipher, But Where Did You Get That Key?. We would like to point out that neither we nor the USENIX reviewers nor the CERT were obviously aware of this.
On August 14th 2018, Graham Bartlett (Cisco) email us that he presented the weakness of PSK in IKEv2 in several public presentations and in his book.
On August 15th 2018, we were informed by Tamir Zegman that John Pliam described the attack on his web page in 1999.

FAQs

Do you have a name, logo, any merchandising for the attack?
No.
Have I been attacked?
We mentioned above that such an attack would require an active man-in-the-middle attack. In the logs this could look like a failed connection attempt or a session timed out. But this is a rather weak indication and no evidence for an attack.
What should I do?
If you do not have the option to switch to authentication with digital signatures, choose a Pre-Shared Key that resists dictionary attacks. If you want to achieve e.g. 128 bits of security, configure a PSK with at least 19 random ASCII characters. And do not use something that can be found in public databases.
Am I safe if I use PSKs with IKEv2?
No, interestingly the standard also mentions that IKEv2 does not prevent against off-line dictionary attacks.
Where can I learn more?
You can read the paper. [alternative link to the paper]
What else does the paper contain?
The paper contains a lot more details than this blogpost. It explains all authentication methods of IKEv1 and it gives message flow diagrams of the protocol. There, we describe a variant of the attack that uses the Bleichenbacher oracles to forge signatures to target IKEv2.

Related news

tehligia

This blog based on two major concepts:

Understand password-cracking techniques
Understand different types of passwords

The simplest way to crack the passwords

The first step is to access the system is that you should know how to crack the password of the target system. Passwords are the key element of information require to access the system, and users also selects passwords that are easy to guess such as mostly people has a passwords of their pet's name or room number etc to help them remember it. Because of this human factor, most password guessing is successful if some information is known about the target. Information gathering and reconnaissance can help give away information that will help a hacker guess a user's password.

Once a password is guessed or cracked, it can be the launching point for escalating privileges, executing applications, hiding files, and covering tracks. If guessing a password fails, then passwords may be cracked manually or with automated tools such as a dictionary or brute-force method.

Types of Passwords

Only numbers
Only letters
Only special characters
Letters and numbers
Only letters and special characters
Numbers, letters and special characters

A strong password is less susceptible to attack by a hacker. The following rules, proposed by the EC-Council, should be applied when you're creating a password, to protect it against attacks:

Must not contain any part of the user's account name
Must have a minimum of eight characters
Must contain characters from at least three of the following categories:
- Non alphanumeric symbols ($,:"%@!#)
- Numbers
- Uppercase letters
- Lowercase letters

A hacker may use different types of attacks in order to identify a password and gain further access to a system. The types of password attacks are as follows:

Passive Online

Eavesdropping on network password exchanges. Passive online attacks
include sniffing, man-in-the-middle, and replay attacks. Moreover, a passive online attack is also known as sniffing the password on a wired or wireless network. A passive attack is not detectable to the end user. The password is captured during the authentication process and can then be compared against a dictionary file or word list. User account passwords are commonly hashed or encrypted when sent on the network to prevent unauthorized access and use. If the password is protected by encryption or hashing, special tools in the hacker's toolkit can be used to break the algorithm.

Another passive online attack is known as man-in-the-middle (MITM). In a MITM attack, the hacker intercepts the authentication request and forwards it to the server. By inserting a sniffer between the client and the server, the hacker is able to sniff both connections and capture passwords in the process.

A replay attack is also a passive online attack; it occurs when the hacker intercepts the password en route to the authentication server and then captures and resend the authentication packets for later authentication. In this manner, the hacker doesn't have to break the password or learn the password through MITM but rather captures the password and reuses the password-authentication packets later to authenticate as the client.

Active Online

Guessing the Administrator password. Active online attacks include auto-
mated password guessing. Moreover, The easiest way to gain administrator-level access to a system is to guess a simple password assuming the administrator used a simple password. Password guessing is an active online attack. It relies on the human factor involved in password creation and only works on weak
passwords.

Assuming that the NetBIOS TCP 139 port is open, the most effective method of breaking into a Windows NT or Windows 2000 system is password guessing. This is done by attempting to connect to an enumerated share ( IPC$ or C$ ) and trying a username and password combination. The most commonly used Administrator account and password combinations are words like Admin, Administrator, Sysadmin, or Password, or a null password.
A hacker may first try to connect to a default Admin$ , C$ , or C:\Windows share. To connect to the hidden C: drive share, for example, type the following command in the Run field (Start ➪ Run):

\\ip_address\c$

Automated programs can quickly generate dictionary files, word lists, or every possible combination of letters, numbers, and special characters and then attempt to log on using those credentials. Most systems prevent this type of attack by setting a maximum number of login attempts on a system before the account is locked.

In the following sections, we'll discuss how hackers can perform automated password guessing more closely, as well as countermeasures to such attacks.

Performing Automated Password Guessing

To speed up the guessing of a password, hackers use automated tools. An easy process for automating password guessing is to use the Windows shell commands based on the standard NET USE syntax. To create a simple automated password-guessing script, perform the following steps:

Create a simple username and password file using Windows Notepad. Automated tools such as the Dictionary Generator are available to create this word list. Save the file on the C: drive as credentials.txt.
Pipe this file using the FOR command: C:\> FOR /F "token=1, 2*" %i in (credentials.txt)
Type net use \\targetIP\IPC$ %i /u: %j to use the credentials.txt file to attempt to log on to the target system's hidden share.

Offline Attacks

Offline attacks are performed from a location other than the actual computer where the passwords reside or were used. Offline attacks usually require physical access to the computer and copying the password file from the system onto removable media. The hacker then takes the file to another computer to perform the cracking. Several types of offline password attacks exist.

Types of Attack	Characteristics	Password Example
Dictionary attack	Attempts to use passwords from a list of dictionary words	Administrator
Hybrid attack	Substitutes numbers of symbols for password characters	Adm1n1strator
Brute-force attack	Tries all possible combinations of letters, numbers, and special characters	Ms!tr245@F5a

A dictionary attack is the simplest and quickest type of attack. It's used to identify a password that is an actual word, which can be found in a dictionary. Most commonly, the attack uses a dictionary file of possible words, which is hashed using the same algorithm used by the authentication process. Then, the hashed dictionary words are compared with hashed passwords as the user logs on, or with passwords stored in a file on the server. The dictionary attack works only if the password is an actual dictionary word; therefore, this type of attack has some limitations. It can't be used against strong passwords containing numbers or other symbols.

A hybrid attack is the next level of attack a hacker attempts if the password can't be found using a dictionary attack. The hybrid attack starts with a dictionary file and substitutes numbers and symbols for characters in the password. For example, many users add the number 1 to the end of their password to meet strong password requirements. A hybrid attack is designed to find those types of anomalies in passwords.

The most time-consuming type of attack is a brute-force attack, which tries every possible combination of uppercase and lowercase letters, numbers, and symbols. A brute-force attack is the slowest of the three types of attacks because of the many possible combinations of characters in the password. However, brute force is effective; given enough time and processing power, all passwords can eventually be identified.