When Mirai meets Ranbyus

Published on December 15, 2016 09:30 UTC by GovCERT.ch (permalink)
Last updated on December 15, 2016 10:12 UTC

When we read the blog post about Mirai’s new DGA feature, a recommended read, we decided to try to re-implement its DGA. This was quite a challenge, as we’re not too familiar with the MIPS architecture and could not easily find any x86 sample with DGA.

The main DGA generation loop did not look too difficult to implement. Here is a copy of it, with some register renaming we did to facilitate a Python implementation:

Mirai DGA loop
Mirai DGA loop(click to enlarge)

We were initially surprised by the observation that, while we see 0x61 (a lowercase ‘a’) added at the end, we don’t seem to see any modulo 26 operation, so we would expect the base name to contain non-printable characters. Nevertheless, we tried to mimic the code instruction by instruction in Python; after repeatedly fixing typos, we came up with the following abomination of code:

def calcDGA(day,month,year,seed=0):
    msk32 = 0xffffffff
    yearVal = (year*8) & msk32
    yearValB = 0x3ffff00 # lowest byte is irrelevant
    monValB = 0x51eb851f # “|month” can be ignored
    msk1 = 0xfffffff0
    msk2 = 0xfffffffe
    res = ""
    
    for i in xrange(12):
        daySeed = day ^ seed
        yearVal -= year
        yearVal &= msk32
        monMask = month & msk2
        month2 = (month * 4) & msk32
        yearVal ^=  year
        month2 ^=  month
        dayMask = day & 0x1fff
        month = (monMask << 4) & msk32
        monVal = year & msk1
        monMask = (monMask*2) & msk32
        daySeed = (daySeed*4) & msk32
        month = (month-monMask) & msk32
        monVal = (monVal << 17) & msk32
        yearVal = yearVal >> 11
        dayMask  ^=  daySeed
        month2 >>= 8
        year = monVal ^ yearVal
        dayMask = (dayMask<<4) & msk32
        yearVal = (day >> 15) & msk32
        month ^=  month2
        day = yearVal ^ dayMask        
        monMask = year ^ month ^ day
        dayMask = ((monMask * monValB) >> 32) & msk32
        dayMask >>= 3    
        yearVal = ((seed*8 + day) << 8) & yearValB
        monVal = seed >> 6
        seed = monVal ^ yearVal
        monVal = (dayMask << 3) & msk32
        yearVal = (dayMask << 5) & msk32
        yearVal = (yearVal - monVal + dayMask) & msk32
        monMask = (monMask - yearVal) & msk32
        monMask = (monMask + 0x61) & msk32
        res += chr(monMask & 0xff)
        yearVal = (year * 8) & msk32
    
    suffix = ""
    if day&1 == 0:
        suffix = "online"
    if day%3 == 0:
        suffix = "tech"
    if day%4 == 0:
        suffix = "support"
    return res,suffix

Yes, this re-implementation is ugly – more than ugly. But surprisingly it produces printable output, as you can easily try it out yourself. While playing with it, we noticed that the letter ‘z’ seems to be absent – only 25 letters of the possible 26 appear. This already triggered some memories, but unfortunately we did not follow them yet. So, where hides the mapping to lowercase letters? The solution lies in the variable monValB, which contains 0x51eb851f, approximating 8/25 modulo 32 (0x51eb851f*25 = 0x8’0000’0007); the subsequent division by 8 (right-shifting of 3 bits) results in a division by 25 (stored in dayMask) – the compiler seems to avoid actual division instructions. Just after that, dayMask is again multiplied by 25, which is done by shifting it left 5 bits (multiplication with 32), shifting the same value once more by 3 bits (multiplication with 8), taking the difference (multiplication with 32-8=24), and finally adding dayMask once more (multiplication with 25). So, the compiler optimizes multiplications as well. After a final subtraction, that results in a “modulo 25”, and then 0x61 is added – note that “z” can never be produced this way, as suspected. Mystery solved.

Actually we see a similar trick later on in the code, the multiplication with 0xAAAAAAAB is in fact an approximation of 2/3 and used to calculate the modulo 3 value in the TLD selection code.

Now we tried to beautify the code, and after several copy and paste steps in Python, we reached the following:

def calcDGA(day,month,year,seed=0):
    msk32 = 0xffffffff
    
    monVal = 0x51eb0000 + month
    yearVal = (year*8) & msk32
    yearValB = 0x3ffff00
    
    msk1 = 0xfffffff0
    msk2 = 0xfffffffe
    
    res = ""
    
    for i in xrange(12):
        yearVal = (((yearVal - year) ^ year) & msk32) >> 11
        year = (((year & msk1) << 17) & msk32)  ^ yearVal
        dayMask = (((day & 0x1fff) ^ (((day ^ seed)*4) & msk32))<<4)&msk32
        yearVal = (day >> 15) & msk32
        month =  ((14*(month&msk2))&msk32)^((month^((month*4)&msk32)) >> 8)
        day = yearVal ^ dayMask
        monMask = year ^ month ^ day
        yearVal = ((seed*8 + day) << 8) & yearValB
        monVal = seed >> 6
        seed = monVal ^ yearVal
        res += chr(monMask % 25 + 0x61)
        yearVal = (year * 8) & msk32
    
    suffix = ""    
    if day%2 == 0:
        suffix = "online"
    if day%3 == 0:
        suffix = "tech"
    if day%4 == 0:
        suffix = "support"
    return res,suffix

Which still works fine. But during the modification of this code, it started to look more and more like a DGA we already knew – especially after we saw the multiplication with 14. Actually, it looked very similar to the Ranbyus DGA:

FOR i = 0 TO 13
     day = (day >> 15) ^ 16 * (day & 0x1FFF ^ 4 * (seed ^ day))
     year = ((year & 0xFFFFFFF0) << 17) ^ ((year ^ (7 * year)) >> 11)
     month = 14 * (month & 0xFFFFFFFE) ^ ((month ^ (4 * month)) >> 8)
     seed = (seed >> 6) ^ ((day + 8 * seed) << 8) & 0x3FFFF00
     domain[i] = ((day ^ month ^ year) % 25) + 'a'

We gave it a shot, and tried the C code. The only slight modifications required were to change the length of the base names from 14 down to 12, and the different selection of TLDs. Also, dga.c produces 30 domains for one day, while Mirai only uses the first one of them. Anyway, the code indeed produces the same base names:

Date       | Ranbyus DGA        | Mirai Base   | Mirai DGA

2016-12-04 | vmdefmnsndojtu.tw  | vmdefmnsndoj | vmdefmnsndoj.tech
2016-12-05 | xpknpxmywqsrce.net | xpknpxmywqsr | xpknpxmywqsr.support
2016-12-06 | lvfjcwwobycjik.com | lvfjcwwobycj | -
2016-12-07 | nympompksmfxsf.pw  | nympompksmfx | -
2016-12-08 | kedbuffigfjsfq.in  | kedbuffigfjs | kedbuffigfjs.online
2016-12-09 | pjapobyvmsnesx.me  | pjapobyvmsne | pjapobyvmsne.support
2016-12-10 | gjnhvqeopqkjjf.cc  | gjnhvqeopqkj | -
2016-12-11 | lokwpdhjqhcvtn.su  | lokwpdhjqhcv | -
2016-12-12 | frwhojupdcrcat.tw  | frwhojupdcrc | -
2016-12-13 | hufuhuerwgsigx.net | hufuhuerwgsi | -
2016-12-14 | bwhrdaumwuvnrb.com | bwhrdaumwuvn | bwhrdaumwuvn.support
2016-12-15 | daldkougomttqb.pw  | daldkougomtt | -
2016-12-16 | gjuyvvwwmbwvdn.in  | gjuyvvwwmbwv | -
2016-12-17 | lgigrtmnssejtd.me  | lgigrtmnssej | -
2016-12-18 | vsuuqkqbavvaiu.cc  | vsuuqkqbavva | -
2016-12-19 | bpmsfckfkrprom.su  | bpmsfckfkrpr | bpmsfckfkrpr.support
2016-12-20 | oornsduuwjlifv.tw  | oornsduuwjli | oornsduuwjli.tech
2016-12-21 | qjqubpciajocbs.net | qjqubpciajoc | qjqubpciajoc.tech
2016-12-22 | exvdaajegjuror.com | exvdaajegjur | exvdaajegjur.support
2016-12-23 | gsqvwjkbmwadex.pw  | gsqvwjkbmwad | -
2016-12-24 | poorcetnmjfchv.in  | poorcetnmjfc | poorcetnmjfc.online
2016-12-25 | ulficqrujekplt.me  | ulficqrujekp | ulficqrujekp.support
2016-12-26 | ltwehsdfhkegkh.cc  | ltwehsdfhkeg | -
2016-12-27 | qqnbfqrdjyjkqb.su  | qqnbfqrdjyjk | qqnbfqrdjyjk.support
2016-12-28 | xtldkfvredxhll.tw  | xtldkfvredxh | -
2016-12-29 | aojpocslpwsuik.net | aojpocslpwsu | aojpocslpwsu.support
2016-12-30 | tyxdowioaygktb.com | tyxdowioaygk | -
2016-12-31 | vtrndmhsgadage.pw  | vtrndmhsgada | vtrndmhsgada.online

You can find a longer list of domains here: Mirai DGA botnet domains

As described in the above blog post, the DGA is not always applied. A wrapping code checks if the current date is in November or start of December (1st to 3rd); on these days the DGA is turned off, on all other days turned on. But the programmers forgot to consider the year in their code: it will turn off the DGA again in November 2017 for another month. So this DGA selection code is a bit buggy. Maybe the programmers don’t plan to use the DGA for such a long time – for sure there will be other samples out until then.

Please note that our interpretation of the TLD selection algorithm differs from the one described in the blog post: the code ends in three separate if statements, not in one combined if-elif-elif construct. It actually checks whether the residual value is divisible by 2, 3, or 4. However, there are cases where neither matches, e.g., 0xef2f25d5 representing December 12th, and no valid domain is produced by the algorithm. This bug is not too critical though: at some later day, a valid domain will be generated anyway.

Also interesting is the distribution of the 3 TLDs support, tech, online and the days where no TLD can be generated. Theoretically one would expect 25% “online”, 25% “support”, 16.6% “tech”, and 33.3% unassigned days. But, as pointed out in the comments of the blog post, the relevant residual value can only take one of 31 values (because only the day of the month and initial seed determine it), and so it is not so surprising that the distribution is unusual. Consequently, the TLD pattern is repeating for every month.

Back to top