Abstract— According toAV vendors malicious software has been growing exponentially last years. One ofthe main reasons for these high volumes is that in order to evade detection,malware authors started using polymorphic and metamorphic techniques.
As aresult, traditional signature-based approaches to detect malware are beinginsufficient against new malware and the categorization of malware samples hadbecome essential to know the basis of the behavior of malware and to fight backcybercriminals. During the last decade, solutions that fight against malicioussoftware had begun using machine learning approaches. Unfortunately, there arefew opensource datasets available for the academic community. One of thebiggest datasets available was released last year in a competition hosted onKaggle with data provided by Microsoft for the Big Data Innovators Gathering.This paper presents two novel and scalable approaches using LeNet likeConvolutional Neural Networks (CNNs) to assign malware to its correspondingfamily. Keywords— polymorphism,convolution, activation, drebin Introduction The interconnectivity, accessibility and open nature of IT industry has proved to be aloon for both developers and users.
But it comes with some threats as well. Themost significant one is the spread of malwares. Malware referred to asMalicious software in any software application that can infiltrate into a systemand access or damage resources without the owner’s consent.
Malware is ageneric term that may be viruses, worms, Trojan horses, spyware etc.· Adware -These are malwares which automatically shown the advertisement to the user.· Virus – It is the software which can harm yourcomputer by generating its copy automatically. These can be send throughelectronic mails.· Worm –They can be send with the help of networks. They have tendency toself-replicate itself and dissemination independently. On the other hand,viruses spread when the user take part in this activity. · Backdoors– These are the software’s which bypass the login credentials without detectedby the owner.
One or more software’s can be installed into system for futureuse. The potential harm that may result from themalware requires the anti-malware authors to stay a step ahead of the malwareauthors. This paper describes the use of LeNet lie convolution neural networkfor malware detection.
Malware detection is a technique that is used to distinguishbetween a malicious application from a being one. Not only this as, there arelots of categories of malwares, malware classification is also important. I. Chalangesin malwre detectionIn present scenario we detect the malwaresby signature based methods and this process is used by antivirus vendors formlast many years. Malware signature is a kind of algorithm which help us toidentify the type of the malware. when we identify the malware then it is soeasy to identify its family but hackers use the polymorphic engine andmetamorphic engine to stay step ahead form the anti-virus programmers. Lack ofopen source dataset for malware poses a great challenge since success of amachine learning algorithm largely depends on the quantity of the dataset used.New malwares get inflected into the system with every tick of the clock.
Malware detection suffers with the problem akin to the problem in virusdetection in biological systems. The files liik different but actually belongto the same family. The malware authors use polymorphism by virtue of which thesame binary file are modified such that they look completely different. Thismakes use of traditional technique insufficient. Another challenge is the largenumber of files that need to be investigated for proper detection.
Thus,needing very good computational efficiency.