NEERAJ KASTURI (U00812285)
Data Processing on extensive scale is
exceptionally troublesome employment with numerous framework ideas, for
example, execution, reliability and fault-tolerance and so on in thought. Spark
and Message Passing Interface(MPI) are the two most commonly used cluster
computing frameworks for big data analytics, each of the Spark and Message
Passing Interface has their offer of significance considering the prerequisite
and requirements. We can think about each of the methods in view of following
From a computational
perspective, the MPI execution is significantly more intense than the Spark on
Hadoop elective as far as speed. Due to its low level programming dialect and
decreased overhead (e.g. no fault handling such as in Spark). The speedup of
MPI over Spark is expected to the dataset size as for the cluster size.
With a bigger dataset,
time contrasts between the two usage would be littler. MPI and Spark are
considerably nearer as far as plate get to time despite the fact that MPI is
still quicker. In addition, Spark on Hadoop is more qualified to exploit the
GCP architecture since it can part the information in various chunks so files
can be perused in parallel from numerous sectors. After all this, Spark on
Hadoop might be favored on the grounds that it offers an distributed file
framework with fault-tolerance and information replication administration and
gives an arrangement of devices to information investigation and administration
that is anything but difficult to utilize, convey and keep up.
Algorithm Design and Programming:
In Spark information
is read from records on HDFS and utilize standard supported RDD transformations
and actions. RDD’s are the core information units in Spark are called Resilient
Distributed Datasets (RDDs) and these can be stored in memory, disks or in
both. Also, Spark operations just need work items to be passed with a specific
end goal to perform appropriated calculations over the information. Whereas,
MPI supports both synchronous and asynchronous programming and gives much
better adaptability in design and programming.
contrasts between these two frameworks are adaptation to fault-tolerance
support. Spark manages with the fault-tolerance adequately yet with an
unmistakable effect on speed. Rather MPI gives an answer for the most part
situated to high performance computing yet helpless to fault-tolerance,
specifically, if utilized as a part of commodity hardware.
In Spark Streaming,
information streams from input assets are handled in a batches and pushed out
to databases. To get begin with how spark streaming functions, first the
information streams are divided into few batches and each batch is considered
as RDD. After performing RDD operations on them the outcomes are pushed out in
The benefits of Spark
Streaming are that they are scalable, fault-tolerant and better at load
balancing and resource managing.
Fault-tolerance in Spark Streaming and Storm:
fault-tolerance is accomplished by duplicating the data among different Spark
executors in worker nodes in the cluster coming about two sorts of information
to be recuperated:
Data received and replicated – In this, the information gets recreated
on one of alternate nodes along these lines, the information can be recovered
when a failure.
Data received but buffered for replication – The information isn’t imitated in
this manner the best way to recuperate blame is by recovering it again from the
In Storm, handling of
fault-tolerance takes place in 3 types:
When worker dies, the supervisor will restart it, even if it doesn’t work
it’s just shifted to another machine.
When the node dies, after the tasks assigned got timed out, Nimbus will assign
the task to different machine.
The Nimbus or
Supervisor are designed to fail-fast and stateless, in other words they are
just restarted again.
Fault tolerance in Spark Vs Storm:
fault-tolerance messaging, Storm needs to monitor every single record. Storm
can be arranged to give at most once and precisely once. The delivery semantics
offered by Storm can acquire latency costs; if information loss in the stream
is worthy, at most once delivery will enhance performance.
Since Spark Streaming
is simply small-scale grouping, precisely once delivery is a trifling outcome
for each batch; this is the main conveyance semantic accessible to Spark. The
resiliency incorporated with Spark RDD’s and the smaller scale clustering
yields an inconsequential system for giving adaptation to fault-tolerance and
message delivery ensures. However, some failure situations of Spark Streaming
corrupt to in any event once conveyance.
All in all, a large
portion of spamming happens from compromised PCs and they end up being
underlying causes for major spamming far and wide. Initially, about 75 percent
of Internet mail is presently spam—that implies for each genuine email message
got, three bits of spam are likewise gotten. With the approach of clouding computing
resources, for example, Amazon web services, spammers have discovered a simple
and helpless approach to attack and spam end users.
To prevent spamming in
Amazon EC2, spam filtering isn’t something that you can set up and overlook: An
antispam framework that functions admirably today will gradually lose its
strength as the spammers figure out how to dodge the sifting strategies that
you’ve actualized. Some of the other challenges in preventing spamming in
Amazon EC2 are:
not being educated enough to avoid spammers.
the identities of email senders and domains.
the memory of attachments of emails at large scale.
the customers professional data, financial data and personal data such as
to automate the protection of filtering the emails sent over the cloud from the
the costs of antispam filtering techniques.
Amazon is quick to
understand the impacts of cloud hosting and has a few safety efforts to keep
any attacks or to face the difficulties of spamming/phishing. Amazon SES
accompanies AWS Key Management Service (KMS) to alternatively encrypt the mail
that it writes to the customer. It utilizes customer side encryption to encrypt
which makes it important to decrypt the substance on customer side subsequent
to recovering the mail. Amazon SES has extensive variety of spam protection
measures set up. It utilizes block lists to keep mail from known spammers from
entering the framework.
Amazon SES utilizes
in-house content filtering advancements to check email content for spam and
malware. In outstanding cases, accounts distinguished as sending spam or other
low-quality email might be suspended, or AWS may make such other move as it
regards fitting. At the point when malware is recognized, Amazon SES keeps
these messages from being sent
Any email that is
composed to server experiences virus and spam checking and afterward the
conceivably affected messages as spam and given a choice to trust or erase the
message. Notwithstanding the spam and virus decisions, Amazon SES gives the
DKIM and SPF check comes about.
Despite the fact that
Amazon will terminate those instances which are distinguished as spammers, the
spammers just creates another one, which may not be the best situation for the
avoidance of spamming. My thought for better anticipation of spammers would be
• filtering outbound activity for
nastiness, rate-restricting port 25/tcp associations on a for each client
premise, with the goal that an instance keep running by (or invaded by) a
spammer can’t deliver enormous quantities of spam before it is identified and
• if there were an approach to look
into client IDs from the IP address of the EC2 hubs they’re utilizing. Indeed,
even a dark client ID string would permit anti-abuse groups to associate a
solitary client’s action as they push through EC2 instances.
Cloud intensive refers
to utilizing a large portion of the resources from cloud. Appropriate from the
data base to Application code everything can be facilitated on cloud. As able
as the present mobile phones may be, the average top of the line cell phone was
not a suitable stage for deep learning vision models. In a cloud-based
arrangement, pictures are sent to a server for analysis utilizing deep learning
surmising to recognize faces. We can simply catch the picture in the mobile and
the various advances/flow should be possible in the cloud platform.
and storage resources are done in pay-as-you-go manner.
power and storage resources can be 100 – 1000 times than that of mobile
resources are saved for utilizing the other applications.
to security breaches when more cloud resources are used.
when more resources are utilized.
time increases as number of cloud servers are used.
In this model, the
computation on mobile phone (onloading) will play out the face detection module
and augmented reality to interface with the client. The other calculation will
keep running in the cloud server (offloading) like feature matching algorithms
and the extraction of the people with a match from the huge database.
Initially, face detection
through cell phone, will utilize local API that detect the look. After face is
detected, cell phone will edit the picture just on the face, from that point
forward, face detection will be handled for the offloading of the face
recognition procedure to the cloud server, where the face picture will continue
to know the personality of the face picture. After the cloud server perceived
the face picture, it will restore the outcome to the mobile phone which
incorporate, individual personality of the individual.
rate is high when the faces to be detected are more.
to cloud-intensive, only some of the data needed to be transferred via network.
utilization computational power and data resources.
as work must be done cloud.
the most ideal for real time because at certain times translation times takes
Usage of more mobile
computation resources where face detection and face recognition process will be
done in mobile environment where as cloud computation is rarely used.
cost is less as there is no interference of cloud.
storage usage with mobile environment.
response time than when cloud is used for face recognition.
After all this, I
would prefer cloud-mobile mix model.
Cloud storage has
gradually and relentlessly replaced the traditional methods for sharing and
synchronizing. Furthermore, with organizations like google keep up the
resources it wound up noticeably prominent in a matter of moments, however one
thing that many disregard or goes unnoticed is security of the service.
Imperva, a digital
security service organization gave an report which clarified another kind of
attack vector that encourages digital attackers to get to information and
archives transferred to prominent document synchronization services , for
example, Dropbox, Google Drive and so on which is alluded as Man-in-The-Cloud
(MiTC) attacks. MiTC attacks target normal cloud storage sites by trading off
the synchronization token that enables a single individual to keep up access to
the information through various points of contact.
These attacks are by
and large not done through taking credentials or compromising the server fairly
done through influencing clients framework and taking synchronizing token.The
malicious software at that point captures the token used to verify the worker’s
entrance to the capacity site and since the attack maintains a strategic
distance from the username and secret key it is difficult to find.
They also mentioned
some defence techniques to avoid these types of attacks like Distinguish the trade-off
of record synchronization account, and much more vitally, recognize the abuse
of the internal data resource. As they believe that an attack will undoubtedly
convey what needs be by the attacker endeavouring to get to business
information in a way that isn’t common for ordinary venture clients.
Henceforth, attackers are in the long run after the enterprise information as
opposed to the data put away at endpoints.
Another method to
counter attack is one of the recommended arrangement is to have Cloud Access
Security Broker (CASB). CASBs utilize discovery techniques to distinguish cloud
applications being used and potential hazard applications, exploitable clients
and other unsafe components. Cloud get to specialists will give certain number
of variation security get to controls, which incorporate encryption and gadget
profiling and credential mapping when an individual record is being utilized
from different frameworks. In this manner, CASBs can recognize any malevolent
vector attacks to a degree and can give security measures.
Jorge L. Reyes-Ortiz, Luca Oneto, and Davide Anguita. Big Data Analytics in the
Cloud: Spark on Hadoop vs MPI/OpenMP on Beowulf.
Tolga Soyata, Rajani Muraleedharan, Colin Funai , Minseok Kwon, Wendi
Heinzelman: Real-time Face Recognition Using a
Mobile-Cloudlet-Cloud Acceleration Architecture
Prasetyawidi Indrawan, Slamet Budiyatno, Nur Muhammad Ridho, and Riri Fitri
Sari: FACE RECOGNITION FOR SOCIAL MEDIA WITH MOBILE CLOUD COMPUTING.