Basic schemes for data
integrity in cloud are Provable Data Possession (PDP) and Proof of
retrievability (PoR). The following section describes the privacy techniques
for data integrity.
3.1 Provable Data Possession (PDP)
Provable Data possession
(PDP) is a technique for assuring data integrity over remote servers. In PDP A
client that has stored data at an unfaithful server can verify that the server
possesses the original data without retrieving it. Working principle of PDP is:
Principle of PDP4
generates pair of matching keys public & secrete key by using probabilistic
key generation algorithm.
Public key along with
the file will be sent to the server for storage by client and he deletes the
file from its local storage.
The client challenges
the server for a proof of possession for a subset of the blocks in the file.
The client checks the
response from the server.
Challenges in PDP:
of error-correcting codes to address concerns of corruption.
Lack of privacy preservation.
No dynamic support.
3.2 Basic PDP Scheme based on MAC
Data owner computes a
Message Authentication Code (MAC) of the whole file with a set of secret keys
and stores them locally before outsourcing it to CSP. It Keeps only the
computed MAC on his local storage, sends the file to the CSP, and deletes the
local copy of the file F. Whenever a verifier needs to check the Data integrity
of file F, He/she sends a request to retrieve the file from CSP, reveals a
secret key to the cloud server and asks to re compute the MAC of the whole
file, and compares the re-computed MAC with the previously stored value.
Challenges in PDP based
The number of
verifications allowed is limited by the number of secret keys.
The data owner has to
retrieve the entire file of F from the server in order to compute new MACs,
Which is not possible for large file.
Public audit ability is
not supported as the private keys are required for verification.
3.3 Scalable PDP
Scalable PDP uses the
symmetric encryption whereas original PDP uses public key to reduce computation
overhead. Scalable PDP can have dynamic operation on remote data. Scalable PDP
has all the challenges and answers are pre-computed and limited number of
updates. Scalable PDP does not require bulk encryption. It relies on the
symmetric-Key which is more efficient than public-Key encryption. So it does
not offer public verifiability.
Challenges in Scalable PDP:
A client can perform
limited number of updates and challenges.
It does not perform
block insertions; only append-type insertions are possible.
This scheme is
problematic for large files as each update requires re-creating all the
3.4 Dynamic PDP
Dynamic PDP which is a
collection of seven polynomial-time algorithms (KeyGen DPDP, PrepareUpdate
DPDP, PerformUpdate DPDP, VerifyUpdate DPDP, GenChallengeDPDP ,ProveDPDP,Verify
DPDP ). It supports full dynamic operations like insert, update, modify, delete
etc. Here in this technique uses rank-based authenticated directories and along
with a skip list for inserting and deleting functions .It has DPDP some
computational complexity, it is still efficient. For example, for verifying the
proof for 500MB file, DPDP only produces 208KB proof data and 15ms computational
overhead. This technique offers fully dynamic operation like modification,
deletion, insertion etc. as it supports fully dynamic operation there is
relatively higher computational, communication, and storage overhead. All the
challenges and answers are dynamically generated.
Challenges in Dynamic PDP:
It has some
Not suitable for thin
DPDP does not include
provisions for robustness.
3.5 Basic Proof of Retrievability (PoR):
Retrievability (POR) mechanism tries to obtain and verify a proof that the data
stored by a user in cloud (called cloud storage archives or simply archives) is
not modified by the archive and thereby the integrity of the data is assured. The
simplest Proof of retrievability (POR) scheme can be made using a keyed hash
function hk(F). In this scheme the verifier, before archiving the data file F
in the cloud storage, pre-computes the cryptographic hash of F using hk(F) and
stores this hash as well as the secret key K.To check if the integrity of the
file F is lost the verifier releases the secret key K to the cloud archive and
asks it to compute and return the value of hk(F).
in Dynamic POR:
works with static data sets.
only a limited number of queries as a challenge since it deals with a finite
number of check blocks.
A POR does
not provide in prevention to the file stored on CSP.
placed on single server at cloud
retrievability for large files using ‘sentinels’. The archive needs to access
only a small portion of the file F. Special blocks (called sentinels) are
hidden among other blocks in the data file F. In the setup phase, the verifier
randomly embeds these sentinels among the data blocks. During the verification
phase, to check the integrity of the data file F, the verifier challenges the
prover (cloud archive) by specifying the positions of a collection of sentinels
and asking the prover to return the associated sentinel values as shown in fig
in POR for large files:
technique put the computational overhead for large files as encryption is to be
performed on whole file.
put storage overhead on the server, because of newly inserted sentinels and
partly due to the error correcting codes that are inserted.
To check the
integrity of file user need to download whole file which increases of
input/output and transmission cost across the network.
works only with static data.
based on keyed hash function hk(F)
A keyed hash
function is very simple and easily implementable .It provides the strong proof
of integrity. In this method the user, pre-computes the cryptographic hash of F
using hk(F) before outsourcing the data file F in the cloud storage, and stores
secret key K along with computed hash. The user releases the secret key K to
the CSP to check the integrity of the file F and asks it to compute and return
the value of hk(F). If the user want to check the integrity of the file F for
multiple times he has store multiple hash values for different keys.
need to store key for each of checks it wants to perform as well as the hash
value of the data file F with each hash key.
higher resource costs for the implementation as every time hashing has to
perform on entire file.
of the hash value for large data files can be computationally burdensome for
high-availability and integrity layer for cloud storage, in which HAIL allows
the user to store their data on multiple servers so there is a redundancy of
the data. Simple principal of this method is to ensure data integrity of file
via data redundancy. HAIL uses message authentication codes (MACs), the
pseudorandom function, and universal hash function to ensure integrity process.
The proof is generated is by this method is independent of size of data and it
is compact in size.
adversaries are biggest threat which attack on HAIL, which may corrupt the file
technique is only applicable for the static data only.
more computation power.
for thin client.
Based on Selecting Random Bits in Data Blocks
which involves the encryption of the few bits of data per data block instead of
encrypting the whole file F thus reducing the computational burden on the
clients. It’s stands on the fact that high probability of security can be
achieved by encrypting fewer bits instead of encrypting the whole data. The
client storage computational overhead is also minimized as it does not store
any data with it and it reduces bandwidth requirements. Hence this scheme suits
well for thin client. In these techniques user
store only a single cryptographic key and two random sequence functions. The
user does not store any data in its local machine. The user before storing the
file at the CSP preprocesses the file and appends some Meta data to the file
and stores at the CSP. At the time of verification the verifier uses this Meta
data to verify the integrity of the data.
technique is only used for Static Data.
prevention mechanism is used in this technique.
Prevention mechanism is implemented in this technique.
Require a TPA (Third
POR (Proof of
Party Auditor )
4. DATA INTEGRITY CHALLENGES
Comparative study of all Data
integrity techniques is as: