Basic schemes for dataintegrity in cloud are Provable Data Possession (PDP) and Proof ofretrievability (PoR). The following section describes the privacy techniquesfor data integrity. 3.1 Provable Data Possession (PDP) Provable Data possession(PDP) is a technique for assuring data integrity over remote servers. In PDP Aclient that has stored data at an unfaithful server can verify that the serverpossesses the original data without retrieving it. Working principle of PDP is: Fig: 1Principle of PDP4 The clientgenerates pair of matching keys public & secrete key by using probabilistickey generation algorithm. Public key along withthe file will be sent to the server for storage by client and he deletes thefile from its local storage.
The client challengesthe server for a proof of possession for a subset of the blocks in the file. The client checks theresponse from the server. Challenges in PDP:Lackof error-correcting codes to address concerns of corruption. Lack of privacy preservation. No dynamic support. 3.
2 Basic PDP Scheme based on MAC Data owner computes aMessage Authentication Code (MAC) of the whole file with a set of secret keysand stores them locally before outsourcing it to CSP. It Keeps only thecomputed MAC on his local storage, sends the file to the CSP, and deletes thelocal copy of the file F. Whenever a verifier needs to check the Data integrityof file F, He/she sends a request to retrieve the file from CSP, reveals asecret key to the cloud server and asks to re compute the MAC of the wholefile, and compares the re-computed MAC with the previously stored value.
Challenges in PDP basedon MAC: The number ofverifications allowed is limited by the number of secret keys. The data owner has toretrieve the entire file of F from the server in order to compute new MACs,Which is not possible for large file. Public audit ability isnot supported as the private keys are required for verification. 3.3 Scalable PDP Scalable PDP uses thesymmetric encryption whereas original PDP uses public key to reduce computationoverhead. Scalable PDP can have dynamic operation on remote data.
Scalable PDPhas all the challenges and answers are pre-computed and limited number ofupdates. Scalable PDP does not require bulk encryption. It relies on thesymmetric-Key which is more efficient than public-Key encryption. So it doesnot offer public verifiability.Challenges in Scalable PDP:A client can performlimited number of updates and challenges. It does not performblock insertions; only append-type insertions are possible.
This scheme isproblematic for large files as each update requires re-creating all theremaining challenges. 3.4 Dynamic PDP Dynamic PDP which is acollection of seven polynomial-time algorithms (KeyGen DPDP, PrepareUpdateDPDP, PerformUpdate DPDP, VerifyUpdate DPDP, GenChallengeDPDP ,ProveDPDP,VerifyDPDP ). It supports full dynamic operations like insert, update, modify, deleteetc. Here in this technique uses rank-based authenticated directories and alongwith a skip list for inserting and deleting functions .
It has DPDP somecomputational complexity, it is still efficient. For example, for verifying theproof for 500MB file, DPDP only produces 208KB proof data and 15ms computationaloverhead. This technique offers fully dynamic operation like modification,deletion, insertion etc. as it supports fully dynamic operation there isrelatively higher computational, communication, and storage overhead.
All thechallenges and answers are dynamically generated.Challenges in Dynamic PDP:It has somecomputational complexity. Not suitable for thinclient.
DPDP does not includeprovisions for robustness. 3.5 Basic Proof of Retrievability (PoR): Proof ofRetrievability (POR) mechanism tries to obtain and verify a proof that the datastored by a user in cloud (called cloud storage archives or simply archives) isnot modified by the archive and thereby the integrity of the data is assured. Thesimplest Proof of retrievability (POR) scheme can be made using a keyed hashfunction hk(F). In this scheme the verifier, before archiving the data file Fin the cloud storage, pre-computes the cryptographic hash of F using hk(F) andstores this hash as well as the secret key K.
To check if the integrity of thefile F is lost the verifier releases the secret key K to the cloud archive andasks it to compute and return the value of hk(F).Challengesin Dynamic POR: It onlyworks with static data sets.It supportsonly a limited number of queries as a challenge since it deals with a finitenumber of check blocks. A POR doesnot provide in prevention to the file stored on CSP. 3.
5.1 Dataplaced on single server at cloudProof ofretrievability for large files using ‘sentinels’. The archive needs to accessonly a small portion of the file F. Special blocks (called sentinels) arehidden among other blocks in the data file F.
In the setup phase, the verifierrandomly embeds these sentinels among the data blocks. During the verificationphase, to check the integrity of the data file F, the verifier challenges theprover (cloud archive) by specifying the positions of a collection of sentinelsand asking the prover to return the associated sentinel values as shown in fig2.Challengesin POR for large files:Thistechnique put the computational overhead for large files as encryption is to beperformed on whole file. This methodput storage overhead on the server, because of newly inserted sentinels andpartly due to the error correcting codes that are inserted. To check theintegrity of file user need to download whole file which increases ofinput/output and transmission cost across the network.
This methodworks only with static data. 3.5.2 PORbased on keyed hash function hk(F) A keyed hashfunction is very simple and easily implementable .It provides the strong proofof integrity. In this method the user, pre-computes the cryptographic hash of Fusing hk(F) before outsourcing the data file F in the cloud storage, and storessecret key K along with computed hash. The user releases the secret key K tothe CSP to check the integrity of the file F and asks it to compute and returnthe value of hk(F). If the user want to check the integrity of the file F formultiple times he has store multiple hash values for different keys.
Challenges:Verifierneed to store key for each of checks it wants to perform as well as the hashvalue of the data file F with each hash key. It requireshigher resource costs for the implementation as every time hashing has toperform on entire file. Computationof the hash value for large data files can be computationally burdensome forthin clients. 3.5.
3 HAIL HAIL,high-availability and integrity layer for cloud storage, in which HAIL allowsthe user to store their data on multiple servers so there is a redundancy ofthe data. Simple principal of this method is to ensure data integrity of filevia data redundancy. HAIL uses message authentication codes (MACs), thepseudorandom function, and universal hash function to ensure integrity process.The proof is generated is by this method is independent of size of data and itis compact in size.Challenges:Mobileadversaries are biggest threat which attack on HAIL, which may corrupt the fileF. Thistechnique is only applicable for the static data only. It requiresmore computation power. Not suitablefor thin client.
3.5.4 PORBased on Selecting Random Bits in Data Blocks Techniquewhich involves the encryption of the few bits of data per data block instead ofencrypting the whole file F thus reducing the computational burden on theclients. It’s stands on the fact that high probability of security can beachieved by encrypting fewer bits instead of encrypting the whole data. Theclient storage computational overhead is also minimized as it does not storeany data with it and it reduces bandwidth requirements. Hence this scheme suitswell for thin client. In these techniques user needs tostore only a single cryptographic key and two random sequence functions.
Theuser does not store any data in its local machine. The user before storing thefile at the CSP preprocesses the file and appends some Meta data to the fileand stores at the CSP. At the time of verification the verifier uses this Metadata to verify the integrity of the data.Challenges:Thistechnique is only used for Static Data. No dataprevention mechanism is used in this technique. No DataPrevention mechanism is implemented in this technique. Methodology Single Server Multi Require a TPA (Third POR (Proof of Encrypted Thin Entire Server Party Auditor ) retrievability ) Users data Simplest POR Yes No No Yes Yes No Yes POR using Yes No No Yes Yes No Yes sentinels PDP Yes No No No NS No NS SDP Yes No No Yes NS No NS Kumar & Saxina Yes No No Yes Yes(Partial) Yes Yes Proposed model Shacham Yes No No Yes NS Yes NS Kennadi’s HAIL Yes Yes No(Optional ) Yes Yes No Yes protocol MR-PDP Yes Yes No(Optional) Yes NS No NS Shah Yes Yes Yes Yes Yes Maybe Yes Wang Yes Yes No(Optional) Yes No No No Sobol Sequence Yes Yes No(Optional) Yes No No Yes 4.
DATA INTEGRITY CHALLENGES Comparative study of all Dataintegrity techniques is as: