“I am Spartacus”: privacy enhancing technologies, collaborative obfuscation and privacy as a public good

The paper introduces an approach to privacy enhancing technologies that sees privacy not merely as an individual right, but as a public good. This idea finds its correspondence in our approach to privacy protection through obfuscation, where everybody in a group takes a small privacy risk to protect the anonymity of fellow group members. We show how these ideas can be computationally realised in an Investigative Data Acquisition Platform (IDAP). IDAP is an efficient symmetric Private Information Retrieval protocol optimised for the specific purpose of facilitating public authorities’ enquiries for evidence.


Introduction
This paper discusses the technology for an obfuscation based method for privacy enhancing tools, which serves the dual function of protecting the reputation and privacy of data subjects, while at the same time protecting legitimate police interests in the confidentiality of an investigation.Unlike most other approaches to PET, in our of privacy law as an individual right.By contrast, our approach asks how PET can look like in a society that considers privacy a common good and the protection of privacy a communal task, an understanding of privacy law that has recently gained much ground in the academic debate.In the first part of the paper, we therefore describe the motivation for this approach in the form of an extended use case.This will prepare the ground for a legal-jurisprudential analysis that is needed for the normative underpinning of the technology.In the second part, we introduce a formal apparatus that can support this type of communal privacy protection.In the third part, we provide a short evaluation of the results, both from a technological and from a legal and ethical perspective, indicating also a number of necessary further research questions.
1.1 Setting the scene: obfuscation and privacy protection Let us consider the following example of a traditional, brick and mortar police investigation: The police wants to check the alibi of a suspect, John Doe.They drive in a marked police car to his place of residence, park it in full view on the street next to his house, and then send pairs of uniformed police officers from neighbour to neighbour, asking if they saw Mr Doe during a certain time interval.
This sort of scenario carries two obvious risks.One is a reputation risk for Mr Doe.His neighbours now know at the very least that he is for one reason or other suspected of a wrongdoing.They might also be able to infer from the questions some of the information the police holds about Mr Doe-if for instance they are asked for a specific time interval, and it is well known that during that time a robbery happened nearby, it would be obvious to infer that Mr Doe is suspect in a robbery.If the question is: ''Have you ever seen very young girls visiting your neighbour late at night'', another inference would immediately be drawn.
At the same time, this approach also carries risks for the police-the neighbours may inadvertently or intentionally alert Mr Doe that he is subject to a police investigation.One way to protect both the interest of Mr Doe in preventing the disclosure that he is subject of a police inquiry, and the interest of the police not to alert Mr Doe is to ask questions that are much broader phrased.This could be e.g.asking every person on that street, including Mr Doe, to list everybody whom they saw in the neighbourhood at the relevant time.This way, there is no finger of suspicion that points at one specific person.But this strategy carries obvious costs too.It creates much more information than necessary, most of it noise, which the police then has to process.It also creates a privacy risk for a much larger number of people-the police now knows about the whereabouts of a large number of citizens it has no legitimate interests in.Nonetheless, creating an excess amount of information seems, paradoxically, to be one way of protecting privacy and integrity of the investigation.
We can now transfer this scenario to the internet, for instance a request to an ISP for data that establishes when a suspect was online, or a request to a bank about online transactions carried out by a client.We assume here and in the following that the data was legitimately held by that company.We also assume that the police warrant is legitimate and necessary.At this point, we face the same dilemma as described above-the formal request for information discloses to the data controller that the police has a legitimate interest in one of their clients.This in turn might give the data controller an incentive for action.A bank for instance may decide to disassociate themselves from a client who has been frequently the subject of data disclosure requests, on the assumption that he is a reputation risk should he become subject of a high profile trial.This in turn may alert the client of investigative activities against him.Different jurisdictions have different rules on the disclosure of police requests to the citizen investigated, but even those that prohibit disclosure can't prevent that a suspect draws the right conclusions if he notices that the behaviour of others towards him has suddenly changed for the worse.However, the ''obfuscation'' method described above-asking much wider, less focussed questions-can't easily be transferred to an online environment.The formal procedure that is required to gain data access requires that the query is sufficiently precise and focussed, to prevent fishing expeditions and unnecessary privacy intrusions of innocent citizens.In Europe, the Data Protection Directive allows national police forces access to data only ''in specific cases.''As Bignami (2007) noted, this provision is explicitly designed to prohibit high-tech fishing expeditions, whether done by the police or by market actors.Again Bignami: The police cannot make blanket requests for calling information.Rather, they must compile detailed requests for information on specific telephone numbers.The requirement of specificity is a means of guaranteeing that the police have at least some grounds for suspecting those telephone numbers of being involved in a criminal conspiracy.
Paradoxically therefore, a method that could in principle protect citizens from the misuse of their data prohibits certain privacy enhancing methods.
Nonetheless, using obfuscation is an attractive privacy enhancing tool in principle.In the online scenario, it is the rights of third parties that prevents us to hide the identity of Mr Doe behind a veil of ''excess data''.This however would change if a sufficient number of other clients of the company in question waived their rights, and under the assumption of mutuality and reciprocity volunteer the ''fog'' of data that shields the identity of the subject of a data query from the data controller, though not the police.The bank or ISP will in this model only know that the subject of the query is amongst the arbitrarily large number of records they are asked to hand over to the police.The police in turn must only be able to decipher from all the data handed over to them the data of the person they are interested in.We will see below how a combination of a trusted third party approach together with encryption methods can provide just such a set up.
A particularly intuitive example of such a solidarity based protection of identity against a data query though comes from the film ''Spartacus''.In one of the most climatic scenes of the film, a Roman general demands from the captured remains of the former slave army that they turn Spartacus over to him.To protect his friends, Spartacus stands up.However, so great is the solidarity of his soldiers that several of them come forward, shouting ''I am Spartacus!''until the shouts dissolve into a cacophony of voices each claiming ''I am Spartacus!''.This makes it impossible for the general to identify and arrest Spartacus.
This story also points to one of the main issues that technology alone cannot tackle-the legal and social environment necessary for such an approach to work.As the Spartacus example shows, people are sometimes willing to take personal risks for a communal good.This requires us however to reconsider the normative foundations of privacy law as an essentially liberal, individual centric notion.Using obfuscation as a method to protect privacy is by no means new, and other writers evoked Spartacus before.(e.g.Howe and Nissenbaum 2009).However, as Brunton and Nissenbaum note, most of these approaches put the burden to produce the excess data on the individual who wants to protect herself.The few examples of truly collective obfuscation that they identify are typically ''low tech'', e.g.swapping of loyalty cards, do not involve any risk for the collaborators and are directed against illegitimate privacy intrusions by private companies.Our problem, and hence our solution, differs in all these aspects.First, existing methods of collaborative obfuscation are low technology approaches by grassroot activists trying to undermine corporate data mining in the long run.Our scenario by contrast is a online investigation for a single, specific event.Furthermore, in our scenario there is a legitimate police investigation, and whatever method we chose to protect Doe's privacy interests, they must not interfere with the legitimate exercise of police functions.Indeed, as we indicated above, protecting Doe's privacy is in the interest of both him and the police-an approach which we hope will help to revise the often overly simplistic concept of privacy as an example of irreconcilable conflict between state and individuals.Finally, it is worth remembering the outcome of Spartacus, the movie.Unable to identify Spartacus, General Crassus crucifies all of the slaves.In our approach too, and in marked difference to previous approaches to collective obfuscation, people will be asked to expose themselves to a-very limited-risk.Because of this demand we have to make of other users, we need to spend a bit more time on the philosophical and jurisprudential underpinnings of our approach, and generally the nature of privacy, to justify this demand and put it into context.

Privacy as a public good and a public responsibility
Privacy has traditionally been framed in law as a paradigmatic case of an individual right that pitches the self-interest of individuals against the communal interest of the state.This is a feature it shares with the traditional understanding of much of human rights law, as granting individual rights that protect against state action only.
Only recently, an alternative discourse in human rights scholarship has emerged, which understands privacy as a social or public value on which other important public goods, in particular democracy and public participation rests.Privacy enables individuals to criticise and resist measures or acts of government that are of an undemocratic or even totalitarian nature.Bloustein (1964Bloustein ( , p. 1003) ) argues that [t]he man who is compelled to live every minute of his life among others and whose every need, thought, desire, fancy or gratification is subject to public scrutiny, has been deprived of his individuality and human dignity.Such an individual merges with the mass.His opinions, being public, tend never to be different; his aspirations, being known, tend always to be conventionally accepted ones; his feelings, being openly exhibited, tend to lose their quality of unique personal warmth and to become the feelings of every man.
Indeed, the experience in many totalitarian regimes has shown that an absence of privacy has the potential for creating a ''society of followers'' (see also Simitis (1987, p. 399)).
This interdependency between the protection of privacy and the protection of other essential features of a democratic society is also highlighted by Raab (2012) who argues that values like personal autonomy and self-determination are important not primarily because individuals may wish to live in isolation (for they do not, mostly), but so that they can participate in social and political relationships at various levels of scale, and so that they can undertake projects and pursue their own goals.
While this shift towards the recognition of privacy as a public good is welcome, for our purpose it has the problem that much of the reassessment also resulted in questioning the role of consent and privacy waivers.As long as privacy was seen merely as an individual right, governments found it easy to convince individuals of the legitimacy of a privacy-security trade-off.Similarly, free social media services such as Facebook offer essentially a ''trade in'' between privacy and free use of services, paid for by advertising revenue.This turned privacy into a tradable object under the control of the rights holder, and marginalised the concept of ''privacy risk''.Theories that emphasise the value of privacy as a common good therefore also became sceptical of the notion of free alienation of privacy in market places, and with that the role of individual consent.As Regan (1994, p. 233) argues, there is a risk that [i]f one individual or a group of individuals waives privacy rights, the level of privacy for all individuals decreases because the value of privacy [in the collective view of society] decreases.
Or put differently, in a society where ''Big Brother'' is daytime television and everybody shares their feelings on Twitter, refusing to participate in the sharing of data is at best mildly odd, at worst in itself suspicious.In a law enforcement context, this means that an already existing ''information imbalance'' between citizens and the state is further shifted in the state's favour.We have to be careful therefore that our own approach and its use of ''consent'', does not inadvertently undermine this very notion of privacy as inalienable civil right on which it dependents.
These preliminary jurisprudential reflections provide us with an abstract normative framework for the technological solution to protect the privacy of people caught for whatever reasons on the police radar during an investigation.It assumes that the protection of privacy is not just a task for the individual, but a communal concern.The aim is a solution where through solidarity in a community, the identity of a suspect is protected, without interfering with legitimate police interests.This requires reassurances, technological, institutional and legal, for those people who are willing to assist in the protection of each other's privacy.In the next section, we introduce our proposal for a ''Data Acquisition Platform'' (IDAP), focussing mainly on the first aspect, how the necessary trust can be created that allows actions of solidarity.

Backgroung and related work
The retrieval of information from a third-party in a private manner is a generic problem that has been researched for use in a variety of different scenarios such as cooperative scientific computation (Du and Atallah 2001); and on-line auctions (Cachin 1999).The things people search for disclose potentially a lot about them.This is the central part of Google's business model-online behavioural profiling based on search queries allows the targeting of advertising with a high degree of accuracy (Tene 2008).More and more often, analysing search queries by suspects also plays a role in criminal investigations, establishing motives, methods and state of mind (Lawless 2007).Initially, Private Information Retrieval (PIR) protocols were designed with a basic requirement of acquiring an interesting data record from a dataholder, sender, in a way that this dataholder is unable to judge which record is of interest to the requestor, chooser.These protocols were not concerned with the secrecy of the records, thus in its least optimised state a PIR could have been achieved by transferring the whole database from the sender to the chooser, as this would allow the chooser to retrieve a record in a private manner.To use a simple analogy, if an individual wants to browse the offerings of an online retailer of medical self-help books, but does not want to leave a trail that tells the retailer which illness he suffers from, downloading the catalogue in pdf and searching it in the privacy of his own home has advantages over online browsing.There are no privacy concerns on the side of the retailer as all the information in the brochure is public anyway.Consequently, the main motivation behind PIR schemes is minimising communicational and computational complexity (Ostrovsky and Skeith 2007).But what if some of the information in the brochure is private, and must not be sent to a customer who is just browsing?For this we need the stronger 1-out-ofn Oblivious Transfer (OT) primitive that allows the retrieval of a randomly selected record from the dataset of n elements held by the sender in a way that the sender cannot learn which record has been transferred, and the chooser cannot learn anything about other records in the dataset (Schneier 1995).1-out-of-n OT protocols that allow chooser to actively select a record to be retrieved, and that have linear or sub-linear complexity, can be referred to as symmetric PIR (SPIR) protocols, since they protect the records of both parties during the information retrieval.These useful privacy-preserving data retrieval protocols can be employed in a variety of systems: electronic watch-lists of suspects (Frikken and Atallah 2003); cooperative scientific computation (Du and Atallah 2001;Goldwasser and Lindell 2002); and on-line auctions (Cachin 1999).Frikken's and Atallah's approach deserves some further comments, as it shares some of the technological solutions with our proposal, but due to a very different legal-ethical approach to privacy advocates an implementation that is unsuitable from our jurisprudential perspective.A typical application for their solution is the following: the police have received information that some known suspects are planning a bomb attack, possibly using fertiliser.They want to query the database of a fertiliser retailer, ideally without alerting the retailer of the identity of their suspects.This can have several reasons, including suspicions against the retailer himself.As indicated above, one obvious solution would be to simply request the entire database, or data about everybody who bought fertiliser, and analyse it on a police server.However, this would mean that the police also gets hold of data about a large number of innocent citizens.Frikken's and Atallah's solution is to provide the police not with the entire database of the retailer, but a segment of it that is sufficiently large to hide their interest in a specific person from the retailer.To protect the wider public though, the selection of data is determined by an objective criterion such as a list of people with previous criminal records, possibly for related offences.This minimises the privacy risk for innocent citizens.It does however increase the privacy risk for people on the lists from which the selection takes place.The retailer could in this case learn that a number of his customers have previous records, or have come to the attention of the police in some other way.We can see now the different jurisprudential assumptions behind this and ours: For Frikken and Atallah, privacy is a conditional right that can be lost through misbehaviour.This does not just apply to the suspect in an investigation, who must reasonably suffer restrictions in his privacy to further the aims of the criminal justice system.Rather, once convicted of a criminal offence, the offender suffers in perpetuity reduced privacy rights, even in cases that have nothing to do with him personally.Where our model is based on the voluntary solidarity between all citizens (whether or not they have a previous record, or are on a police watch list), in their model a subset of the citizenry, those who for one reason or the other have already become subject to police interest, are forced to provide the cover for the investigators.
With the use of the protocols described above, a chooser would be capable of privately retrieving a record from the sender's database, by secretly referring to its index in this database.In SPIR such index is expected to be publically available in an electronic directory (Aiello et al. 2001;Bao and Deng 2001).However, ISPs and other dataholders with large databases of private data cannot be expected to maintain such freely available indexes.Also, it is expected that an investigator would normally refer to a suspect by name, ID or phone number, etc.For this reason before the data can be received using SPIR, a search needs to be performed by the chooser against the records in the sender's database.Such a private search operation requires a protocol that allows two parties to compare the values of their data in a private manner.The protocols that are optimised to make comparisons for equality are referred to as Private Equality Test (PEqT) protocols.PEqT protocols are often based on commutative (Frikken and Atallah 2003;Kwecka et al. 2008) or homomorphic cryptosystems (Bao and Deng 2001).
A record of interest can be located in a database using a 1-out-of-n PEqT protocol and then retrieved with help of SPIR.Often each of these protocols has a separate computationally expensive preparation phase, making it suboptimal for IDAP.The exception to this rule is a range of protocols including: private intersection; private intersection size; and Private Equijoin (PE) defined in (Agrawal et al. 2003).These protocols are based on commutative encryption and thanks to the use of different properties of the underlying commutative algorithms are capable of allowing for both private matching and private data retrieval.

Building blocks
This section describes the PE protocol that is the basis for the creation of our privacy preserving investigative platform-IDAP.A more detailed description of the technical aspects of IDAP can be found in the 2011 PhD thesis of one of the authors, Kwecka, Cryptographic privacy-preserving enhancement method for investigative data acquisition. 1he PE protocol relies on commutative cryptography, thus some background for this is provided first.

Commutative cryptosystems
Many cryptographic applications employ sequential encryption and decryption operations.The reasons to sequence (cascade) different cryptographic schemes together include strengthening the resulting ciphertext and achieving additional functionality which is impossible under any given encryption scheme on its own (Shannon 1949;Weis 2006).A basic cascadable cryptosystem can consist of a number of encryption stages, where the output from one stage is treated as an input to another.In such a basic cascadable cryptosystem it is necessary to decrypt in the reverse order of encryption operations.However, a special class of sequential cryptosystems commutative cryptosystems-allows for the decryption of a ciphertext in an arbitrary order.Thus  Shamir (1980) as used in his, Rivest's and Aldman's classic game of mental poker, employing the Three-Pass (3Pass) secret exchange protocol.We note though that SRA is not secure against known-plaintext attacks. 2he most commonly used commutative cryptosystem is based on the Pohlig-Hellman (PH) scheme (1978), a symmetric key exponentiation cipher (Menezes et al. 2010 p 642).While the PH protocol influenced the design of Rivest-Shamir-Adleman (RSA) public key scheme (1978), the main strength of PH is that it is commutative for keys based on the same prime number and that it allows for comparing the encrypted ciphertexts.Consequently, under PH the two ciphertext c ba = e b e a (m) and c ab = e a e b (m) hiding the same plaintext m are equal (1), while this is not the case with ordinary encryption protocols, that satisfy (2).Thanks to those properties PH can be used in the 3Pass primitive that allows two parties to exchange data without exchange of keys, as well as to perform PEqT that permits private matching of data records.

Three pass protocol (3Pass)
The 3Pass protocol, shown in Fig. 1, was intended to allow two parties to share a secret without exchanging any private or public key.
The operation of the protocol can be described using the following physical analogy: 1. Alice places a secret message m in a box and locks it with a padlock E A . 2. The box is sent to Bob, who adds his padlock E B to the latch, and sends the box back.3. Alice removes her padlock and passes the box back to Bob. 4. Bob removes his padlock, and this enables him to read the message from inside the box.
There can be more parties, or encryption stages, involved in a 3Pass-like protocol, and this property makes it ideal for locking a plaintext multiple times and then unlocking it in an arbitrary order, as long as the parties are cooperating until the execution of the protocol is completed.Such functionality is required by IDAP as described later in this paper.

Private equality test (PEqT)
PEqT protocols can be used to privately verify whether two secret inputs to the protocol are equal or not.Agrawal et al. (2003) proposed one of the most scalable and flexible PEqT protocols for operations on datasets.The scheme is illustrated in Fig. 2 and can be described in the following steps: 1. Alice encrypts her input and sends it to Bob. 2. Bob encrypts the ciphertext received from Alice and sends it back.3. Bob encrypts his secret input and sends it to Alice. 4. Alice encrypts the ciphertext containing Bob's input. 5. Alice compares the two resulting ciphertexts, if they are equal then her and Bob's inputs are equal.6. Alice may inform Bob about the result.Fig. 1 Three-pass secret exchange protocol.The protocol was aimed at providing an alternative to public-key encryption and DH-like key negotiation protocols The following section describes a scheme that extends both the PEqT and 3Pass primitives to form the PE protocol that is the blueprint for our IDAP.

Private equijoin protocol
A PE protocol can enable two parties, the chooser and the sender, to privately compare their sets of unique values V C and V S , and allows the chooser to retrieve some extra information ext(v) about records V S , that match records V C on a given parameter.Thus, sensitive data marked as V C and V S , such as date of birth, address or credit-card number, describing the data subjects in two datasets can be compared in their encrypted forms using the PEqT primitive, in order to find the equijoin between the two datasets.The equijoin shows where the list of the items requested match the lists of the items in the dataset and nothing else.Then the PE uses the 3Pass primitive to reveal the information that the sender wants to make available to the chooser, the ext(v), for the items in equijoin only.However, the sender is ''blind'' at this stage, as s/he does not know the records that are in the equijoin.Consequently, the investigators could encrypt their list of the suspects V C and receive data, ext(v) on the individuals matching the criteria in the encrypted set V S .Please note that v stands for a single record/data-subject in dataset V C or V S .Thus the uppercase letters refer to sets.The PE protocol involves the following steps: 1.Both parties apply hash function h to the elements in their sets, so that X C = h(V C ) and X S = h(V S ).Chooser picks a secret PH key E C at random, and sender picks two PH keys E S and E S 0 , all from the same group Z p * .2. Chooser encrypts entries in the set:  The above protocol can perform the basic functions required for the purpose of investigative data acquisition.Its use in investigative scenarios is described in the following section.

IDAP versus Private Equijoin
This section evaluates our proposed use of the PE protocol as basis for IDAP.The operations required during investigative data acquisition from a third party in general consist of: 1. Identification of the type of the information that is required.These could be h parameters that contain answers to investigator's questions, referred to as return parameters rp 1-k , e.g.Date of Birth (DOB), address, or numbers called by a given subscriber.In a formal, legally prescribed environment, it ought to be demonstrable later that these criteria matched those on the warrant application, adding an additional level of legal scrutiny and accountability.2. Specification of any circumstantial request constrains, or l different input parameters, ip 1-l , with values ip_val 1-l , e.g.time frame of the transactions being requested.3. Specification of the relevant data subject e.g. by identifying the individual whose data is to be retrieved, or by providing the mobile phone number of the suspect, etc.This parameter is referred to as the record of interest, ri with value ri_val.4. Retrieval of the relevant records.
Then, if we refer to the dataset as the source, the request for investigative data could be mapped into the following SQL query: SELECT rp 1 ; rp 2 ; . ..; rp h FROM source WHERE ri ¼ ri val AND ip 1 ¼ ip val 1 AND ip 2 ¼ ip val 2 AND. ..AND ip l ¼ ip val l ð2Þ In most cases the names of the return parameters, as well as the names of the input parameters, and values of these input parameters can be openly communicated.But the value of the interesting record, ri_val is used to uniquely identify the suspect and must be hidden.This can be achieved by running a database query for the return parameters of all the records that satisfy the conditions defined by the input parameters and then collecting the interesting record from the sender using a PE protocol.Consequently, the query that is actually run on the sender's database can be rewritten to: The results of such query (3) would be an input to a PE that would enable the chooser to privately select only the record of interest that match given ri_val.

PE's performance
The previous section discussed different types of protocols that could enable the chooser to download a record from the sender's database, maintaining the secrecy of the record selected.We mentioned that most available protocols could not achieve IDAP on their own, and a combination of several protocols is required.Such combination typically results in high computational and communicational complexity, because each protocol usually requires its own preparation phase.The PE protocol described in Section IV is capable of both private matching and performing SPIR, and has a low overhead.Table 1 defines the computational complexity of the protocol.For research purposes the PE protocol has been implemented on a desktop computer running Microsoft Windows XP Professional operating system with an AMD Turion 64 X2 Mobile 1.58 GHz CPU, and 3 GB of RAM.The implementation was based on the Bouncy Castle cryptographic API.MS SQL GUIDs acted as input to hashing protocols, while the produced hashes were used as an input to the asymmetric algorithms (as in the OT and PE protocols).The AES128 protocol was tested using a 1 kB input (that is approx.150 words of ASCII text) this is expected to be larger than necessary to simulate records returned by the dataholder (similar amounts of data are used in Iliev and Smith (2005) and Cristofaro et al. 2009).Regarding Key sizes: symmetric encryption AES 128bit.The cost of operation is the time of encrypting/ decrypting 1 kB of data.Asymmetric operations PH/SRA 1,024 bit.The cost of operation is the average time to encrypt/decrypt 128 bits of data.Using the implementation the research team has confirmed that some of the experiments can be simulated based on the computational complexity and cost measured in millisecond for operation.The values for the cost presented in Table 1 are based on an average time for the execution of 1 million cryptographic operations of the given type.The same method is used to evaluate the performance of a modification that we suggest below.However, since this modification is only partly motivated by efficiency, but also responds to legal requirements, this paper does not discuss the performance side by side.Reader who would find it easier to see the results side by side are referred to the online resource (footnote 3), especially Fig. 6.1, also for further technical details about the implementation that will be of less interest for readers with legal background.
In practice this particular solution based on the PH cipher and implemented in C#.NET can process thousand records a minute, on average.The following section discusses the performance in context of investigations, and discusses issues that could limit the usability of our solution.

Advantages of PE in data acquisition process
Following our general philosophy outlined in the first part, the PE protocol allows for acquiring more than one interesting record at a time, and adding more records to the enquiry increases the processing time by a negligible value (*151 ms) per each extra interesting record in an enquiry.Use of PE also satisfies the condition that the data holder remains in full control of data, and decides what can be disclosed.This addresses several current legal concerns regarding whether or not the police should be given direct access to traffic data in particular, or as in the present system, the data controller should remain in control and can if necessary refuse the request and challenge its legitimacy in court.The costs involved in building and deploying PE based IDAP are anticipated to be low since it is a software system and the architecture is based on a protocol that is in the public domain.

Limitations of PE in the data acquisition process
The processing time required for the protocol to run is the main drawback of the PE protocol.If there are a thousand records in the database it only takes approximately Asymmetric Crypto.

Crypto. operation Key generation Crypto. operation
Step Step 4 --O(2 m) Step 5 Cost (ms/operation) 0.33 7 30 The complexity of each of the steps in the proposed initial solution.Where n is the number of the data rows in the source, and m is the number of interesting records.Cost is the measured average time in ms to perform given cryptographic operation from managed C#.NET code Privacy enhancing technologies, collaborative obfuscation 1 min for the complete run of the protocol, however, the processing time is linear to the number of records in a dataset and data acquisition from a database with five million records would take three and a half days to run on an ordinary PC.During an urgent enquiry, especially where there is a clear danger to life, the police can currently get access to relevant location data from a mobile network operator in less than half an hour.Such a result could not be expected of PE if the database has more than thirty thousand records.Additionally, even if the data requested is relatively small in size, e.g. 100 kB per record, the results from a database of five million records would be more than 500 MB of data that would need to be transferred over the Internet.Clearly, there is a requirement for the PE to run on a subset of the sender's database rather than the whole database or another solution would need to be chosen.As we will see, the need for efficiency aligns neatly with certain data protection requirements.

Proposed modifications
The previous section has listed some drawbacks of using PE in the pursuit of IDAP.
Here these drawbacks are addressed.

Lowering processing time
Above we recommended minimising the processing time for each run of the protocol in large databases, such as those belonging to ISPs and mobile telephone providers.Theoretically, in order to maintain privacy of the suspect, the chooser needs to request from the sender to process all the records in the database.Only this way no information about the interesting records is revealed.The correctness of this scheme can be proven under the requirements of the multiparty computation (Asonov and Freytag 2003).In its current form the system would not be capable of processing any urgent requests due to the processing time required.This would be a major drawback, which could be mitigated by limiting the numbers of records that need to be processed and then sent by the sender per enquiry.Privacy of the alleged suspect should be protected, but if the probability of the sender guessing the ID of the interesting record is for example 1:1000 and not 1:n, and the dataholder has no other information that could help infer any knowledge as to the identity of the suspect, then arguably the privacy of the suspect and the investigation is maintained.As we discussed above, also during traditional face-to-face investigations, diffusion is used.As we noted, this is a widely accepted technique which would however in a digitalised environment fall foul of the prohibition of fishing expeditions.We are therefore from a legal perspective required to balance various conflicting-and sometimes converging interests.The interests of the police in a speedy investigation converge with those of other data subjects that he police should only receive the minimal amount of data necessary-this points to a solution that limits the number of ''camouflage records'' that they receive.From the perceptive of the suspect, it matters just how detrimental an inference would be drawn by the mere fact of being the suspect of a criminal investigation.Thirdly, the nature of the data is also relevant.In an investigation against a suspected paedophile for instance, even otherwise innocent behaviour like browsing catalogues for children's wear can be indirect evidence for the police case.In this situation, were it to become public knowledge that someone is suspected by the police of paedophilia would be particularly severe on an innocent suspect.At the same time, the mere fact that someone was looking at clothing catalogues is not particularly sensitive data outside the context of such an investigation.Therefore, the customers of the online retailer who are asked to provide ''camouflage'' for our suspect do not risk anything personally, even if the data were compromised, as the fact that they too looked at clothing catalogues is in itself uninteresting.In this scenario, it seems reasonable to increase the number of foils, as the risk for each is negligible, but the privacy gain for the suspect considerable.However, if the data is sensitive or possibly embarrassing regardless of whether or not it is analysed in the context of an investigation, for instance information about buying Viagra, then the number of foils should be reduced to minimise the risk for them as third parties.Our approach allows ''scaling'' the protection of both the suspect and that of the other customers, taking this type of legally required balancing as a starting point.
The problem is to decide on the technique of narrowing down the scope in a way that ensures the record of interest is among the results returned.If the list of the record identifiers is public, such as the list of the Internet Protocol (IP) addresses or telephone numbers served by a given network operator, then the chooser could simply selected records to be processed at random from such directory.However, in case such list is not publicly available it would be possible to split the PE protocol back into separate parts: PEqT; and OT, and an additional off-line preparation phase.This way the initial off-line phase could be run against the whole database, but the information retrieval would be performed against a smaller set of records.If as a number of records requested per each interesting record is defined as the diluting factor-o the protocol IDAP would be defined as follows: Phase A: Preparation 1. Sender applies hash function h to the elements in the input set V S , so that X S = h(V S ). 2. Sender picks a encryption PH key E S at random from a group Z p * , where p is a strong prime.3. Sender encrypts each h(v) [ X S with the key E S , the result is a list of encrypted identities If more records need to be added to the set these can be processes using steps 1 and 3, and then added to the list.
Phase B: PEqT 1.Following a request for data, sender provides chooser with a complete list of encrypted identities prepared during Phase A, reordered lexicographically.2. Chooser applies hash function h to the elements in set containing the identities of the interesting records, so that X C = h(V C ).
3. Chooser picks a commutative cryptography key pair, encryption key E C and decryption key D C , at random from the same group Z p * that was used by sender in the Phase A. 4. Chooser encrypts entries in the set X C , so that: Step C7 obtaining tuples a, b such that ha, bi = hh(v), E S 0 (h(v))i.Thus, a is the hashed value v [ V C , and b is the hashed value v encrypted using E S 0 .9. Chooser sets aside all pairs received in Step C5, whose first entry is equal to one of the first entry of any two-tuples obtained in Step B9.Then uses the appropriate b tuple associated with a given interesting record as a symmetric key to decrypt the extra information contained in the second entry in the pair received in C5.This is performed for all the matching entries.
In this improved protocol the initial processing depends on the size of the dataset-n, but it needs to be performed only once in a given period of time, e.g.once per month, or per year.There is no need that the camouflage data is up to date, since the police is ex hypothesis investigating a past event, so might well be interested in a former client, or a client whose circumstances have changed.The remaining operations are less processing heavy as illustrated in Table 2.The IDAP protocol has been implemented in the same fashion as the PE protocol described in Sect.3.1.The results from the empirical evaluation matched the results that simulated using the computational complexity and cost presented in Table 2.
Figure 3 illustrates the processes involved in this improved version of acquisition protocol.It is worth noting that there are five communication rounds required in this protocol.This is two rounds more than in the original PE protocol; still, most of efficient SPIR protocols require considerably more rounds.This method provides significant improvements to the processing time required for enquiries if total number of records in the sender's database is higher than o 9 m, i.e. higher than the number of interesting records m multiplied by the diluting factor o.This is illustrated in Fig. 4. Furthermore, the true strength of this version of the protocol is seen when multiple enquiries are run of the same database using a single encrypted catalogue of the records, compiled by the sender in Phase 1 (shown in Fig. 5).

Correctness and security
IDAP is a modification of the PE protocol whose correctness and security proofs can be found in Agrawal et al. (2003).The goals and logic of IDAP and PE are similar; however, IDAP is streamlined to provide better performance than PE in the specific use scenario of investigative data acquisition.There is an assumption that there is a method of authenticating other parties and securing the channel for communication.In order to evaluate the correctness and security of IDAP the inputs and outputs need to be clearly stated (Cristofaro et al. 2009 Normally both parties learn the sizes |V S | and |V C |, as by default all the encrypted identities in V S are send to the chooser, while the chooser in order to find the interesting records among these encrypted identities and in order to decrypt the ext(m) for these records provides the sender with encrypted elements of the set V C .
IDAP is based on Shamir's commutative protocols, a variant of PH protocol where the prime p is public and common between the communicating parties.An adversary with the knowledge of the ciphertext C and the prime p would need to solve the following hard problem to break the commutative PH protocol (Schneier 1995): p modp Just like RSA, the ciphertext created using the PH algorithm may leak some information about the input plaintext message.Therefore, this algorithm is suitable  for uses where the input is formed from random data.This is the case in the PE and IDAP, as the commutative PH is used to encrypt hashed IDs of the records.While it is normally recommended to use padding schemes in any implementation of RSA (Kaliski 2003), and thus PH implementation as well, the PE and IDAP mitigate this requirement by using fixed size hashes as the input.
In order to narrow down the scope of the enquiry IDAP splits the PE protocol into three parts.However, the only way the operations of the protocol are affected is the Fig. 4 Processing time per enquiry depending on the number of interesting records.This proposed modification of the protocol improves significantly the processing time required for the protocol to run for the cases where the product of the number of the interesting records m and diluting factor o is smaller that the number of the records in the database n Fig. 5 Processing time depending on the number of enquires.This proposed modification improves significantly the processing time required for the protocol to run for the cases where more than one enquiry is run against the same database fact that under IDAP the chooser request extra information for only (m 9 o) records, rather than for the whole dataset n.The main consequence of this approach in respect to the security of the protocol is that the sender knows that there are m interesting suspects in the set of identities the size of (m 9 o).On the other hand, for small organisations with less than 100,000 IDs, there is no need to narrow down the results.Consequently, in IDAP, the privacy of the suspect is affected by the diluting factor o, and the sender's probability of guessing the interesting records IDs is 1:o and not 1:n.As long as o is reasonably large, and the sender has no other sources of information about the suspects, the privacy of the suspects should be safe.
IDAP allows for the multiple selection of criteria by hashing together different criteria and using them within the PE protocol as an ID of a record.This does not affect the security of the PE protocol.On the other hand adding a semi-trusted third party-the proxy-in order to restore the balance between the privacy of the innocents and the suspects that we will discuss in the next section would modify the security of the protocol.The proxy filters out the records not classified as interesting from the sender's response.Assuming that the semi-trusted party behaves as expected, the security of the ext(m), the data records contained in the sender's database is information theoretic from the chooser's perspective.On the other hand, if the proxy and the sender cooperate, they can easily work out the identities of the interesting records.The main aim of IDAP is to hide those identities from the sender, However, under current practice, the identities of the suspects are provided in every data acquisition notice.Consequently, if the semi-trusted party were to cooperate with the sender, this would only reveal information that is currently openly communicated to the data holders anyway, making the worst case scenario no worse than current best practice.

Assessing privacy risks and data protection compliance
In this final section we return to our discussion from the beginning and evaluate the wider legal and societal issues that this proposal raises and assess the privacy risks that are involved.We recommend in response two institutional aspects to complement the technological solutions described above.
Let us recap the main features of the system that we have described so far.The police is interested in our target, John Doe.They make a request for data about Doe to the online provider X.Since X does not need to know the identity of the suspect, and may draw adverse inferences about him if he knew Doe was target of an investigation, the police requests data from a larger set of people (the foils), chosen randomly.Since the retailer knows that only one of the people whose information he hands over is the suspect, he can't any longer draw an adverse inference against any individual; the community hides the identity of the suspect from the retailer behind a wall build by them all, just as in the Spartacus example.At the same time, the data of all the customers is encrypted in such a way that the police can only make sense of the data that belongs to the suspect-a key has been created prior to making the data request that opens only the data for the specific subject under investigation.The encryption renders the records unusable to the authorities in the sense that they are secure against attacks in polynomial time.This prevents ''fishing expeditions'', and ensures that the data of the innocent customers can't be used by the police for other purposes.
However, this still involves providing government agencies with records of individuals that are ''innocent bystanders'', which raises legal issues as well as issues of public acceptance.There are some additional actions that may reassure the public that the data is safe.First, if the technique for minimising the processing time is employed, the chances that investigators will retrieve encrypted records of a particular individual that is not a suspect are small in large datasets.This also means that the investigators would need to first break the encryption key used by the sender to hide identities (Phase A), before they could attempt to obtain the data about a specific individual that is not a suspect.Additionally, if the identity of a data subject is never encrypted under the same key as the data records, then investigators would need to successfully brute force two separate keys in order to make use of the retrieved encrypted records.Otherwise the information will be unintelligible.
Furthermore, even if the police could access this data, it would in all likelihood be of no interest, and have no potential of privacy harm, as it was generated randomly.In addition to the relevant information that Doe bought large quantities of fertiliser-relevant giving the investigative hypothesis that he is a bomb maker-the police would learn nothing more significant than e.g. that a Mr. Smith bought a shovel and Mrs Jones a wheelbarrow from the same farm equipment company.At the same time, the police would become exposed to a significant risk themselves for violating their legal obligation to destroy this data unseen.This random character of the camouflage information therefore prevents the police from collecting it strategically.However, some of the data could expose the data subject to risks other than privacy risks.For instance the data from the ''foils'' might be credit card details.If the police were to lose this data before destruction, people may fear that the data could fall into the hands of criminals.That the data is highly encrypted may be insufficient to alleviate this fear.Acceptability therefore depends also on the public trust into the data handling and security procedures used by the police institutionally, not just the technology provided by our approach.Most security professionals trust into a security process more than they trust in encryption.One possible solution is to involve a semi-trusted third party, resulting in a modified process: 1.All communication between chooser and sender goes through proxy.
2. Chooser provides proxy with the identifiers of the interesting records encrypted by sender, E S (h(v)).This is done over a secure channel or with use of a 3Pass protocol once the parties are authenticated.3.At the stage where data is transferred from sender in Step C4, proxy filters the response and discards the records that were not specified by chooser's request, i.e. the records other than the ones identified in Step 2.
The semi-trusted party should have no interest in finding out the object of the investigation or the content of the data records returned by the dataholder.The party that is chosen must not cooperate with the sender or the protocol will be broken.
A key requirement is therefore that the proxy has no incentives to find out the detail of the investigation, thus it is not going to invest in expensive-development of decryption, nor it is going to cooperate with the sender in order to establish the identity of the suspect.On the other hand, if the need arises to verify the chooser's requests in a court of law, the proxy and the sender can work together to establish the identities of the records requested by the chooser, or verify that the data request by the police was in conformance with the warrant that was granted.This introduces an additional ''price'' for the police-in return for more secrecy vis a vis the data controller (the online retailer or bank) and a more efficient search, they are also subject to more scrutiny and transparency Data requests are now lodged with a third party which can also check if the formulation of the search query was law compliant.
Nonetheless, from a (EU) legal perspective, even encrypted data is still personal data under the Data Protection Directive.This means one of the six legally valid grounds for processing the data must hold.The most obvious is the consent of the data subjects, which we will discuss below.Another basis could be a legal duty created through statute. 3Currently, no duty for citizens to participate in schemes like the proposed exist.However, the arguments that we developed in the first part would at least permit legislators to create such a duty.Even though it would impose a (minimal) privacy risk for the ''foils'', since this is required to reduce the much greater privacy risk of the suspect, such a prima facie infringement would arguably be proportionate, efficient and necessary.Secondly, using again the idea that privacy is as a common good that is fundamental for a free, democratic order, it may even be possible to permit such an approach in the absence of new legal duties.Art 7(e) of the directive creates a blanket exception if the processing of the data ''necessary for the performance of a task carried out in the public interest''.This in turn might make it unnecessary to require consent from those customers whose data is used merely to hide the identity of the suspect.Just as our privacy can be violated as part of a criminal investigation to further the public good of efficient law enforcement, so one could argue that we are also required to shoulder a purely theoretical privacy risk to maintain the foundations of a free society.Similar arguments have been made in the past regarding medical research data and ''benefit sharing'': as long as I benefit in the long run from medical research, solidarity requires that I take a marginal privacy risk in making some of my data, in an anonymised, encrypted format, available for research (Wicks et al. 2010;Laurie and Sethi 2013).We have a similar benefit sharing here-everybody can become subject of a police investigation, so in the long run, I share the benefits from a system that pools all our records and selects randomly a few of them each time a the modern equivalent of a Roman General asks is: Which one of you is Spartacus?
Art 7(c) and 7(e) would result in slightly different legal regimes, and therefore also slightly different implementations of the approach.In neither case, consent of the ''foils'' is necessary.However, if governments were to decide to impose a new duty under Art 7(c), the approach proposed in this paper, or a functionally equivalent solution, would become legally mandatory and therefore used by all online organisations that store customer data.Art 7(e) by contrast simply creates permission for online retailers to implement this solution.We can therefore in this case expect a much less widespread uptake, with market forces ultimately deciding on its acceptance.
However, the absence of case law makes it difficult to assess if this argument, which relies on the jurisprudential analysis of privacy in the first part of this paper, would withstand scrutiny by the courts.A legally safer option is therefore to ask for a generic consent from customers-''are you willing to put your data in a pool if and when there are police inquiries in the future''?This anticipatory consent prevents time delays during investigations.Whether or not a sufficient number of customers would be willing to subscribe to such a scheme requires further, empirical research that should also address the question how adequate incentives could be designed.A conceptual problem however arises for our approach.We noticed above the possible conflict between a conception of privacy as a public good and the notion of consent as the ultimate ''trump'' that allows individuals to opt out of their otherwise guaranteed protection.Relying on consent in turn by us seems to undermine the very basis of our approach, also because of the increasing recognition of the cognitive limits of consent that turn it into barely more than a legal fiction (Solove 2013).This is true particularly if solidarity alone is not sufficient to incentivise customer's to participate in the scheme and other incentives need to be found.Nor is it certain that reciprocity could be required in order to be protected under the scheme so that only those who ''donate'' their data will benefit if they themselves should come under the spotlight.If in this case, the police makes an inquiry regarding someone who is not participant in the scheme, his data would be treated with less concern for privacy than possible in principle.This could render the data controller, that is the company, in violation of data protection law.Otherwise privacy would again be treated as an alienable property, to be assigned away provided consent is given.Despite all this, consent seems at present the legally least problematic approach.

Conclusion and further work
Our investigation started with a common privacy problem in online investigations: In order to obtain data about a suspect, the police must disclose to the data controller (a bank, and ISP etc.) the identity of the ''person of interest''.This poses a privacy and reputation risk to the suspect: people often assume that ''where there is smoke, there is fire'', and even being subject of a police investigation carries substantial reputation risks-holders of public office e.g. will frequently resign even at such an early stage of a criminal investigation.It also poses a risk for the police investigation and its integrity, as it can warn off suspects and increase their flight risk.A combination of technical and legal factors prevents the use of strategies to minimise these risks that are used in the offline environment.As long as we pitch state interests (here, the police) against those of the citizen, the problem is difficult to address.In our setting though, these interests converge.By looking at new and emerging conceptions of privacy that understand it less as an individual right, but as a communal good that enables important social institutions in a democratic society, we were able to overcome this gridlock and suggest a combination of technical, attitudinal and legal measures.
Because this conception of privacy differs from the traditional jurisprudential conceptualisation, it raises several questions about the legal evaluation of our proposal.We discussed possible legal foundations that allow the necessary data transfer, concluding that a significant degree of public acceptance will be crucial.The success of our proposal will therefore ultimately depend on empirical, social factors regarding risk assessment, solidarity and communal loyalty.Further research should in particular look at social attitudes to ''privacy risk sharing'', and how, if at all, it differs between different online communities.We should expect uptake to be highest in those environments where mutual solidarity and a feeling of belonging is strongest, for instance voluntary internet based associations such as the community of Wikipedia editors, and the lowest where the community'' is one of mere convenience, such as the ''community of Amazon customers''.
As the initial problem was caused by a combination of traditional legal concepts and their lack of ''fit'' with modern online environments, our solution too employed a combination of legal and technological approaches.Further research is therefore needed on legal, technological and organisational aspects alike.From a technological perspective, improving further our idea that for specific queries, different ratios between ''camouflage'' and ''real'' data are better than a ''one size fits it all approach'' will be further explored.This involves studying further the balance between number of foils, sensitivity of data and resulting risks.The challenge here is also to balance protection from risk against communication complexity in both legally and technologically sound ways.Exploring different ways to balance communication complexity, different key sizes and the ratio between interesting/ extra data that is sent to the investigators should result in a number of typical risk profiles, which can lead to partly automated choice of protocols.
A different task will be to extend our approach beyond the simple model of a one off query of the type typically encountered in police investigations.Were the police to make several queries about the same suspect to the same data controller in a short period of time, the controller might be able to triangulate the identity of the suspect after all.This would still require much more effort than they have to invest at present, but would at least be theoretically possible.Multiple queries of this type are rare, due to police operational reasons (and also legal constraints), much more common however are of course request for the long term surveillance of an account in situations where the goal is prevention of future crimes rather than investigation of a past crime.A natural extension of our idea would therefore be the study of long term, real time surveillance operations which inevitably would demand much more from the ''foils''.Our approach to think of PETs as communal tasks should either way make a valuable contribution to the range of PET tools that are available.In the past, they reflected the libertarian, individualistic concept of privacy law, equipping individuals with protective tools that ''build walls around them'' within which they can keep their data safe.By contrast, our approach is a tool for the emerging understanding of privacy as a public good, where the protection of anonymity becomes a communal task, where we are strong only when united.
, a ciphertext c = e b e a (m) (c ciphertext, m plaintext, e encryption operation under keys a and b), could be decrypted as either m = d b d a (c) or as m = d a d b (c).The advantages of such cryptosystems were widely promoted by e a e b ðmÞ ¼ e b e a ðmÞ ð 1Þ e a e b ðmÞ 6 ¼ e b e a ðmÞ ð 2Þ Alice's input: secret message m ; encryption key A E ; decryption key A D .Bob's input: encryption key B E ; decryption key B D .

Fig. 2
Fig. 2 Private equality test.This protocol allows two parties to compare their secret inputs

5 .
Chooser sends to sender set Y C , reordered lexicographically.6. Sender encrypts with key E S each entry y [ Y C received from chooser.7. Sender returns set of pairs hy, E S (y)i to chooser.8. Chooser decrypts each entry in E S(Y C ), obtaining E S (X C ) = D C E S (E C (X C )) = D C E S (Y C). 9. Chooser compares each entry in E S (X C ) to the entries of Y S constructed in Step A3 (Step 3 of Phase A) and received by the chooser in Step B1.This way the interesting records can be identified.Phase C: OT 1.After identifying the interesting records in Y S the chooser selects at random o -1 other unique records from Y S for each interesting record in V C .These are the diluting records, which together with the records of interest form a shortlist for the enquiry.If the number of interesting records multiplied by o is greater than n, the size of the dataset V S , then the complete Y S is shortlisted.2. Send the shortlist to sender.3. Sender picks an encryption PH key E S 0 at random from the group Z p * .4. Sender identifies entries h(v) from X S that have been shortlisted and processes each shortlisted record in the following way:(a) Encrypts h(v) with E S 0 to form the key used to lock the extra information about v, i.e. ext(v), j(v) = E S 0 (h(v)).(b) Encrypts the extra information using a symmetric encryption function K and the key j(v) crafted in the previous step: cðvÞ ¼ KðjðvÞ; extðvÞÞ.(c) Forms a pair hE S (h(v)), c(v)i. 5.The pairs formed in C4(c), containing a private match element and the encrypted extra information about record v, are then transferred to chooser.6. Sender encrypts each entry y [ Y C , received from chooser in Step B5, with key E S 0 to form set of pairs hy, E S 0 (y)i 7. Pairs hy, E S 0 (y)i are then transferred to chooser.8. Chooser removes the encryption E C from all entries in the 2-tuples received in )]: Chooser's input set V C containing IDs of interesting records Sender's input set V S containing IDs of the records in the dataset, together with extra information about these records-ext(m) Output chooser learns |V S | (the size of the set V S ), V C \V S and ext(m) for m [V C \V S , while Sender learns |V C |. Proxy learns only the sizes of the sets Fig. 3 IDAP process flow.Graphical representation of the improved IDAP Chooser sends to sender set Y C , reordered lexicographically.4. Sender encrypts each entry y [ Y C , received from the chooser, with both E S and E S 0 and for each returns 3-tuple hy, E S (y), E S 0 (y)i.5.For each h(v) [ X S , sender does the following:(a) Encrypts h(v) with E S for use in equality test.

Table 1
Computational complexity of the PE protocol

Table 2
Computational complexity of improvement 1