The Case for Book Privacy Parity: Google Books and the Shift from Offline to Online Reading

Cindy Cohn and Kathryn Hashimoto*

On February 18, 2010, Judge Denny Chin of the U.S. District Court for the Southern District of New York[1] presided over the Google Books settlement fairness hearing.[2] Under consideration was the proposed class action settlement agreement between defendant Google and the plaintiff authors and publishers who had challenged the company’s decision to scan millions of books from leading research libraries.[3] At stake was Google’s ambitious—some say audacious[4]—plan to create the world’s largest combined digital bookstore and library.[5] Judge Chin heard testimony on a wide range of issues from more than two dozen speakers, both for and against the settlement agreement.

Speaking along with others, privacy advocates and librarians called attention to the absence of privacy provisions in the agreement—a noteworthy omission in light of the unprecedented amount of reader information that Google will amass under its Book Search services.[6] One of the privacy advocates at the fairness hearing was co-author Cindy Cohn of the Electronic Frontier Foundation (EFF), representing a group of prominent authors and publishers including Michael Chabon, Jonathan Letham, and Cory Doctorow.[7] The group also included authors and publishers of books discussing sensitive or controversial subjects, such as illegal activity, drugs, and sexual behavior, who noted that their readers were especially sensitive to being tracked.

The Google Books services will do a large amount of tracking of readers. It will collect and store the following information as a reader searches and browses:[8]

  • terms the reader uses to search for books;
  • titles and descriptions of books the reader searches for but never reads;
  • titles and descriptions of books the reader finds and browses;
  • pages the reader reviewed and the time spent on each page;
  • titles of books for which access is purchased.[9]

All of that information will be tied to a variety of other information collected while the reader is signed in to a Google account.

Even more significantly, Google Books’ granular tracking of readers will continue long after purchase. In fact, for as long as a purchaser continues to have access to the book, Google Books will continue to track which specific pages are read and reread, even how long is spent on each page read, plus any annotations made by the reader. As EFF observed at the fairness hearing, no library or bookstore could gather as much information about readers as Google Books will, short of hiring someone to follow readers through the stacks and then into their homes.[10]

Google Books will not merely collect and store the information, however. As described in the discussion below, Google’s central business model is to use information about user behavior to target advertising. For users signed in to Google accounts, the information collected through Google Books will be used by Google to target advertising.[11] As the privacy advocates and librarians pointed out both in their briefs and at the hearing, the proposed settlement agreement did not specify any limitations on collection and use of reader information or on the retention, modification, deletion, or disclosure of reader information to third parties or the government.[12]

Google has done little to address concerns about its collection, storage, and use of reader information. Prior to the fairness hearing, the settling parties revised the agreement in response to various objections, but they did not incorporate any reader privacy provisions.[13] Just before the deadline for written objections to the settlement, Google released a privacy policy for Google Books wholly separate from the settlement agreement.[14] The policy points to key provisions in Google’s main privacy policy, then provides specific privacy practices pertaining to current services offered by Google Books and the prospective services described in the settlement agreement.[15] While the Google Books privacy policy addresses some of the privacy concerns raised,[16] the policy does not fully articulate strict standards for usage, retention, and disclosure of reader information. In addition, because the privacy policy is not a part of the settlement agreement, it is not enforceable by a supervising court and can be changed at any time, creating further uncertainty and apprehension.[17]

Approval of the class action settlement requires the court to determine both whether the agreement is fair, reasonable, and adequate to all class members and whether the settlement is in the public interest.[18] The privacy advocates uniformly urged the court to consider the privacy interests of the reading public as part of the public interest evaluation.[19] The objecting privacy authors and publishers also noted that their own expressive and financial interests were adversely affected by the lack of privacy protections in the settlement, as the tracking and potential disclosures would create a chilling effect on readers, especially those consuming books that address sensitive or controversial topics.[20] Overall, the privacy advocates argued that the court should require Google to create strong and enforceable privacy provisions as a condition of approval of the settlement, as such provisions were in the interest of authors, readers, libraries, and the book industry.[21]

At the time of this writing, it is not known whether the proposed settlement will be approved by the district court or how privacy issues will be affected by the outcome. On the brink of the court’s ruling, potentially a major development in internet access to information, this is an apt time to examine online reader privacy in the context of Google Books. Such an examination may also assist in deriving general privacy principles to apply in the context of later online book reading technologies. While many non-book sources of information have already been moved online with little formal discussion of, or protection for, reader privacy, books are one area where strong reader privacy protections have traditionally been embraced and enforced. Thus, the shift to books online raises the question: Should we maintain the strong privacy protections that books have enjoyed offline, or should we downgrade the privacy of online books to the relatively low level provided for most other online activities?

This Article argues for book reading privacy parity at a minimum. The norms and laws for protecting book reader privacy offline can and should carry over into the online world. As public discourse and the marketplace of ideas move further online, public discussion continues to require private space for research, thought, and expression. We should not allow a technological shift to undermine or eliminate that private space.

The Article begins by providing a brief survey of traditional reader privacy protections and the key rationales for them. It then describes the new privacy challenges presented by books online and considers whether there is any basis for reducing the privacy protections for books as they move to an online format. Finally, it outlines specific privacy demands made by EFF and the Center for Democracy & Technology in the Google Books case. Those demands can serve as a helpful starting place for consideration of privacy in other online book offerings, as well as raise questions about whether we should similarly reconsider the relatively low level of privacy protection provided for non-book reading online.

I. The Tradition of Protecting Readers

Historically, government and social institutions have established safeguards that protect an individual’s right to select and peruse printed material free of surveillance and prolonged recordkeeping. These safety provisions have proved necessary against persistent threats to privacy. Reading habits of individuals provide valuable information that law enforcement and the government have actively sought. During the McCarthy hearings, for example, people were questioned about whether they had read Marx and Lenin and whether their spouses or associates had such books on their shelves.[22] Indeed, “[i]n the 1950s, people with leftist books sometimes shelved them spine to the wall, out of fear that visitors would see and report them.”[23] Such threats are not merely a remnant of the McCarthy era. In 2007, Amazon moved to quash a federal government subpoena initially seeking the identities of 24,000 book purchasers.[24] Between 2001 and 2005, libraries were contacted by law enforcement seeking information on patrons at least 200 times.[25]

The right to read free from fear of being watched, reported on, or retaliated against is a necessary element of the robust public “marketplace of ideas” envisioned by the First Amendment. If sufficient safeguards to protect people’s reading and browsing habits are not in place, then the result is a society where such activities are subdued and altered. Such chilling effects severely inhibit expression and public debate. As Justice Douglas observed, “Once the government can demand of a publisher the names of the purchasers of his publications . . . [f]ear of criticism goes with every person into the bookstall . . . [and] inquiry will be discouraged.”[26]

Recognizing the existence and harms of the chilling effect on society, the Supreme Court and lower courts have protected reading privacy from government disclosure. In United States v. Rumely, the Court held that a bookseller could not be convicted for refusing to provide a list of purchasers of a political book.[27] In Lamont v. Postmaster General of United States, the Court struck down a federal statute that required individuals wishing to receive materials the government had labeled as “communist political propaganda” to return to the Post Office a signed notice stating they wanted to receive such materials.[28] The Court noted that “[p]ublic officials like schoolteachers who have no tenure might think they would invite disaster if they read what the Federal Government says contains the seeds of treason.”[29]

Reading also implicates the rights of association and expressive conduct.[30] The Supreme Court has expressly acknowledged that “[t]he right of freedom of speech and press includes not only the right to utter or to print, but the right to distribute, the right to receive, the right to read.”[31] As a result, U.S. courts have continually shielded readers from government intrusion and scrutiny since information about their reading habits could reveal their associations.

The right to engage in private reading is especially fundamental where, as with the Google Books services, the reading will often occur inside the home. As the Supreme Court observed in Stanley v. Georgia, justifications for criminalizing even unprotected material like obscenity do not “reach into the privacy of one’s own home.”[32] It continued,“[I]f the First Amendment means anything, it means that a State has no business telling a man, sitting alone in his own house, what books he may read or what films he may watch.”[33] This reasoning suggests an additional Fourth Amendment interest against unreasonable searches and seizures based on a reasonable expectation of reader privacy in the home.[34]

A related basis for protecting reader privacy is the issue of autonomy and control over data collected about oneself. This is part of a personal interest in not being harmed or exploited by the unintended use of data. The disclosure of a person’s reading habits can lead to snap, and possibly misleading, judgments about a person’s character. In The Unwanted Gaze, for instance, Jeffrey Rosen describes a now infamous episode of Kenneth Starr’s investigation into Monica Lewinsky’s life.[35] Starr attempted to subpoena Washington bookstore Kramerbooks for records of Lewinsky’s book purchases, including her purchase of Nicholson Baker’s Vox, a book discussing phone sex, to give to President Clinton. This revelation, especially if taken out of context, obviously created a risk of judgment about the activities of both Ms. Lewinsky and the President. Privacy includes the ability to prevent actions from being judged and from being judged out of context, and so implicates control and autonomy as much as it implicates intellectual curiosity.

Recognizing the speech and privacy interests involved in the disclosure of a person’s reading activities, federal and state courts have insisted that the government meet high standards when it attempts to obtain reader records.[36] Forty-eight states protect public library reading records by statute, and the attorneys general of the other two states have issued opinions in support of protecting readers’ library records.[37] Thus, the courts and legislature are united in recognizing the importance of a citizenry that can freely avail itself of books and the information they contain without fear of monitoring.

Alongside legal protections are the principles of reader privacy that have been fostered by institutions such as libraries and bookstores.[38] Libraries in particular have developed policies and ethical codes specifically protecting reader privacy and autonomy. The American Library Association, as part of its core mission, has taken on a leadership role in the vigilant protection of reader privacy.[39] Its Library Code of Ethics, first adopted in 1938, consists of eight statements, including: “We protect each library user’s right to privacy and confidentiality with respect to information sought or received and resources consulted, borrowed, acquired or transmitted.”[40] Libraries and brick-and-mortar bookstores have also demanded and fought for a high legal standard, such as a warrant or court order, before turning over reader information to law enforcement or to private parties in litigation.[41] For instance, the Tattered Cover bookstore in Denver refused to turn over book purchase information about a patron accused of drug dealing based on a subpoena, leading to a decision by the Colorado Supreme Court requiring a more protective “warrant plus” standard for such requests in the future.[42] In this way, institutions have played an important leading role in both advancing and enforcing social norms protecting intellectual exploration, including reading.[43]

II. Books Online Present New Privacy Challenges

Against this backdrop of reader protections offline, it is clear that online reader privacy protections are at least, if not more, important. Both the scope of collection and the actual and possible uses of reader information are much greater online than offline. As described above, online book providers can collect and keep records of reader activities that include much more information than offline providers. They can also marshall that information with a level of granularity far beyond the capabilities of physical libraries or bookstores.[44] A physical bookstore or library will likely only ever have a record of the books a reader checks out or purchases, so laws and policies have only needed to address the retention period for this information and whether and on what legal showing it is made available to others.

In contrast, Google Books is a hosted internet service that logs all user activities. Its granular tracking ability begins long prior to purchase, capturing the process of searching, browsing and the intellectual inquiry that leads to selecting a book, even when no book is selected. Google’s tracking also continues long after book selection and purchase of access.

In addition, the actual and potential uses made of the collected information are significantly greater and potentially more intrusive online than offline. For instance, Google offers many services besides Google Books, and its business model aggregates information about users gathered from each service into an overall profile that is then used to contour advertising.[45] This includes collection not only across Google’s own products, such as Gmail, Google Chat, and Search, but also, using cookies and other technologies, the gathering of information about a specific user across all of the websites that use Google’s DoubleClick/AdSense advertising network.[46] Given the size of this network of websites, it is no exaggeration to say that many internet users can spend their whole day online and never leave a site where a Google product is tracking them.[47] Currently, Google Books includes information gathered about logged-in users as part of this large engine for placing behavior-based advertising on the websites viewed by readers.[48]

In theory, the more limited amount of information gathered by offline bookstores and libraries could be used as part of a data aggregation scheme. However, there is no evidence that this is actually occurring and, as discussed above, such a practice would be contrary to the ethical standards promulgated by the American Library Association and other organizations.[49]

III. Book Readers Online Deserve the Same Privacy As Book Readers Offline

As demonstrated above, offline readers have enjoyed strong legal protections and institutional norms, supported by the inability of bookstores and libraries to do much tracking. The underlying justifications for privacy protections remain even as books move online and new technologies make tracking much easier. The marketplace of ideas is still a central animating principle for freedom of expression. Private space for inquiry and discovery is still a precursor to free ideas and beliefs that populate the marketplace, and the chilling effect from having those inquiries monitored and tracked is just as powerful when reading is done online.

This need for private space and control over information about reading activities is actually augmented by internet technologies. As Justice Stevens noted, the internet is “a vast platform from which to address and hear from a worldwide audience of millions of readers, viewers, researchers, and buyers.”[50] The ready availability of so much information online creates a bigger marketplace for ideas, but it should also buttress the argument for carving out private space for inquiry and discovery. For instance, the internet has been a tremendous way for individuals to learn about the medical conditions that affect them and their loved ones, or to inquire into religions or philosophies that they may never have known about, much less been able to study. Yet without sufficient privacy protections, those same inquiries that have newly been enabled by the internet may be stunted due to fear of tracking.

In a case involving Amazon, the judiciary acknowledged the chilling effect that might occur if the government were to seek records of reading from the online booksellers. Rejecting a government subpoena for the reading records of 120 customers, a court noted that:

“[I]f word were to spread over the Net—and it would—that the FBI and the IRS had demanded and received Amazon’s list of customers and their personal purchases, the chilling effect on expressive e-commerce would frost keyboards across America . . . well-founded or not, rumors of an Orwellian federal criminal investigation into the reading habits of Amazon’s customers could frighten countless potential customers into canceling planned online book purchases, now and perhaps forever.”[51]

Research confirms that this same concern is raised for purely online reading. A 2007 survey found that 8.4% of Muslim-Americans changed their internet usage because they believed their habits were being tracked by the government.[52] Indeed, it is clear that many individuals will not access certain types of information online if they cannot do so anonymously. As a district court observed in considering a statute requiring an age verification process for some online material: “[m]any people wish to browse and access material privately and anonymously, especially if it is sexually explicit,” and “[a]s a result of this desire to remain anonymous, many users who are not willing to access information non-anonymously will be deterred from accessing the desired information” due to the age verification requirement.[53] Thus, the need to protect readers from unwanted monitoring remain vital even as reading materials migrate from offline to online environments.

Some may argue that, despite the equal strength of rationales for privacy online, current online behavior in areas other than books indicates that people simply do not want to continue the traditional offline privacy protections for reading online. Put in economic terms, the argument is that users as rational online actors have chosen to trade their privacy for the convenience and often free services offered by various online entities, and that they will continue to do so when reading books.[54]

The argument relies on the observation that much online activity involves reading, even if not reading from books. The same granular information on user activity to be gathered by Google Books is already collected regularly by search engines, host sites, search engines, browsers, and others. Sometimes this information is gathered simply as a side effect of the operation of the underlying technologies. Often, however, information about non-book readers online is put to use as part of the business model of the provider. The uses include not only what Fair Information Practice Principles[55] call primary uses—i.e., uses that are necessary to complete the transaction—but also secondary uses that benefit the provider, such as sales of customer data to third parties and behavior-based advertising models. Thought of in this light, the era of digital books presents us with a major choice: to treat books online like books offline, or to treat books online like other, less protected online activities that also involve reading.[56]

The argument that privacy protections for books online should be low because people tolerate low privacy norms for online non-book reading has three key problems. First, the central constitutional reason for protecting book privacy against government intrusion is the First Amendment. The First Amendment’s concerns about protecting the marketplace of ideas and preventing chilling effects exist, however, regardless of the beliefs or privacy preferences of the majority or whether there is a “reasonable expectation of privacy” as described in Fourth amendment doctrine. As a result, arguments based upon reader expectations or majority views are simply irrelevant to the First Amendment implications of government intrusions into online reading behaviors.[57]

Second, even outside of government intrusions, the proposition that people care less about their privacy online than offline appears to be untrue. Recent survey research conducted by researchers at the Berkeley Center for Law and Technology into the expectations and beliefs of online users with regard to targeted marketing—the key business model for Google Books—indicates that online users do care about privacy, and quite strongly.[58] The research found that 66% of Americans do not want marketers to tailor advertisements to their interests and 63% believe that advertises should be required by law to immediately delete information about their internet activity.[59] The researchers conclude: “[i]t is hard to escape the conclusion that our survey is tapping into a deep concern by Americans that marketers’ tailoring of ads for them and various forms of tracking that informs those personalizations are wrong.”[60]

The survey indicates that what may appear to be a lack of concern for privacy is instead a lack an understanding of company privacy policies combined with faulty assumptions about the strength of privacy laws and guarantees. These have led to an incorrect presumption by many people that their privacy online is more protected than it actually is.[61] This misunderstanding is more prevalent among younger users, which may help explain the oft-repeated assertions that young people are indifferent to their privacy.[62] Regardless, the assertion that individuals online are making informed or rational decisions to reduce the amount of privacy they have in their online reading appears to be faulty.[63]

Third, while some research is beginning to show shifts in views on privacy by informed internet users, those shifts do not undermine the case for reader privacy protections for books online. Instead, they point to a desire for more user control over information about their online activities.[64] For instance, in the context of social networking sites like Facebook, leading researcher danah boyd has characterized younger internet users as viewing privacy issues as a right to control privacy settings in order to share information in certain contexts.[65] Ms. boyd found that even young users who are deciding to share more information about themselves in certain online contexts seek to retain control of that sharing, contradicting the claim that users do not care when vendors unilaterally reduce online privacy protections. This observation is supported by the sharp user outcry after first Facebook, then later Google for its Buzz service, attempted unilateral reductions in privacy protections users had come to expect.[66] Accordingly, the actual data about reader expectations and knowledge provides no support for the argument that online readers do not care about privacy.

IV. Minimum Protections for Online Book Reader Privacy

Given that the rationales for offline reader privacy continue in the online world, and that online products and services are actually more privacy-invasive than their offline counterparts, what specific privacy protections should we expect for books online? In the context of Google Books, EFF and the Center for Democracy and Technology (CDT)[67] frame the issue around four principles:[68]

  • Limited Tracking of User Information;
  • Adequate Protection Against Disclosure;
  • User Control over Personal Information;
  • Sufficient Transparency in Data Use and Enforceability of Commitments.

For each principle, the goal is to try to match, as closely as possible, the elements that create and enforce privacy in the offline world of libraries and bookstores.

These are not the only things one could do to protect book reader privacy online, however. The list represents a floor, not a ceiling, and is limited to the specific Google Books services and business models. It also represents a calculation about what the federal judge is likely to feel comfortable requiring in the context of the Google Books settlement process. Nonetheless, in fleshing out a specific set of demands for a specific technology, the list provides a guide for the sorts of questions that should be asked of any online book reading technology, as well as of other technologies that provide readers access to information online.

Before diving into the specifics, it is important to note that one of the challenges of the current online environment is that this environment is generally structured to track users in the first instance. For example, a basic web server will collect the IP address of all visitors to a web page in the course of delivering content to the visitor.[69] While alternatives exist, including technologies such as the Tor network that allow a visitor to hide his actual IP address from the website host, those alternatives currently require both more work on the part of the visitor and the participation of the website or service visited.[70] As a result, and again in the specific context of the Google Books settlement, the focus of the recommendations is mitigating the potential for misuse of the information—ensuring that gathered information is retained for a minimum period of time, that there are strict limits on disclosure to third parties including the government, and that readers receive real notice and control.

Those issues currently addressed in Google’s recently published privacy policy or that Google promises to address in the future are marked below with an asterisk (*). However, even where Google has addressed an issue, the promise is not enforceable by readers and may unilaterally be changed by Google at any time. Thus, even these issues require judicial assistance or further action by Google.

1. Limited Tracking of User Information

Just as readers may anonymously browse books in a library or bookstore, readers should be able to search, browse, and preview books on Google Books without being forced to identify themselves. Thus, EFF and CDT demanded that Google:

  • Ensure that searching and browsing of books does not require user registration or the affirmative disclosure of any personal information*;
  • Require specific, opt-in consent before connecting any information collected from an individual reader with any other information the digital provider may know about the same individual from other sources. This is especially important for book providers, such as Google, that have multiple services collecting information about users*;
  • Purge all logging or other information related to individual uses as soon as practicable, which in most instances should be no less than every 30 days, ensuring that this information cannot be used to connect particular books viewed to particular computers or users except where that information is necessary to continue to provide user access to the book;
  • Where possible, permit the use of anonymity providers, such as Tor, proxy servers, and anonymous VPN providers, when interacting with the service.

Similarly, in order to protect the privacy of users of institutional subscriptions to Google Books, at a minimum, Google must:

  • Collect no information about the browsers or computers of Google Books institutional users other than encrypted or anonymous session identifying information from the institution*;
  • Ensure that information about an individual’s use of an institutional subscription is not connected to the same individual’s use information from other Google services.

2. Adequate Protection Against Disclosure

Readers should be able to read and purchase books on Google Books without worrying that the government or a third party may be reading over their shoulder. To ensure that any information linking Google Books users to the books they view or purchase is not freely disclosed to the government or third parties, EFF and CDT demanded that Google:

  • Commit that it will not disclose information about the reader to government entities or others absent a warrant or court order unless required to do so by law;
  • Notify the reader prior to complying with any government or third-party request for information (unless forbidden to do so by law or court order), and provide the reader with sufficient time to seek court review of the request;
  • Guarantee that it will not tell any third party, including the Book Rights Registry or any entity assisting with billing or another portion of the transaction, which books were purchased.

3. Reader Control over Personal Information

Readers should have complete control over information Google stores about their book previews and purchases. Google should:

  • Allow the reader to delete their books and ensure that this deletion removes any record of the purchase;
  • Allow the reader the digital equivalent of hiding a book under their bed. Specifically, allow a reader to control what other local or remote computer users can see about their reading, possibly through the use of separate password-protected “bookshelves” or other technical means;
  • Allow the reader to hide their reading of a book after purchase, by breaking the financial trail connecting them to a particular book. One way to do this is to establish a method to allow private reading of purchased books and private giving of books, such as allowing the reader to anonymously transfer or “gift” purchases to someone else (including transfer to other accounts controlled by the reader), with no record of the fact of the original purchase.

4. Sufficient Transparency in Data Use and Enforceability of Commitments

Readers should be informed of what happens to any data collected about them or any marking technology used by Google to track their usage of books. They should also be able to ensure that the commitments made both in privacy policies and by law will be kept. Google should:

  • Provide a robust, easy-to-read notice of privacy provisions and policies;
  • Ensure that any commitments it makes to protect reader privacy are legally enforceable by readers;
  • Store all reader information exclusively in countries that have strong privacy protecting laws, especially as against demands for disclosure by law enforcement and private third parties;
  • Ensure that any watermarks or other marking technologies used do not contain identifying information about readers in a format that third parties can read or decipher. Any watermarks with personally identifying information about readers should be disclosed to readers to alert them to the existence of such marks and the type of information they include;
  • Annually, publish online, in a conspicuous and easily accessible area of its website, the type and number of information requests it receives from government entities or third parties.


With an appeal of the district court decision about the Google Books settlement all but certain, the lawsuit likely has a long way to go before it is completely resolved.[71] But the future of books and online reading is already taking shape, and it is already delivering exciting opportunities for expanded public access to books of all kinds. It is at this threshold moment, when technologies are being designed and built, that privacy protections must be considered. That is in part why EFF, the other privacy organizations, and librarians have insisted on raising book privacy at this stage of the Google Books development.

This threshold moment, in which so many books are being brought online, also presents us with a crucial point in online reading privacy. Are we going to insist on parity in reading privacy for books online and offline, or are we going to allow a downgrade in the privacy of books to match the generally low level of privacy protections that exist on the rest of the internet? Since the reasons for protecting reading privacy have not changed even as the initial online book technologies have presented heightened privacy concerns, we believe privacy parity should be a minimum first step. And once we have made that decision, we must confront a more difficult one—isn’t it time to increase the level of privacy protection for all reading online to the level that readers offline have long enjoyed?

* Cindy Cohn is the Legal Director of the Electronic Frontier Foundation (EFF) and represented the EFF and the Privacy Authors and Publishers at the Google Books fairness hearing. Kathryn Hashimoto is an intern at EFF and a J.D. candidate, University of San Francisco School of Law, 2010. Portions of this Article previously appeared in various places, including the Privacy Authors and Publishers’ Objection to Proposed Settlement; Digital Books and Your Rights: A Checklist for Readers; and in EFF Deeplinks blog posts, and are cited accordingly throughout this paper.

