Top Banner
Columbia Law School Public Law & Legal Theory Working Paper Group Paper Number 08-176 WHEN THE CACHED LINK IS THE WEAKEST LINK: SEARCH ENGINE CACHES UNDER THE DIGITAL MILLENNIUM COPYRIGHT ACT BY: MIQUEL PEGUERA, VISITING SCHOLAR COLUMBIA LAW SCHOOL PROFESSOR OF COMMERCIAL LAW AND INTERNET LAW, UNIVERSITAT OBERTA DE CATALUNYA This paper can be downloaded without charge from the Social Science Research Network electronic library at: http://ssrn.com/abstract=1135274
58

when the cached link is the weakest link: search engine caches under the digital millennium

Sep 12, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: when the cached link is the weakest link: search engine caches under the digital millennium

Columbia Law School

Public Law & Legal Theory Working Paper Group

Paper Number 08-176

WHEN THE CACHED LINK IS THE WEAKEST LINK: SEARCH ENGINE CACHES UNDER THE

DIGITAL MILLENNIUM COPYRIGHT ACT

BY:

MIQUEL PEGUERA, VISITING SCHOLAR COLUMBIA LAW SCHOOL

PROFESSOR OF COMMERCIAL LAW

AND INTERNET LAW, UNIVERSITAT OBERTA DE CATALUNYA

This paper can be downloaded without charge from the Social Science Research Network electronic library at:

http://ssrn.com/abstract=1135274

Page 2: when the cached link is the weakest link: search engine caches under the digital millennium

Electronic copy available at: http://ssrn.com/abstract=1135274

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

WHEN THE CACHED LINK IS THE WEAKEST LINK: SEARCH ENGINE CACHES UNDER THE

DIGITAL MILLENNIUM COPYRIGHT ACT

Miquel Peguera∗

Abstract

When crawling the net, search engines’ robots make a copy of each web page they visit. These copies are stored in the search engine's cache. In their search results, along with the link to the actual web page and a brief snippet from it, the main search engines provide a link to the cached copy as well. In Field v. Google the court held that the operation of Google's cache falls under the caching safe harbor of the Digital Millennium Copyright Act. Examining both the plain language of the statutory text and its legislative history, this Article shows why search engine caches are not covered by the DMCA caching safe harbor. Taking into account the Ninth Circuit analysis in Perfect 10 v. Amazon, this Article further suggests that the unavailability of a safe harbor does matter, since other defenses may fall short or involve higher litigation costs. In addition, it discusses whether an amendment of the DMCA safe harbor regime would be advisable.

I. INTRODUCTION...............................................................................................2 II. THE SEARCH ENGINES’ “CACHED” LINKS FEATURE ................................6

A. Cache copies and “Cached” links ..............................................................6 B. Opting out of the “Cached” links feature...................................................8

III. THE DMCA SYSTEM CACHING SAFE HARBOR ......................................10 A. The DMCA safe harbor regime ...............................................................11 B. The safe harbor for caching .....................................................................13

∗ Visiting Scholar, Columbia Law School (Fall 07–Spring 08). Professor of Commercial Law and Internet Law, Universitat Oberta de Catalunya (Barcelona, Spain). Ph.D. in Law, University of Barcelona (Spain) (2006). Especial thanks to Professor Jane Ginsburg for her detailed and insightful comments on earlier drafts. Thanks also to Professors Jonathan Band, Eric Goldman, James Grimmelmann, R. Anthony Reese, and Raquel Xalabarder. Thanks also to Laura Quilter, Danny Sullivan, John Waiss and to participants in the Visiting Scholars’ Forum at Columbia Law School.

Page 3: when the cached link is the weakest link: search engine caches under the digital millennium

Electronic copy available at: http://ssrn.com/abstract=1135274

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

2 SEARCH ENGINE CACHES UNDER THE DMCA [2009

IV. CAN A SEARCH ENGINE’S CACHE FIND SHELTER UNDER THE DMCA CACHING SAFE HARBOR?...............................................................................22

A. Does a search engine meet the threshold eligibility criteria for the safe harbors? ........................................................................................................24 B. Does a search engine’s cache meet the definition of system caching? ....27 C. Does a search engine’s cache meet the requirements set forth in § 512(b)(2)? .....................................................................................................35 D. A harbor too shallow for a search engine’s cache to anchor in. ..............39

V. DOES IT MATTER? ......................................................................................39 VI. AMENDING THE SAFE HARBOR REGIME?.................................................49 VII. CONCLUSION ............................................................................................56

I. INTRODUCTION

Many Google users are familiar with its “Cached” links feature. In its search results, Google—as do other search engines—normally shows a so-called “Cached” link along with the title of a particular web page and a brief snippet from it. When a “Cached” link is clicked, the user is lead not to the actual web page, but to the copy or “snapshot” of that page that Google took when crawling the web, which is stored by the search engine until the next time its robot visits the page and takes a new “snapshot.”1

Blake A. Field, an attorney member of the State Bar of Nevada, knew this feature well; and since he considered it to be a copyright infringement, he decided to test its propriety through a lawsuit.2 Thus, on January 2004 he wrote 51 short stories over a three-day period. He registered copyrights for each of them separately. Then he opened a website and uploaded these works, making them freely accessible for all on the Internet.3

Field was aware of the easy steps a website owner can take to avoid his or her website being indexed by Google, and he was aware of how to specifically instruct Google not to show the so-called “Cached” links. Having 1 See Google Help Center, Google Web Search Features: “Cached” links, http://www.google.com/help/features.html#cached (last visited Mar. 5, 2008). The “Cached” links feature is also present in other main search engines, such as Yahoo!, Ask, or MSN Search. 2 See Field v. Google Inc., 412 F.Supp.2d 1106 (D.Nev. 2006); see also Plaintiff’s Answers To Requests For Admission Of Defendant Google, Inc. at 3:19-26, 4:1-2, Field 412 F.Supp.2d 1106 (No. 57-2) (admitting that the lawsuit had been filed “in part” in an attempt to test the legal propriety of Google’s cache). 3 See Field, 412 F.Supp.2d at 1113-14. The website can be found on the Internet Archive, http://web.archive.org/web/*/http://www.blakeswritings.com (last visited Mar. 5, 2008).

Page 4: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

2009] SEARCH ENGINE CACHES UNDER THE DMCA 3

knowledge of all this, Field set out to make his website with its pages to be automatically included in the Google search results, so that Google would provide a “Cached” link to those pages as well. As expected by Field, Google automatically indexed his website and provided “Cached” links to copies of his pages stored by Google, in addition the main link to each of its actual pages.4

Once that happened, Field filed a complaint against Google for copyright infringement on account of those “Cached” links.5 Not surprisingly, he did not seek actual but statutory damages. He requested $50,000 in statutory damages for each work. Altogether, he sought $2,550,000 for the fifty-one short pieces he had written over a three-day period.6

The court immediately determined that the plaintiff was acting in bad faith:

Field’s own conduct stands in marked contrast to Google’s good faith. Field took a variety of affirmative steps to get his works included in Google’s search results, where he knew they would be displayed with “Cached” links to Google’s archival copy and he deliberately ignored the protocols that would have instructed Google not to present “Cached” links.7

Stating that “Field decided to manufacture a claim for copyright infringement against Google in the hopes of making money from Google’s standard practice”,8 the court was obviously eager to decide against the plaintiff. Indeed, the facts in Field v. Google were ideal for Google to have a decision asserting the legality of its “Cached” links feature—and it certainly obtained it, to such an extent that the ruling reproduced entirely and verbatim the text defendant Google filed as “[Proposed] Findings of Fact and Conclusions of Law & [proposed] Order”.9 Accepting all the grounds of Google’s motion for

4 For all these facts see Field, 412 F.Supp.2d at 1114. 5 He filed a first complaint on April 6, 2004, claiming copyright infringement in one of the works. On May 25, 2004, he filed an amended complaint claiming copyright infringement in the remaining fifty works. See id. at 1110. 6 Id. 7 See Field, 412 F.Supp.2d at 1123. 8 See id. at 1113. 9 See [Proposed] Findings of Fact and Conclusions of Law & [proposed] Order, Field 412 F.Supp.2d 1106 (No. 63). At the hearing held on December 19, 2005 on the cross-motions for summary judgment, the Judge stated he was granting Google’s motion for summary judgment on the four grounds set forth in the motion, granting also Google’s oral motion for partial summary judgment based on the Digital Millennium Copyright Act and denying Plaintiff’s motion on the same issues. He requested from Google a “complete proposed findings of fact and conclusions of law”. Google filed that proposal, stating in a cover letter to the Judge that it “attempts to capture [Judge’s] comments at the hearing and otherwise tracks the arguments and evidence that Google submitted in connection with the motions,

Page 5: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

4 SEARCH ENGINE CACHES UNDER THE DMCA [2009

summary judgment, the court found that Google did not directly infringe the copyrighted works; that Google “held an implied license to reproduce and distribute copies of the copyrighted works at issue”;10 that the plaintiff was “estopped from asserting a copyright infringement claim against Google with respect to the works at issue”;11 and that Google’s use of the works was fair use.12 Furthermore, the court granted a cross-motion for partial summary judgment, orally made by Google at the hearing on the parties’ cross-motions, based upon the “system caching” safe harbor of the Digital Millennium Copyright Act.13 Granting this motion, the court declared “that Field’s claim for damages is precluded by operation of the ‘system cache’ safe harbor of Section 512(b) of the Digital Millennium Copyright Act (‘DMCA’).”14

Google has been providing “Cached” links for many years—and so have other main search engines. While at least from a theoretical point of view this feature had always raised some copyright concerns,15 before Field v. Google it had never been challenged in court. This ruling is the first one that analyzes the legality of this feature under copyright law.16 Interestingly, Field v. Google is also the first decision that applies the DMCA system caching safe harbor—paradoxically, as discussed below, to cover an activity different than the one for which the safe harbor was intended.

albeit in a somewhat streamlined fashion.” See id. The text of the actual ruling added only the date—January 12, 2006—and the Judge’s signature. See Findings of Fact and Conclusions of Law & Order, Field 412 F.Supp.2d 1106 (No. 64). 10 See id. at 1109. 11 Id. 12 Id. 13 See id. at 1109-10. 14 See id. at 1109. 15 See Stefanie Olsen, Google cache raises copyright concerns, CNET News.com (July 9, 2003) http://www.news.com/2100-1038_3-1024234.html (last visited April 4, 2008); see also Eric Goldman, Misguided CNET Article on Canadian Copyright Law and Caching/Archiving, Technology & Marketing Law Blog (July 19, 2005) http://blog.ericgoldman.org/archives/2005/07/misguided_cnet.htm (last visited April 4, 2008). 16 Other courts—albeit not dealing with a claim of copyright infringement on account of the “Cached” links feature—have already quoted and accepted some of the holdings of Field v. Google. See Parker v. Google, Inc., 422 F.Supp.2d 492, 498 (E.D.Pa. 2006) (“[T]he District Court for the District of Nevada recently held that Google is entitled to the [DMCA]’s safe harbor provisions for its system caching activities. Field, at 1122-25 (granting Google’s motion for summary judgment that it qualifies for § 512(b) safe harbor for system caching).”) aff’d, 2007 WL 1989660 (3d Cir. 2007). See also Also Perfect 10 v. Google, Inc., 416 F.Supp.2d 828, 852 n.17 (C.D.Cal. 2006), (“That local browser caching is fair use is supported by a recent decision holding that Google’s own cache constitutes fair use. Field v. Google, Inc., 412 F.Supp.2d 1106 (D.Nev.2006)”).

Page 6: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

2009] SEARCH ENGINE CACHES UNDER THE DMCA 5

This article examines whether a search engine’s cache falls under the DMCA caching safe harbor. Part II describes the “Cached” links feature and its value, and provides an overview of the means available to website owners to opt-out of it. Part III briefly presents the DMCA safe harbor regime and analyzes in particular the system caching safe harbor. Considering both the plain language of the statutory provision and the legislative history, it discusses which specific technical activity this safe harbor intends to cover, and concludes that it is the function currently known as proxy-caching, which aims to save bandwidth and to improve network efficiency. This function is described in Part III.B.

Part IV outlines the differences between proxy caching and a search engine’s cache and moves on to analyze whether despite of these differences, a search engine’s cache may still be deemed to meet the plain language of the safe harbor provision. First, Part IV.A focuses on compliance with the threshold criteria for eligibility, and concludes that they are not an obstacle for a search engine to qualify for the safe harbor. Then Part IV.B discusses whether a search engine’s cache meets the basic definition of the activity covered by the safe harbor as described by § 512(b)(1) and concludes that it does not. Rather, it finds that in the operation of a search engine’s cache the initial transmission to a first user as required by § 512(b)(1)(B) doesn’t take place. It also finds that cached copies are not made available to subsequent users that request access to the material from the originating site, as required by § 512(b)(1)(C). Part IV.C moves on to examine whether the specific requirements of § 512(b)(2) are satisfied, and concludes that while this may be the case in some circumstances, some of these requirements are generally not likely to be met.

Part V then discusses whether the inapplicability of the caching safe harbor to a search engine’s cache does matter—since it might be deemed to be non-infringing in the first place or already covered by other defenses, such as implied license, estoppel and fair use. Taking into account the Ninth Circuit analysis in Perfect 10 v. Amazon,17 it concludes that the unavailability of a safe harbor does matter, since in some cases the implied license and estoppel defenses are not likely to apply, and the fair use defense remains uncertain and is likely to involve higher litigation costs. Finally Part VI discusses whether an amendment to the DMCA safe harbor regime to cover the activity of a search engine’s cache would be advisable, and while Part VI does not endorse a specific policy decision on this point, it does suggest that if an amendment were to be made, the best way to address this issue would be by adapting the information location tools safe harbor rather than the existing caching safe harbor.

17 Perfect 10, Inc. v. Amazon.com, Inc., 508 F.3d 1146 (9th. Cir. 2007).

Page 7: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

6 SEARCH ENGINE CACHES UNDER THE DMCA [2009

II. THE SEARCH ENGINES’ “CACHED” LINKS FEATURE

A. Cache copies and “Cached” links

The “Cached” links feature can be briefly described as follows. For the

purposes of making its index search engine’s robots constantly crawl the web and make a copy or “snapshot” of every web page they find—unless instructed otherwise, as discussed below, or unless they are not able to access the content because it is protected by a password or otherwise (e.g. encrypted). Each of these copies is then stored in the search engine’s servers and it is kept there until it is replaced by another snapshot taken by the robot the next time it visits the same web page. Those copies are called “cached” copies, and this repository is referred to as the search engine’s “cache”.18 The way these copies are made can be represented as follows:

Fig. 1. A search engine proactively crawls the web and keeps a copy of every web page it visits. This will be the so-called “cached” copy.

Having all this storage of copies, the specific function of providing “cached” copies essentially consists of making available those copies or snapshots to the users. This is made through a link labeled with the word “Cached”, that appears in most of the search results, along with the title of the relevant web page and a short snippet from it. As noted before, this link, when clicked by a user, shows the copy of the original web page made by the search engine’s robot last time it visited that page.19 This copy is not necessarily 18 See Google Webmaster Help Center, Prevent or remove cached pages, http://www.google.com/support/webmasters/bin/answer.py?answer=35306> (last visited March 5, 2008). 19 To be sure, a cached copy stored by a search engine’s cache consists only of the HTML code of the web page. These HTML instructions may in-line link to images or other elements. Though they will be displayed when accessing the cached copy, these elements are

Origin website

1

Search engine 2

3

Page 8: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

2009] SEARCH ENGINE CACHES UNDER THE DMCA 7

identical to the current page, for the latter may have changed since the robot took the snapshot stored in the cache. The cached copies provided by search engines clearly reveal their character through a prominent disclaimer on the top of the page which stresses that it is just the snapshot of the page taken when crawling the web, and that the web page may have changed since then.20

By making available the cached copies to users, the search engine provides them with additional or alternative information about the relevant web page. Indeed, a user may find it useful to access the cache copy for several reasons. First, the original web page may be unavailable at that particular moment, whether temporarily or definitively. In that case, the cached copy will provide information that, though not always current, may be useful for the user’s interests. Second, since the cached copy highlights in color the terms used to perform the query, it makes it easier to identify why a particular web page is relevant to that search query. To be sure, this also could be easily found out just looking at the current page and performing a word search within it. However, since the search results are based not on the content of the current pages but on an index that stems from the copies stored by the search engine, it may be the case that the current page has changed and no longer includes the term used in the search query. In that case, when looking at the current page, the user will find it difficult to know why the page has been included in the search results, while she will find it out looking at the older version of the page accessible through the “Cached” link, where the search terms still appear. Third, in some cases the user may find it interesting to compare the current web page with an older version of it, such as the cached copy. The cached copies are indeed archival in nature.21 They are meant to show how the appearance of that web page was at a particular time in the recent past. This function, however, is better accomplished by an archive of web pages that keeps not just temporary but not stored on the search engine’s cache. See Perfect 10, Inc. v. Amazon.com, Inc., 508 F.3d 1146, 1156 (9th. Cir. 2007) It is important to distinguish this search engine’s “cache”—the repository of cached web pages which are made available through cached links—from a different repository, frequently labeled also as “cache”: the repository of thumbnail images used by the search engine in its image-search functionality. 20 Google’s disclaimer reads as follows:

This is Google’s cache of http://[url of the original page] as retrieved on [date and GMT hour]. Google’s cache is the snapshot that we took of the page as we crawled the web. The page may have changed since that time. Click here for the current page without highlighting. This cached page may reference images which are no longer available. Click here for the cached text only. To link to or bookmark this page, use the following url: [url of the cached copy stored by Google]. These search terms have been highlighted: [terms entered in the search query].

21 See Field v. Google Inc., 412 F.Supp.2d 1106, 1111 (D.Nev. 2006) (“When clicked, the ‘Cached’ link directs an Internet user to the archival copy of a Web page stored in Google’s system cache, rather than to the original Web site for that page.”) (emphasis added).

Page 9: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

8 SEARCH ENGINE CACHES UNDER THE DMCA [2009

permanent copies, such as the Internet Archive.22 In contrast, a cached copy stored by a search engine is deleted when replaced by the new snapshot of the web page taken by the crawler.23

B. Opting out of the “Cached” links feature

The robots’ activity is an automatic one. As already pointed out, unless

instructed otherwise or encountering protected content, robots will make a copy of each web page they visit, and will make this copy available through “Cached” links. Nonetheless, webmasters may opt-out, specifying some directions so that robots do not index or archive their web pages. To be effective, these directions must adjust to the existent industry standards. There are two main types of standards widely accepted by the industry. One consists of the inclusion of meta-tags in the HTML code of the web page; the other consists of the placement of a “robots.txt” file in the server root.24

Meta-tags may be directed either generally to all the robots, or to specific crawlers, indicating then the name of the robot in the meta-tag.25 Through meta-tags, webmasters may direct robots, for instance, not to index the page, nor to follow its links.26 It is also possible to allow the robots to index the page, but not to follow the links it contains.27 Through a “noarchive” meta-tag, webmasters can also instruct the robots to index the page but not to make

22 See Internet Archive, http://www.archive.org (last visited Mar. 26, 2008). 23 In Field v. Google it was found that “the copy of Web pages that Google stores in its cache is present for approximately 14 to 20 days.” See Field 412 F.Supp.2d at 1124. However, some critics affirm that in some cases this period of time could be substantially longer. See Nicole Bashor, Comment, The Cache Cow: Can Caching and Copyright Co-exist?, 6 J.MARSHALL REV. INTELL. PROP. L. 101, (2006) (“[W]eb site content that has been removed may remain in the search engine cache for months, years, or indefinitely.”) (footnotes omitted). 24 For a description of both standards, see e.g. The Robots Pages, http://www.robotstxt.org (last visited Mar. 5, 2008). 25 See The Web Robots Pages, About / robots.txt, http://www.robotstxt.org/meta.html (last visited Mar. 5, 2008). For Google’s robot, the pertinent meta-tag would be: <META NAME="GOOGLEBOT" CONTENT="NOINDEX, NOFOLLOW">. See Google Webmaster Help Center, Preventing content from appearing in Google search results, http://www.google.com/support/webmasters/bin/topic.py?topic=8459 (go to Block or remove pages using meta tags) (last visited Mar. 5, 2008). 26 An adequate meta-tag would be <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">. See The Web Robots Pages, About / robots.txt, http://www.robotstxt.org/ meta.html (last visited Mar. 5, 2008). 27 The meta-tag would be <META NAME="ROBOTS" CONTENT="NOFOLLOW">. See id.

Page 10: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

2009] SEARCH ENGINE CACHES UNDER THE DMCA 9

available a cached copy of it for users.28 Following this “noarchive” meta-tag, a search engine will include the web page in its search results, but will provide no “Cached” link.

The inclusion of a “robots.txt” file in the server root is another widely known and accepted industry practice.29 The “robots.txt” file is a very simple text file that, again, can be directed either to any robot or to one or more specific robot.30 Moreover, the directions can refer to all of the website or just to specific parts of it. With the expression “disallow: /” a direction to robots is given that they do not access the files or directories indicated after the slash.31 If no file or directory is designated after the slash, the expression “disallow: /” means that robots must not access any part of the website.32 On the contrary, if the file says only “disallow: ”, without the slash, it is indicated that not a single file or directory is forbidden, and thus that robots may access the entirety of the website.33.

In addition to meta-tags and robots.txt standards, web owners can also opt-out through other means. First, as already mentioned, if the website is protected, by a password or otherwise, the robot will be unable to access it, and thus to make a copy of it.34 Second, a website owner may request the search engine not to display “Cached” links to particular web pages either contacting the search engine directly, or through a removal procedure established by the

28 The meta-tag would be <META NAME="ROBOTS" CONTENT="NOARCHIVE">. See id. 29 See The Web Robots Pages, About / robots.txt, http://www.robotstxt.org/ robotstxt.html, (last visited March 5, 2008). The robots exclusion standard is not an official one, but most search engines accept it. See Eric J. Feigin, Architectures of Consent: Internet Protocols and Their Legal Implications, 56 STAN. L. REV. 901, 934 (2004) (“[U]nlike IP, TCP, or HTTP, robot exclusion headers are not an officially recognized standard”). See also James Grimmelmann, The Structure of Search Engine Law, 93 IOWA L. REV. 1, 28 (2007) (“Major search engines generally honor requests not to cache, but they have forced providers to use standard technical measures to make those requests”). 30 To instruct all robots, the file must indicate “User-agent: *”. To direct only specific robots, its name must be included on the User-agent line. For example to instruct Google’s robot, the file must say “User-agent: Googlebot”. See id. See also Google Webmaster Help Center, Preventing content from appearing in Google search results, http://www.google.com/support/webmasters/bin/topic.py?topic=8459 (go to How do I block or allow Googlebot?) (last visited Mar. 5, 2008). 31 See The Web Robots Pages, About / robots.txt, http://www.robotstxt.org/ robotstxt.html (last visited March 5, 2008). 32 See id. 33 See id. 34 See Expert Report of Dr. John R. Levine at 9, Field v. Google Inc., 412 F.Supp.2d 1106 (D.Nev. 2006) (No. 54): (“[I]f a particular web page is only available through registration or use of a password, then a robot—like other Internet users—will not be able to access the web page (without first registering or obtaining a password).”).

Page 11: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

10 SEARCH ENGINE CACHES UNDER THE DMCA [2009

search engine.35 None of the preceding ways to opt-out was used by plaintiff Field, who on the contrary placed a robots.txt file allowing all robots full access to his pages.36

III. THE DMCA SYSTEM CACHING SAFE HARBOR

Part II has just described how a search engine’s cache works and how website owners may opt-out. In case a search engine’s cache was to be deemed a copyright infringement, it is important to determine whether it could be shielded by a DMCA safe harbor. This part will now consider briefly the DMCA safe harbor regime and describe particularly the system caching safe harbor set forth in § 512(b) and the technical function this provision intends to cover. Part IV

35 Field, 412 F.Supp.2d at 1113 n.5. See also Google Webmaster Help Center, Preventing content from appearing in Google search results, http://www.google.com/support/webmasters/bin/topic.py?topic=8459 (go to How can I remove my content from the Google index?) (last visited Mar. 5, 2008). 36 The court stated as undisputed facts the following ones

25. Field admits he knew that any Web site owner could instruct Google not to provide a “Cached” link to a given Web page by using the “no-archive” meta-tag (as discussed above). Field also knew that Google provided a process to allow Web site owners to remove pages from Google’s system cache. With this knowledge, Field set out to get his copyrighted works included in Google’s index, and to have Google provide “Cached” links to Web pages containing those works. . . . 28. Field created a robots.txt file for his site and set the permissions within this file to allow all robots to visit and index all of the pages on the site. Field created the robots.txt file because he wanted search engines to visit his site and include the site within their search results. 29. Field knew that if he used the “noarchive” meta-tag on the pages of his site, Google would not provide “Cached” links for the pages containing his works. Field consciously chose not to use the ‘no-archive’ meta-tag on his Web site. 30. As Field expected, the Googlebot visited his site and indexed its pages, making the pages available in Google search results. When the pages containing Field’s copyrighted works were displayed in Google’s search results, they were automatically displayed with “Cached” links, as Field intended they would be.

Field, 412 F.Supp.2d at 1113-14 (internal citations omitted) (emphasis in original). Plaintiff Field placed a robot.txt file at the root of the server, with the line “User-agent: *”, and the line “Disallow: ”, therefore expressly allowing all robots to access the entirety of his website. Field’s robots.txt file can be retrieved at the Internet Archive. See http://web.archive.org/web/*/http://www.blakeswritings.com/robots.txt (last visited Mar. 5, 2008).

Page 12: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

2009] SEARCH ENGINE CACHES UNDER THE DMCA 11

will then address the analysis of whether a search engine’s cache falls under the DMCA caching safe harbor.

A. The DMCA safe harbor regime

After several years of intense debates, a compromise was eventually

reached between the different stakeholders on the issue of the liability of internet intermediaries for online copyright infringements.37 This compromise was reflected in the Digital Millennium Copyright Act (DMCA) enacted in October 1998. Title II of the DMCA amended Chapter 5 of Title 17, U.S.C., by adding a new section 512 titled “Limitations on liability relating to material online”.38 This section doesn’t modify the general principles of liability, instead it creates a series of “safe harbors” for certain common activities carried out by service providers.39 These activities are described in subsections (a) “Transitory Digital Network Communications”; (b) “System Caching”; (c) “Information Residing on Systems or Networks At Direction of Users”; and (d) “Information Location Tools”.40 Qualifying service providers for each of these safe harbors are shielded from liability for all monetary relief for direct, vicarious and contributory infringement,41 by reason of carrying out the activity considered in each safe harbor. The safe harbors also limit injunctive relief against qualifying service providers, but only to the extent determined by section (j).42

A service provider seeking to qualify for any of the safe harbors must meet the definition of “service provider” in subsection (k), certain general

37 See e.g. Jennifer M. Urban & Laura Quilter, Efficient Process Or “Chilling Effects”? Takedown Notices Under Section 512 Of The Digital Millennium Copyright Act, 22 SANTA CLARA COMPUTER & HIGH TECH. L.J. 621, 631-35 (2006). See also Mike Scott, Safe Harbors Under The Digital Millennium Copyright Act, 9 N.Y.U. J. LEGIS. & PUB. POL’Y 99, 115-19 (2005-06). 38 17 U.S.C. § 512 (2000). 39 See SEN. REP. NO. 105-190 at 19 (1998). 40 See 17 U.S.C. § 512(a)-(d) (2000). Section 512 establishes also a “Limitation on Liability of Nonprofit Educational Institutions” in § 512(e). It also sets forth, in subsection (g), liability limitations for taking down material claimed to be infringing and for replacing the removed or disabled material. 41 See H.R. REP. NO. 105-796, at 73 (1998) (Conf. Rep.) (“The limitations in subsections (a) through (d) protect qualifying service providers from liability for all monetary relief for direct, vicarious and contributory infringement.”). The Ninth Circuit confirms this scope: “We have held that the limitations on liability contained in 17 U.S.C. § 512 protect secondary infringers as well as direct infringers. Napster, 239 F.3d at 1025”. Perfect 10, Inc. v. Amazon.com, Inc. 487 F.3d 701,732 (9th Cir. 2007) (citing A&M Records, Inc. v. Napster, Inc., 239 F.3d 1004, 1025 (9th Cir. 2001)). 42 17 U.S.C. § 512(j) (2000). See H.R. REP. NO. 105-796, at 73 (1998) (Conf. Rep.).

Page 13: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

12 SEARCH ENGINE CACHES UNDER THE DMCA [2009

conditions for eligibility set forth in subsection (i), and the specific requirements established for the particular safe harbor at issue. As indicated by subsection (n),

Subsections (a), (b), (c), and (d) describe separate and distinct functions for purposes of applying this section. Whether a service provider qualifies for the limitation on liability in any one of those subsections shall be based solely on the criteria in that subsection, and shall not affect a determination of whether that service provider qualifies for the limitations on liability under any other such subsection.43

According to the general conditions of eligibility set forth in § 512(i), the safe harbors shall apply only if the service provider:

(A) has adopted and reasonably implemented, and informs subscribers and account holders of the service provider’s system or network of, a policy that provides for the termination in appropriate circumstances of subscribers and account holders of the service provider’s system or network who are repeat infringers; and

(B) accommodates and does not interfere with standard technical measures [used by copyright owners to identify or protect copyrighted works].44

Several cases have dealt with the condition of adopting and implementing a policy for the termination of subscribers and account holders who are repeat infringers.45 This issue was discussed, for instance, in Corbis v. Amazon,46 where the court deemed that Amazon had actually complied with that condition. The opposite conclusion, however, was reached by the court in the

43 17 U.S.C. § 512(n) (2000). 44 See 17 U.S.C. § 512(i)(1),(2) (2000). These threshold criteria for eligibility appear to reflect the understanding that in order to benefit from the limitations of liability, service providers must get involved to some extend in the protection of copyright. As the H.R. Report put it—underscoring the kind of compromise reached in the statute—, Title II of the DMCA

[P]reserves strong incentives for service providers and copyright owners to cooperate to detect and deal with copyright infringements that take place in the digital networked environment. At the same time, it provides greater certainty to service providers concerning their legal exposure for infringements that may occur in the course of their activities.

H.R. REP. NO. 105-551, pt. 2, at 49-51 (1998). 45 See generally David Nimmer, Repeat Infringers, 52 J. COPYRIGHT SOC'Y 167 (2005). 46 See Corbis Corporation v. Amazon.com, Inc., 351 F.Supp.2d 1090, 1100-06 (W.D. Wash., 2004).

Page 14: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

2009] SEARCH ENGINE CACHES UNDER THE DMCA 13

Aimster case, and thus it was held that Aimster could not benefit from the DMCA’s safe harbors.47

Apart from meeting the threshold criteria for eligibility, for a service provider to benefit from a particular safe harbor, the activity it carries out must meet the description provided by the pertinent safe harbor, and must comply with the specific conditions it establishes. Accordingly, I turn now to examine in particular how the DMCA describes the system caching activity and which are the specific conditions the service provider must meet to benefit from this limitation of liability.

B. The safe harbor for caching

The activity of “system caching” is defined in § 512(b) in a very narrow

way. A definition so narrowly tailored lacks the desirable technological neutrality, and thus raises the problem of its inadequacy to cover, in the future, similar activities carried out in slightly different technological ways.48 However, the choice made by Congress in 1998 was to select only certain activities, and to grant a limitation of liability only to such selected activities.49

47 See: In re Aimster Copyright Litigation, 334 F.3d 643, 655 (7th Cir. 2003). Judge Posner denied the application of the safe harbor stating that the DMCA

[P]rovides a series of safe harbors for Internet service providers and related entities, but none in which Aimster can moor. The Act does not abolish contributory infringement. The common element of its safe harbors is that the service provider must do what it can reasonably be asked to do to prevent the use of its service by “repeat infringers.” 17 U.S.C. § 512(i)(1)(A). Far from doing anything to discourage repeat infringers of the plaintiffs’ copyrights, Aimster invited them to do so, showed them how they could do so with ease using its system, and by teaching its users how to encrypt their unlawful distribution of copyrighted materials disabled itself from doing anything to prevent infringement.

Id. 48 See Scott, supra note 37, at 137 (“Statutes tailored too precisely to the problems raised by the technology of the time can easily fall short when applied to the technologies of the present or future. This process may already be underway with the safe harbors.”). This appears to be particularly clear in the case of peer to peer file-sharing technologies. See, e.g. Niva Elkin-Koren, Making Technology Visible: Liability Of Internet Service Providers For Peer-To-Peer Traffic, 9 N.Y.U. J. LEGIS. & PUB. POL’Y 15, 17 (2006) (“The courts [in Recording Indus. Ass’n of Am., Inc., v. Verizon Internet Servs., Inc., 351 F.3d 1229 (D.C. Cir. 2003) and In re Charter Commc’ns, Inc., 393 F.3d 771 (8th Cir. 2005)] held that the DMCA safe harbor regime was tailored to address a different technological infrastructure and did not apply to peer-to-peer technology.”). 49 See SEN. REP. NO. 105-190 at 19 (1998). Nonetheless, the House Report refers to the safe harbors as covering “general categories of activity”, which could suggest that the safe harbors

Page 15: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

14 SEARCH ENGINE CACHES UNDER THE DMCA [2009

The activity the system caching safe harbor intends to cover is a well-known technical function carried out by ISPs and other entities to enhance network efficiency.50 This activity is usually known as “proxy caching”. We will now briefly present the essential characteristics of this function, which will be more closely examined in the next Part.51

To understand what proxy caching is and how it works, it is useful to describe first generally the concept of a cache memory. In general terms, storage of data in a cache memory consists of temporarily keeping a copy of certain data, that are likely to be used again, in a place where they can be accessed, and retrieved from, easier and faster than fetching them again from their original source. A cache can be defined thus as “a temporary storage area where frequently accessed data can be stored for rapid access”.52 There are many different kinds of caches. Microprocessors in personal computers, for example, have their own cache memory to store data that will be faster to access there than in the main memory of the PC. Similarly, web browsers have a built-in cache, located in the hard drive, where copies of recently visited web pages are temporarily stored. This allows the user to go back—normally by means of the back button—and watch again these pages in a quicker way, since the browser will normally show the copies stored in the cache, instead of retrieving again all the information from the origin server.53 Both the microprocessor-cache and the browser-cache are examples of local caches. A similar operation, though, can be performed remotely by a proxy to serve the requests of a large number of users, which brings in the concept of proxy caching—the one the DMCA’s safe harbor contemplates.

When a user connected to the Internet wants to access a web page, she sends a “request” to the computer that hosts that web page—the origin server. A request is made typically by typing the URL of the desired web page—for instance, “http://www.columbia.edu”—in the browser’s address bar.54 The browser then converts this into a short line of code that includes the IP address of the user’s computer, which identifies this computer in the communication. The request is sent through the network of the user’s Internet Service Provider—

intended to cover also slightly different functions not clearly reflected in the language of the statute. See H.R. REP. NO. 105-551, pt. 2, at 50 (1998). 50 See H.R. REP. NO. 105-551, pt. 2, at 51-52 (1998). 51 For a technical in depth analysis on the caching function see, generally, DUANE WESSELS, WEB CACHING (2001). 52 See Wikipedia: cache, http://en.wikipedia.org/wiki/Cache (last visited, Mar. 24, 2008). 53 See WESSELS, supra note 51, at 15. See also Mark Nottingham, Caching Tutorial for Web Authors and Webmasters, v. 1.81—March 16, 2007, http://www.mnot.net/cache_docs/ (last visited Jan. 7, 2008). 54 This would be the URL (Uniform Resource Locator) for the home page of the website of Columbia University in the City of New York.

Page 16: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

2009] SEARCH ENGINE CACHES UNDER THE DMCA 15

at least initially, since afterwards it may go through different networks—to the origin server that hosts the requested web page. The origin server will generate a response to that request that will be sent back to the user’s computer—again through her ISP’s network—which will consist of the content of the requested page. This is certainly a very simple way to describe the process of requesting and getting a web page, and leaves aside many nuances, but it will suffice for now for the purposes of this explanation.

In some instances, however, the user’s computer doesn’t communicate directly with the origin site—although this may be unnoticed by the user. I am referring to the cases where the requests of the user’s computer are intermediated by a “proxy”. A proxy can be described as a computer that intercepts the user’s request and then makes again this request to the origin site on behalf of the user.55 The new request will bear the IP address of the proxy, and thus the response from the origin site will be sent to the proxy. Finally, the proxy will send the response back to the user that requested it.

A proxy server may sit for example in the premises of a company where the employee’s computers have IP addresses only valid for the internal network, but not to establish a communication on the open Internet. Thus, any individual computer will access the Internet through the company’s proxy. An Internet Service Provider, considered here as an entity that operates a telecommunications network and provides access to the Internet, may also install a proxy server to deal with the requests sent by its users.

A proxy may serve many different purposes, for example, filtering the traffic. However, the key purpose I want to consider now is the one of saving bandwidth, and thus improving the efficiency of the network. An ISP can save bandwidth by means of a proxy by keeping a copy of the responses originated by previous requests, and using those copies to serve subsequent requests made by the same user or by a different one. This function is called “caching”, or more precisely “proxy-caching”.

A proxy that implements a caching function is known as a “caching proxy”, or a “proxy cache”. Caching proxies have been widely used by Internet Service Providers (ISPs) to store copies of web pages frequently requested by their users, so that they could show the cached copy to users that subsequently request the same web page.56 The cache copy is made by the ISP’s proxy cache 55 See Fielding, et al., HTTP/1.1. Request for Comments (RFC) 2616, available at http://www.w3.org/Protocols/rfc2616/rfc2616.html (last visited Mar. 26, 2008). 56 See WESSELS, supra note 51, at 15. Web caching is not unknown to courts. A fairly good understanding of it can be seen for example in Akamai Techs., Inc. v. Cable & Wireless Internet Servs., Inc., 344 F.3d 1186, 1189 (Fed. Cir. 2003):

There have been numerous attempts to alleviate Internet congestion, including methods commonly referred to as “caching,” “mirroring,” and

Page 17: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

16 SEARCH ENGINE CACHES UNDER THE DMCA [2009

on the occasion of a user’s initial request of a page—as discussed below, this will be a key point in the statutory definition of caching. Since the request is intermediated by the proxy in the way already described, the proxy will receive the response from the origin website, and this response—through the system or network operated by the ISP, which is another key point of the legal provision—will be then forwarded to the user. In this process, the proxy is able to make and store a copy of that response.57 This process is illustrated in the figure below:

Fig. 2. This illustrates the initial transmission, which is initiated by a first user’s request. The origin server’s response to that request will be transmitted by the ISP to that user, while keeping a copy of it on its cache. This transmission is made through the system or network operated by or for the service provider.

“redirection.” “Caching” is a solution that stores web pages at various computers other than the origin server. When a request is made from a web browser, the cache computers intercept the request, facilitate retrieval of the web page from the origin server, and simultaneously save a copy of the web page on the cache computer. The next time a similar request is made, the cache computer, as opposed to the origin computer, can provide the web page to the user.

57 To be sure, not all web pages are cached. Storing copies in the proxy cache only makes sense if the original information will remain unchanged for a certain amount of time, and if it is likely to be requested again in the near future by a sufficient number of users. Also, not all objects are cacheable. For example, caching proxies don’t cache or decrypt secure pages, such as the ones shown under HTTPS protocol. Moreover, a website owner can establish limits and conditions to caching by means of HTTP headers. For a closer look on these issues, see, WESSELS, supra note 51, at 24-34, 111-31. See also Nottingham, supra note 53.

ISP’s Proxy Cache

First user

Origin server

INTERNET

ISP’s Network

1

2

3

6

5

4

7

Page 18: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

2009] SEARCH ENGINE CACHES UNDER THE DMCA 17

The response stored in the proxy cache will be used to serve the subsequent requests of users who later on—albeit within some temporal limits—will request again the same web page. Normally, indeed, when an ISP that has installed a proxy cache receives a request to access a certain web page, it will check first whether a copy of that page is available from its cache. If this is the case, and certain technical conditions are met, the ISP will send this copy to the user.58 This is normally faster than fetching again the original page, and since there is no need to obtain all the information again from the originating source, it saves bandwidth and reduces network congestion.59

As seen in the figure below, subsequent users will receive the response directly from the proxy cache:

Fig. 3. The ISP will serve the copy stored on the proxy cache to the subsequent users requesting the same web page. This saves bandwidth and resources since there is no need to fetch again the material from the origin source. The copies made and stored by a proxy cache are not meant to be

archival copies, but perfect substitutes for the original page. The ISP’s proxy will serve the cached copy as an adequate response to the user who is requesting

58 If the ISP doesn’t have the requested object in its cache, it may still get it from another cache. Different caches are organized in cache hierarchies. See WESSELS, supra note 51, at 132-43. 59 See WESSELS, supra note 51, at 10-12. It also reduces the load imposed upon originating servers and thus the originating server responses to the requests that actually reach it will be faster. See Id.at 13.

ISP’s Proxy Cache

Origin website

INTERNET

ISP’s Network

Subsequent users

1 2

First user

Page 19: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

18 SEARCH ENGINE CACHES UNDER THE DMCA [2009

to access the page from the originating site. This is why the cache must be properly updated following the directions set by the website’s owner. As will be underscored below, this is a key difference with the so-called “cached” copies that search engines make available through “Cached” links—which are really archival copies, meant to reflect the page as it appeared sometime in the recent past.

This type of caching was well-known at the time of the drafting of the statute,60 and is obviously the “system caching” contemplated by § 512(b), which describes it as:

[T]he intermediate and temporary storage of material on a system or network controlled or operated by or for the service provider in a case in which—

(A) the material is made available online by a person other than the service provider;

(B) the material is transmitted from the person described in subparagraph (A) through the system or network to a person other than the person described in subparagraph (A) at the direction of that other person; and

(C) the storage is carried out through an automatic technical process for the purpose of making the material available to users of the system or network who, after the material is transmitted as described in subparagraph (B), request access to the material from the person described in subparagraph (A), if the conditions set forth in paragraph (2) are met. 61

The H.R. Report further confirms that this safe harbor was meant precisely to cover the activity described above:

In terminology describing current technology, this storage is a form of “caching,” which is used on some networks to increase network performance and to reduce network congestion generally, as well as to reduce congestion and delays to popular sites. This storage is intermediate in the sense that the service provider serves as an intermediary between the originating site

60 For a discussion on the use of caching before the enactment of the DMCA see Wayne R. Dunham, The Determination Of Antitrust Liability In United States v. Microsoft: The Empirical Evidence The Department Of Justice Used To Prove Its Case, 2 J. COMPETITION L. & ECON. 549, 578-84 (2006) (discussing the effects of caching when gathering empirical evidence for the 1998 U.S. v. Microsoft antitrust case). 61 See 17 U.S.C. § 512(b)(1) (2000).

Page 20: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

2009] SEARCH ENGINE CACHES UNDER THE DMCA 19

and ultimate user. The material in question is stored on the service provider’s system or network for some period of time to facilitate access by users subsequent to the one who previously sought access to it. For subsection (b) to apply, the material must be made available on an originating site, transmitted at the direction of another person through the system or network operated by or for the service provider to a different person, and stored through an automatic technical process so that users of the system or network who subsequently request access to the material from the originating site may obtain access to the material from the system or network. 62

This is the operation illustrated above in Fig. 2 and Fig. 3. In addition to the definition provided in § 512(b)(1)—already quoted—subsection 512(b)(2) establishes a series of conditions for a service provider to benefit from this particular safe harbor. The first one refers to the integrity of the material, which has to be transmitted to subsequent users without modification to its content from the manner in which it was transmitted to the first user.63

This first condition satisfies two different goals. First, it protects the owner of the originating site, forbidding the service provider to send to a user content different from the one the website owner intended to show. Second, it assures that the function the service provider is performing has the neutral, automatic and passive character that the caching activity is supposed to have.64 This character may explain why this safe harbor, unlike the hosting and linking safe harbors,65 is granted without the requirement of lack of knowledge that the cached material is infringing. The same can be said about the transmission safe harbor, set forth in subsection 512(a), where—like in the caching safe harbor—the activity has to be carried “through an automatic technical process”,66 the material must be “transmitted through the system or network without modification of its content”,67 and no lack of knowledge is required on the service provider to benefit from the safe harbor.

62 See H.R. REP. NO. 105-551, pt. 2, at 52 (1998) (emphasis added). 63 See 17 U.S.C. § 512(b)(2)(A) (2000). The House Report states that “this restriction apply, for example, so that a service provider who caches material from another site does not change the advertising associated with the cached material on the originating site without authorization from the originating site”. 64 See Scott, supra note 37, at 152 (“The underlying presumption [in § 512(b)] seems to have been that a certain amount of ‘passive’ copying was an inevitable byproduct of communications on a digital network.”). 65 See 17 U.S.C. § 512(c) and (d) (2000). 66 17 U.S.C. § 512(a)(2) (2000). 67 17 U.S.C. § 512(a)(5) (2000).

Page 21: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

20 SEARCH ENGINE CACHES UNDER THE DMCA [2009

The next three conditions, § 512(b)(2)(B) through (D), are aimed to address some possible negative consequences of caching. Indeed, while caching involves important advantages in terms of bandwidth savings, reducing delays to popular sites and generally giving a better performance of the net, it may also entail some drawbacks. One of them is the risk of obsolescence of the cached pages, which could adversely affect both the users and the owner of the website. Another possible negative effect is that the website owner may not be able to know the exact number of visits or “hits” to her pages, since some of these visits will be made to the proxy cache, and not to the actual website.68 Finally, cache copies might constitute a way for users to access the website while eluding the conditions established by the owner—such as the use of a password or the payment of a fee. Acting to prevent or to minimize each of these risks will be a condition for the service provider to benefit from the safe harbor.

The risk of obsolescence is tackled by the condition set forth in § 512(b)(2)(B), which requires that:

(B) the service provider . . . complies with rules concerning the refreshing, reloading, or other updating of the material when specified by the person making the material available online in accordance with a generally accepted industry standard data communications protocol for the system or network through which that person makes the material available, except that this subparagraph applies only if those rules are not used by the person [who made the material available online] to prevent or unreasonably impair the intermediate storage to which this subsection applies; 69

This condition requires that service providers respect the instructions established by the website owner (“the person making the material available online”) provided that those instructions are established in accordance with a generally accepted industry standard. As far as web caching is concerned, the standard for those instructions can be found in the Hypertext Transfer Protocol (HTTP)70—the instructions being in the form of different kinds of HTTP headers. By means of HTTP headers webmasters may, for instance, instruct caching proxies not to cache a particular element at all. They may also set

68 For a discussion on the difficulty to measure the number of “hits” when a system caching is in place, see for example, Dunham, supra note 60, at 578-84. See also WESSELS, supra note 51, at 130-31. See also Jeffrey Goldberg, Why web usage statistics are (worse than) meaningless, http://goldmark.org/netrants/webstats/ (last visited Mar 24, 2008). 69 17 U.S.C. § 512(b)(2)(B) (2000). 70 See WESSELS, supra note 51, at 60-61. HTTP v. 1.1 is defined in the Request for Comments (RFC) 2616, supra note 55. Norms dealing with caching are established mainly in its Chapter 13.

Page 22: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

2009] SEARCH ENGINE CACHES UNDER THE DMCA 21

specific instructions regarding the freshness of the cached copies, e.g., requiring caches to contact the originating server for validation each time before serving a cached copy; or specify the maximum amount of time within which certain object will be considered fresh. These directions will indicate to the proxy cache whether the cached objects are fresh enough to be shown to a user without need of checking first with the originating server, or whether it is necessary to validate first its freshness before serving them to the user.71

The next requirement for the applicability of the caching safe harbor considers the possibility that the web owner might use a technology to gather information about the visits made to cached copies of her material. Here, § 512(b)(2)(C) requires that the service provider “does not interfere with the ability of technology associated with the material to return to [the person making the material available online] the information that would have been available to that person if the material had been obtained by the subsequent users . . . directly from that person”.72

Another potential harm for the owner of the originating site—as already mentioned—stems from the possibility that, through the cache, users might be able to access some materials while eluding the conditions the web owner has in place as a prerequisite to grant access to those materials, such as the payment of a fee, or a password identification. Subsection 512(b)(2)(D), addressing this issue, establishes that the service provider must permit “access to the stored material in significant part only to users of its system or network that have met those conditions and only in accordance with those conditions”.73

The next condition for the applicability of the caching safe harbor contemplates the case in which the material was made available at the originating site without the authorization of the copyright owner. In such a case, the service provider must respond “expeditiously to remove, or disable access to, the material that is claimed to be infringing upon notification of claimed

71 See WESSELS, supra note 51, at 111-31. 72 17 U.S.C. § 512(b)(2)(C) (2000). However, the service provider only has this obligation of not interfering if the technology used by the web owner:

(i) does not significantly interfere with the performance of the provider’s system or network or with the intermediate storage of the material; (ii) is consistent with generally accepted industry standard communications protocols; and (iii) does not extract information from the provider’s system or network other than the information that would have been available to [the person that made the material available online] if the subsequent users had gained access to the material directly from that person;

17 U.S.C. § 512(b)(2)(C)(i)-(iii) (2000). 73 17 U.S.C. § 512(b)(2)(D) (2000).

Page 23: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

22 SEARCH ENGINE CACHES UNDER THE DMCA [2009

infringement as described in subsection (c)(3)”.74 However, the obligation to remove or disable access to the allegedly infringing material only applies if “the material has previously been removed from the originating site or access to it has been disabled, or a court has ordered that the material be removed from the originating site or that access to the material on the originating site be disabled.”.75 According to the House Report, this last provision was added to this subsection because the caching storage “occurs automatically and unless infringing material has been removed from the originating site, the infringing material would ordinarily simply be re-cached.”76

IV. CAN A SEARCH ENGINE’S CACHE FIND SHELTER UNDER THE DMCA CACHING SAFE HARBOR?

After describing in Part II the operation of search engine caches and examining in Part III the system caching safe harbor and the particular function it intends to cover, this Part will address the issue of whether this safe harbor is wide enough as to make possible for a search engine’s cache to anchor in it. The analysis will conclude that—despite Field v. Google—a search engine’s cache falls outside the boundaries of the caching safe harbor. This is so, essentially, because the statute contemplates a very specific activity, and the one performed by a search engine’s cache happens to be a different one altogether—even though we may also refer to it using the word “cache”, or even chose to call it a “system-cache”.

As discussed above, the legislative history of the DMCA shows which is the particular function the caching safe harbor intends to cover—the one already described in Part III as proxy caching. Subsection 512(b)(1) defines system caching in somewhat neutral terms, but also in a very narrow and technological

74 17 U.S.C. § 512(b)(2)(E) (2000). 75 17 U.S.C. § 512(b)(2)(E)(i), (ii) (2000). 76 H.R. REP. NO. 105-551, pt. 2, at 52-53 (1998). A similar argument could certainly be made in the case of information location tools that link to infringing material: even if the search engine removes the link, unless the infringing material is removed from the originating site the search engine will index it again next time it crawls the site. However, § 512(d) requires that a search engine act expeditiously to remove the link regardless of whether the infringing material continues to be in the originating website. This inconsistency perhaps reflects a stronger negotiation power on the part of telecommunication companies that lobbied for the caching safe harbor, than on the part of other service providers such as search engines, not so mighty at the time the DMCA was negotiated. Moreover, the argument made by the H.R. Report fails to explain why—as established in 17 U.S.C. § 512(b)(2)(E)(i)—the service provider must take down the cached material when a court has ordered the removal of the material from the originating site but this removal hasn’t taken place yet.

Page 24: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

2009] SEARCH ENGINE CACHES UNDER THE DMCA 23

way.77 This certainly makes it problematic to fit into the language of that definition other services than the one the definition is actually contemplating. This is probably true even for the very similar, though not identical, kind of caching currently used by some ISPs to efficiently manage peer-to-peer file-sharing traffic minimizing bandwidth consumption.78 Hence, more important difficulties arise when trying to fit a very different function—such a search engine’s cache—into the definition provided for in section 512(b)(1).

The Field court, accepting the interpretation proposed by Google, concludes that Google’s cache matches the plain language of § 512(b)(1). However, as this part will discuss, such a conclusion seems flawed. Moreover, the ruling doesn’t examine whether or not the conditions established in § 512(b)(2) were met by the Google cache, stating that “[t]here is no dispute between the parties with respect to any of the other requirements of Section 512(b)”.79 These conditions may certainly be deemed to have been fulfilled by Google in that particular case—in relation to Field’s web pages. However, the reason why some of these conditions—the ones set forth in § 512(b)(2)(B), (C) and (D)—may be considered to have been satisfied is not that Google’s cache, in general, works in a way that adjusts to these requirements. It is simply that the circumstances of Field’s web pages made unnecessary for Google to comply with such requirements. Those conditions consist of respecting certain determinations adopted by the website owner, such as establishing directions regarding the refreshing, reloading, or other updating of the material; using certain technology to return to the website owner information about the use of the material; or establishing some conditions to access the original website. Therefore, if the website owner has not taken any of those determinations—as it was the case in the Field’s website—the requirements set forth in § 512(b)(2)(B), (C) and (D) simply do not apply. The relevant question then is whether a search engine’s cache complies with those requirements in relation to websites in which such conditions or technology have actually been established

77 17 U.S.C. § 512(b)(1) (2000). 78 See Guy Pessach, An International-Comparative Perspective on Peer-To-Peer File-Sharing and Third Party Liability In Copyright Law: Framing the Past, Present, and Next Generations’ Questions, 40 VAND. J. TRANSNAT’L L. 87, 122-32 (2007). The author discusses the use of active caching by ISPs to reduce the huge level of bandwidth consumption that stems from p2p file sharing networks. Through the so-called active caching, ISPs cache files exchanged through p2p protocols. The author expresses some doubts as to whether this kind of caching falls under the DMCA caching safe harbor, but nonetheless considers that the literal text of the safe harbor is still such that it might be applicable in the context of p2p caching, and proposes that courts “liberally and broadly interpret and apply [the] current caching [exemption], including in the context of active caching applications for managing peer-to-peer traffic” on account of policy considerations, to avoid “driv[ing] peer-to-peer platforms into being a second-rate speech resource”. See id. 79 Field v. Google Inc., 412 F.Supp.2d 1106, 1124-25 (D.Nev. 2006).

Page 25: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

24 SEARCH ENGINE CACHES UNDER THE DMCA [2009

or used by the web owner. As discussed below, at least for the requirement set forth in § 512(b)(2)(B)—the updating of the information—it seems safe to assume that where website owners have actually established conditions regarding the updating of the information, a search engine’s cache doesn’t comply with them, and thus fails to satisfy this statutory requirement.

The next subparts will discuss whether the operation of a search engine’s cache complies with the definition of caching activity and with the specific requirements set forth in the caching safe harbor provision. But before doing so it has to be considered whether a search engine satisfies some initial conditions for the applicability of any safe harbor, namely, whether it matches the definition of “service provider” established in § 512(k) and whether it satisfies the eligibility threshold criteria set forth in § 512(i)—issues the Field court didn’t address.

A. Does a search engine meet the threshold eligibility criteria for the safe harbors?

When considering the eligibility for the DMCA safe harbors, the first two issues to take into account are the definition for the term “service provider”, which is to be found in § 512(k), and the compliance with the threshold criteria set forth in § 512(i).

Subsection 512(k) provides two separate definitions for the term “service provider”. The first one describes the meaning of this term as used in § 512(a)—the transitory digital communications safe harbor—where “the term ‘service provider’ means an entity offering the transmission, routing, or providing of connections for digital online communications, between or among points specified by a user, of material of the user’s choosing, without modification to the content of the material as sent or received.”80 The second definition is a broader one, and describes the meaning of this term as used in any other subsection, thus including the system caching safe harbor: “[a]s used in this section, other than subsection (a), the term ‘service provider’ means a provider of online services or network access, or the operator of facilities therefor, and includes an entity described in [the former definition].”.81 The latter definition—which is the relevant one for the caching safe harbor—seems broad enough to safely assume that a search engine matches it.

80 17 U.S.C. § 512(k)(1)(A) (2000). 81 17 U.S.C. § 512(k)(1)(B) (2000).

Page 26: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

2009] SEARCH ENGINE CACHES UNDER THE DMCA 25

On the other hand, however, satisfaction of the threshold eligibility criteria—established in § 512(i)—may be seen as problematic for a search engine, particularly the condition set forth in § 512(i)(1), which requires the service provider to have “adopted and reasonably implemented, and inform[ed] subscribers and account holders of the service provider’s system or network of, a policy that provides for the termination in appropriate circumstances of subscribers and account holders of the service provider’s system or network who are repeat infringers;” 82

The problem here is that search engines typically don’t have subscribers or account holders but just users when it comes to the search services they offer.83 Indeed, to perform search queries with a search engine like Google, users are not required to hold an account, to register or to log in.84 Likewise, although some website owners may decide to pay a fee in order to have their website prominently shown in the search results as “sponsored links”, website owners don’t need to be subscribers or account holders to have their websites indexed and shown within the normal search results. Certainly, apart from its main search function, a search engine like Google provides many other services, such as e-mail or different kinds of hosting services, and in some of these services users do need to become subscribers or account holders.85

It seems clear that if a search engine doesn’t have subscribers or account holders, it cannot adopt and implement a policy providing for the termination of subscribers and account holders who are repeat infringers. However, this should not be considered as a failure to meet this eligibility requirement. Indeed, the most plausible conclusion seems to be that the condition of a implementing a policy providing for the termination of subscribers and account holders that are

82 See 17 U.S.C. § 512(i)(1) (2000). 83 See Craig W. Walker, Application of the DMCA Safe Harbor Provisions to Search Engines, 9 Va. J.L. & Tech. 2, ¶ 40 (2004), http://www.vjolt.net/vol9/issue1/v9i1_a02-Walker.pdf. 84 Subscribers, nonetheless, may run their search queries while logged in, and thus enjoying additional features like having Google keeping a record of the searches performed. See Google Accounts, https://www.google.com/accounts (last visited, Mar. 26, 2008). 85 Regarding the services directed to subscribers or account holders, Google states in its website that it has a policy providing for the termination of repeat infringers:

Account Termination.– Many Google Services do not have account holders or subscribers. For Services that do, Google will, in appropriate circumstances, terminate repeat infringers. If you believe that an account holder or subscriber is a repeat infringer, please follow the instructions above to contact Google and provide information sufficient for us to verify that the account holder or subscriber is a repeat infringer.

See Google: Digital Millennium Copyright Act, http://www.google.com/dmca.html (last visited, Mar. 26, 2008)

Page 27: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

26 SEARCH ENGINE CACHES UNDER THE DMCA [2009

repeat infringers can only be required for service providers which actually have subscribers or account holders—to whom the policy is meant to be applied.86

Certainly, it could be argued that in order to benefit from the safe harbors service providers must necessarily provide its services to subscribers or account holders—and thus implement the policy. However this would exclude the currently existing search engines from the safe harbors, which doesn’t appear to be an acceptable conclusion, not only because it would be undesirable that search engines were barred from the liability limitations, but because this would be inconsistent with other subsections of § 512. If a service provider lacking subscribers or account holders had to be deemed as failing that threshold condition—and thus ineligible for any safe harbor—then that condition would be inconsistent with the information location tools safe harbor set forth in § 512(d), since the normal situation for a search engine is the absence of subscribers or account holders—actually the language “subscribers” or “account holders” is not present in § 512(d). The existence of a safe harbor meant to cover the normal services of a search engine suggests that this threshold requirement can’t be construed in a way that would be impossible to satisfy by an ordinary search engine provider, because this would render the information location tools safe harbor inapplicable. Therefore, for the purposes of this analysis, we will assume that a search engine’s lack of subscribers or account holders regarding its search services—including a “Cached” links feature—doesn’t prevent it from benefiting from the safe harbors.

The second threshold eligibility condition requires that the service provider “accommodates and does not interfere with standard technical measures” used by copyright owners to identify or protect copyrighted works.87 However, the service provider only needs to satisfy this condition when the technical measures meet the conditions set forth in § 512(i)(2).88 While there is

86 See Walker, supra note 83, ¶ 41 (“[A] more reasonable interpretation is that search engines and other service providers that lack subscribers and account holders simply do not have to establish such policies in order to be eligible for the safe harbors.”). 87 See 17 U.S.C. § 512(i)(1)(B) (2000). 88 Subsection § 512(i)(2) requires that this technical measures

(A) have been developed pursuant to a broad consensus of copyright owners and service providers in an open, fair, voluntary, multi-industry standards process; (B) are available to any person on reasonable and nondiscriminatory terms; and (C) do not impose substantial costs on service providers or substantial burdens on their systems or networks.

In Field v. Google the plaintiff did not use any such technology. See the transcription of the deposition of Blake A. Field at 98:10-17, document attached to Declaration of William O’Callaghan in Support of Google’s Motion for Summary Judgment, Field v. Google Inc., 412 F.Supp.2d 1106 (D.Nev. 2006) (No. 57) [hereinafter, Field deposition].

Page 28: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

2009] SEARCH ENGINE CACHES UNDER THE DMCA 27

room for discussion as to whether standards meeting these conditions have actually been reached within the industry,89 it is not clear to what extent a search engine’s cache would take into account copyright notices.90

B. Does a search engine’s cache meet the definition of system caching?

Let’s now recall the main steps of the caching function, already

described above. First, a user of a service provider seeks access to an originating site. This means the user makes a request to access that site—this can be done not only by typing the URL in her browse, but also clicking on a hyperlink. Doing this, the user is directing her service provider to fetch and transmit to her—through its network—the material hosted in the originating site. The service provider’s proxy cache intercepts this request and forwards it to the originating site. The request then travels through the service provider’s network until it reaches the originating site—if the originating site is also a client of the same service provider—or until it leaves that network to enter one or more other networks if this is necessary to finally reach the remote site.91 Finally the originating site receives the request—as coming from the proxy.92 The origin server then answers that request, and thus the information requested travels now back to the proxy—either directly through the network of the service provider that runs the proxy cache, or going first through the network of the originating site’s service provider, if it is a different one. In any case, maybe after crossing several different networks and using different ways, the packets in which the information is divided finally arrive to the network of the first service provider

89 See Field deposition, supra note 88, at 98. See also Jane C. Ginsburg, Separating the Sony Sheep from the Grokster Goats: Reckoning the Future Business Plans of Copyright-Dependent Technology Entrepreneurs, 50 ARIZ. L. R. 577, 591 n.57 (2008) (stating that arguably one of these measures might be filtering, but noting that, given the statutory definition of technical standard measures, the present state of filtering technologies may not be enough, due to the lack of an inter-industry consensus). 90 In Field v. Google, plaintiff Field contended in his amended complaint that: “20. Third-party web page content added to the Google cache is done so without regard to any copyright notices that may be affixed to that content.” See First Amended Complaint for Copyright Infringement at 3:18-19, Field 412 F.Supp.2d 1106 (No. 5-1568). In its answer to the complaint, Google stated: “20. Google admits that its system cache process does not treat a web page containing a copyright notice differently from a web page that does not contain a copyright notice, but otherwise denies the allegations in Paragraph 20.”. See Google’s Inc.’s Answer to First Amended Complaint With Counterclaims at 3:14-16, Field 412 F.Supp.2d 1106 (No. 7-6482). 91 For an explanation of the basic steps of an Internet communication through one or more ISPs’ networks, see Jonathan Zittrain, Internet Points of Control, 44 B.C. L. REV. 653, 656-58 (2003). 92 See WESSELS, supra note 51, at 15-16, 70.

Page 29: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

28 SEARCH ENGINE CACHES UNDER THE DMCA [2009

and reach the proxy server. The proxy makes then a copy of the information, and stores it, and the information continues its way, through the network of the service provider, until it arrives at the user’s computer.93 This scheme is illustrated in the Fig. 2 above. After this, when a subsequent user makes the same request—i.e., types the same URL in her browser, and thus asks the service provider to transmit the same request to the originating site and to transmit back to her the originating site’s response—the service provider will avoid fetching again the material from the originating site by transmitting to this subsequent user the copy stored in the cache. This process is presented in the Fig. 3 above. The copy stored by the proxy cache is meant to be a perfect substitute for the original web page and therefore it can be used by the service provider as an adequate response to the requests made by subsequent users.

If one compares this function—which is the one contemplated by the caching safe harbor—with the one carried out by a search engine and its “Cached” links (see Fig. 1), it becomes obvious that they are two completely different functions. In the case of a search engine’ cache, indeed, the whole operation that takes place in the system caching is missing altogether.

The search engine doesn’t perform any of the steps just described. It doesn’t receive a request from a user asking it to fetch the content of a remote site; it doesn’t route and transmit this request through its network to the remote site; it doesn’t transmit back through its system or network the content of the originating site to the user who made the request; it doesn’t receive requests from subsequent users to access, through its network, the same originating site; it doesn’t serve these subsequent users a copy meant to be a perfect substitute of the requested page in order to avoid having to fetch again the content from the originating site. It is not that a search engine is performing the same function as a proxy cache but happens to skip some of the normal steps. It is, on the contrary, that a search engine’s cache carries out a different function altogether, and for this reason, as is only natural, none of the described steps takes place.

It could be argued that even if the DMCA had the proxy caching function in mind, the plain language of the statute is abstract or broad enough as to cover other activities—including a search engine’s cache. However, the analysis of § 512(b)(1) shows that the statutory definition is sufficiently precise as to exclude a search engine’s cache from the safe harbor.

93 For the clarity of the exposition, we have assumed in this scheme that the service provider that operates a proxy cache is the one closest to the user. Obviously, it could also be the case that the proxy is located with an upstream service provider. In this case the scheme would be essentially the same, only that the client of the proxy would be the downstream service provider, that is acting under the directions of one of its clients, that in turn could be again another downstream service provider, and so forth. Also, as pointed out supra in note 58, a hierarchy of caches may be in place.

Page 30: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

2009] SEARCH ENGINE CACHES UNDER THE DMCA 29

The first element of the statutory definition that seems problematic for a search engine’s cache is that, as provided in the first paragraph of § 512(b)(1), the caching storage has to be “intermediate”.94 The H.R. Report states that “[t]his storage is intermediate in the sense that the service provider serves as an intermediary between the originating site and the ultimate user.”95 It is clear that a search engine is not an intermediary in the same sense than a proxy cache that intercepts the request and serves the cached copy instead of the page requested—a search engine’s cache is instead a repository of archival copies. However, both the language of the statute and the explanation provided by the H.R. Report are certainly vague as for the meaning of the term “intermediate”, and this could allow a broad construction, under which a search engine’s cache would be “intermediate” because it serves as an intermediary between the original page and the user that requests the cached copy.96 Nonetheless, it must be noted that the H.R. Report explanation of the term “intermediate” is given in the context of a clear description of the caching function and thus it seems safe to assume that the intermediate character refers to the kind of intermediation that occurs through the use of a caching proxy.

In addition, the storage has to be “temporary”.97 This requirement doesn’t appear to be a problem for a search engine’s cache, since its copies are

94 See 17 U.S.C. § 512(b)(1) (2000). 95 See H.R. REP. NO. 105-551, pt. 2, at 52 (1998). 96 The Field court relied in Ellison v. Robertson, 357 F.3d 1072 (9th Cir.2004), to conclude that Google’s cache is an intermediate storage:

Citing the DMCA’s legislative history, the Ninth Circuit found that AOL’s storage of the materials was both ‘‘intermediate’’ and ‘‘transient’’ as required by Section 512(a). Like AOL’s repository of Usenet postings in Ellison which operated between the individuals posting information and the users requesting it, Google’s cache is a repository of material that operates between the individual posting the information, and the end-user requesting it.

Field, 412 F.Supp.2d at 1124 (internal citation omitted). However, unlike in a repository of Usenet postings where individuals post information, the information stored by Google’s cache is not posted by any individual, but is retrieved by Google from the origin websites by means of a robot or crawler. Moreover, the meaning of “intermediate” in § 512(a) and (b) might not be the same. See H.R. REP. NO. 105-551, pt. 2, at 51 (1998) (“New Section 512(b) applies to a different form of intermediate and temporary storage than is addressed in subsection (a).”). Finally, it has been argued by some commentators that the Ellison court erred when held that AOL’s storage was “intermediate and transient” in the sense of § 512(a). See e.g. Alicia L. Wright, Newsgroups Float Into Safe Harbor, And Copyright Holders Are Sunk, 2006 DUKE L. & TECH. REV. 19, ¶¶ 15-33. 97 See 17 U.S.C. § 512(b)(1) (2000).

Page 31: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

30 SEARCH ENGINE CACHES UNDER THE DMCA [2009

not meant to be permanent, but replaced periodically with the new “snapshot” taken by the crawler.98

Apart from the intermediate and temporary character of the storage, a key point in system caching—and one that doesn’t occur in the case of a search engine’s cache—is that the copies are made and kept by the service provider as a consequence of a first user requesting access to that material and the material being transmitted to that user, through the service provider’s system or network. This key element is required by the plain language of § 512(b)(1)(B), which establishes that the material has to be transmitted “from the [the originating site] through the system or network [controlled or operated by or for the service provider] to a person other than [the one that put made the material available on line] at the direction of that other person”.99

This ‘other person’ is the service provider’s user who requested the material—the one labeled as “First User” in Fig. 2 above. Indeed, the plain language of § 512(b)(1)(B) contemplates three different subjects. One is the person who made available the material online on the originating site. Another one is the person to whom the material is transmitted. And the third subject is the service provider, who is the one that carries out this transmission through its system or network, acting at the direction of the person who requested the material and to whom it transmits the material. Although not directly mentioned in this clause, the service provider is the key subject in the whole subsection, since it is the one that will enjoy the liability limitation. The service provider cannot be either the person who made available the material on line, nor the “other person” to whom the material has to be transmitted precisely by the service provider.100 When § 512(b) refers to the service provider, that is, to the 98 Nothing in subsection 512(b) suggests that the term “temporary” has the same meaning that the term “transient” in § 512(a). Thus there is probably no need to examine whether the storage of the cached copies could be considered “transient” in the sense of § 512(a). However, see Field, 412 F.Supp.2d at 1124 (“The Court finds that Google’s cache for approximately 14 to 20 days—like the 14 days deemed ‘‘transient storage’’ in Ellison—is ‘‘temporary’’ under Section 512(b) of the DMCA.”). 99 17 U.S.C. § 512(b)(1)(B) (emphasis added). The fact that the “system or network” mentioned in § 512(b)(1)(B) is the one “controlled or operated by or for the service provider” is obvious looking at § 512(b)(1). 100 Furthermore, the House Report doesn’t seem to leave room for confusion: “The material in question is stored on the service provider’s system or network for some period of time to facilitate access by users subsequent to the one who previously sought access to it.” H.R. REP. NO. 105-551, pt. 2, at 52 (1998) (emphasis added). In this sentence it seems clear that “the one” is a user of the service provider—and not the service provider itself. The next sentence of the House Report leads to the same conclusion:

For subsection (b) to apply, the material must be made available on an originating site, transmitted at the direction of another person through the system or network operated by or for the service provider to a different person, and stored through an automatic technical process so that users of

Page 32: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

2009] SEARCH ENGINE CACHES UNDER THE DMCA 31

one that will enjoy the liability limitation, it doesn’t call it “other person”, but “service provider”.101

A search engine’s cache doesn’t fit in this scheme. As represented above in Fig. 1, a search engine retrieves the web page from the origin server out of its own initiative, by means of a crawler, and not acting at the direction of someone else. Thus, a search engine cannot be the “service provider” considered in this subsection.

The surprising reading proposed by the defendant in Field v. Google—and accepted by the court—to overcome this obstacle consists of considering that this “other person” to whom and at whose direction the service provider transmits the material can be the service provider itself:

Field next claims that Google’s cache does not satisfy the requirements of Section 512(b)(1)(B). Section 512(b)(1)(B) requires that the material in question be transmitted from the person who makes it available online, here Field, to a person other than himself, at the direction of the other person. Field transmitted the material in question, the pages of his Web site, to Google’s Googlebot at Google’s request. Google is a person other than Field. Thus, Google’s cache meets the requirement of Section 512(b)(1)(B).102

This interpretation, however, fails to take into account the key factor implied by the statute: that the one who carries out the transmission, through its system or network, is the service provider—the one that seeks the liability limitation. The Field’s court interpretation considers instead that in this provision the one who carries out the transmission is the web site owner, an interpretation that seems to be barred by the plain language of the subsection 512(b)(1)(B) that states “through the system or network”,103 which, as the first

the system or network who subsequently request access to the material from the originating site may obtain access to the material from the system or network.

Id. (emphasis added). In this sentence, it is not clear whether the “another person” who directs the material to be transmitted is the same “different person” to whom the material is transmitted, but in any event it seems clear that the service provider transmits the material to a person different than itself. Since the “service provider” in this language is the one that will enjoy the liability limitation, an entity seeking the liability limitation conferred by this safe harbor—such as Google in Field v. Google—should be in the position of the service provided as described in the statutory language and the legislative history. 101 See, for example, 17 U.S.C. § 512(b)(1)(A) (2000). 102 Field, 412 F.Supp.2d at 1124. 103 Though in a sense it may be said that a website “transmits” its content when it receives a request—meaning that it serves the information requested and thus initiates its transmission—this doesn’t seem to be a possible meaning of the phrase “being transmitted”

Page 33: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

32 SEARCH ENGINE CACHES UNDER THE DMCA [2009

paragraph of § 512(b)(1) makes clear, is “system or network controlled or operated by or for the service provider” that seeks to qualify for the safe harbor.104 As a consequence of this, the Field court’s interpretation, contemplates only two of the three persons involved: the web owner and someone that gets the material from the originating website, labeled in the section a “person other than the person described in subparagraph (A)”—that is, other than the person who put the material online. Finally, Field court’s interpretation identifies this “another person” with the person that caches the material and that will gain the liability limitation, that is, with the entity described in the subsection as the “service provider”. Stated briefly, it seems that the interpretation held by the Field court could only be correct if we could change the language of § 512(b)(1)(B) into this alternative one: “(B) the material is transmitted from the person described in subparagraph (A) to the service provider at the direction of the service provider”. Again, this alternative reading is only possible if we disregard the language “through the system or network”, that clearly implies that the service provider is carrying out the transmission at the direction of a person different from itself.

Even assuming that such a broad interpretation could be an acceptable reading of this clause, this is not the only problem that a search engine’s cache faces to meet the statutory definition of caching. It has also some problems to meet subsection 512(b)(1)(C), which requires that “(C) the storage is carried out through an automatic technical process for the purpose of making the material available to users of the system or network who, after the material is transmitted

as used in this subsection, because the language “through the system or network” clearly implies the transmission is carried out by the service provider. The difference between carrying out a transmission and simply initiating it can be seen in § 512(a)(1). 104 See 17 U.S.C. § 512(b)(1) (2000). This is confirmed again by the House Report: “the material must be made available on an originating site, transmitted at the direction of another person through the system or network operated by or for the service provider to a different person,”. H.R. REP. NO. 105-551, pt. 2, at 52 (1998) (emphasis added). A condition that the transmission is performed “through a system or network controlled or operated by or for the service provider” is also required in the transitory digital communications safe harbor provided by § 512(a). The key importance of this language became apparent in the Napster case. When the record companies sued Napster for contributory and vicarious copyright infringement, Napster filed a motion for summary adjudication of the applicability of the DMCA § 512(a) safe harbor to its activities. The court acknowledged that “[t]he language of subsection 512(a) makes the safe harbor applicable, as a threshold matter, to service providers ‘transmitting, routing or providing connections for, material through a system or network controlled or operated by or for the service provider… .’ 17 U.S.C. § 512(a)”. A & M Records, Inc. v. Napster, Inc., No. C 99-05183 MHP, 2000 WL 573136, at *6 (N.D.Cal. May 12, 2000) (emphasis in original). After a thorough discussion on this point, the court concluded that “[b]ecause Napster does not transmit, route, or provide connections through its system, it has failed to demonstrate that it qualifies for the 512(a) safe harbor. The court thus declines to grant summary adjudication in its favor.” Id. at *8 (emphasis added).

Page 34: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

2009] SEARCH ENGINE CACHES UNDER THE DMCA 33

as described in subparagraph (B), request access to the material from the person described in subparagraph (A), . . . ”.105

This paragraph presents two main elements, and both are problematic. First, the purpose of the storage has to be to make the material available to users who request access to it “after the material is transmitted as described in subparagraph (B)”,106 that is, after the initial transmission represented in Fig. 2 above. This leads to the problem already discussed, namely, that in the operation of a search engine’s cache there is no previous transmission to a first user of the service provider.107 Second, § 512(b)(1)(C) refers to subsequent users who request access to the material from the originating site. This is what happens in the system caching contemplated by the safe harbor, where a subsequent user makes a request to access the originating site and as a response get a perfect substitute of that material fetched from a closer location, i.e., from the proxy cache.108 In the case of a search engine cache, however, the cached copies are not made available to users that request access from the originating site but to users who click on the “Cached” link.109

105 17 U.S.C. § 512 (b)(1)(C) (2000) (emphasis added). 106 Id. 107 When the Field court examined whether Google satisfied § 512(b)(1)(C), it simply obviated the fact that the material had never been “transmitted as described in subparagraph (B)”. Actually, the relevant text of this subsection that rises this point—the reference to the previous transmission—is omitted in the ruling’s recalling of the statute:

Finally, Field contends that Google’s cache does not fully satisfy the requirements of Section 512(b)(1)(C). Section 512(b)(1)(C) requires that Google’s storage of Web pages be carried out through “an automat[ed] technical process’ and be ‘for the purpose of making the material available to users . . . who . . . request access to the material from [the originating site].”.

Field, 412 F.Supp.2d at 1124 (Omissions in original). 108 In the caching function contemplated by the safe harbor, a subsequent user actually requests access to the material from the originating site, as required by § 512(b)(1)(C), i.e, types the URL of the originating site in her browser—or clicks on a link that goes to the originating site—, albeit this request never reaches the originating site because the service provider’s proxy cache, standing between the user and the remote site, solves the request sending the cached information to her. See, generally, WESSELS, supra note 53. See also Scott, supra note 37, at 147 (“The caching process is transparent to the requester, who performs precisely the same steps to access the work from cache as to access it all the way from the original host, and in most cases will not know where a particular copy originated.”) (emphasis added). 109 Plaintiff Field contended that subsection 512(b)(1)(C) establishes that the cached materials can only be made available to those who request access to the originating site:

The plain language of the statute requires that, in order to fall within the safe harbor, the cached material may be distributed ONLY to those requesting the originating site—not seeking the site’s contents from

Page 35: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

34 SEARCH ENGINE CACHES UNDER THE DMCA [2009

The Field court used the following reasoning to hold that this element was fulfilled by Google’s cache:

There is likewise no dispute that one of Google’s principal purposes in including Web pages in its cache is to enable subsequent users to access those pages if they are unsuccessful in requesting the materials from the originating site for whatever reason. Google’s cache thus meets the requirements of Section 512(b)(1)(C).110

Certainly, a user who clicks on a “Cached” link may (or may not) have previously requested access to the material from the originating site—most probably clicking on the main link of the search result—and having been (or not) unsuccessful at this, decides to request access to an archival copy of that material from the search engine’s cache, by clicking on the “Cached” link. However, this doesn’t change the fact that the cached copy is made available to the user that requests access to the archival copy from the search engine cache and not to the original material from the originating site.111

elsewhere—this is the “request access to the material from the person described in subparagraph (A) [originating site] language of the statute.

Plaintiff’s Reply to Google’s Opposition to Plaintiff’s Motion for Summary Judgment at 10-11, Field 412 F.Supp.2d 1106 (No. 50-2817918) (emphasis, alteration and lack of quotation marks in original). 110 Field, 412 F.Supp.2d at 1124 (internal citations omitted). It is remarkable, again, how here the court uses the term “subsequent” devoid of any meaning, apparently just to describe Google’s operation in a fashion that resembles more the statutory language. 111 Interestingly enough, when trying to show compliance with the requirement that the storage is carried out to make the material “available to users . . . who . . . request access to the material from the person described in subparagraph (A)” (see 17 U.S.C. § 512 (b)(1)(C)) (emphasis added), Google pointed out that “[i]t is unclear whether the word ‘from’ in this sentence modifies the phrase ‘request access’ or merely the word ‘material.’”. See Google Inc.’s Opposition to Plaintiff Blake A. Field’s Motion for Summary Judgment at 12, n.14, Field 412 F.Supp.2d 1106 (No. 45-2813413). Google claims that in any event its cache satisfies either interpretation. See id. However, it seems to be more comfortable with the latter, since it makes the remark that:

While the Court need not decide which of these interpretations is correct, Google notes that, later in the statute, when Congress speaks of a user accessing an originating site, it twice uses the word “directly” to modify the word “from.” See 17 U.S.C. § 512(b)(2)(C) (Ex. A attached) (“if the material had been obtained by the subsequent users described in paragraph (1)(C) directly from that person.”). The absence of the word “directly” in Section 512(b)(1)(C) suggests that storage need only be for the purpose of making available material from an originating site to those who request access to it.

Id. This is an interesting grammatical point, since it is true that the plain language of the provision may lead to the said confusion. However, even from a grammatical standpoint, the

Page 36: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

2009] SEARCH ENGINE CACHES UNDER THE DMCA 35

To sum up, a search engine’s cache carries out a function different enough from the one contemplated by the DMCA caching safe harbor and described by its plain language—particularly in §§ 512(b)(1)(B) and (C)—as to lead one to conclude that it falls out of the subject matter of the statutory provision.

C. Does a search engine’s cache meet the requirements set forth in § 512(b)(2)?

Even if a search engine’s cache could be deemed to satisfy the definition

of caching set forth in § 512(b)(1), matching this definition wouldn’t be enough to enjoy the liability limitation, since the conditions set forth in § 512(b)(2) must also be met.112

The Field court didn’t analyze whether Google’s cache met those conditions, for it seems that this was not disputed between the parties. Thus the court only examined Google’s compliance with § 512(b)(1):

Because Google has established the presence of the disputed elements of Section 512(b) as a matter of law, Field’s motion for summary judgment that Google is ineligible for the Section 512(b) safe harbor is denied. There is no dispute between the parties with respect to any of the other requirements

meaning that seems more likely is that the word “from” modifies the phrase “request access”. The text of the concerned subparagraph is as follows:

(C) the storage is carried out through an automatic technical process for the purpose of making the material available to users of the system or network who, after the material is transmitted as described in subparagraph (B), request access to the material from the person described in subparagraph (A), if the conditions set forth in paragraph (2) are met.

17 U.S.C. § 512(b)(2)(C) (2000) (emphasis added). The phrase “the material” appears twice. The first instance makes it clear that the material at issue is an already identified one—there’s no need at all to specify which is that material since it is already clear that it is the material that was posted in the originating site. That’s why the text is able to refer to it using the article “the”, without any additional specification. Obviously, when the phrase “the material” appears for the second time, no explanation to indicate which material this is is needed neither. Thus, the phrase “from the person described in subparagraph (A)” can’t have the purpose of indicating which material the statute is referring to, but to signal where is the user requesting access from.

Besides, this is the reading consistent with the caching function the safe harbor intended to cover. Moreover, it is easy to trace the wording of the previous drafts and acknowledge that the enacted version doesn’t intend to mean that the user is requesting access—anywhere—to the material that once was on the originating site, but that she is requesting access from the originating site. 112 See § 512(b)(1) (2000): “A service provider shall not be liable . . . if the conditions set forth in paragraph (2) are met.”

Page 37: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

36 SEARCH ENGINE CACHES UNDER THE DMCA [2009

of Section 512(b). Accordingly, Google’s motion for partial summary judgment that it qualifies for the Section 512(b) safe harbor is granted.113

For the purposes of our commentary, however, it will be useful to consider briefly to what extent a search engine’s cache might be deemed to comply with § 512(b)(2)’s requirements.

The purposes of a search engine’s “Cached” links feature are different from those of the function contemplated by § 512(b). While the system caching function that § 512(b) provides for intends to supply the user with a perfect substitute for the requested page—to the point that this function is normally transparent for the user, i.e. the user doesn’t know that in fact she is not visiting the actual page but its cached copy—a search engine’s “cached” copy has to be purposely accessed as such, clicking on the “Cached” link, and it clearly warns the user that it is just “the snapshot that we took of the page as we crawled the web. The page may have changed since that time. . . . This cached page may reference images which are no longer available. . . . These search terms have been highlighted: . . .”. Because this “cached” copy doesn’t intend to serve as a perfect substitute for the original page, some of the requirements set forth in § 512(b)(2) simply doesn’t make sense for Google’s cache operation. Perhaps the most clear example of this is the condition set forth in § 512(b)(2)(B), which requires that the service provider “complies with rules concerning the refreshing, reloading, or other updating of the material when specified by the person making the material available online in accordance with a generally accepted industry

113 Field, 412 F.Supp.2d at 1124-25 (emphasis added). One must conclude therefore that at some point the plaintiff came to accept that the conditions established in § 512(b)(2) were in fact satisfied by Google with respect to his particular website. This was not, however, plaintiff’s initial position, as can be seen for instance in his answers to Google’s interrogatories:

Interrogatory No. 22. If YOU contend that the Google system cache does not qualify for the safe harbor for service providers provided by 17 U.S.C. § 512(b) DESCRIBE all bases for that contention. Answer To Interrogatory No. 22. Google does not meet the definition of service provider. The storage is not intermediate and temporary. Google does not operate a system or network. Material is not cached as a result of an internet user requesting access to the originating site. Cached materials are not distributed to internet users as a result of that user being subsequent to the original system user who requested the originating page. Web page content distributed from the cache is modified from its original content. None of the conditions of Paragraph 2 of 17 U.S.C. § 512(b) are met by Google.

See Plaintiff’s Answers to Interrogatories of Defendant Google, Inc., at 12, document attached to the Declaration of William O’Callaghan in Support of Google’s Motion for Summary Judgment, Field v. Google Inc., 412 F.Supp.2d 1106 (D.Nev. 2006) (No. 57-2).

Page 38: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

2009] SEARCH ENGINE CACHES UNDER THE DMCA 37

standard data communications protocol for the system or network through which that person makes the material available . . . ”114

As discussed above, the standard most likely to be used by a website owner to specify “rules concerning the refreshing, reloading, or other updating of the material” is the Hypertext Transfer Protocol (HTTP).115 The kind of rules the website owner may specify under this Protocol—by means of HTTP headers—include directions regarding the freshness of the cached material, such as requiring the service provider to contact the originating site for validation before serving any cached information, or specifying the maximum amount of time within which a cached object may be considered fresh and thus can be served without the need of checking first with the originating server—and the need to refresh it when that time has expired.

These directions are observed by proxy caches. However, a search engine cache will not comply with them. This makes perfect sense, since these directions are intended for the system caching function, and again, the function performed by a search engine—even if labeled as a “system cache”—is a completely different one. Therefore, a web page in which, through HTTP headers, the web owner has specified for instance that is not to be cached, when visited by a search engine’s robot it will be nonetheless copied and indexed, and the copy or “snapshot” of it will be made available by the search engine through a “Cached” link. If these specifications established a certain limit of time for expiration, the search engine will nonetheless continue showing up in the cached copy regardless that the time limit may have already passed. Likewise, if there is an instruction that the service provider has to check the originating server for the freshness of the cached copy before serving it, the copy will still be shown to a user when she clicks on the “Cached” link—without any previous checking with the originating server. All this is by no means a fault of the search engine. It is just that all those instructions—the ones contemplated by § 512(b)(2)(B)—are intended to regulate the system caching function, but not the activity of storing and displaying old copies under the name of “cached”. The standards that apply to the latter function are those discussed above in the form of “robots.txt” files and HTML tags. These standards don’t deal with the “refreshing, reloading, or other updating of the material”, but simply with allowing or denying robots to copy the page for indexing purposes, and to make it available to users through “Cached” links.

As a consequence, since a search engine’s robot will not comply with the specifications the web owner may have established on the refreshing,

114 17 U.S.C. § 512(b)(2)(B) (2000). 115 See supra note 70.

Page 39: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

38 SEARCH ENGINE CACHES UNDER THE DMCA [2009

reloading or other updating of the cached material, it will fail to comply with the requirement set forth in § 512(b)(2)(B).

Arguably, however, if simply there are no directions to comply with—because the web owner has not established any—the search engine couldn’t be considered as failing to meet the requirement. This was the situation in Field v. Google,116 and that’s why we can conclude that in this case Google indeed met the requirement in § 512(b)(2)(B). By the same token, we may conclude that the requirement set forth in § 512(b)(2)(C) was also met by Google in the specific set of facts of Field v. Google. This subsection requires that “the service provider does not interfere with the ability of technology associated with the material to return to the [web owner] the information that would have been available to that person if the material had been obtained . . . directly from that person”.117 Since there was no such technology in place, Google obviously couldn’t interfere with it, and couldn’t be deemed to fail the statutory requirement. As for the condition established in § 512(b)(2)(D), the same conclusion may be reached, and again on the same grounds. This requirement provides that if the website owner

[H]as in effect a condition that a person must meet prior to having access to the material, such as a condition based on payment of a fee or provision of a password or other information, the service provider permits access to the stored material in significant part only to users of its system or network that have met those conditions and only in accordance with those conditions; . . . 118

No such conditions were established by Field, and therefore Google also met this statutory requirement. Furthermore, it seems safe to assume that this requirement will generally be meet by a search engine’s cache, since the robot will not access information that is protected by a password or otherwise—unlike in the real caching, where the ISP, acting as a proxy, intercepts all the information accessed by the user, and is able to cache it.

On the other hand, § 512(b)(2)(A) requires that service provider not modify the content of the material. However, the copies that are displayed to users through “Cached” links have been slightly modified—the terms used in the search query are shown highlighted in different colors. It may be an open question to determine whether such modification is enough to consider that the requirement in § 512(b)(2)(A) has not been met. Probably, under de minimis

116 Field, 412 F.Supp.2d. 117 17 U.S.C. § 512(b)(2)(C) (2000). 118 17 U.S.C. § 512(b)(2)(D) (2000).

Page 40: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

2009] SEARCH ENGINE CACHES UNDER THE DMCA 39

rationale, this alteration doesn’t amount to a failure in the compliance with that requirement.

The last condition, set forth in § 512(b)(2)(E), applies to the situation in which the website owner “makes that material available online without the authorization of the copyright owner”.119 This obviously was not the case in Field v. Google, since Field owned the copyrights in his writings.

D. A harbor too shallow for a search engine’s cache to anchor in.

The preceding analysis leads to the conclusion that the system caching

safe harbor established in § 512(b) is not wide enough to cover a search engine’s cache.120 Being a different activity, it simply doesn’t fit in the statutory description. The plain language of this subsection is so narrowly tailored to address a particular function, that it cannot be construed as encompassing the one performed by search engines.

This raises two questions. First, whether it really matters after all. Second, if it does, whether the safe harbor statutory regime should be amended to deal with this issue. Both are addressed in the next part.

V. DOES IT MATTER?

The fact that a service provider’s activity doesn’t meet the requirements for a particular DMCA safe harbor doesn’t mean that the service provider is necessarily liable, either directly or secondarily, since the activity might be non infringing or already protected by other defenses.121 Therefore, from a

119 17 U.S.C. § 512(b)(2)(E) (2000). 120 From a comparative point of view, it is worth noting that a Belgian court in Copiepresse v. Google (Tribunal de Première Instance de Bruxelles, Feb. 13, 2007, No. 06/10.928/C) has held that Google’s cache doesn’t fall under the caching safe harbor set forth in article 13 of the European Directive on Electronic Commerce (Council Directive 2000/31, 2000, O.J. (L 178)), which parallels the DMCA caching safe harbor. It should be noted, however, that there are some differences between both statutes and that the language of the European Directive when it comes to the definition of “caching” makes it more straightforward that a search engine’s cache falls outside its scope. 121 This was clearly stated by the H.R. Report:

[N]ew section 512 does not create any new liabilities for service providers or affect any defense available to a service provider. Enactment of new Section 512 does not bear upon whether a service provider is or is not an infringer when its conduct falls within the scope of new Section 512. Even if a service provider’s activities fall outside the limitations on liability

Page 41: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

40 SEARCH ENGINE CACHES UNDER THE DMCA [2009

substantive point of view, the question of whether a search engine’s cache falls under the caching safe harbor would only be relevant if, as a result of the operation of its cache, the search engine could face liability in the first place and no other defenses were available.122

The Field court, before analyzing the applicability of the caching safe harbor, concluded that Google’s conduct didn’t constitute direct infringement, on the grounds that it was automated and non volitional.123 It also held that even assuming that Google engaged in direct infringement, it had successfully established the defenses of implied license, estoppel and fair use.124 It may seem thus that, from a substantive point of view, the availability of the caching safe harbor will hardly be relevant for a search engine’s cache, since, according to Field v. Google it appears to be either non infringing or already protected by other defenses.

However, Field v. Google might not provide reliable guidance for future claims on the issue of the underlying liability of a search engine’s cache. First, as for direct liability, the approach taken by the Field court appears to be in contrast with more recent judicial decisions, such as the Ninth Circuit’s opinion in Perfect 10 v. Amazon.125 Second, the issue of secondary liability was not analyzed at all in Field v. Google, since plaintiff Field didn’t claim that Google was either contributory or vicariously liable.126 The analysis of the defenses asserted in Field v. Google might also provide limited guidance. While those defenses seem appropriate in relation with the particular facts of the case, different conclusions might be reached with different facts. This seems to be particularly clear in relation to fair use, due to its inherent unpredictability,127 but

specified in the bill, the service provider is not necessarily an infringer; liability in these circumstances would be adjudicated based on the doctrines of direct, vicarious or contributory liability for infringement as they are articulated in the Copyright Act and in the court decisions interpreting and applying that statute, which are unchanged by new Section 512. In the event that a service provider does not qualify for the limitation on liability, it still may claim all of the defenses available to it under current law. New section 512 simply defines the circumstances under which a service provider, as defined in this new Section, may enjoy a limitation on liability for copyright infringement.

H.R. REP. NO. 105-551, pt. 2, at 64 (1998). 122 In addition, even if there were other defenses available, a safe harbor provision would still be relevant from a procedural point of view, since it is likely to be less burdensome for a defendant to assert the applicability of the safe harbor than other defenses. 123 See Field, 412 F.Supp.2d at 1115. 124 See Field, 412 F.Supp.2d at 1114-15. 125 Perfect 10, Inc. v. Amazon.com, Inc., 508 F.3d 1146 (9th. Cir. 2007). 126 See Field, 412 F.Supp.2d at 1114, n.8. 127 On the problem of fair use uncertainty, particularly when it comes to uses of new technologies, see, e.g. Michael W. Carroll, Fixing Fair Use, 85 N.C. L. REV. 1087 (2007).

Page 42: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

2009] SEARCH ENGINE CACHES UNDER THE DMCA 41

also the outcome of the other defenses could turn out to be different, for example in a context where the materials posted on the original website were infringing in the first place. Let us consider briefly these issues.

As for direct liability, the plaintiff in Field v. Google exclusively claimed that Google engaged in direct infringement when it showed the cached copies to a user who clicked on the “Cached” links.128 He didn’t contend that the initial copies made by Google’s robot and stored in Google’s servers were infringing, and thus this point was not considered by the court.129 Plaintiff Field contended that Google’s response to a user that clicks on a “Cached” link entails both an unauthorized reproduction and an unauthorized distribution of the work.130 The court dismissed this claim stating that

[W]hen a user requests a Web page contained in the Google cache by clicking on a “Cached” link, it is the user, not Google, who creates and downloads a copy of the cached Web page. Google is passive in this process. Google’s computers respond automatically to the user’s request. Without the user’s request, the copy would not be created and sent to the user, and the alleged infringement at issue in this case would not occur. The automated, non-volitional conduct by Google in response to a user’s request does not constitute direct infringement under the Copyright Act. See, e.g., Religious Tech. Ctr., 907 F.Supp. at 1369—70 (direct infringement requires a volitional act by defendant; automated copying by machines occasioned by others not sufficient); CoStar Group, 373 F.3d at 555; Sega Enters. LTD v. MAPHIA, 948 F.Supp. 923, 931—32 (N.D.Cal.1996). Summary judgment of non-infringement in Google’s favor is thus appropriate.131

See also, generally, Barton Beebe, An Empirical Study Of U.S. Copyright Fair Use Opinions, 1978-2005, 156 U. PA. L. REV. 549 (2008). 128 See id. At least, this is the only claim considered by the Field court. Actually, however, the First Amended Complaint listed also as a defendant’s acts of infringement the mere fact of making available the cached copies through “Cached” links. See First Amended Complaint for Copyright Infringement at 4:6-7, Field 412 F.Supp.2d 1106 (No. 5-1568). 129 See id. at 1115. 130 See id. Plaintiff Field claimed that some user actually clicked on the “Cached” links and retrieved copies of his works. Ironically enough, this user turned out to be Field himself—only to make it even more difficult for the court to accept his claim of statutory damages. See Field deposition, supra note 88 at 111. 131 See Field, 412 F.Supp.2d at 1115. Criticizing this ruling, see Rebecca Bolin, Locking Down The Library: How Copyright, Contract, And Cybertrespass Block Internet Archiving, 29 HASTINGS COMM. & ENT L.J. 1, 24 (2006) (“This volition exception is even more painful than Netcom’s, and extended even slightly beyond the facts at issue leads to the absurd result that your robot or other code can be your non-volitional agent.”).

Page 43: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

42 SEARCH ENGINE CACHES UNDER THE DMCA [2009

In Perfect 10 v. Amazon, however, the Ninth Circuit held that Google’s image search engine, when showing the thumbnail images to a user that runs an image search query, publicly displays those images and thus there is a prima facie direct infringement of this exclusive right:

The computer owner shows a copy “by means of a . . . device or process” when the owner uses the computer to fill the computer screen with the photographic image stored on that computer, or by communicating the stored image electronically to another person’s computer. 17 U.S.C. § 101. In sum, based on the plain language of the statute, a person displays a photographic image by using a computer to fill a computer screen with a copy of the photographic image fixed in the computer’s memory. There is no dispute that Google’s computers store thumbnail versions of Perfect 10’s copyrighted images and communicate copies of those thumbnails to Google’s users. Therefore, Perfect 10 has made a prima facie case that Google’s communication of its stored thumbnail images directly infringes Perfect 10’s display right. 132

Moreover, since those thumbnail images are stored by Google out of its own initiative, the Ninth Circuit distinguishes this situation from the one where an entity like a bulletin board passively stores and communicates the content uploaded by users:

Because Google initiates and controls the storage and communication of these thumbnail images, we do not address whether an entity that merely passively owns and manages an Internet bulletin board or similar system violates a copyright owner’s display and distribution rights when the users of the bulletin board or similar system post infringing works. Cf. CoStar Group, Inc. v. LoopNet, Inc., 373 F.3d 544 (4th Cir.2004).133

This holding strongly implies that Google’s active role in storing the thumbnail images prevents reaching the conclusion that its conduct lacked the necessary volition to find a direct infringement—a volition that was missing in CoStar v. LoopNet. Arguably, under this approach neither could the lack of volition be found in Google’s storage and communication of its cached

132 Perfect 10, Inc. v. Amazon.com, Inc., 508 F.3d 1146, 1160 (9th. Cir. 2007) (footnote omitted). The Ninth Circuit opinion also implies that the search engine violates the distribution right when it shows the thumbnail images to the users. See id. at 1162-63. 133 Id. at n.6

Page 44: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

2009] SEARCH ENGINE CACHES UNDER THE DMCA 43

copies,134 and thus the approach adopted by the Ninth Circuit seems to be directly opposed to the one followed by the Field court.135

Even if a plaintiff shows a prima facie case of direct infringement, the defendant may establish that its use of cached copies is a fair use and thus not infringing.136 While a fair use defense might prevail in many cases this result is difficult to predict. A fair use defense must be analyzed in a case by case basis and the outcome is uncertain. Moreover, in a preliminary injunction motion, “once the moving party has carried its burden of showing a likelihood of success on the merits, the burden shifts to the non-moving party to show a likelihood that its affirmative defense [such as fair use] will succeed.”137

In Field v. Google the content stored and made available through “Cached” links was copied from the site where it had been legally posted by its copyright owner, who allowed free access to that material and didn’t exploit it commercially. Let us envision, however, a different situation. Imagine that the copyright owner of a work of authorship is exploiting it commercially and does not provide free access to it. Imagine as well that someone copies the copyrighted material and publishes it on a website without the authorization of the copyright owner. Consider further that a search engine makes a copy of the 134 See Matthew D. Lawless, Comment, Against internet volition, 18 ALB. L.J. SCI. & TECH. (forthcoming, 2008) (on file with the author). The author explains how in Parker v. Google, Inc., 422 F. Supp. 2d 492, 497 (D. Pa. 2006), in order to find that a search engine’s activity was non-volitional, the court considered only the first part of the Netcom test—that the copying is automatic—and omitted the second part of it—that it has to be caused by a third party request. He acknowledges that “[i]f the court had applied that portion of the volition test, it could hardly have failed to recognize the affirmative steps involved in Google’s directing its bots to troll and cache the Web.” See Lawless, supra, at __. However, since—following the Ninth Circuit approach in CoStar Group, Inc. v. LoopNet, Inc., 373 F.3d 544, 550 (4th Cir. 2004)—volition essentially means “a nexus sufficiently close and causal to the illegal copying”, and this would not be the case of a search engine’s activity, he suggests that the understanding of the “third party request” prong could be broaden so that there is no need to skip this part of the test to find non volition. Lawless thus suggests that it could be understood that a website owner who doesn’t opt out of the search engine crawling and copying is in a way requesting the search engine to index and cache her site, and thus the search engine would be just automatically responding to a third party request. Id. at __. 135 As we see below, the finding of a prima facie direct infringement by a search engine’s cache would refer only to the text contained in the cached copies, and not to the images because they are not stored but just in-line linked. 136 See 17 U.S.C. § 107 (2000) (“. . . the fair use of a copyrighted work, . . . is not an infringement of copyright.”). 137 See Perfect 10, 508 F.3d at 1158 (citing Gonzales v. O Centro Espirita Beneficente Uniao do Vegetal, 546 U.S. 418, 429 (2006)). In Perfect 10 v. Amazon the Ninth Circuit established that in a preliminary injunction motion, defendant bears the burden of demonstrating a likelihood that its fair use defense will succeed, and thus reversed the District court ruling that plaintiff Perfect 10 also had the burden of demonstrating a likelihood of overcoming defendant Google’s fair use defense. See Perfect 10, 508 F.3d at 1158.

Page 45: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

44 SEARCH ENGINE CACHES UNDER THE DMCA [2009

material posted on the infringing website, stores it on its cache, and makes it available through a “Cached” link. Imagine finally that the copyright owner—and not the publisher of the infringing website from where the material has been cached—brings a lawsuit against the search engine.

A situation like this seems likely to affect the fair use analysis. Certainly, the mere fact that the cached copy has been taken from an infringing site does not in itself prevent the finding of a fair use.138 However, the circumstances in this hypothetical situation could be relevant for the first statutory factor,139 when assessing whether the use of the cached copies is superseding rather than transformative.140 Second and most importantly, this could also affect the analysis of the fourth statutory factor which refers to “the effect of the use upon the potential market for or value of the copyrighted work”.141 The Field court held that “[t]he fourth fair use factor cuts strongly in favor of fair use in the absence of any evidence of an impact on a potential market for Field’s copyrighted works.”142 It seems then that a different conclusion might be reached if the copyright owner is commercially exploiting the works unlawfully posted in the original website and then cached by the search engine.143 In addition, the Field court’s analysis of an additional factor

138 In Perfect 10 v. Amazon, the Ninth Circuit rejected Perfect 10’s argument that the use of thumbnails stemming from infringing websites is inherently not fair use, and stated that

Unlike the alleged infringers in Video Pipeline and Atari Games, who intentionally misappropriated the copyright owners’ works for the purpose of commercial exploitation, Google is operating a comprehensive search engine that only incidentally indexes infringing websites. This incidental impact does not amount to an abuse of the good faith and fair dealing underpinnings of the fair use doctrine. Accordingly, we conclude that Google’s inclusion of thumbnail images derived from infringing websites in its Internet-wide search engine activities does not preclude Google from raising a fair use defense.

Id. at 1164, n.8. 139 17 U.S.C. § 107(1) (2000). 140 See Perfect 10, Inc. v. Amazon.com, Inc., 508 F.3d 1146, 1165-66 (9th. Cir. 2007). 141 17 U.S.C. § 107(4) (2000). On the practical importance of this factor, see Beebe, supra note 127 at 582 (“[M]ost courts and commentators assume that, in practice, the outcome of the section 107 test relies primarily on the outcome of the fourth factor”). In his empirical analysis of fair use opinions, he found that “the outcome of factor four coincided with the outcome of the overall test in 83.8% of the 297 dispositive opinions” examined, and compared to the influence of other factors, he notes that “[t]hough hardly conclusive, this breakdown is consistent with the conventional view that factor four exerts the stronger influence on the outcome of the test.” See id. at 584-85. 142 See Field, 412 F.Supp.2d at 1123. 143 In Perfect 10 v. Google, Inc., 416 F.Supp.2d 828 (C.D.Cal.2006) the district court held that

Google’s use of thumbnails likely does harm the potential market for the downloading of P10’s reduced-size images onto cell phones.

Page 46: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

2009] SEARCH ENGINE CACHES UNDER THE DMCA 45

dealing with defendant’s good faith suggests that this additional factor would be neutral in a case in which the copyright owner didn’t take any affirmative steps to have her works included in the search engine’s cache, but the search engine’s robot copied the works from an infringing website.144 To be sure, these elements will not necessarily preclude a fair use defense, but they may make the outcome even more uncertain.

Moreover, a situation like the one just described would also affect the availability of other defenses. Indeed, if the claim against the search engine is brought by a copyright owner who didn’t make the material freely available on line—the search engine having taken the snapshot from an infringing website—it seems clear that neither the implied license nor the estoppel defense could be sustained.145

In addition, the issue of risk of secondary liability—though not claimed in Field v. Google—should also be considered. Again, the Ninth Circuit opinion in Perfect 10 v. Amazon might be relevant in this respect, as it addresses the issue of secondary liability for in-line links. Upholding the district court

Google argues that because “P10 admits [that] this market is growing,” its “delivery of thumbnail search results” must not be having a negative impact. Apart from being more relevant to the quantification of damages, this weak argument overlooks the fact that the cell phone image-download market may have grown even faster but for the fact that mobile users of Google Image Search can download the Google thumbnails at no cost. Commonsense dictates that such users will be less likely to purchase the downloadable P10 content licensed to Fonestarz.

Id. at 851 (emphasis in original). The Ninth Circuit finally reversed this ruling, but only on the grounds that this potential harm was hypothetical, because “the district court did not make a finding that Google users have downloaded thumbnail images for cell phone use.” See Perfect 10, 508 F.3d at 1168. For a critical analysis of the district court ruling on this point see Britton Payne, Imperfect 10: Digital Advances and Market Impact in Fair Use Analysis, 17 FORDHAM INTELL. PROP. MEDIA & ENT. L.J. 279 (2006). 144 The Field court stated that

Field’s own conduct stands in marked contrast to Google’s good faith. Field took a variety of affirmative steps to get his works included in Google’s search results, where he knew they would be displayed with ‘‘Cached’’ links to Google’s archival copy and he deliberately ignored the protocols that would have instructed Google not to present ‘‘Cached’’ links.

Comparing Field’s conduct with Google’s provides further weight to the scales in favor of a finding of fair use.

Field, 412 F.Supp.2d at 1123. 145 The implied license defense, moreover, might be applied more narrowly by some courts. See e.g. Metro-Goldwyn-Mayer Studios, Inc. v. Grokster, Ltd., 518 F. Supp.2d 1197, 1226 (C.D. Cal. 2007) (“[T]hough not seemingly acknowledged by the district court in Field, the Ninth Circuit has explained that the implied license doctrine in copyright cases is to be very narrowly construed”).

Page 47: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

46 SEARCH ENGINE CACHES UNDER THE DMCA [2009

decision, the Ninth Circuit applied what the lower court referred to as the “server test”. Under that test, the court held that since the thumbnail images shown in Google’s image search results were actually stored in Google’s servers, these thumbnails constituted a prima facie infringement of the display and distribution rights—although it was finally deemed to be fair use. Conversely, Google was not directly infringing those rights when it in-line linked to full-sized images because those were not stored in its servers.146 The Ninth Circuit expressly applied this test to Google’s cache. As seen above,147 a cached copy, such as the ones made available in search results through a “Cached” link, consists only of the HTML code of the cached web page. The images shown in a cached copy are retrieved from the originating source by means of an in-line link, following the HTML instructions included in that page. Thus the Ninth Circuit stated that:

Because Google’s cache merely stores the text of web pages, our analysis of whether Google’s search engine program potentially infringes Perfect 10’s display and distribution rights is equally applicable to Google’s cache. Perfect 10 is not likely to succeed in showing that a cached web page that in-line links to full-size infringing images violates such rights. For purposes of this analysis, it is irrelevant whether cache copies direct a user’s browser to third-party images that are no longer available on the third party’s website, because it is the website publisher’s computer, rather than Google’s computer, that stores and displays the infringing image.148

While the in-line linked images that appear on the cached copies were not deemed to be direct infringement, they do raise the issue of contributory liability. Thus, even though the district court in Perfect 10 v. Google had ruled that Perfect 10 was not likely to succeed on the merits of its secondary liability claims with respect to the in-line linking to infringing full-size images, the Ninth Circuit reversed this holding with respect to the contributory infringement claim, stating that the district court “failed to consider whether Google and Amazon.com knew of infringing activities yet failed to take reasonable and feasible steps to refrain from providing access to infringing images.”149

146 See Perfect 10, Inc. v. Amazon.com, Inc., 508 F.3d 1146, 1159-63 (9th. Cir. 2007). 147 See supra note 19. 148 See Perfect 10, 508 F.3d at 1162. 149 Id. at 1177. The test for contributory infringement used by the Ninth Circuit was the following:

[A] computer system operator can be held contributorily liable if it “has actual knowledge that specific infringing material is available using its system,” Napster, 239 F.3d at 1022, and can “take simple measures to prevent further damage” to copyrighted works, Netcom, 907 F.Supp. at 1375, yet continues to provide access to infringing works.

Page 48: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

2009] SEARCH ENGINE CACHES UNDER THE DMCA 47

Interestingly enough, the Ninth Circuit referred to the DMCA safe harbors in this way:

Due to this error, the district court did not consider whether Google and Amazon.com are entitled to the limitations on liability set forth in title II of the DMCA. The question whether Google and Amazon.com are secondarily liable, and whether they can limit that liability pursuant to title II of the DMCA, raise fact-intensive inquiries, potentially requiring further fact finding, and thus can best be resolved by the district court on remand. We therefore remand this matter to the district court for further proceedings consistent with this decision.

To sum up, it appears that at least in some cases, the operation of a search engine’s cache, might be deemed to constitute a prima facie direct infringement of copyright,150 that the defenses against this finding—namely the Id. at 1172 (emphasis in original). Applying this test to the case before the court, the Ninth Circuit reasoned as follows:

Here, the district court held that even assuming Google had actual knowledge of infringing material available on its system, Google did not materially contribute to infringing conduct because it did not undertake any substantial promotional or advertising efforts to encourage visits to infringing websites, nor provide a significant revenue stream to the infringing websites. Perfect 10, 416 F.Supp.2d at 854-56. This analysis is erroneous. There is no dispute that Google substantially assists websites to distribute their infringing copies to a worldwide market and assists a worldwide audience of users to access infringing materials. We cannot discount the effect of such a service on copyright owners, even though Google’s assistance is available to all websites, not just infringing ones. Applying our test, Google could be held contributorily liable if it had knowledge that infringing Perfect 10 images were available using its search engine, could take simple measures to prevent further damage to Perfect 10’s copyrighted works, and failed to take such steps.

The district court did not resolve the factual disputes over the adequacy of Perfect 10’s notices to Google and Google’s responses to these notices. Moreover, there are factual disputes over whether there are reasonable and feasible means for Google to refrain from providing access to infringing images. Therefore, we must remand this claim to the district court for further consideration whether Perfect 10 would likely succeed in establishing that Google was contributorily liable for in-line linking to full-size infringing images under the test enunciated today.

Id. at 1172-73 (footnote omitted). 150 From a comparative point of view it is interesting to refer again to the Belgian case Copiepresse v. Google (Tribunal de Première Instance de Bruxelles, Feb. 13, 2007, No. 06/10.928/C). Copiepresse SCRL, the collective rights management society representing Belgian publishers of daily newspapers in French and German, filed a lawsuit against Google—joined later on by other collective rights societies—seeking an injunction to remove from Google News and from Google’s Cache any articles and images owned by the

Page 49: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

48 SEARCH ENGINE CACHES UNDER THE DMCA [2009

implied license, estoppel and fair use defenses—might fall short in certain situations, and that there is also a risk of secondary liability for in-line links contained in the cached copies. As a consequence, the availability of a safe publishers represented by Copiepresse. A preliminary injunction against Google was issued in September 5, 2006. For the most part, this injunction was upheld by the final ruling, rendered in February 13, 2007—currently under appeal. The Court of First Instance of Brussels held that the reproduction of articles’ headlines and excerpts of articles by Google News, and also the storage of articles and documents making them available to the public through “Cached” links constitute violations of the Belgian Act on Authors’ Rights and Neighboring Rights (Loi relative au droit d’auteur et aux droits voisins) of June 30, 1994. It also held that Google cannot rely on any exception provided by the statute. Google was ordered to remove all these materials from Google News and to remove also the “Cached” links from its search results. As for the Google cache, the court noted that Google’s robots make a copy of each web page they visit, and that this copy is stored on Google’s memory and made available to Internet users through a “Cached” link—“en cache”, and held that this is a reproduction of the work and a communication to the public, in the sense of Article 1 of the Belgian Law on the authors’ rights and neighboring rights. This provision grants the author the exclusive right to reproduce or authorize the reproduction of the work in any way and under any form whatsoever. As amended in 2005, this article also grants the author the exclusive right to communicate the work to the public by any means, including the making available to the public of the work in such a way that members of the public may access the work from a place and at a time individually chosen by them.

Like in the Field case, Google argued that it is the user, and not Google, who creates a copy of the work, and thus the user is the author of the eventual reproduction and communication to the public, while Google limits itself to the provision of the facilities that allow Internet users to make such a communication to the public. Unlike in Field v. Google, the Belgian court rejected this argument. It underscored that Google is the author of the first reproduction—the copy made by the robot and stored in Google’s memory. It also pointed out that it is Google that makes available this copy on its own site through the “en cache” link—as opposed to the links that send the user to the originating site—and therefore Google’s role is not limited to the mere provision of facilities. The court held thus that Google engages in a reproduction and in making available to the public the copy stored in its memory. Google contended that by failing to adopt the technical measures to opt-out, website publishers had granted an implied license to index the pages and to make them available through “Cached” links, but—again unlike in Field v. Google—the court didn’t accept this argument, stating that the authorization needs to be obtained with certainty and before engaging in the activity, which it didn’t consider to be the case.

By contrast, a German court decided a case similar to Kelly v. Arriba Soft and Perfect 10 v. Google, dealing with the thumbnail images stored by Google and displayed on its image search functionality, and found the use had been authorized by means of an implied license. In A Painter v. Google, 3 O 1108/05 Landgericht Erfurt, March 15, 2007, a German painter voluntarily made her artwork available on her own website and Google’s robot stored thumbnails of the images and show them to users through the image search function. The German court held that while the display of thumbnails was prima facie infringing, plaintiff could have easily avoided this display by placing a robots.txt file directing Google robot not to do so. Failing to take this step, website owner granted an implied license for Google to store and display the thumbnail images. See Ben Allgrove, The search engine’s dilemma: implied license or crawl and cache? 2 JOURNAL OF INTELLECTUAL PROPERTY LAW & PRACTICE 437 (2007) (UK).

Page 50: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

2009] SEARCH ENGINE CACHES UNDER THE DMCA 49

harbor does matter—not only procedurally but also substantively. Therefore, the fact that—as discussed above—a search engine’s cache falls outside the caching safe harbor raises the question of whether an amendment of the DMCA safe harbors might be advisable.

VI. AMENDING THE SAFE HARBOR REGIME?

The DMCA safe harbors didn’t foresee how technology would evolve in the future, and thus some current technological activities related to the transmission, storage and location of third party materials are not able to successfully invoke the safe harbor regime. It is not an easy task to discern whether some of these new functions—and particularly the “Cached” links feature—would have been included in the scope of the safe harbors if they had been already in place at the time the DMCA was negotiated and drafted.

Unlike proxy caching—covered by § 512(b)—the “Cached” links feature is neither a key function to enhance network efficiency, nor a crucial service such as the basic one provided by a search engine when presenting a list of search results and links to the relevant web pages—a function covered by the information location tools safe harbor in § 512(d).151 Rather it is just an additional service, which turns out to be useful for users—and thus also economically advantageous for the search engine.

On the other hand, search engine’s “Cached” links don’t appear to be significantly harmful for copyright owners when it comes to cached copies of their own web pages. The easiness of opting out, the implied license arguably granted by those who put materials online, and the likelihood of finding fair use seem to weigh in favor of the innocuousness of this practice. Certainly, in some instances a website owner may want to make freely available some material online only for a very short period of time, such as a digital newspaper that offers for free the current day’s news and requires a subscription to see older editions. Such a model might be harmed by the “Cached” links feature, since it would allow users to see for free an old page—the cached copy—which the website owner no longer provides free of charge or without registration on its own website. However, a publisher that runs such a business model is likely to be able to easily opt-out of the “Cached” links feature through any of the means

151 On the importance of search engines and other “categorizers” to overcome the problem of information overload see Frank Pasquale, Copyright in an Era of Information Overload: Toward the Privileging of Categorizers, 60 VAND. L. REV. 135 (2007) (conceptualizing information overload as an externality caused by copyrighted works and calling for a more favorable copyright treatment of “categorizers”).

Page 51: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

50 SEARCH ENGINE CACHES UNDER THE DMCA [2009

already described.152 Certainly, however, some less sophisticated publishers may be unable to use some of the means to opt out. For example, one who publishes a blog using the services of certain blog platforms will not be able to place a robot.txt file on the server root, nor to introduce meta-tags in the HTML code of the web page. Yet, in the arguably unlikely case that this blogger wanted to be left out of the search engine’s cache, she will normally be able to opt out by either contacting directly the search engine or by resorting to a removal procedure.153 More generally, content providers have sometimes fought against specialized search engines and content aggregators to avoid being crawled,154 but there has been very little litigation challenging the provision of “Cached” links of non-infringing web pages by general-purpose search engines.155

152 An example of this can be found in the New York Times website, which requires registration to see some old articles. Search engines—at least as of April 30, 2008—doesn’t provide “Cached” links for web pages from www.nytimes.com, because their HTML code—even the home page’s one—contain the no-archive meta-tag <meta name="robots" content="noarchive"/> which, as discussed supra in Part II.B, directs search engines not to display “Cached” links. Some parts of the New York Times website are banned to robots as well through a robots.txt file. See http://www.nytimes.com/robots.txt (last visited April 30, 2008). 153 See supra, note 35. But still, see Bolin, supra note 131, at 28 (“Unsophisticated authors lack the capacity to make a conscious decision to waive their rights, and the Googlebot has no way of knowing whether a user is unsophisticated or consciously omitted the exclusion notice.”) 154 See Grimmelmann, supra note 29, at 24-30 (discussing the content providers’ interests in avoiding the costs of unnecessary indexing burdens and in preventing unfair competition when their contents are delivered directly to users by search engines and content aggregators). See also Niva Elkin-Koren: Let The Crawlers Crawl: On Virtual Gatekeepers And The Right To Exclude Indexing, 26 U. DAYTON L. REV. 179 (2001) (discussing eBay, Inc. v. Bidder’s Edge, Inc., 100 F. Supp. 2d 1058 (N.D. Cal. 2000), where the auction site eBay tried to prevent the use of its database by data aggregators that allow users to conduct searches across different auction sites simultaneously). 155 Field v. Google Inc., 412 F.Supp.2d 1106 (D.Nev. 2006), is probably the only U.S. case so far where a copyright owner makes his material available on the Internet and then brings a copyright infringement claim on account of the cached copies of his web pages stored and made available by a search engine’s cache. In Parker v. Google, Inc., 422 F.Supp.2d 492, 498 (E.D.Pa. 2006) aff’d, 2007 WL 1989660 (3d Cir. 2007) the plaintiff was not bringing an action against Google on account of cached copies of his own web pages. The ruling made a reference to Field v. Google holding that Google’s “system caching activities” are covered by the DMCA caching safe harbor, but the Parker court was not considering the “Cached” links feature; rather it was considering the initial copying made for indexing purposes: “while it is not entirely clear from Plaintiff’s rambling Complaint, should Parker be claiming direct copyright infringement based on Google’s automatic caching of web pages as a means of indexing websites and producing results to search queries, this activity does not constitute direct infringement either.” Id. at 497-98. Parker, a pro se plaintiff had posted on the USENET one of his copyrighted works, which was stored and made available by Google as part of its USENET service. He asserted—among other claims—that Google’s archiving of this posting was a direct infringement of his copyright. He also claimed that Google engaged

Page 52: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

2009] SEARCH ENGINE CACHES UNDER THE DMCA 51

Furthermore, while there have been a number of cease-and-desist letters sent to search engines on account of “Cached” links, almost all of them refer to cached copies of third party infringing web pages.156

While cached copies of legitimate websites might not be particularly harmful for copyright owners that make their content available on line, a different situation arises when it comes to cached copies of third party infringing websites. Certainly, it appears that there has been very little litigation here, either.157 However, a number of cease-and-desist letters sent to Google referring somewhat to the Google cache have been reported on the Chilling Effects website.158 I have found 66 notices of this kind,159 which could be classified as follows. Firstly, some notices expressly ask Google to remove a list of infringing web pages both from the search results and from the cache. I have found 28 of these notices. Some of them identify as the infringing work the URL (Uniform Resource Locator) of the allegedly infringing web pages while some others include also the URL of the cached copy stored on Google’s server—or attach a print out of the cached copy. Interestingly, one of the latter specifically warns Google that the cached copy would constitute direct copyright infringement, while the link to the actual infringing page in the search results would be

in direct copyright infringement when, as a response to a user’s search query, displayed a list of search results and excerpted his website in that list. Citing Religious Tech. Ctr. v. Netcom On-Line Commc’n Servs., Inc. 907 F.Supp. 1361 (N.D.Cal.1995), the court concluded that “Google’s automatic archiving of USENET postings and excerpting of websites in its results to users’ search queries do not include the necessary volitional element to constitute direct copyright infringement.” See Parker 422 F.Supp.2d at 497. Parker did assert some claims against Google’s cache, but they were not claims of copyright infringement, but of defamation, invasion of privacy and negligence, stemming from some defamatory statements located on a website cached by Google. Those claims were rejected on the basis of § 230 of the Communications Decency Act (47 U.S.C. § 230). Id. at 500-01. 156 I have been able to identify only one cease-and-desist letter among those reported on the Chilling Effects website where the sender seems to ask Google to remove cached copies of some of its own web pages. See Chilling Effects Clearinghouse, CNET Complains of Win Pro in Google Cache (Aug. 28, 2003) http://www.chillingeffects.org/dmca512/notice. cgi?NoticeID=845 (last visited April 28, 2008). 157 If we exclude other cases dealing with the provision of thumbnail images—which as noted supra in note 19 is a function different from the “Cached” links feature—it seems that the only case so far discussing copyright infringement for links to cached copies taken from third party infringing web pages has been the Perfect 10 case, and even there the discussion concerned just the cached copies’ in-line links to images stored elsewhere. See Perfect 10, 508 F.3d at 1162. 158 See Chilling Effects Clearinghouse, http://www.chillingeffects.org/ 159 The total number of notices sent to Google and reported on the Chilling Effects website is 811, which include not only notices for links but also for material hosted on different Google services such as Blogspot or Google Groups. See Chilling Effects Clearinghouse, http://www.chillingeffects.org/dmca512/notice.cgi (last visited April 28, 2008).

Page 53: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

52 SEARCH ENGINE CACHES UNDER THE DMCA [2009

contributory infringement.160 Secondly, in other 21 occasions, the cease-and-desist letters don’t make the explicit petition of removing the cached copy, but they certainly do that implicitly, either ticking the URL of Google’s cached copies as the identified infringing material, or simply mentioning that the infringing web pages have been indexed and cached by Google. I include here nine notices sent to Google by Perfect 10 complaining that certain infringing websites have “thousands of Perfect 10 infringements available by clicking on the Google cache link”.161 There is a third group of seven notices which seem to ask only for the removal of the cached copies—and not for the removal of the links to the actual infringing web pages. Some of these notices seem to emphasize the fact that the cached copy is hosted by Google. This include a notice sent by Microsoft in 2003 asking Google to remove the cached copy of a website that was offering “cracks” or product keys to circumvent technological protection measures of Microsoft’s software.162 There are also four cease-and-desist letters stating that the infringing contents had already been removed from the third party web pages, but that they lingered on Google’s cached copies, and thus asked for the removal of those copies from Google’s cache. In the case of another five notices, the allegedly infringing web pages were also hosted by Google—in services such as Blogspot, Google Groups and others—and the cease-and-desist letters required the removal of both the infringing materials and also the cached copies available from Google’s cache. Finally, there is also one reported cease-and-desist letter not sent to Google but to a website publisher where the sender requires that the recipient remove all the infringing content from his website, “as well as from any site that mirrors or caches your site”.163

It must be noted that even though it is understandable that a copyright owner of material unlawfully posted on a third party website will ask the search engine to remove the “Cached” link, the § 512 notice-and-take-down regime doesn’t appear to provide a legal basis for those cease-and-desist letters. The subject matter of the notifications contemplated by § 512 is limited to the elements that are actually covered by the safe harbors in subsections 512(b), (c) and (d), and this doesn’t include a search engine’s “Cached” links. Put in other words, by means of a § 512 cease-and-desist letters one may ask to remove, or to 160 See Chilling Effects Clearinghouse, Notice to Google re “Vivian’s Vow” (March 15, 2002), http://www.chillingeffects.org/dmca512/notice.cgi?NoticeID=273 (last visited April 28, 2008). 161 See Chilling Effects Clearinghouse, http://www.chillingeffects.org/dmca512/notice.cgi (last visited April 28, 2008). 162 See Chilling Effects Clearinghouse, Microsoft Complains of Product Key in Google Cache (Mar. 12, 2003) http://www.chillingeffects.org/dmca512/notice.cgi?NoticeID=586 (last visited April 28, 2008). 163 See Chilling Effects Clearinghouse, Associated Press challenges posting of articles (Oct. 13, 2003), http://www.chillingeffects.org/fairuse/notice.cgi?NoticeID=918 (last visited April 28, 2008).

Page 54: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

2009] SEARCH ENGINE CACHES UNDER THE DMCA 53

disable access to, materials—including a reference or link—that reside on the service provider’s system or network and are the object of the limitation of liability provided by the safe harbor. The removal of these materials following a notification precisely constitutes a condition for the safe harbor to apply.

Yet this is not the case of the “Cached” links because they are not the object of a liability limitation. We have already seen that they are not covered by § 512(b). They are not covered by the safe harbors on section 512(c) and (d), either. Briefly stated, a search engine’s cache doesn’t fall under the safe harbor provision for hosting, set forth in § 512(c), because the storage of cached copies is not—as required per § 512(c)(1)—a “storage at the direction of a user”.164 And search engine caches fall outside the information location tools safe harbor provided by § 512(d) as well because this provision only applies to “the provider referring or linking users to an online location containing infringing material or infringing activity, by using information location tools, including a directory, index, reference, pointer, or hypertext link,”165 and it seems clear that this must be interpreted as linking or referring to a third party location, and not to material hosted by the search engine itself—such in the case of a “Cached” link.

A closer look to these issues leads us to a more precise question: the difference between a case in which a service provider stores infringing material but doesn’t allow access to it; and the case where a service provider stores infringing material and allows access to it. The DMCA hosting safe harbor—§ 512(c)—grants a liability limitation to the service provider even if it continues to store the infringing material after receiving a notification of claimed infringement, as long as it disables access to the stored material.166 Arguably, thus, the mere storage of infringing material, provided that access to it has been disabled, doesn’t prevent the application of the hosting safe harbor. The storage of a cached copy of an infringing web page by a search engine, however, even if the search engine doesn’t allow access to that copy—i.e., doesn’t provide a “Cached” link—will not be covered by the hosting safe harbor, because as already pointed out, it would not be a “storage at the direction of a user”.167 I submit, however, that this mere storage of a cached copy, without providing access to it, is actually already covered by the information location tools safe harbor. The argument for this is a technical one: in the current state of technology search engines need to make cached copies of all the web pages they access in order to make an index to perform search queries and provide search

164 See 17 U.S.C. § 512(c)(1) (2000). 165 See 17 U.S.C. § 512(d) (2000). 166 See 17 U.S.C. § 512(c)(1)(C) (2000). 167 See 17 U.S.C. § 512(c)(1) (2000).

Page 55: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

54 SEARCH ENGINE CACHES UNDER THE DMCA [2009

results.168 Arguably, to be consistent with the liability limitation granted to the provision of search results, the information location tools safe harbor needs to cover also the previous technical steps the search engine needs to take in order to provide the protected service.169 What still will remain unprotected by any of the current DMCA safe harbors is the provision of the “Cached” link, which indeed appears to be the weakest link.

As previously noted, since the “Cached” links are not covered by any safe harbor, in strict terms they cannot be the object of a cease-and-desist letter based on the § 512 notice-and-take-down procedure. However, the provision of “Cached” links—at least as of today’s practice—only takes place along with the provision of the search results—i.e., the “Cached” link appears along with the main link to the relevant page and the little snippet from it. As a consequence, if the search engine removes a particular web page from its search results, this will cause also the removal of the “Cached” link, for it doesn’t stand alone. As a practical matter, thus, even though when it comes to cached copies of third party infringing sites a copyright owner is not able to opt-out of the “Cached” links feature—because she doesn’t have control over the third party web page—she can still attain this result with a § 512(d) cease-and-desist letter asking for the removal of the main link to that third party web page from the search results. Probably, the only scenario in which a copyright owner would not be able to attain the said result would be the case in which the infringing content has already been removed from the third party web page—and thus there’s no room for a § 512(d) cease-and-desist letter—but the cached copy still shows the unauthorized material.

In any event, the possibility for a copyright owner to achieve in practice the effective removal of the “Cached” links doesn’t solve the problem of the search engine’s liability for those links before they are removed, and thus it poses again the question of whether the DMCA safe harbor regime should be amended to cover the “Cached” links feature.

In the preceding analysis we have seen how a “Cached” link to a copy of an infringing web page may certainly harm the copyright owner. Arguably, however, this harm doesn’t go significantly beyond the harm that may be caused by the main link to the actual infringing page provided on the search results—an activity the DMCA considered to deserve a limitation of liability, set forth in §

168 See Grimmelmann, supra note 29, at 27 (“A search engine’s spidering processes require making at least one initial copy of any content the engine wishes to index”). 169 But cf. Jonathan Band, Google and Fair Use, 3 J. BUS. & TECH. L. 1, 4 (2008) (“[The] list of activities [in § 512(d)] does not explicitly include copying expression into a search database.”).

Page 56: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

2009] SEARCH ENGINE CACHES UNDER THE DMCA 55

512(d).170 This suggests that an amendment of the DMCA safe harbor regime to cover the “Cached” links feature might be appropriate. However, this paper doesn’t intend to make or endorse a normative proposal on this point. The definitive assessment on how “Cached” links could affect the interests of the copyright owners and those of the users and search engine operators corresponds largely to those groups and ultimately to Congress.

Yet, if a limitation of liability for the “Cached” links feature were finally deemed to be desirable, my suggestion is that this could be better accomplished through a reform of the information location tools safe harbor in § 512(d) than by expanding the scope of the current caching safe harbor in § 512(b). In order to grant a liability limitation to search engine caches, some commentators have indeed proposed to amend the language of § 512(b) to make sure that it also covers search engine caches.171 However, as discussed above, the rationale behind the § 512(b)—granting protection for a technical function deemed to be necessary for the adequate performance of the net—doesn’t seem appropriate to cover a search engine’s cache. Moreover, some of the specific conditions set forth in § 512(b)—such as the updating requirement,172 or the taking-down requirement, which depends upon the fact that the infringing material has already been removed, or ordered to be so, from the originating site—173 would also be inadequate to deal with a search engine’s cache.

Furthermore, the “Cached” links feature is closely related to the core of the search engine operation. The copy made by the crawler is actually a necessary step to establish the index, and the removal of the main links, as already indicated, will bring about the removal of the “Cached” links as well. This suggests that the safe harbor meant to cover the operation of search

170 It is interesting to note that, in addition to the notorious disclaimer indicating that the “Cached” page is just an old copy of the original web page—and in addition to the links to the actual page included also in the disclaimer—the “Cached” copy is not a mirror of the original website, but just a snapshoot of a web page. Thus, if the user clicks on any clickable part of the cached copy, she will not be lead to another “cached” material hosted by the search engine, but to the place the hyperlink established by the web page owner was pointing to—normally to the original website or to some other third party location. 171 See Urban & Quilter, supra note 37 at 691 (“In light of Blake A. Field v. Google, we recommend clarifying the plain language of § 512(b) to make it clear that any automatic caching of content subordinate to indexing processes or network management would be covered by a straightforward safe harbor.”). See also David Ray, Note, The Copyright Implications of Web Archiving and Caching, 14 SYRACUSE SCI. & TECH. L. REP. 1, 32 (2006), http://sstlr.syr.edu/?p=113 (“[T]he caching provisions under section 512 should be expanded to include caching as practiced by Google”). 172 See 17 U.S.C. § 512(b)(2)(B) (2000). 173 See 17 U.S.C. § 512(b)(2)(E) (2000).

Page 57: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

56 SEARCH ENGINE CACHES UNDER THE DMCA [2009

engines, i.e, § 512(d), might be the most appropriate to be expanded in order to cover the “Cached” links functionality.174

VII. CONCLUSION

Although in Field v. Google the court held that the operation of Google’s cache falls under the DMCA caching safe harbor, a close reading of the statutory text and its legislative history proves that this is not so. A search engine’s cache falls outside the scope of this provision due to the fact that the activity carried out is not the one contemplated by the safe harbor and because, as a consequence of this, a search engine’s cache is unable to meet some key requirements set forth by the statute.

The unavailability of the liability limitation provided by the safe harbor turns out to be relevant for a search engine not only procedurally but also substantively. Its “Cached” links feature is likely to be deemed to be a prima facie copyright infringement. The defenses of implied license and estoppel doesn’t seem to be appropriate when the cached content has been taken from infringing websites. Courts are certainly likely to find that a search engine’s cache constitutes a fair use. However, this outcome is always uncertain. Moreover, in some circumstances, particularly in cases where the copyright owner is exploiting commercially the works of authorship and the search engine has cached them from infringing sites, the finding of a fair use seems even more unpredictable. Furthermore, even if the cached copies stored on search engine servers are deemed to be fair use, and thus non infringing, the fact that these copies in-line link to infringing images stored in the originating server may raise concerns of secondary liability, as the Ninth Circuit opinion in Perfect 10 v. Amazon clearly suggests.

It is not easy to envision a reform of the DMCA safe harbor regime in the near future—particularly if we bear in mind how the statute was negotiated, resulting in a complex compromise between the stakeholders involved. Nonetheless, if a reform were to be made, granting safe harbor protection for search engine caches seems advisable. Since the “Cached” links feature is directly linked to the operation of search engines, this protection could be better

174 The new wording will need to deal with the specificities of the operation of the cache, and thus, for example, a requirement to abide by the directions specified according to generally accepted practices, such as the robots.txt standard or HTML meta-tags, might be appropriate, and so might be a requirement of showing a prominent disclaimer indicating the nature of the copy, as search engines already do.

Page 58: when the cached link is the weakest link: search engine caches under the digital millennium

Forthcoming (2009) JOURNAL OF THE COPYRIGHT SOCIETY OF THE U.S.A.

2009] SEARCH ENGINE CACHES UNDER THE DMCA 57

achieved through an amendment of the current information location tools safe harbor than by expanding the scope of the current system caching safe harbor.