Google Copyright Transparency Report
Google timed a nice Friday evening release to update of their policy toward copyright infringement.
Starting next week, we will begin taking into account a new signal in our rankings: the number of valid copyright removal notices we receive for any given site. Sites with high numbers of removal notices may appear lower in our results.
Wow. Sounds like trouble. Surely that means that YouTube’s rankings are about to get torched.
Oh, nope. One quick exemption for the video king:
This data presents information specified in requests we received from copyright owners through our web form to remove search results that link to allegedly infringing content. It is a partial historical record that includes more than 95% of the volume of copyright removal requests that we have received for Search since July 2011. It does not include:
- requests submitted by means other than our web form, such as fax or written letter
- requests for products other than Google Search (e.g, requests directed at YouTube or Blogger)
- requests sent to Google Search for content appearing in other Google products (e.g., requests for Search, but specifying YouTube or Blogger URLs).
Only copyright holders know if something is authorized, and only courts can decide if a copyright has been infringed; Google cannot determine whether a particular webpage does or does not violate copyright law. So while this new signal will influence the ranking of some search results, we won’t be removing any pages from search results unless we receive a valid copyright removal notice from the rights owner. And we’ll continue to provide “counter-notice” tools so that those who believe their content has been wrongly removed can get it reinstated. We’ll also continue to be transparent about copyright removals.
YouTube vs Sites Cleaner Than YouTube
And yet, a UK student faces up to 10 years in jail in the US for founding a crowdsourced site which links to sites that allow you to watch TV online.
Kim DotCom suffered a militant raid on his house & had his assets frozen for running MegaUpload, which was a tiny spec of dirt compared to the size of YouTube.
On the copyright front YouTube was rotten from the start:
- “In a July 19, 2005 e-mail to YouTube co-founders Chad Hurley and Jawed Karim, YouTube co-founder Steve Chen wrote: ‘jawed, please stop putting stolen videos on the site. We’re going to have a tough time defending the fact that we’re not liable for the copyrighted material on the site because we didn’t put it up when one of the co-founders is blatantly stealing content from other sites and trying to get everyone to see it.’”
- “Chen twice wrote that 80 percent of user traffic depended on pirated videos. He opposed removing infringing videos on the ground that ‘if you remove the potential copyright infringements… site traffic and virality will drop to maybe 20 percent of what it is.’ Karim proposed they ‘just remove the obviously copyright infringing stuff.’ But Chen again insisted that even if they removed only such obviously infringing clips, site traffic would drop at least 80 percent. (‘if [we] remove all that content[,] we go from 100,000 views a day down to about 20,000 views or maybe even lower’).”
- “In response to YouTube co-founder Chad Hurley’s August 9, 2005 e-mail, YouTube co-founder Steve Chen stated: ‘but we should just keep that stuff on the site. I really don’t see what will happen. what? someone from cnn sees it? he happens to be someone with power? he happens to want to take it down right away. he get in touch with cnn legal. 2 weeks later, we get a cease & desist letter. we take the video down.’”
- “A true smoking gun is a memorandum personally distributed by founder Karim to YouTube’s entire board of directors at a March 22, 2006 board meeting. Its words are pointed, powerful, and unambiguous. Karim told the YouTube board point-blank:
‘As of today episodes and clips of the following well-known shows can still be found: Family Guy, South Park, MTV Cribs, Daily Show, Reno 911, Dave Chapelle. This content is an easy target for critics who claim that copyrighted content is entirely responsible for YouTube’s popularity. Although YouTube is not legally required to monitor content (as we have explained in the press) and complies with DMCA takedown requests, we would benefit from preemptively removing content that is blatantly illegal and likely to attract criticism.’”
- “A month later, [YouTube manager Maryrose] Dunton told another senior YouTube employee in an instant message that ‘the truth of the matter is probably 75-80 percent of our views come from copyrighted material.’ She agreed with the other employee that YouTube has some ‘good original content’ but ‘it’s just such a small percentage.’”
- “In a September 1, 2005 email to YouTube co-founder Steve Chen and all YouTube employees, YouTube co-founder Jawed Karim stated, ‘well, we SHOULD take down any: 1) movies 2) TV shows. we should KEEP: 1) news clips 2) comedy clips (Conan, Leno, etc) 3) music videos. In the future, I’d also reject these last three but not yet.’”
Broader Copyright Questions
There still are a lot of murky questions in Google’s “transparency.”
- If a person embeds an image from Imgur, ImageShack, TinyPic, PhotoBucket or elsewhere & the page that has a hotlink gets a DMCA how does that count?
- If a brand is large enough does it take many DMCAs to get hit?
- Is there any analysis of the underlying business model of the site? What happens to document storage sites like DocStoc & Scribd, or even image sites like Pinterest?
- What happens to sites that link at penalized sites too frequently?
- What happens to ad networks that frequently fund such copyright violations?
HUGE Impact on the Web
In terms of impact on the web for publishers, this change is every bit as big as Florida, Panda & Penguin. It may not seem so at first (as it will take time for market participants to consider the uses) but this is a huge deal. Consider some of the following scenarios…
- You try to create something like YouTube for another form of content (Pinterest?) and it gets hit as spam for following Google’s lead.
- You offer a free blogging platform that competes with Blogspot, but it gets hit as spam for following Google’s lead.
- You decide to create a project like Google’s book scanning project & you get hit as spam for following Google’s lead.
- You run an ad network & start growing quickly. As you grow some sketchier publishers enter your ad network. Like Google AdSense, a large portion of your ad network is filled with sites that have copyright violations on them. Suddenly working with your ad network gets people hit as spam because your business model is too similar to Google’s.
- You create a new social network & are struggling to compete with Google’s preferential ranking & hard coded placements of their own network. You make your network more open to encourage growth & you get hit as spam.
- If You are Amazon or eBay you can afford premium featured content to pull up your other listings. But if you can’t afford their cost structure & hire freelance writers or work with outsourced workers to create some of your content & they use some copyright work without you knowing. But does Amazon now have to vigilantly review their reviews for plagiarism?
- A competitor licenses some of their content as Creative Commons for years & doesn’t mind wide use of it. Then you use it & one day they see you as a competitive threat and remove their Creative Commons license & bulk DMCA you. Or you have a lifetime syndication deal with a company, they later change the policy & claim that your documents are forged.
- Getty images presumes you didn’t license an image that you did & files a DMCA. At some point there is no purpose in targeting the webmaster or host…just go direct to Google knowing that you can create the equivalent of a “patent trolling” styled business model where you create a business model where it is cheaper for people to pay to have the issue resolved the quick way before they lodge a formal complaint. Some organizations might even have a subscription service set up where you pre-pay for immunity.
- A former employee who wrote content for you claims you used it without permission. Or that same former employee used pirated images & longish quotes from other sources that they didn’t disclose to you that they now highlight via DMCA.
- You license data from a source & they do a mid-contract change leveraging the small print & have a bot lined up to send 40,000 DMCAs against you if you do not agree to the higher pricepoint.
- Google is considering making an investment in your site & you want too much money. As an edge case near the threshold of this copyright limit you know you have immunity if you join the borg, but lack it if you don’t work with them.
- Big media players that play in the gray area will be fine, but smaller sites that try a similar model will be sunk by DMCAs and/or legal fees.
- Your leading competitor realizes that your blog publishes comments by default with editorial review (and that even later has lax review) and then they file DMCA reports against you. Or they could just grab chunks of content from Google’s leaderboard of complainers and post them into your web forum, knowing that those companies will file a DMCA report against you.
- A site has some content public & some behind a paywall. With a page partially indexed, how does Google respond to DMCA requests when the alleged infraction is behind a registration wall or paywall?
- A competitor (inspired by Google no doubt) hires off shore “contractors” to copy your site & then file DMCA reports against you in bulk. How long until people start uploading their own content to file their own DMCAs against certain sites with user generated content?
- Even if your site is 100% legal, a combination of ignorance & crowd-driven vigilante justice can still take you down.
- Any site that offers interactive features & has user generated content is at risk of being labeled as spam unless they have tight editorial control over user generated content. And at the same time, Google can enter vertical after vertical with scrape & displace garbage knowing that they don’t have those editorial costs due to their self-granted blanket immunity.
- If you do not register your sites with Google & counter claims (even bogus ones) then you are seen as being a spammer. And if you register with Google then when they don’t like something one site does they can hit other sites all at the same time. No point going to the host or registrar, go direct to Google & start building up negative karma.
Why did Google feel the need to grant themselves blanket immunity from the policy?
That question was largely missing among the fanboi blogs & journalists who were encouraged by Google’s “transparency.”
24 Karat Pyrite On Sale for Only $100 an Ounce
If YouTube is going to win big, then that’s a great place to invest, right?
Some venture capitalists are investing in YouTube channels, but that is a fool’s game.
- Google is also investing in select channels (like Machinima). It is quite hard to outperform Google in returns while investing into a platform that they control & thus have better data on than you ever could.
- As YouTube’s dominance increases (and it will now that competing platforms with a similar business model will be smeared as spam), you can count on them offering premium partners crappier revenue share deals in years to come. They will offer nice deals to Warner Bros. & such, but the independent smaller players will get cut out of the ecosystem in much the same way as they did in Google’s organic search results.
- Google, prince of transparency (for everyone but Google), requires that premium publishers *not* disclose the terms of their deals: “The Partner Program forbids participants to reveal specifics about their ad-share revenue. Rates can vary depending on the size and demographics of the partner’s audience and an array of other metrics.”
Note that I don’t claim YouTube is a bad host for your own content, but that I am skeptical in applying the VC model to it with a belief that you can out-invest Google on their own site; particularly when they own the dominant platform, control the non-public revenue share rates, invest in competing channels & can offer free promotion + higher rates to anyone they invest into in order to dominate the category.
And the issue isn’t just video either. The same dynamic can apply to just about any other infrastructural layer. For instance, Google could buy out a torrent site (say like uTorrent) and have that site gain immediately immunity for being part of the borg, while other sites that compete now absorb both greater editorial filtering costs & greater risks that destroy their ROI.
As Google continues to lock down search, you can expect more smart publishers to hedge investments in search and YouTube with investments in proprietary non-search applications that Google can’t take away.
The Devil is in the Details
“We are optimistic that Google’s actions will help steer consumers to the myriad legitimate ways for them to access movies and TV shows online, and away from the rogue cyberlockers, peer-to-peer sites, and other outlaw enterprises that steal the hard work of creators across the globe. We will be watching this development closely — the devil is always in the details — and look forward to Google taking further steps to ensure that its services favor legitimate businesses and creators, not thieves.” – Michael O’Leary, Senior Executive Vice President for Global Policy and External Affairs of the Motion Picture Association of America, Inc.
The concerned with Google pitching themselves as the preeminent authority on copyright is they have consistently played both sides of the fence.
- Their image search offers thumbnails
- they offer a “quick view” version of PDFs in the SERPs
- their search results offer cached versions of pages
- they have long opposed piracy-related legislation & created a viral campaign against it
- they run a social network (which often outranks the original content source in the search result)
- they run a blog host (which has a long history of being one of the spammiest blog hosts with sketchy content)
- they run YouTube, the video site with the most piracy on it
- they run the ad network which funded the most piracy-driven content (at one point their ads were even on the biggest warez sites)
- they monetized scraper sites like Mahalo for years (still do!)
- they willingly ran AdWords ads for over 50 thousand advertisers selling counterfeit goods
- for years they included piracy-related searches in Google Instant (along with the above mentioned ads) to try to force brands to need to buy their own brand on broad match to try to mask over the problem
- they scanned millions of ebooks without permission (a class-action lawsuit over this is ongoing)
- when they penalize sites with algorithms like Panda or Penguin, they often allow (AdSense funded) scraper sites to outrank the original content source
- they scraped content from Yelp & TripAdvisor to populate their places page while giving those businesses the ultimatum that if they didn’t like it, they should block Googlebot (and it took government scrutiny to curtail the practice)
- they continue to scrape in more information from publishers to power the expansion of their knowledge graph
When Google was competing against YouTube, this was how they viewed copyright internally.
Business Objectives Drive “Relevancy” Signals
Now that Google wants to sell premium content they (sort of) respect copyright (& are willing to hold the rest of the web to a higher standard than themselves to create this impression).
I have long believed that relevancy signals were often politically driven & that internal business development goals often lead or create various signals. Certainly that was obvious when Google+ was hardcoded in the search results. It was equally true when Knol outranked the original content sources. Google frequently pretends to be (belligerently) unaware of externalities, but when the issues impact their own business they gain an elevated sense of importance.
- We saw that when Google mentioned that they created QDF because Google Finance didn’t rank: Mr. Singhal started to worry that Google’s balance was off. When the company introduced its new stock quotation service, a search for “Google Finance” couldn’t find it.
- We saw it again when universal search was rolled out only *after* Google bought YouTube.
- We saw once more when their hard stance on (against?) property rights for a decade quickly flipped like a switch after Android Marketplace & Google Play stunk up the joint from not making enough sales to be significant. Google needs a complete media catalog to become a default purchase hub & they can’t get that until they display that they respect property rights.
And these business objectives not only influence the relevancy algorithms, but also the editorial guidelines.
- Renting a link to try to rank higher for a relevant keyword is sinful, but offering up undisclosed payola to influencers who manipulate the entire media ecosystem is totally reasonable.
- Info harvesters are not allowed to buy AdWords ads & some forms of lead generation are marketed as being spam, but lead generation forms in the search results & overriding user privacy settings in their browser are totally cool.
And even while Google is rolling out this “copyright violators are spammers” algorithm (which they are exempt from) they still chug on with their ebook offering:
They posted several of my 41 books up as free downloads (some were missing a few pages at most a single chapter) It took several e-mails from me pointing out that they were infringing copyright before they took them down. During the time my books were free on Google my sales of e-books fell dramatically. ” – K C Watkins
When Google started scanning books an internal document stated: “[we want web searchers interested in book content to come to Google not Amazon” ... or, as put another way, in that same document, “[e]verything else is secondary … but make money.”