A Google Employee on the "duplicate content penalty"

13 replies
  • SEO
  • |
I didn't believe in it, to be honest, but I'm starting to think there may be something to it now, with the Panda update.

Check out this thread on the Google Webmasters' Forum, where a Googler specifically states that duplicate content is penalized: Google Blocked My MISTAKE - Webmaster Central Help
#duplicate content penalty #employee #google
  • Profile picture of the author drew3806
    I have run into this as well, and use <h3> tags on my sites. I don't understand how use of the <h3> tag could get you deindexed.
    {{ DiscussionBoard.errors[3591837].message }}
  • Profile picture of the author Peetr
    Hi,

    It’s refreshing to hear these things from the horse’s mouth. It’s difficult to disambiguate rumour from fact most of the time.

    Thanks!

    Peetr
    {{ DiscussionBoard.errors[3591926].message }}
  • Profile picture of the author dburk
    Originally Posted by Capitalist_Pig View Post

    I didn't believe in it, to be honest, but I'm starting to think there may be something to it now, with the Panda update.

    Check out this thread on the Google Webmasters' Forum, where a Googler specifically states that duplicate content is penalized: Google Blocked My MISTAKE - Webmaster Central Help
    Hi Capitalist_Pig,

    I think you may have misunderstood the cause of his de-indexing, it was for copyright violation, not duplicate content per say. Though related, they are in fact two separate and distinct issues.

    The Panda update was a tweak in the ranking algorithm that effects where a page might rank in the SERP. The problem this chap was facing was a de-indexing, likely due to copyright violation, nothing to due with ranking or the Panda update.
    {{ DiscussionBoard.errors[3592141].message }}
    • Profile picture of the author Mike Anthony
      Originally Posted by dburk View Post

      Hi Capitalist_Pig,

      I think you may have misunderstood the cause of his de-indexing, it was for copyright violation, not duplicate content per say.
      Dburk go down further.

      "One thing that is very important to our algorithms is unique and compelling content. In general, it makes little sense for us to crawl and index content that is copied, rewritten, automatically translated or otherwise automatically processed -- we assume that our users would much rather want to find the original source. With that in mind, I would strongly recommend removing all such content -- should you have any -- and making sure that all of your content is of the highest quality possible, and really unique and compelling, in short: something that users will want to refer friends to. "

      and then

      "Looking through a sample of your posts, I don't see a problem with other sites copying your content, but I do see posts on your site that are just rewritten from other sources. While I understand that you can't make up news, it's really important to us that your site provides unique and compelling content of its own, even if you are referring to other sources. One way to do this could be to properly quote and link to the original source, while making sure that your commentary makes up the majority of the content on those pages. Rewriting other sources without attribution or unique additional value is not something that our algorithms appreciate. "


      So its a fairly wide directive and it affects more than just blatant copyright that directly affects those who unfortunately are using spinning scraping plugins etc.
      Signature

      {{ DiscussionBoard.errors[3592277].message }}
    • Profile picture of the author donhx
      Originally Posted by dburk View Post

      Hi Capitalist_Pig,

      I think you may have misunderstood the cause of his de-indexing, it was for copyright violation, not duplicate content per say. Though related, they are in fact two separate and distinct issues.

      The Panda update was a tweak in the ranking algorithm that effects where a page might rank in the SERP. The problem this chap was facing was a de-indexing, likely due to copyright violation, nothing to due with ranking or the Panda update.

      No, Google is not the copyright police... they don't care about that aspect. They care about the quality of material, how it is being diluted by article spinning and minor changes.

      Notice that copyright is not mentioned at all.

      However, this should be the death of article spinning.


      From JohnMu Google Employee:
      3/24/11


      One thing that is very important to our algorithms is unique and compelling content. In general, it makes little sense for us to crawl and index content that is copied, rewritten, automatically translated or otherwise automatically processed -- we assume that our users would much rather want to find the original source. With that in mind, I would strongly recommend removing all such content -- should you have any -- and making sure that all of your content is of the highest quality possible, and really unique and compelling, in short: something that users will want to refer friends to.


      Should you find and remove content like that, I would recommend submitting a reconsideration request after cleaning everything up, detailing the changes that you have made.
      Signature
      Quality content to beat the competition. Personalized Author Services
      {{ DiscussionBoard.errors[3592338].message }}
      • Profile picture of the author Mike Anthony
        Originally Posted by donhx View Post

        However, this should be the death of article spinning.

        I don't know if Google is yet at the place where it can run synonyms for every page it crawls. Some things are possibe but technologically impractical. Of course what can happen with google's new browser plugin is that if people read content that reads like its spun - and frankly even the so called good spinning just doesn't read right to a human being - Google will know the users didn't find value in the content.
        Signature

        {{ DiscussionBoard.errors[3592374].message }}
  • Profile picture of the author mattlaclear
    There is so much misinformation flowing from Google you would do very well to ignore it. I bet you'll grab more page one rankings if you did.
    Signature

    Free Training for SEO Providers in the United States - https://happyseoclients.com/happy-seo-clients-training/

    {{ DiscussionBoard.errors[3592285].message }}
    • Profile picture of the author Mike Anthony
      Originally Posted by mattlaclear View Post

      There is so much misinformation flowing from Google you would do very well to ignore it. I bet you'll grab more page one rankings if you did.
      Well yes and no. The OP needs to make a distinction between the algo and a manual penalty. Looks like the guy in the link got manually audited and deindexed so its not misinformation from google because they will do that especially if the site is reported but it really doesn't apply completely to the algo.
      Signature

      {{ DiscussionBoard.errors[3592355].message }}
  • Profile picture of the author pdrs
    Wow, Google wants to give it's user unique relevant content, what a surprise!

    Cmon guys, this is getting really old. Think about it this way: If you're scraping/posting dupe content then you obviously don't care all that much about what you're putting out there for your readers, so why worry about what Google thinks? Just keep doing it, if you lose a few sites, just build some more, don't try and legitimize the whole dupe content thing to yourself. It's spam and you know it.
    Signature
    RemoteControlHelicopterReviews.(com/net) - Up for sale! No reasonable offer refused. Great branding for a super hot niche!
    {{ DiscussionBoard.errors[3592352].message }}
    • Profile picture of the author dburk
      Originally Posted by Mike Anthony View Post

      Dburk go down further.

      "One thing that is very important to our algorithms is unique and compelling content. In general, it makes little sense for us to crawl and index content that is copied, rewritten, automatically translated or otherwise automatically processed -- we assume that our users would much rather want to find the original source. With that in mind, I would strongly recommend removing all such content -- should you have any -- and making sure that all of your content is of the highest quality possible, and really unique and compelling, in short: something that users will want to refer friends to. "

      and then

      "Looking through a sample of your posts, I don't see a problem with other sites copying your content, but I do see posts on your site that are just rewritten from other sources. While I understand that you can't make up news, it's really important to us that your site provides unique and compelling content of its own, even if you are referring to other sources. One way to do this could be to properly quote and link to the original source, while making sure that your commentary makes up the majority of the content on those pages. Rewriting other sources without attribution or unique additional value is not something that our algorithms appreciate. "


      So its a fairly wide directive and it affects more than just blatant copyright that directly affects those who unfortunately are using spinning scraping plugins etc.
      While I do not disagree with you sentiment, If you look at the very last sentence you quoted, it underscores the point I was making.

      Google themselves are massive scrapers, Their Web Search and News products consist of entirely scraped content. As long as it is useful, adds value and provides proper attribution it can rank very well within Google's index.

      Originally Posted by donhx View Post

      No, Google is not the copyright police... they don't care about that aspect. They care about the quality of material, how it is being diluted by article spinning and minor changes.
      I also agree with your sentiment, Google is not the "Internet police", yet as an upstanding "Internet Citizen" they must comply with the law that directly applies to them.

      Originally Posted by donhx View Post

      Notice that copyright is not mentioned at all.
      It would be very much out of character for Google to accuse a user of their help forum of copyright violation, so It should be no surprise that they didn't go on record exposing themselves to litigation.

      Google generally leaves it to the copyright holder to make such claims and then they dutifully follow the DMCA requirements and remove the content from their index.

      In accordance with the Digital Millennium Copyright Act (DMCA), we only accept copyright complaints from content owners or someone officially authorized to act on their behalf. If you have legal questions about the DMCA (the text of which can be found at the U.S. Copyright Office Web Site, U.S. Copyright Office), please consult your own legal counsel.
      Source: Google Help


      Again, the point I was trying to make is that this poor fellow wasn't having a ranking issue, his entire website had been de-indexed which is a good indicator that the website had serious violations of Google's Webmaster Guidelines.
      {{ DiscussionBoard.errors[3592576].message }}
  • Profile picture of the author iAmNameLess
    The guy wasn't deindexed!!!!!!!!!!!

    This stuff is so annoying. Whatever though, my autoblogs are doing great. Sites where I use h3 are still rankings... sites where I have duplicate content, are still ranking.

    Mattlaclear said it best, and I rarely even check out posts there... too much paranoia.
    {{ DiscussionBoard.errors[3592405].message }}
    • Profile picture of the author dburk
      Originally Posted by iAmNameLess View Post

      The guy wasn't deindexed!!!!!!!!!!!

      This stuff is so annoying. Whatever though, my autoblogs are doing great. Sites where I use h3 are still rankings... sites where I have duplicate content, are still ranking.

      Mattlaclear said it best, and I rarely even check out posts there... too much paranoia.
      Hi iAmNameLess,

      Actually his site was and still is de-indexed.

      Your search - site:newsatweb.com - did not match any documents.
      site:newsatweb.com - Google Search

      I do agree that this has nothing to do with duplicate content nor the Panda Update.
      {{ DiscussionBoard.errors[3592716].message }}
  • Profile picture of the author simonbuzz
    Banned
    I dont think h3 Tag is the issue...there maybe something else...that get you banned...recently One of my top site was banned...I dont even know why...the site was generating 3$ to 4$ thousand per month alone...and it was 3 years old....but now its gone...
    {{ DiscussionBoard.errors[3592496].message }}

Trending Topics