“Avoid scraped content”
– from the Google webmaster guidelines 1
What is scraped content?
Scraping content is taking content from other places on the web and publishing it on your own site.
There are many websites that only contain pieces of other websites or stolen content. Many sites will take articles from other websites and publish them as if they were their own, or will copy entire websites.
This behavior is against the Google guidelines and against copyright laws in the United States and other countries.
Most people who are scraping content knows they are doing it. If you do not think you are scraping content, you probably are not.
Use caution when you display information from other websites.
Sometimes webmasters have things like “latest news” feeds or twitter feeds, etc in their sidebar.
Is that bad?
In most cases it is fine to display such things in your sidebar, but sometime people start displaying too much information from too many sources and they are in
danger of breaking this guideline or the Auto generated content guideline
How much is too much?
Google doesn’t state anywhere the exact answer to this question, but I would say it makes good sense that content from other sources should not exceed 10 percent
of your webpage. An example would be a typical blog that has a news feed in it’s side bar…
If you write a very short blog post, the information in your news feeds could rather easily have more content than your blog does…
It is worth considering how a search engine would see that page…
It would see that most of the page was unoriginal content, or repeated content like your logo and footer and such. This means that overall the webpage may not be considered a great resource for the subject of the post.
Examples of scraping
Google has provided 2 examples of what they consider scraping…
- Sites that copy and republish content from other sites without adding any original content or value
- Sites that copy content from other sites, modify it slightly (for example, by substituting synonyms or using automated techniques), and republish it
- Sites that reproduce content feeds from other sites without providing some type of unique organization or benefit to the user
- Sites dedicated to embedding content such as video, images, or other media from other sites without substantial added value to the user
Using content from other websites will hurt your Google ranking, rather than help it.