Copyscape is Not Enough
Copyscape has been a great tool for blog owner with hired writers. Blog owners can check the written content against other writings that are available on the web, quickly, and efficiently. However, the black hat society has found a way around the Copyscape check.
Just today, we purchased 15 blog articles, over 400 words each, that are related to “Real Estates”. The sales copy for these articles claimed the following:
- Written by native English Writers
We asked for a sample before paying. After checking the sample with Google search we were satisfied and paid for the articles. The seller sent us the fifteen Real Estate articles and we were happy at feeding them to our automated text processor. But we quickly lost our happiness, because our automated text processor couldn't process these articles.
After much frustration, we learned that these article has been encoded in Unicode UTF-8 format. But no UTF-8 to ASCII conversion would work. Strange indeed.
We took a sentence from the article and manually typed it into Google search (in between double-quotes). A whole list of web pages popped up with the exact English sentence. The articles were copied verbatim from the web. The seller had cloaked it from Copyscape and Google search detection by encoding characters into UTF-8 throughout the article.
These articles are not only illegally infringing on the original owner's copyrights, they produce no SEO results. That is because once these words are encoded into UTF-8 in this fashion, it loses all its English meaning. Sure, human can still read it with bare vision. But computers and search engine bots won't be able to make sense of it; meaning zero keywords for SEO.
We are currently demanding a refund on this illegal sale. But in the mean time, there needs an addition layer of protection on top of Copyscape and Google search. Based on this experience, we have developed a Text Encoding Detector. We will ask article providers for sample to run against this Text Encoding Detector. If the Text Encoding Detector finds Unicode characters, it will report back with a warning. If we see this warning, we manually type in sentences on Google search.
Using this technique, we can be sure almost 100% of the time whether an article is plagiarized or original. We are also providing Text Encoding Detector as a free online tool. You can access it yourself at this click-able link: Text Encoding Detector.
Can I just be clear on your findings.
Are you saying that your writers supplied you with articles, that to the human eye read (and looked) perfectly ok, but these articles had in fact been cloaked with Unicode UTF-8 format.
Then when you ran those cloaked versions of your articles through CopyScape, it indicated they were ok (but they weren't really unique because Copyscape was reading the Unicode UTF-8 format element cloaked in the article)
Then you took the real (unloaked words) and put those Google and found hundreds of copies.
Have I got that whole scenario correct?
Yes, that would be the correct interpretation.
I got two sites entirely de-indexed and reconsideration requests denied after using articles with encoded text ( which I had purchased and didn't realise it was encoded).
Even after I removed the content Google refused to reindex my sites.
So it's very apparent now that Google can in fact detect encoded articles and heavily penalize website if they catch you out
Same problem just happened to me that made me very angry. I hired a writer on Fiverr and she always delivered her work with high quality so quickly. Accidentally from searching the internet, I found that one of the articles was an exact copy from another site. It passed Copyscape and all other free tools I found online. I just ran her latest article through your tool and yes, it's encoded. I am so upset now and will demand a refund.
Did your message disappear? Read the Forums FAQ.
Spam Control | * indicates required field
No TrackBacks yet. TrackBack can be used to link this thread to your weblog, or link your weblog to this thread. In addition, TrackBack can be used as a form of remote commenting. Rather than posting the comment directly on this thread, you can posts it on your own weblog. Then have your weblog sends a TrackBack ping to the TrackBack URL, so that your post would show up here.
Messages, files, and images copyright by respective owners.
Copyright © 1996 - 2017. All Rights Reserved.