The search engine how to identify non-original article

We search for an article on the popular search engine such as Google, often find that there are a lot of the same article, this is due to most of the articles on the website each other reproduces a result, however, as the search engine technology continues to develop and progress. knowledge of search engine is also able to slowly recognized other non-invasive article, we have to resolve some of the ways of the search engines to identify non-original articles.
First, search engines will filter the Chinese characters commonly used symbols:,. ! '"() {} [], But also filters" to,,,, ah, like the repetition rate is very high ranked no useless words.
That is based on keywords organic screening, because Keywords is basically for the content of the site will not change, and determine the original and pseudo-original method is keyword analysis, if the two article keywords appear in a similar position, is search engine which one of these two articles is a pseudo-original!
So I am afraid not quite understand, then let the author through specific examples for everyone to do something introduces first computer to remove the two articles, and then began to program analysis!
1: First set up a proportion, such as marked in accordance with 0.5 coefficient defined as M,!
2: article A according to the words divided into three sections, the B-post is divided into three sections, according to the algorithm the text into a computer to recognize the symbols, the symbols we tentatively use ADSDFAGFAG the, of course, the correct symbol should be in accordance with the binary code expressed
3: A, B two articles into a symbol, the computer began to carry out a comparative analysis of the similarity of this time there will be a proportional, if more than the first step in setting the proportion of 0.5, indicating two The article is similar to the same if it is found the same search prime engine will naturally look for other parameters to decide who is the original and pseudo-original!
Third, the pseudo-originality of the search engine to identify articles in the original article title recognition, change synonyms, programs or some statement to rewrite the inclusive two, modify the order of paragraphs to identify whether different from the original article purpose. Modification of the above steps, the search engine can identify the article is original, general, the article was updated on the site and included later, he would included in the database where two similar content pages X and Y cut into many independent blocks (A), and an independent block to compare the same part of the number of these blocks exceeds the threshold set by the search engine Z, he would think that the X and Y where a is reproduced.Here the content is divided into the A region, referring to the word of the search engine technology. Judge the duplicate block is more than the threshold Z, refers to the search engine indexing technology. Of course, the X and Y values are set by the search engine's algorithm, the set of different search engine algorithms, we have no way of knowing, but we can analyze a lot of useful things from the above model.
First, X and Y values determine the search engine to determine the ability of the reproduced content. The greater the Z value, the value of A more hours, the search engine to distinguish the reproduced content to the higher; the contrary, lower. These two values is determined by the resources consumed by the co-ordination between the search engine algorithm and algorithms, and other factors, so the search engines will not blind pursuit of high resolution capability.
Second, can be seen from the model mentioned above, the pseudo-original approach is not very effective search engine. They are through the District to determine the repeatability, and content of the order, so adjust the paragraphs of the order of the method is not feasible. Several other pseudo-original method, including the increase or decrease, content rewrite, replace synonyms, their effectiveness to a certain extent determined by the size of the values of N and M values. The development of search engines so far, the algorithm has been quite mature, the ability to distinguish between content duplication has been very effective, so add, delete content, replace part of a search engine does not make pseudo-original article as the original had.
Several ways to pass on the above, the search engine basically can identify 90% of the articles for the original article, the search engine to identify whether there are more ways of the original article.