- Comparer results are not good enough - what can I do?
- How many crawlers can run at the same time?
- Image matching sometimes leads to faulty assignments. Why?
- How long do I have to wait for results?
- Why does the count of crawled URLs differ when comparing a domain with itself?
- What does the warning message, that the results may be incomplete, mean?
1. Comparer results are not good enough - what can I do?
If the results of your comparison are not to your satisfaction, the usual reason is, that the actual content of the pages can not be detected correctly. The RelaunchApp will try to find the actual content section when crawling, however, often to no avail since the pages are too different.
In order to vastly improve the comparisons, it is generally worthwhile to specify the HTML elements that contain the actual content. You can specify these start and stop sequences per domain. If you enter meaningful data here, your next crawl should bring much better results
It is also possible, that pages generate empty content or error messages, when too many crawlers are being used at the same time. This means that the content of all pages is the same, because the same error message is shown all over. Check for server overload - in this case you should reduce the number of your crawlers. This makes for a longer process but in the end you will receive more satisfactory results.
2. How many crawlers can run at the same time?
Now, this depends on the type of account you are using and on whether or not you have vaildated your site. By default 10 crawlers can run in parallel - more impressions than that at the same time could harm your site. Therefore, you have to validate your page first - to ensure you are the actual owner. It would be best to test this very carefully until you can be sure that an overload is not going to happen.
3. Image matching sometimes leads to faulty assignments. Why?
Faulty assignments usually happen when images are too small or have very little content. If that is the case, the compared files are very tiny and the result may be surprising, since this can lead to huge jumps in the assignment. This generally happens with about 1 or 2 images in a thousand.
4. How long do I have to wait for results?
This completely depends on how much power your server has and how much content is on your site. Generally speaking, image matching takes longer than text matching - so it takes longer to get results for sites that rely heavily on images. If your site has more than 100.000 subpages and images, it is to be expected, that it will take more than a day for all results to be generated.
5. Why does the count of crawled URLs differ when comparing a domain with itself?
If you compare a domain e.g. with itself in Free tariff, it may happen that for old and new a different count of URLs is crawled. This happens because the URLs are accessed in a partly random order. The results are only identical when the domain is fully crawled. If the domain cannot be completely crawled, the comparison results aren't identical either. This way a list of lost urls may be produced which wouldn't be created with a full crawl. In this case you should see a warning message in the project dashboard, that the results are incomplete.
6. What does the warning message, that the results may be incomplete, mean?
Depending on your tariff there are limits how many URLs in total may be crawled. Once this limit is reached the crawling is interrupted and the pages that were crawled until then will be compared. Depending on which URLs were crawled until then pages may be wrongly considered as "lost" or grouped to not perfectly matching redirects, because the destination pages were not yes scanned by the crawler. Conclusion: To get valid results the domain must be allowed to be crawled completely. In doubt a bigger dimensioned tariff may be needed.
You can check up to 301 redirects for free - try it - zero Risk!