Author: Judd Lyon

New Site Graph Visualization, Start URL Option & 4X Crawler Speed Boost

Austin in August means excruciating heat. So we’ve been keeping cool inside our air-conditioned office, working hard on making SiteCondor even more useful. A special thanks to all the customers and prospects who’ve been providing us with feedback, you rock!

Site Graph

Our deepest look yet into your site’s linking structure

This new visualization makes it easy to navigate through your entire site to understand how your pages are interlinked. While the picture below looks cool, the Site Graph is a highly interactive visualization that you should try for yourself in order to fully appreciate. We’ll be posting a Site Graph video in the next couple days, keep an eye out for that.

Start URL

This new Job setting lets you set the crawl starting point


This can be used in many different ways:

  • When combined with the URL filter, you can limit your crawl to a specific area of your site
  • Create a Scheduled Job that checks on one particular page
  • Start your crawl from pages that are otherwise not discoverable (e.g., landing, private, unpublished pages)
  • Visualize site architecture starting from a specific URL (useful on big sites)

Sign In or Sign Up to try the new setting.

Up to 4X crawl speed increase

SiteCondor can now crawl resource-dense sites up to 4X faster than before


Additional Improvements

  • Added default option to disregard jsessionid from URLs
  • Improved error reporting – removed duplicate 404/Redirect errors that were showing up in the Other Errors section
  • And several other tweaks to make SiteCondor more robust and easier to use

Go ahead and try it out!

Welcome to the SiteCondor Blog

SiteCondor is a website analysis tool for digital marketing experts. You should check it out.

Welcome to our blog, where we share ideas about website optimization, web dev, product updates, and more.

Create your account
Recent Posts

Watch content, visual, and performance changes on any page

From time to time, we like to take short breaks from working on the main SiteCondor product and explore complementary ideas. Hence Page Watch was born.

Page Watch is a free tool that lets you track and understand page content, visual, and performance changes over time. You can use it to monitor your homepage, landing pages, competitor landing pages, product pages, terms and conditions or policies, and more.

Click here to try out Page Watch

To create a job, simply enter your email address and the URLs for the pages you’d like to watch (no signup needed):


Each Page Watch job can check up to 3 pages for changes over a 5 day span. Feel free to create as many jobs as you’d like. For each of the requested pages it will take a daily snapshot of content, visual, and performance changes and compares it to the previous day. Page Watch will send you an email letting you know there’s new results.

Here’s a quick screenshot of what results look like:


And here’s a screenshot on how content changes are displayed:


We are really excited to release this free tool and sincerely hope you’ll find it useful. If you do, please help us spread the word and share it with your friends and colleagues. If you have a minute, we would love for you to check it out and let us know what you think!

Find & Fix Your Low-Quality External Links

As you may know, in late January Matt Cutts published a much talked about post entitled The decay and fall of guest blogging for SEO. In it he directly discourages guest-blogging for links, and cites a July 2013 Search Engine Land article wherein his colleague at Google, John Mueller, explicitly encourages no following links in guest posts.

(If you’re not clear on follow/nofollow, WordStream has an excellent post on the subject.)

For the last couple months there has been a sharp uptick in the number of people reporting manual link penalties showing up in Google Webmaster Tools.

At PubCon New Orleans it became public that industry stalwart Ann Smarty’s MyBlogGuest had been hit by a penalty for passing link value in guest posts. Google was sending a clear message – clean up your guest blogging links.

[Edit: As per Ann’s comment below, the reason for penalty expressed above is just our assumption, there was no clear reason given by Google]

Savvy internet marketing veterans like Eric Enge of Stone Temple Consulting have urged taking proactive approach in cleaning up your links (See recommendation #6 in Is link-building dead?). The question is how?

Using SiteCondor to Analyze Links to External Sites

The latest SiteCondor release includes an improved External Links section containing a breakdown of your follow/nofollow links.

Let’s take a look at a practical example, shall we?

I’ve run a crawl job on a made-up sample blog post. (Note: you can click the images to enlarge them.)


You’ll notice that there are several spammy links, a legit link, and a legit link with overly-optimized anchor text. There is one no followed link.

Pro tip: For sites with hundreds of links you can use the Search function to find particular links.



Given this scenario, you could clean things up by:

1. nofollowing the spammy links (if not getting rid of them altogether)
2. adjusting the anchor text for the real estate link

Here’s the cleaned up result: after-follow


If you’d like to see the follow/nofollow links on any site, check out the External Links tab in your account or sign up for a free trial. Best of luck!

New features and improvements potpourri

Hello again!  Lately we’ve been busy reviewing SiteCondor’s feature set and the very valuable feedback from our beloved users (thank you!). As a result, we’ve decided to improve some key features and add some frequently requested ones. Below you’ll find a summary with the major updates. Feel free to try out SiteCondor and experience them first-hand, we’d love to know what you think.

Improved Search

If you’ve used SiteCondor in the past, you are likely familiar with the Explore menu and the different sections underneath it (Resources, Titles, Images, Meta Descriptions, Headings, Internal Links, External Links, URLs, Structured Data, Others, and XML Sitemap).  Underneath these sections, there are tabs presenting you with different aggregate views of those elements, enabling you to further slice and dice the data without the need to export to CSV and open up a spreadsheet application (albeit that’s also available).

This update improves the old, simple searching capabilities within each of the tabs into a much more powerful search: you can continue to use the previous “contains”-like searches, and you can also now run Regular Expression searches.  We also included an easier way to back out from search results along with a message clearly showing the search that triggered the current results. Here’s a quick example showing how to search for all Titles containing either “Austin” or “Work”:


Improved 404 error reporting

Our previous 404 section did not explicitly or conveniently display where the errors originated from (i.e.: where it found the broken links, broken images, etc.). You could get around this by triggering searches on the other sections, but that wasn’t a great user experience. So we decided to add a “found at” expandable section that lets you see the URLs where those broken resources were found right on the same screen (and in the exported CSV files as well):


New timeout error reporting

Most of the time, errors from web servers come back to our speedy crawler in nicely packaged, standards-compliant ways (i.e. with appropriate HTTP status codes, etc). But  let’s face it, s^&#*t happens. Network connections go down, somebody kicks a cable, web servers melt down, and zombies may attack anytime. When errors don’t come back to us in a timely manner, our crawler eventually gives up on that particular resource. Previously we were quietly ignoring these situations. We realized this was a problem, and so we are now reporting these errors within the Resources/Other Errors tab with a Status Code of 599. (This is not part of the RFC standard but rather a status code generally used to indicate network, client, or proxy timeouts.) If you see lots of 599 or 403 errors on your crawl results, you may try running your job again with less aggressive settings (generally speaking, dial down Concurrency, increase Throttling, and maybe increase Timeout as well).

New Job options: Max Resources, Disregard URL Query Strings, improved URL filter 

Upon request, our Job settings got a facelift too. We’ve moved up the protocol option, added a Max Resources option (limits the amount of resources to be used for the job, causing the crawler to stop early if the site is larger than the specified limit). Not only that, we’ve added an improved URL filter option supporting both “contains” and “regular expression” filtering, and a new Disregard Query Strings option. When enabled, the Disregard Query Strings setting will remove the query string part of the URL before requesting the resource. This is useful when crawling sites that make extensive use of query strings without necessarily returning different or interesting content.


Improved Job Summary display

Our Job Summary page now includes the new job options, and also presents the information in a clearer, easier to consume fashion (time units are used where appropriate, Yes/No used instead of true/false, etc).

Bug Fixes and improved crawler

As more users continue to create jobs to crawl different domains around the web, we continue to find situations where our crawler could improve the way it processes certain sites (particularly for sites with very poor markup, servers that aren’t very well behaved, or generally speaking for sites that aren’t crawl-friendly).  While we were making these updates, we also took care to  improve our crawler – making it both more accurate and faster, as well as giving the ability to  handle edge cases in better ways.

We hope you’ll enjoy this new release.  As usual, please stay in touch and let us know what you think. Best,