Screaming Frog has always been my preferred tool to perform SEO Audits and get a quick overview of a web site. Yet the tool is not perfect: it has limits depending on the machine it runs on. Luckily there are alternatives in the desktop SEO spiders niche; one of the most prominent competitors – which I appreciate more every day – is Visual SEO Studio, developed by Federico Sasso.
Under some aspects Visual SEO (despite being in Beta at the moment I’m writing) is more efficient than Screaming Frog. In this article you’ll find the transcript of the video interview with the developer, where we discussed pros and cons of both the tools and their main design differences.
G: Today I’d like to ask Federico Sasso – the developer behind Visual SEO – a few questions to understand the main differences between the two most diffused client web crawlers on the market, Visual SEO and Screaming Frog. First of all: hi Federico, thank you for your collaboration, do you want to introduce yourself?
F: Hi Giovanni, thank you for your invitation, much appreciated. I’m a developer, that’s my background. I first approached SEO when I had the role of in-house in the company I worked for, and I quickly got passionate. Today I’m mainly known as the author of this crawler, an SEO spider running on desktop PCs. I recently founded a company and now am an entrepreneur.
G: Lovely, and we Italians are proud of you for having distributed – for free until now – this tool many of us are using. When did you start developing Visual SEO? Since how many years does this software exist?
F: far too many years! I mean: the first lines of code I think I wrote them in 2010, but it was far from what today the product is, it was more the prototype of crawler for a search engine; then I slowly understood I could have built and distributed a product for many people to use if I made it a desktop crawler. Not that I worked on it since 2010 uninterruptedly, I worked hard on it since 2011; surely since 2012 I worked on it all my spare time: weekends, nights, holidays.
G: there’s a lot of work behind it! The purpose of this interview is to understand the main differences from the crawler we can define the most diffused today, which is Screaming Frog, and yours, which we hope could gain a good market share. I’m not developer so I could make some mistakes but… I know Screaming Frog is developed in Java; what language and platform is Visual SEO developed with?
F: Screaming Frog is developed in Java, a language created by Sun Microsystems; it’s a high level language running not directly on the physical processor of the hosting machine, but running on top of a so-called Java virtual machine. This approach brought many advantages in modern software architectures.
Visual SEO instead is developed in C#/.NET, and environment very similar to Java, clearly inspired by Java: here the virtual machine is called the .net runtime, it’s like having a virtual processor. Java and C# are two extremely similar languages, and not by chance. Microsoft was originally distributing its own version of Java, then there’s been a legal querelle and since it couldn’t any more distribute as “Java” a language to which it made far too many changes, Microsoft said: “OK, than I’ll do my own language” and, truth to be said, it did a really good job. Despite all MS detractors, often for good reasons, it did a very good job. It was easier for it to create a better product because it could copy from the experience of Java.
G: I use intensively both yours and Screaming Frog and, as our audience probably well knows, running them on client machines, their ability to scan a bigger number of URLs mainly depends on the RAM installed on the user’s PC. What’s the difference between the two languages regarding how they leverage RAM memory and machine processor, or are they equivalent in terms of resource usage?
F: there are no big differences in terms of platform. What matters more in terms of memory consumption are architectural aspects. One is the machine architecture, the so-called bitness of the machine is installed on. I mean, whether it is 32 or 64 bits, because if we have a 32 bits machine – I’m talking about Windows but should be the same for Mac – Windows cannot allocate to the process the software is running on more then 2 GB of memory. No matter if you have 20, 24, 48 installed: if the OS is 32 bit, it cannot give more than 2 GB to your program. And if 2 GB is the total memory physical RAM, there are other programs competing for memory resources, and it’s clear I can crawl less, as the crawl size increases the product will consume more memory.
G: indeed, that’s why I set up a Windows 10, x64, with 32 GB of RAM, exactly to be able to run this kind of software!
F: there are also other aspects to consider. It’s not only a matter of processor architecture (and the OS support for it), it also depends a lot on the internal architecture of the software product itself. I know nothing about how Screaming Frog is made internally, I cannot know it. I know – if I recall correctly – that Screaming Frog keeps in memory at crawl time the entire internal links graph. This is a cost for them in terms of memory footprint, and its increment is not linear, it can grow exponentially as the crawl gets larger, much depends on how the site and its internal linking are made. Visual SEO Studio memory consumption increase is linear today, but those are architectural choices. For example, until 7-8 months ago Visual SEO had a huge memory consumption because it kept in-memory, as the crawl process went on, all the HTML contents of the crawled pages, not only the meta data. Granted, it kept it in-memory but compressed, there were tonnes of techniques used to reduce the consumption, nevertheless we had many users experiencing crashes because they exhausted the available memory. We have a crash server, we receive a crash detail saying a certain user – it’s just a serial code, we know nothing about their real identities unless they told us – had a crash because she/he ran out of memory. If the user provided an e-mail address, I was lucky to be able to at least reply and explain the issue, otherwise there was nothing I could do. Then we dismantled and remade the underlying architecture; since Visual SEO Studio continually saves on disk while crawling, we didn’t really need to keep everything in memory. We did it to be faster when producing reports, but we come up with a smart solution permitting not to be penalized in performances, and then we had no more such crashes for about 8 months. It only happened once a couple of weeks ago; a user from Turkey whom I was able to contact as he left me his e-mail address. He told me he had a 32 bits machine, only 2 GB of total RAM, and 10 programs running at the same time… yet he had been able to crawl around 145.000 URLs.
G: which is no little at all!
F: I know of people who have been able with Screaming Frog to crawl more than a million URLs using virtual machines on the cloud. After changing the architecture we too tested with the cloud and the very first attempt we were able to crawl 400.000 URLs with no issues, we only stopped because we had visited the entire tested site. But then other bottlenecks appeared, because one thing is representing in the user interface 10-100 thousand URLs, 400 thousand is another thing. There were parts, some views in our product, which were weaker. Take the directory tree view for example: some sites have a rather flat structure, but other have lots of nested directories, so much that in some cases you had hundred and hundreds of nodes, and loading them in the user interface took too long, so we decided to set a limit at the moment. Today we only allow you to crawl up to 150.000 pages, but under the hood the engine would permit much more, were it not for the UI. We will eventually loose the limit though.
G: How much RAM to reach that figure?
F: 150.000? No much really, even with 2GB.
G: Then it is very efficient, because with Screaming Frog it’s very hard to reach 100.000 URLs unless you have at least 16 GB
F: keep in mind the Turkish guy reached 145.000 with only 2GB of total memory, so…
G: So regarding memory footprint Visual SEO seems to be more efficient than Screaming Frog, also because if I understood correctly what you said, Screaming Frog during a site scan holds in RAM everything, while yours writes on the database so can discharge RAM and load less, and continue to work.
F: Yes because Screaming Frog only permits to save work at the end of the crawl, I believe. I’m not a Screaming Frog user, I’ve not the paid version. I don’t even think I have it installed.
G: This I did ask to Screaming Frog, as far as I know it doesn’t use a database when running, it’s not as if it saved at real-time on a DB and there were an inbound and outbound flow of data in RAM, it’s more as if the RAM were like a bucket filling drop by drop, until the time you have to save. Often the only solution with Screaming Frog is segmenting the scan of big sites: you save the first section and then go on with the second, and so on, while with yours it seems to be more efficient. I might try to test it on a big site and see what happens.
F: let me know! Actually, I’m not sure why Screaming Frog needed so much memory because I don’t think they held the whole HTML, I thing just the meta data, H1s at most, not much.
G: I’ll make a test and see! Concerning features instead, which features do you think make Visual SEO different, newer, better… which are the peculiar features you like most?
F: I’m not going to speak badly about Screaming Frog, because it’s a very good product. I didn’t use to like it until one year ago, then since version 4 they did excellent things I have to say, things we too started working on but never released, they are frozen in development branches for lack of time. For example, they have the integration with Google Analytics, the integrations with Search Console search data, they have the “custom extraction” – something we’ve been asked to do at least two years ago and still didn’t deliver because at the moment we have limited resources. We were a spare time project until recently, now we are a full time developer – me – and a collaborator, so it’s not easy but we are getting organized. I always tried to differentiate as much as possible from Screaming Frog. To begin with, we were born different because both the two of us started to work on a product when the other one did not exist for the public, so there are two completely different visions. Screaming Frog was always proposed as the swiss army knife, big table view, I export data to Excel, import them back, rinse and repeat with subsequent processing… even if I’m under the impression they are trying to change that. Visual SEO has always tried to keep users on the product, to give them answers, data and views, and avoid them to pass through an external software like Excel. Not everyone has an Excel license, of course there are open source alternatives, but you get the point. I also happen to have a visual mindset, so I gave priority to the tree views. A point of strength appreciated by our users is the “Crawl view”, it clearly shows at first glance the link structure and the crawl paths taken by a search engine spider.
G: are you meaning the view showing site sections, so folders, their size and their sub-folders, and the files they contain?
F: that’s the “Index view”, but there also is the one you normally see during a site exploration, visually depicting the “crawl paths”, it’s very appreciated. In my opinion Visual SEO is also much more powerful in terms of data querying. A big difference, an opposite design choice, between Visual SEO and Screaming Frog is that Screaming Frog decided to process everything at “crawl time”; if user wants to set filters, can only do it before crawling, right? It’s you the Screaming Frog advanced user.
G: Yes, you set your filters and rules first, then launch the scan.
F: If you realized you forgot something, or realize you made a mistake, darn, what do you do?
G: You do it all over again!
F: for small sites it’s not a big issue, the crawler is quite efficient; for very large sites, well, after three days spent crawling it could be a big problem. We made the opposite choice, we download the complete page HTML content, we save it on a local database and then can inspect it as much as we want to. You are not forced to crawl again the very same content. Screaming Frog advantage that many appreciate is you can “see” something already during the exploration, we do show you something like titles, descriptions.. something you can already see, but not things like percentage distribution, overall link depth, those things Screaming Frogs has on its right pane, those we are not showing you straight away, only afterwards. This is a difference you can perceive very soon, there are PROS and CONS.
G: likely Screaming Frog uses this approach exactly because it is more dependent on memory, while being yours more efficient memory-wise it can do differently. By selecting the proper filters in Screaming Frog you can actually reduce much of the workload, the number of pages it scans, that obviously depends on the type of filter, but in general you can limit the work and can make a better use of the memory. Being your tool not affected by this kind of problem, it can work differently and be more efficient.
F: I suspect that has been a mandatory choice for them, because their free version does not permit to save your work, so if they had real-time persistence they’d have lost the limit for the free version, I suspect.
G: and this observation of yours drives me to the last question. Your tool has been free of charge up to now, you are very generous. When will the paid version come, and with what kind of fee, an annual fee may be? And what about the price?
F: I prefer not to explicit the price right now, because it’s not curved into stone, but it will be roughly comparable to its competitors, that I can say.
G: When will the product end the Beta phase? When will version 1.0 come?
F: I can tell you the roadmap: We plan to release version 1.0 at the end of January 2016, and 1.0 will have to editions: a free one – likely called “Community Edition” – will have some limitations. The version without limitations – named “Professional Beta” – will be for a few months free of charge as well. Will only ask for a registration to unlock the features, and will be an improved version of todays product. The paid version is scheduled for the end of March.
G: Very well, so 2016 will see the coming of Visual SEO as a commercial product, and it was high time because you deserve it after all the work you dedicated to it!
F: I think it’s kind of a record, more than three years of Beta.
G: Being you alone, it’s really a lot what you did, congratulations!
Thank you so much Federico, I have no more questions. You exposed very well the concepts behind your tool and I wish you the best for the commercial launch of the product. Surely, I will purchase the license!
Extra considerations by Federico Sasso
More on memory consumption and control
With Screaming Frog it’s up to you setting a limit, hope to get it right (I think you can even set a limit superior to the actual physical RAM); I hope not to be wrong, I believe if Screaming Frog realizes you are finishing the set quota it would interrupt the crawl to save the data. Then you are forced to find a configuration file, change it manually (this is my understanding from what I read on their web site, the Screaming Frog expert is you, Giovanni), then launch the software again and reload the data to continue crawling (a positive note here is they enable you – if I understand it correctly – to resume a crawl process between one launch and the other; at the moment we don’t permit it, as we don’t save the queue of URLs yet to be explored).
In Visual SEO Studio everything is simpler for the user: the software doesn’t impose a-priori limits on RAM usage: during the crawl process there are “memory check-points” periodically monitoring and estimating memory consumption, and before the available RAM memory ends the software stops the crawl process (and finishes saving data). Continuing wouldn’t make any sense, because the consumed memory was the one allocated by the Operating System, it’s not very likely the user would stop, open the PC, and add physical RAM!
Cases like the one of the Turkish user who had a crash because of memory exhaustion are very unlikely: the memory check points do work, the user experienced the crash just because other programs were drinking from the same bucket without Visual SEO Studio being able to know.
The solution adopted for Visual SEO Studio will also permit with no changes to schedule crawls: there’s no risk of the user been shown a dialog window in the middle of a night crawl to ask to increase RAM.
Screaming Frog users also have to take care of installing the correct bitness (32 or 64 bits) version of the Java runtime. Programs written in .Net have not this problem because the .Net runtime is already installed with the Operating System, at most you have to worry of the correct release. Correct .net release version for Visual SEO Studio is not much an issue since time: it asks for .net 4.x which nowadays is the minimum installed also on older machines (and can be installed even on Windows XP SP3); when the installed runtime is more recent, Visual SEO will run the most recent runtime it finds.
About the need to crawl so many URLs
As I said, we want to loose the 150K URLs limit in the near future. It’s an interesting problem under several aspects: technical and commercial.
Today limit is in place because, were it not there, some users would crawl, say, half a million URLs, and then would complain for the software being too slow for some elaborations (some custom filters, or the near to come “Performance suggestions” make a very intensive inspection and on large sites they take some time to complete). So for now we prefer being perceived as “less URLs”, but usable (not that 150K were little, on the contrary).
Human mind does not perceive much the difference between large numbers. For our brain the difference between 10.000 and 15.000 is the same between 1ML and 1.5ML, but machine performances in the first case are almost indistinguishable, while in the second they could differ dramatically.
In order to unlock the limit without losing reputation we should rebuild some user interface components: tree views first, but any component after a while can show its limits, even the table view becomes very slow when you load 1ML URLs on it. There’s a solution to everything, but you have to rebuild parts that now are working fine with a well respectable 150K, and changing them would cost development time, today for us very scarce (the Pro Paid deadline set for the end of March / beginning of April is very tight, we have to put in place a payment and invoicing system, all this stuff is a big priority for our economical survival).
There are other aspects, as I told you:
150K is not little, but how many sites in percentage have more URLs?
I don’t really know the answer – I wish I did – but 150K are by far already a more than acceptable limit even for most Pro users (and also in Visual SEO Studio you can segment crawl limiting exploration to one or more subfolders, even if doing that there is the theoretical risk – as with Screaming Frog – to miss some URLs depending on how internal links are distributed).
True, e-commerce sites can easily outgrow that limit, and many Visual SEO users do manage e-commerce sites, even if usually not huge ones (I don’t normally know anything about users unless they do contact me, but some of them do it).
Fact is, more than 90% of on site issues usually is in the layout, you fix them there, you fix them everywhere. And to find problems in the layout you don’t really need to crawl million of URLs.
Often SEOers want to crawl a site fully to make a quote; e-commerce managers want to crawl fully because they often lack in-depth SEO skills, they have lot of duplications because of canonicalization issues, crawl budget problems, usually caused by “faceted navigation”, but even there it takes little to make the total number of URLs an order of magnitude lower.
Does it make commercially/strategically sense to permit crawling much more? I once used to think yes, but I suspect today we already are covering the needs of the biggest part of our potential users. We could do that, of course, but it’s 80legs and DeepCrawl the ones perceived to be the proper tool for big crawls, we’d risk to pursue a wrong priority at the wrong time. From a business standpoint it would probably make more sense building a web based service (with another brand), competing to such services, and have users pay 10 times the fee of a desktop-based software license.
Said that, we certainly want for the future to raise the bar above todays 150K limit; may be not too much, but 500K/1ML I repute it to be achievable (with the lower value much easier).
About crawl speed
More differences between Screaming Frog and Visual SEO Studio
To Screaming Frog crawl speed is a priority, who bothers about the rest. Yes, it’s not as fast as the old Xenu, but the software architecture is multi-threaded, it means it can perform more concurrent HTTP requests. You can even make a DOS attack with a software like Screaming Frog (not the best tool for the task, of course).
Visual SEO Studio makes the opposite choice: its spider is adaptive, it never forces the web server throughput more than what it reputes to be its limit. I does so by using a single-threaded queue, it makes only a request at the time, and never starts the next one before the previous is completed. It does so also to guarantee the exploration order and reconstruct the crawl paths according to a “breadth first” algorithm approximating in a repeatable way the behaviour of a search bot (it is an approximation: an exploration by Googlebot approximates to a breadth-first only in absence of external signals, then the priority is weighted by PageRank, and the order is not exact because the pipeline is asynchronous). Screaming Frog doesn’t attempt to emulate a search bot behaviour, it doesn’t care much about exploration order and crawl paths and profits hands full of it. Visual SEO Studio does it by design choice: reconstructing repeatable crawl paths and adapting to the web server capacity.
Actually Visual SEO Studio could try to push a little more, because web servers are made to “scale” and manage concurrent calls; in particular, were the crawler the only user, the web server wouldn’t suffer if Visual SEO Studio pushed a little harder. We could change our engine to parallelize some requests (some particular groups) still preserving the ability to keep the exploration order and reconstruct crawl paths; at the cost of a little higher internal complexity we’d have some performance gain (I can’t estimate how much now). It still would be slower on average, but not much. For the future we’ll do it; today users are not complaining about speed (once they did, but then we made some changes and dramatically improved crawl efficiency).
Here too, we see different choices:
- to Screaming Frog the priority is crawling fast, who cares about the web server health
- to Visual SEO Studio the priority is not to alter web server efficiency.
(Editor’s note: Screaming Frog does let you set crawl speed: by default it performs 5 concurrent calls, but user can raise or lower the limit by using the “speed” option in the spider configuration menu).
Once a user contacted me, saying “look, with Screaming Frog I can crawl this site with x URLs per second, but after y the server usually crashes and I have to restart it. With your crawler it never crashes, but takes longer, how can I speed it up?”
Of course my answer was: “it doesn’t crash because the spider doesn’t try to push more than the web server can handle!”. HTTP requests cannot hit harder, only more frequently!
Think about an online e-commerce site of a customer, even if I don’t crash it, with a spider crawling at full speed for two days everything slows down, conversion rate decreases, and it’s the bipeds paying the transactions, not the bots.
In my opinion, this Visual SEO trait – adapting to the rhythm of the server without forcing further – backfires as potential uses see it at first as a weakness.
True, on average it is slower, but not that much. On a local test site I once passed 200 pages per second! OK, small site, local, very light, but when the web server keeps responsive, the crawler has good performances. The worst case for Visual SEO Studio is when a single page times out, and average speed drops; a little of parallelization would prevent it. We’ll do it one day.
Excessive load on the server is not only caused by excessive crawl frequency per se, because web server are built well and scale, but by the increased memory allocation on the server caused by the higher frequency HTTP calls. Problem is, normally bots do not accept session cookies, and each of their HTTP request implies creating a new server side session.
Once a Unix system administrator explained me every new session on Apache costs on average 7MB; suppose it were true (the datum is a little contested): think about a Screaming Frog which for each HTTP call causes a new session on the server, suppose they only were 5 requests per second, having a server session time-out time of 20’=1200”, it would mean that after 20′ of crawling, every instant we’d have 1200x5x7MB = 41 GB allocated. It would mean more than 40 GB of RAM allocated on the web server just because a crawler is spidering your site, and may be you didn’t even authorize it.
Note: for this reason Visual SEO Studio crawler by default accepts session cookies, even if it permits disabling them as in some cases servers would behave differently.
7 MB is the minimum amount allocated by Apache (I believe Nginx to be more efficient), but if the guy who built the program running on the web server (be it a CMS, web app, store, whatever) had the bright idea to allocate more for each new session (I swear I saw it), the problem increases even further!
Other differences: spoofing and REP
Another distinguishing trait of Visual SEO Studio is a stricter respect of the Robots Exclusion Protocol.
For example, Visual SEO Studio respects crawl-delay up to a maximum of 2 seconds; yes, it permits you to ignore it, but only for those sites the user has demonstrated to administer (“Administered Sites” list).
Also changing user-agent is permitted for administered sites only. Ditto for ignoring the robots.txt file: it’s allowed for administered sites only.
At the basis of such choices are several reasons:
Ethical: we are consuming someone else’s resources without giving anything back (in case of a competitor’s site or a potential customer’s).
Screaming Frog doesn’t consider the issue, it just says “your responsibility” and lets you do anything. To us ethics matter, and besides if someone gets upset and sends the lawyers, try explaining it to a judge; we want to keep far from legal issues… See the next point
Legal: around the world there already are sentences, not many indeed, related to the crawlers world. My opinion is they will become more frequent. We cannot risk our future with costly and long battles in courts. The solution to permit anything to who demonstrates to own/administer the web server is the safest; unfortunately many users have not clear in mind the advantage of listing their sites in the “Administered Sites” list, so if they don’t and have a crawl-delay set, they perceive the tool to be very slow even if they could crawl faster.
Business: having chosen not to permit spoofing user-agent (Editor’s note: changing user-agent) for sites we do not control, not adhering to any other aspect of the REP (e.g.: bypassing robots.txt directives for sites we don’t administer), other than making all our choices useless, after a while would create a bad reputation among sysadmins. They would try to block “Pigafetta” (it’s the name of Visual SEO Studio user-agent), damaging our future sales.
Services like Majestic, aHrefs, etc… all do respect REP (Moz actually played sly in the early days, when it was SEOMoz), if they didn’t they’d have their IP blocked more often (Majestic has distributed crawlers, not calling from a fixed IP, nevertheless they seem to be the most rigorous of the bunch). Screaming Frog doesn’t care about the problem because their users can always spoof the user agent and who cares for the rest. Those are choices (see previous points).
Today in Visual SEO Studio craw-delay for non-administered sites is honoured up to 2 secs, but once used to be up to 10” and there also was a bug causing the topping not to be applied in some cases. I once received a message from a user, sent using the software itself (“help” menu, “send the author a message” option); when this happens I receive an e-mail, and if the user inputs an e-mail address I can understand who he is and answer. The user was a fairly known SEO from overseas; we had exchanged a few polite messages in the past, but he didn’t know there was me behind Visual SEO. This instance instead he sent a message full of hate because the crawler was too slow. He never answered back to my request of details, I have no idea whether he stumbled on the bug or he met a 10” crawl-delay, the only thing I discovered was in real life he is a ar….le. Even if it were the worst software ever, he didn’t pay a dime for it. He never answered me.
About versions for their operating systems
As you well know, today Visual SEO is only available for Windows, while Screaming Frog has versions for Windows, Mac, and Linux. I wish I had a euro for each time they asked me whether there is a Mac version. It’s not only a matter of smaller amount of market share reached… For example A.S. (Editor’s note: a well-known International SEO) when I met her would gladly have given it a try, but she’s a Mac user! An opportunity missed!
Programs coded in Java are much more portable: Java runtimes are distributed for all platforms since years. The look’n’feel might be awful, but porting is more straightforward, and has much lower development costs.
.net was born copying Java as much as possible, and is made with a run-time – a virtual processor – exactly like Java is, but Microsoft for years never invested to have it ported to other operating systems. Some have tried, the most successful attempt is called Mono, a .net version running on Linux and Mac. Since about one year Microsoft itself also is investing on porting the platform to other architectures (in part I believe in partnership with Mono).
This is to say that porting Visual SEO on the Mac is feasible, with some sacrifice.
We always have been careful to chose components which wouldn’t pose an obstacle to a porting (the most notable exception is the screenshots part, relying on an embedded version of IE, but we want to change it anyway). The Mono port is still at a very early stage, we timidly tried some parts, and had to procrastinate to follow other priorities. During the next months we will try again, helped by an external resource.
I have no idea how much a Mac version will expand our reach to potential customers. Some insist to say in SEO / Web Marketing world Mac has more than 50% share. I never found a credible statistic by niche. According to this source for desktop systems Windows dominates with the lion share, and Mac has less than 10%. Knowing the statistics per niche would help to better estimate how much to budget for the porting (I might ask Andrea Pernici to check Analytics of Forum GT). Porting to Mono would permit having both a Mac and a Linux version; sincerely at the moment I’m more interested in a Mac version, a Linux version at least at the beginning would be more a maintenance burden to have only few users more.