sobota 12. prosince 2015

Comparison of import.io and OutWit

When import.io was released, I was excited. However, the excitement disappeared. The reasons follow:
  1. Whenever you are defining a crawler, you have to always define at least 5 examples, even though you know, that in this case just 2 examples would be enough.
  2. The interface is sluggish even in the offline version of import.io.
  3. The crawling is approximately 10 times slower than in OutWit.
  4. The export is not satisfactory. If you tell import.io to export the data into csv, then import.io strips away all commas from the scraped text. If you need to preserve the commas, you can still export the data in XLS or JSON. But Excel has a limit on the length of text in cell. And when you get over the limit, you cannot open the file. JSON is neither a workable solution because the characters in the text are not always correctly escaped, making the JSON invalid. Hence, after several hours of web scraping with import.io you find yourself unable to scrape import.io.
While OutWit irritates me with it's deep context menus, at least it does it's work.   

Žádné komentáře:

Okomentovat