API Evangelist Partners

These are my partners who invest in API Evangelist each month, helping underwrite my research, and making sure I'm able to keep monitoring the API space as I do.


3scale makes it easy to open, secure, distribute, control and monetize APIs, that is built with performance, customer control and excellent time-to-value in mind.


Efficiently turn APIs into real-time experiences, using a proxy-as-a-service that turns any request-answer API into real-time event-driven data feeds without a line of server-side code.

API Scraping News

These are the news items I've curated in my monitoring of the API space that have some relevance to the API definition conversation and I wanted to include in my research. I'm using all of these links to better understand how the space is testing their APIs, going beyond just monitoring and understand the details of each request and response.

Title Source Visit
I’m harvesting credit card numbers and passwords from your site. Here’s how. (2018-01-06) medium.com
Web Scraping reveals top tech trends and company’s media mentions in 2017 (2017-12-18) www.promptcloud.com
CFPB principles for data aggregation services could have broad implications (2017-12-07) www.internationallawoffice.com
Screen Scraping is Dead, Long Live Screen Scraping (2017-11-30) www.finextra.com
Applications of web scraping in financial services industry (2017-11-29) www.promptcloud.com
'Screen scraping' fintechs hit back at CBA (2017-11-27) www.afr.com
Scraping Hacker News (2017-11-26) medium.com
Open Banking & Screen Scraping – Dave Tonge – Medium (2017-11-22) medium.com
Tips to Cut Costs Associated with Web Data Extraction (2017-11-20) www.promptcloud.com
How to Scrape Instagram Profiles (2017-11-13) dev.to
A Faster, Updated Scrapinghub (2017-11-05) blog.scrapinghub.com
PromptCloud launches a Web Scraping Forum (2017-10-27) www.promptcloud.com
Leveraging Web Scraping for Cryptocurrency Trading (2017-10-18) www.promptcloud.com
Why Has Ecommerce Companies Opting For Web Scraping Increased? (2017-10-01) www.promptcloud.com
How to gather information about companies from the news (2017-09-15) medium.com
Mastering Python Web Scraping: Get Your Data Back (2017-09-12) medium.com
FIDO Alliance FIDO Alliance Addresses PSD2 Screen Scraping Debate in Letter to European Commission and European Parliament (2017-09-08) fidoalliance.org
European Commission is Right To Reject a Screen (2017-08-30) www.datainnovation.org
The future of your data could rest in the outcome of LinkedIn vs HiQ case (2017-08-24) thenextweb.com
Rapidly Extract Information from Public Websites (2017-08-23) blog.algorithmia.com
Web Scraping by Leveraging Ruby (2017-08-23) www.promptcloud.com
How to build a Scalable Crawler on the cloud, that can mine thousands of data points, costing less… (2017-08-22) medium.com
Legality of Extracting Publicly Available User (2017-08-21) www.promptcloud.com
Scraping with paging (2017-08-19) medium.com
Web Scraping: Challenges and Roadblocks (2017-08-18) www.promptcloud.com
How to Extract Data from a Website using CrawlBoard (2017-08-16) www.promptcloud.com
Is LinkedIn trying to protect your data — or hoard it? (2017-08-15) www.washingtonpost.com
Judge orders LinkedIn to stop blocking data (2017-08-15) hosted.ap.org
LinkedIn can’t block scrapers from monitoring user activity (2017-08-15) www.engadget.com
Microsoft ordered to let third parties scrape LinkedIn data (2017-08-15) www.theverge.com
LinkedIn loses legal right to protect user data from AI scraping (2017-08-14) thenextweb.com
Should Data Scientists Learn Web Scraping? (2017-08-14) www.promptcloud.com
API disobedience — provide API to your data or someone else will (2017-08-11) medium.com
Best Programming Languages for Web Scraping (2017-08-09) www.promptcloud.com
5 Reasons Why You Should Scrape Competitor Prices (2017-07-26) www.promptcloud.com
Make a Web Scraper with AWS Lambda and the Serverless Framework (2017-07-23) medium.com
Google Sheets vs Web Scraping Services (2017-07-19) www.promptcloud.com
Screen scraping 101: Who, What, Where, When? (2017-07-19) medium.com
Do You have the Right Web Scraping Team? (2017-07-17) www.promptcloud.com
Scraping the Steam Game Store with Scrapy (2017-07-07) blog.scrapinghub.com
Why Enterprises Outsource Web Scraping to PromptCloud (2017-06-23) www.promptcloud.com
ABBYY’s new version of TextGrabber is a super useful OCR and translation app (2017-06-22) techcrunch.com
Why Do FinTechs Want To Save Screen Scraping? (2017-06-22) nordicapis.com
Why Customization is the Key Aspect of a Web Scraping Solution (2017-06-12) www.promptcloud.com
Scraping Dynamic Websites: How We Tackle the Problem (2017-06-09) www.promptcloud.com
How We Optimized Our Web Crawling Pipeline for Faster and Efficient Data Extraction (2017-06-07) www.promptcloud.com
May 2017 Crawl Archive Now Available (2017-06-05) commoncrawl.org
Web Scraping (2017-05-29) medium.com
Outsourcing your Web Scraping Project: Things to Know (2017-05-22) www.promptcloud.com
Sample Data is Great! But it is only Half the Story (2017-05-15) www.promptcloud.com
Fintechs fight plan to bar screen scraping and protect European banks (2017-05-08) www.cnbc.com
40,000 Tinder Pics Scraped Into Big Data Service (2017-05-02) www.itsecurityguru.org
Massive Tinder Photo Grab Is Latest Scary Warning To Be Careful What You Post (2017-04-30) www.huffingtonpost.com
Someone scraped 40,000 Tinder selfies to make a facial dataset for AI experiments (2017-04-28) techcrunch.com
OG-Miner : Data Crawling on Steroids. (2017-04-04) umbrella.cisco.com
Banks, consumer groups agree: Screen scraping needs better regs (2017-03-08) www.americanbanker.com
FutureTDM: The Future of Text and Data Mining (2017-03-03) blog.okfn.org
Block Web Crawlers With Rails (2017-02-11) dzone.com
Facebook User Data Scraping From Posts (Social Networking) (2017-01-31) codecanyon.net
Netflix/sketchy: A task based API for taking screenshots and scraping text from websites. (2017-01-05) github.com
Building a Sentiment Analysis Pipeline for Web Scraping (2016-12-30) blog.algorithmia.com
October 2016 Crawl Archive Now Available (2016-11-07) commoncrawl.org
A Fast Way to Scrape Image URLs from Webpages (2016-11-03) blog.algorithmia.com
Introducing Our Data Collection Endpoint (2016-10-21) blog.scaleapi.com
Why Promoting Open Data Increases Economic Opportunities (2016-10-19) blog.scrapinghub.com
September 2016 Crawl Archive Now Available (2016-10-07) commoncrawl.org
Tweet: The architecture diagram was greatly improved in Scrapy 1.2: https://t.co/978svMNg3p Many thanks to @lorenaelise and long live #opensource! (2016-10-04) twitter.com
Tweet: Scrapy 1.2 is out, with new features, bug fixes and better docs! Check the release notes and upgrade: https://t.co/Ph9IKLJN66 (2016-10-03) twitter.com
Domain whois, web scraping and more with Node.js MongoDB (2016-09-30) www.freelancer.com
How to Run Python Scripts in Scrapy Cloud (2016-09-28) blog.scrapinghub.com
Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd (2016-09-26) www.nature.com
Extract Structured Data From Web Sites Using Analyze URL (2016-09-21) blog.algorithmia.com
How to Fix Crawl Errors in Google Search Console (2016-09-21) moz.com
Data Sets Containing Robots.txt Files and Non-200 Responses (2016-09-16) commoncrawl.org
August 2016 Crawl Archive Now Available (2016-09-16) commoncrawl.org
Tweet: Nice shout out in the @guardian for @importio https://t.co/HfdlZIzeEh (2016-09-15) twitter.com
Screen Scraper Mitigation and Dogfooding Included in Riot Games#039; Textbook API Justifications (2016-09-15) www.programmableweb.com
Tweet: Learn how to handle #Javascript in #Scrapy with Splash: https://t.co/jM2GwEpOiF #opensource #data #python https://t.co/D690EKjm7y (2016-09-13) twitter.com
Extracting text from an image with OCR (Optical Character Recognition) using Node.js (2016-09-08) community.havenondemand.com
Improved Frontera: Web Crawling at Scale with Python 3 Support (2016-09-01) blog.scrapinghub.com
How to Crawl the Web Politely with Scrapy (2016-08-25) blog.scrapinghub.com
I Dont Need No Stinking API - Web Scraping in 2016 and Beyond (2016-08-24) franciskim.co
Facebook Twitter Scraper For Business (Search) (2016-08-22) codecanyon.net
Introducing Scrapy Cloud with Python 3 Support (2016-08-17) blog.scrapinghub.com
LinkedIn Goes After Anonymous Data Scrapers (2016-08-16) www.pcmag.com
LinkedIn sues anonymous data scrapers (2016-08-15) techcrunch.com
July 2016 Crawl Archive Now Available (2016-08-09) commoncrawl.org
Practical tips for scraping data (2016-08-04) flowingdata.com
This Month in Open Source at Scrapinghub August 2016 (2016-08-04) blog.scrapinghub.com
Local Google Results Scraper (2016-07-26) codecanyon.net
Scrapy Tips from the Pros: July 2016 (2016-07-20) blog.scrapinghub.com
ParseHub vs. Scrapy Comparison ? which alternative is better for web scraping? (2016-07-15) blog.parsehub.com
Capybara and Selenium for Testing and Scraping (2016-07-15) dzone.com
Scraping the Web for Water Levels using PowerShell (2016-07-14) learn-powershell.net
QuickCode is the new name for ScraperWiki (the product) (2016-07-14) blog.scraperwiki.com
ParseHub vs. Import.io ? which alternative is better for web scraping? (2016-07-12) blog.parsehub.com
Scrapely: The Brains Behind Portia Spiders (2016-07-07) blog.scrapinghub.com
Business Data Spider (2016-07-02) codecanyon.net
Introducing Portia2Code: Portia Projects into Scrapy Spiders (2016-06-29) blog.scrapinghub.com
Web Grabber - WordPress HTML Scraping Plugin (2016-06-13) codecanyon.net
Introducing the Datasets Catalog (2016-06-09) blog.scrapinghub.com
Jam API ? Parse web pages using CSS query selectors (2016-06-09) github.com
Wells Fargos Bid to Vanquish Screen Scraping (2016-06-06) www.americanbanker.com
Parsey McParseface ? The world?s most accurate open source parser (2016-05-13) github.com
Google just open sourced something called Parsey McParseface, and it could change AI forever (2016-05-12) thenextweb.com
Scrapy + MonkeyLearn: Textual Analysis of Web Data (2016-05-11) blog.scrapinghub.com
Introducing Scrapy Cloud 2.0 (2016-05-04) blog.scrapinghub.com
Harvesting Searched for Tweets Using Python (2016-05-02) blog.ouseful.info
How To Export Website Content To Excel (2016-04-29) www.magnolia-cms.com
List of most active web crawlers and spiders (2016-04-28) deviceatlas.com
Scrapy Tips from the Pros: April 2016 Edition (2016-04-20) blog.scrapinghub.com
Grok Your Data with the New MonkeyLearn Addon (2016-04-14) blog.scrapinghub.com
Data Journalism Tools Part 1: Extracting and Scraping Data (2016-04-13) blog.silk.co
Crawling a website with the SearchBlox API (2016-04-12) www.searchblox.com
Webscraping with C# - point and scrape! (2016-04-07) www.codeproject.com
Instapaper launches Instaparser API (2016-04-06) blog.instapaper.com
Mapping Corruption in the Panama Papers with Open Data (2016-04-06) blog.scrapinghub.com
WrapAPI ? APIs for the whole web (2016-04-05) wrapapi.com
Trawling the Companies House API to Generate Co-Director Networks (2016-04-04) blog.ouseful.info
PHP Web Scraper - Easily Grab HTML From Websites (2016-04-04) codecanyon.net
Web Scraping to Create Open Data (2016-03-30) blog.scrapinghub.com
Scrapy Tips from the Pros: March 2016 Edition (2016-03-23) blog.scrapinghub.com
Scraping Images and Files Using Casper.JS (2016-03-23) dzone.com
This Month in Open Source at Scrapinghub March 2016 (2016-03-16) blog.scrapinghub.com
Power Your Sports Stats with Web Scraping (2016-03-16) blog.parsehub.com
Website Scraping Using Selenium, Docker, and Chrome With Extensions (2016-03-16) dzone.com
Data mining the Votes of Members of the Polish Parliament (2016-03-14) dzone.com
How Web Scraping is Revealing Lobbying and Corruption in Peru (2016-03-09) blog.scrapinghub.com
Screen-scraper 7.0 Released (2016-03-02) blog.screen-scraper.com
Splash 2.0 Is Here with Qt 5 and Python 3 (2016-02-29) blog.scrapinghub.com
Migrate your Kimono Projects to Portia (2016-02-25) blog.scrapinghub.com
Kimono Alternative for Web Scraping - ParseHub (2016-02-20) blog.parsehub.com
Portia: The Open Source Alternative to Kimono Labs (2016-02-17) blog.scrapinghub.com
Python 3 is Coming to Scrapy (2016-02-04) blog.scrapinghub.com
Simple Way to Convert HTML Table Data into PHP Array (2016-01-25) www.codeproject.com
Scrapy Tips from the Pros: Part 1 (2016-01-19) blog.scrapinghub.com
Kimono : Turn websites into structured APIs from your browser in seconds (2016-01-05) www.kimonolabs.com
Create Your Own Web Scraper Using node.js and Get Data in JSON Format (2015-12-13) www.codeproject.com
Finding Common Phrases or Sentences Across Different Documents (2015-12-13) blog.ouseful.info
Acquiring at Digital Scale: Harvesting the StoryCorps.me Collection (2015-12-08) blogs.loc.gov
How Indiana?s Legislative Site Foiled Attempts to Scrape It (2015-08-30) www.programmableweb.com
Number of prescriptions by location (2015-08-28) blog.scraperwiki.com
Fragments ? Scraping Tabular Data from PDFs (2015-08-11) blog.ouseful.info
The four kinds of data PDF (2015-08-11) blog.scraperwiki.com
PDFTables: All the tables in one page, CSV (2015-06-30) blog.scraperwiki.com
Learn to scrape and build a Reddit API in Flask (2015-06-25) www.airpair.com
End User Programming at the Office for National Statistics (2015-06-10) blog.scraperwiki.com
Building knowledge graphs for new technologies with kimono (2015-06-08) blog.kimonolabs.com
How Contactive builds complete user profiles with kimono (2015-06-02) blog.kimonolabs.com
The screen-scraping vs. direct API integration debate: whats the best strategy for your mobile commerce site? (2015-06-01) www.information-age.com
Why Developers Should Avoid Screen Scraping (2015-05-21) openlegacy.com
Announcing PDFTables.com (2015-05-18) blog.scraperwiki.com
Generate high quality potential candidate leads (2015-05-03) blog.kimonolabs.com
The SEC API by Kimono (2015-05-02) kimonolabs.com
Frontera: The Brain Behind the Crawls (2015-04-22) blog.scrapinghub.com
Introducing crawl history error reports (2015-04-21) blog.kimonolabs.com
Are you getting the whole story? Investigating biases in mainstream news (2015-04-17) blog.kimonolabs.com
Scraping Web Pages With R (2015-04-15) blog.ouseful.info
Scrape Data Visually with Portia and Scrapy Cloud (2015-04-07) blog.scrapinghub.com
6 PHP Libraries For HTTP And Scraping Websites (2015-04-07) www.developersfeed.com
Could Kimonolabs#039; March Madness API Have Saved Your Bracket? (2015-03-24) www.programmableweb.com
Open March Madness API (2015-03-18) blog.kimonolabs.com
The History of Scrapinghub (2015-03-16) blog.scrapinghub.com
Scraping, Enriching, and Visualizing Data: A CrowdFlower Meet Up (2015-03-13) www.crowdflower.com
Tomato or Tomahto? How Next Caller uses kimono to add pronunciation to caller ID (2015-03-09) blog.kimonolabs.com
WPAS - Protect Your Data And Prevent web Scraping (Utilities) (2015-03-08) codecanyon.net
Skinfer ? a tool for inferring JSON Schemas (2015-03-04) blog.scrapinghub.com
How Google Pulls Structured Snippets from Websites Tables (2015-03-03) moz.com
rvest: R package to scrape web data (2015-03-02) flowingdata.com
Handling JavaScript in Scrapy with Splash (2015-03-02) blog.scrapinghub.com
What?s News Where? An Analysis of what made the front page of news sources across the globe with MonkeyLearn Entity Extraction (2015-02-24) kimonolabs.wpengine.com
Scrapinghub crawls the Deep Web (2015-02-24) blog.scrapinghub.com
New Changes to Our Scrapy Cloud Platform (2015-01-22) blog.scrapinghub.com
Introducing ScrapyRT: An API for Scrapy spiders (2015-01-22) blog.scrapinghub.com
Why Startups Need an API (2012-04-21) tune.com

If you think there is a link I should have listed here feel free to tweet it at me, or submit as a Github issue. Even though I do this full time, I'm still a one person show, and I miss quite a bit, and depend on my network to help me know what is going on.

API Scraping Organizations

These are the organizations I come across in my research who are doing interesting things in the API space. They could be companies, institutions, government agencies, or any other type of organizational entity. My goal is to aggregate so I can stay in tune with what they are up to and how it impacts the API space.


Embedly provides a platform and suite of tools to make embedding and previewing links simple. Embedly helps publishers and consumers manage embed codes from more than 100 Websites and APIs, including YouTube, Flickr, Ustream, Picassa, Hulu, Twitpic, Quantcast, and CrunchBase. It automatically convert links from these sources into embedded media on the fly.


Saplo uses innovative semantic technologies to analyze text in a way that mimic how humans read and evaluate text. Saplo help organisations extract and refine valuable information hidden in large text collections. Saplo have five different services; Entity Tagging, Topic Tags, Related & Similar Articles, Contextual recognition and Sentiment Analysis.


The service provides analysis of selected text passages to identify named entities and statements of fact with disambiguation to distinguish similar text strings. It applies machine learning algorithms and natural language processing to connect a text sample with a knowledge base and identify known elements and their relationships. API methods support submission of a text sample to be parsed. 


In late 2014, we needed a web scraper for one of our consulting projects, but couldn't find anything suitable. Therefore we decided to build a better scraper and it turned out people really liked it. Few months later, the project was selected with 32 others from 6500 applications to the inaugural Y Combinator Fellowship programme in August 2015. Apifier launched publicly in October 2015.


ScraperWiki is a web-based platform for collaboratively building programs to extract and analyze public (online) data, in a wiki-like fashion. Scraper refers to screen scrapers, programs that extract data from websites. Wiki means that any user with programming experience can create or edit such programs for extracting new data, or for analyzing existing datasets. The main use of the website is providing a place for programmers and journalists to collaborate on analyzing public data


Common Crawl is a non-profit foundation dedicated to providing an open repository of web crawl data that can be accessed and analyzed by everyone. Common Crawl Foundation is a California 501(c)(3) registered non-profit founded by Gil Elbaz with the goal of democratizing access to web information by producing and maintaining an open repository of web crawl data that is universally accessible and analyzable.


Scrapinghub is a company that provides web crawling solutions, including a platform for running crawlers, a tool for building scrapers visually, data feed providers (DaaS) and a consulting team to help startups and enterprises build and maintain their web crawling infrastructures.


PromptCloud opeartes on “Data as a Service” (DaaS) model and deals with large-scale data crawl and extraction, using cutting-edge technologies and cloud computing solutions (Nutch, Hadoop, Lucene, Cassandra, etc). Its proprietary software employs machine learning techniques to extract meaningful information from the web in desired format. These data could be from reviews, blogs, product catalogs, social sites, travel data—basically anything and everything on WWW. It’s a customized solution over simply being a mass-data crawler, so you only get the data you wish to see. The solution provides both deep crawl and refresh crawl of the web pages in a structured format.


Convextra allows you collect valuable data from internet and represents it in easy-to-use CVS format for forther utilization.

Screen Scraper

Copying text from a web page. Clicking links. Entering data into forms and submitting. Iterating through search results pages. Downloading files (PDF, MS Word, images, etc.).


Web data extraction and mashups are easy with Mozenda. We're industry leaders in screen scraping and data integration.

HPE Haven OnDemand

HPE Haven OnDemand is a platform for building cognitive computing solutions using text analysis, speech recognition, image analysis, indexing and search APIs. Simply put, developers and businesses use APIs to add advanced capabilities such as natural language processing, machine learning, and predictive analytics to their applications.


AYLIEN Text Analysis API is a package of Natural Language Processing, Information Retrieval and Machine Learning tools for extracting meaning and insight from textual and visual content with ease. At AYLIEN, we’re harnessing the potential of your data. Whether you're a news organization, a developer, a savvy marketer or an academic, you'll soon see what a dose of AYLIEN intelligence can do for you. Our text API allows you to monitor the sentiment of your brand, analyze documents or summarize and classify large amounts of text.


An open source and collaborative framework for extracting the data you need from websites. In a fast, simple, yet extensible way. Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.


Turn dynamic websites into APIs. You can extract data from anywhere. ParseHub works with single-page apps, multi-page apps and just about any other modern web technology. ParseHub can handle Javascript, AJAX, cookies, sessions and redirects. You can easily fill in forms, loop through dropdowns, login to websites, click on interactive maps and even deal with infinite scrolling.


Build an API on top of any website. Turn any website...into a parameterized APIBuild, share, and use APIs made from webpages. Use WrapAPI to scrape sites, build better UIs, and automate online tasks.

Apache Nutch

Nutch is a well matured, production ready Web crawler. Nutch 1.x enables fine grained configuration, relying on Apache Hadoop™ data structures, which are great for batch processing. - See more at: http://nutch.apache.org/index.html#sthash.dehuG4St.dpuf

Dandelion API

Context Intelligence: from text to actionable data. Extract meaning from unstructured text and put it in context with a simple API.Thanks to its revolutionary technology, Dandelion API works well even on short and malformed texts in English, French, German, Italian and Portuguese.


Moz is a software as a service (SaaS) company based in Seattle, Washington, U.S.A., that sells inbound marketing and marketing analytics software subscriptions. It was founded by Rand Fishkin and Gillian Muessig in 2004 as a consulting firm and shifted to software development in 2008. The company hosts a website which includes an online community of more than one million globally based digital marketers and marketing related tools.


ScrapeLogo has been discovered and developed by Maintop Businesses, originally only for internal purposes. It was coded as an independent service for several Maintop’s B2B projects. When requests from other companies multiplied, a private beta version was launched too. We are now looking for the first beta testers, who would like to show company logos on their websites and help us improve the quality and precision of our algorithm.


Importio turns the web into a database, releasing the vast potential of data trapped in websites. Allowing you to identify a website, select the data and treat it as a table in your database. In effect transform the data into a row and column format. You can then add more websites to your data set, the same as adding more rows and query in real-time to access the data.


<p>Diffbot provides a set of APIs that enable developers to easily use web data in their own applications. Diffbot analyzes documents much like a human would, using the visual properties to determine how the parts of the page fit together. The algorithm uses statistical techniques to automatically and reliably determine the structural organization of a page, independent of layout and the language of the text.


The product of over 50 person years of engineering effort, AlchemyAPI is a text mining platform providing the most comprehensive set of semantic analysis capabilities in the natural language processing field. Used over 3 billion times every month, AlchemyAPI enables customers to perform large-scale social media monitoring, target advertisements more effectively, track influencers and sentiment within the media, automate content aggregation and recommendation, make more accurate stock trading decisions, enhance business and government intelligence systems, and create smarter applications and services.


Bitext delivers the most precise and granular text analytics solution on the market, with an accuracy rate above 90%. We are computational linguists first. Our technology really understands sentence structure and its different layers of meaning, so it always produces the richest results.

If you think there is an organization I should have listed here feel free to tweet it at me, or submit as a Github issue. Even though I do this full time, I'm still a one person show, and I miss quite a bit, and depend on my network to help me know what is going on.

API Scraping Tooling

As I study each API, and API related service, I'm always looking for open source tooling that has been developed around each area of the API life cycle. This is an aggregate of tooling I've come across and aggregated as part of my API testing research.




Apache Tika

The Apache Tika™ toolkit detects and extracts metadata and structured text content from various documents using existing parser libraries.




PowerShell Module to interact with AYLIEN Text Analysis API - a package consisting of eight differen












Frontera backend to guide a crawl using PageRank, HITS or other ranking algorithms based on the link












Objective C







A python library detect and extract listing data from HTML page.











Portia is a tool for visually scraping web sites without any programming knowledge. Just annotate web pages with a point and click editor to indicate what data you want to extract, and portia will learn how to scrape similar pages from the site. Portia has a web based UI served by a Twisted server, so you can install it on almost any modern platform.

If there is a tool that you think should be listed here, let me know by submitting a Github issue or Tweeting a link at me. I'm always looking for new types of tools, and get better at organizing them here and making sense.