API Evangelist Partners

These are my partners who invest in API Evangelist each month, helping underwrite my research, and making sure I'm able to keep monitoring the API space as I do.

Streamdata.io

Streamdata is a software vendor making real-time data accessible to all by operating a proxy turning request / response APIs into feeds of real-time events.

Uptrends

Uptrends is the ultimate monitoring tool to stay in control of the uptime, performance, and functionality of your websites, APIs, and servers.

3Scale

3scale makes it easy to open, secure, distribute, control and monetize APIs, that is built with performance, customer control and excellent time-to-value in mind.

API Scraping News

These are the news items I've curated in my monitoring of the API space that have some relevance to the API definition conversation and I wanted to include in my research. I'm using all of these links to better understand how the space is testing their APIs, going beyond just monitoring and understand the details of each request and response.

Title Source Visit
How to Scrape Amazon Product Reviews using Python (2018-08-27) www.promptcloud.com
Scrapy and Scrapyrt: how to create your own API from (almost) any website (2018-08-25) medium.com
6 Emerging Applications of Web Scraping Technology (2018-08-22) www.promptcloud.com
9 Best Applications of Text Data Mining and Analysis (2018-08-06) www.promptcloud.com
What is Web Scraping? (2018-07-17) www.promptcloud.com
Web Data Acquisition Framework – Go (2018-07-12) www.promptcloud.com
How Data Compliance Companies Are Turning To Web Crawlers To Take Advantage of the GDPR Business Opportunity (2018-05-30) blog.scrapinghub.com
How To Parse Google Local Pack Results (2018-04-13) medium.com
Raschietto: a simple library for web scraping (2018-02-25) medium.com
Serverless Architecture and Web Scraping? (2018-02-22) medium.com
Getting Data From the Web (2018-02-22) dzone.com
Extract Facebook and Twitter data from any page (2018-02-08) medium.com
Is Web Scraping Driving Nectar’s 20% MoM Ecomm Growth? (2018-02-02) www.mozenda.com
I’m harvesting credit card numbers and passwords from your site. Here’s how. (2018-01-06) medium.com
Web Scraping reveals top tech trends and company’s media mentions in 2017 (2017-12-18) www.promptcloud.com
CFPB principles for data aggregation services could have broad implications (2017-12-07) www.internationallawoffice.com
Screen Scraping is Dead, Long Live Screen Scraping (2017-11-30) www.finextra.com
Applications of web scraping in financial services industry (2017-11-29) www.promptcloud.com
'Screen scraping' fintechs hit back at CBA (2017-11-27) www.afr.com
Scraping Hacker News (2017-11-26) medium.com
Open Banking & Screen Scraping – Dave Tonge – Medium (2017-11-22) medium.com
Tips to Cut Costs Associated with Web Data Extraction (2017-11-20) www.promptcloud.com
How to Scrape Instagram Profiles (2017-11-13) dev.to
New Features Release: Scraping Web Data Up To 500% Faster Just Got Easier (2017-11-06) www.mozenda.com
A Faster, Updated Scrapinghub (2017-11-05) blog.scrapinghub.com
PromptCloud launches a Web Scraping Forum (2017-10-27) www.promptcloud.com
Leveraging Web Scraping for Cryptocurrency Trading (2017-10-18) www.promptcloud.com
Why Has Ecommerce Companies Opting For Web Scraping Increased? (2017-10-01) www.promptcloud.com
How to gather information about companies from the news (2017-09-15) medium.com
Mastering Python Web Scraping: Get Your Data Back (2017-09-12) medium.com
FIDO Alliance FIDO Alliance Addresses PSD2 Screen Scraping Debate in Letter to European Commission and European Parliament (2017-09-08) fidoalliance.org
European Commission is Right To Reject a Screen (2017-08-30) www.datainnovation.org
The future of your data could rest in the outcome of LinkedIn vs HiQ case (2017-08-24) thenextweb.com
Rapidly Extract Information from Public Websites (2017-08-23) blog.algorithmia.com
Web Scraping by Leveraging Ruby (2017-08-23) www.promptcloud.com
How to build a Scalable Crawler on the cloud, that can mine thousands of data points, costing less… (2017-08-22) medium.com
Legality of Extracting Publicly Available User (2017-08-21) www.promptcloud.com
Scraping with paging (2017-08-19) medium.com
Web Scraping: Challenges and Roadblocks (2017-08-18) www.promptcloud.com
How to Extract Data from a Website using CrawlBoard (2017-08-16) www.promptcloud.com
Is LinkedIn trying to protect your data — or hoard it? (2017-08-15) www.washingtonpost.com
Judge orders LinkedIn to stop blocking data (2017-08-15) hosted.ap.org
LinkedIn can’t block scrapers from monitoring user activity (2017-08-15) www.engadget.com
Microsoft ordered to let third parties scrape LinkedIn data (2017-08-15) www.theverge.com
LinkedIn loses legal right to protect user data from AI scraping (2017-08-14) thenextweb.com
Should Data Scientists Learn Web Scraping? (2017-08-14) www.promptcloud.com
API disobedience — provide API to your data or someone else will (2017-08-11) medium.com
Best Programming Languages for Web Scraping (2017-08-09) www.promptcloud.com
5 Reasons Why You Should Scrape Competitor Prices (2017-07-26) www.promptcloud.com
Make a Web Scraper with AWS Lambda and the Serverless Framework (2017-07-23) medium.com
Google Sheets vs Web Scraping Services (2017-07-19) www.promptcloud.com
Screen scraping 101: Who, What, Where, When? (2017-07-19) medium.com
Do You have the Right Web Scraping Team? (2017-07-17) www.promptcloud.com
Scraping the Steam Game Store with Scrapy (2017-07-07) blog.scrapinghub.com
Why Enterprises Outsource Web Scraping to PromptCloud (2017-06-23) www.promptcloud.com
ABBYY’s new version of TextGrabber is a super useful OCR and translation app (2017-06-22) techcrunch.com
Why Do FinTechs Want To Save Screen Scraping? (2017-06-22) nordicapis.com
Why Customization is the Key Aspect of a Web Scraping Solution (2017-06-12) www.promptcloud.com
Scraping Dynamic Websites: How We Tackle the Problem (2017-06-09) www.promptcloud.com
How We Optimized Our Web Crawling Pipeline for Faster and Efficient Data Extraction (2017-06-07) www.promptcloud.com
May 2017 Crawl Archive Now Available (2017-06-05) commoncrawl.org
Web Scraping (2017-05-29) medium.com
Outsourcing your Web Scraping Project: Things to Know (2017-05-22) www.promptcloud.com
Sample Data is Great! But it is only Half the Story (2017-05-15) www.promptcloud.com
Fintechs fight plan to bar screen scraping and protect European banks (2017-05-08) www.cnbc.com
40,000 Tinder Pics Scraped Into Big Data Service (2017-05-02) www.itsecurityguru.org
Massive Tinder Photo Grab Is Latest Scary Warning To Be Careful What You Post (2017-04-30) www.huffingtonpost.com
Someone scraped 40,000 Tinder selfies to make a facial dataset for AI experiments (2017-04-28) techcrunch.com
OG-Miner : Data Crawling on Steroids. (2017-04-04) umbrella.cisco.com
Banks, consumer groups agree: Screen scraping needs better regs (2017-03-08) www.americanbanker.com
FutureTDM: The Future of Text and Data Mining (2017-03-03) blog.okfn.org
Block Web Crawlers With Rails (2017-02-11) dzone.com
Facebook User Data Scraping From Posts (Social Networking) (2017-01-31) codecanyon.net
Netflix/sketchy: A task based API for taking screenshots and scraping text from websites. (2017-01-05) github.com
Building a Sentiment Analysis Pipeline for Web Scraping (2016-12-30) blog.algorithmia.com
October 2016 Crawl Archive Now Available (2016-11-07) commoncrawl.org
A Fast Way to Scrape Image URLs from Webpages (2016-11-03) blog.algorithmia.com
Introducing Our Data Collection Endpoint (2016-10-21) blog.scaleapi.com
Why Promoting Open Data Increases Economic Opportunities (2016-10-19) blog.scrapinghub.com
September 2016 Crawl Archive Now Available (2016-10-07) commoncrawl.org
Tweet: The architecture diagram was greatly improved in Scrapy 1.2: https://t.co/978svMNg3p Many thanks to @lorenaelise and long live #opensource! (2016-10-04) twitter.com
Tweet: Scrapy 1.2 is out, with new features, bug fixes and better docs! Check the release notes and upgrade: https://t.co/Ph9IKLJN66 (2016-10-03) twitter.com
Domain whois, web scraping and more with Node.js MongoDB (2016-09-30) www.freelancer.com
How to Run Python Scripts in Scrapy Cloud (2016-09-28) blog.scrapinghub.com
Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd (2016-09-26) www.nature.com
How to Fix Crawl Errors in Google Search Console (2016-09-21) moz.com
Extract Structured Data From Web Sites Using Analyze URL (2016-09-21) blog.algorithmia.com
August 2016 Crawl Archive Now Available (2016-09-16) commoncrawl.org
Data Sets Containing Robots.txt Files and Non-200 Responses (2016-09-16) commoncrawl.org
Tweet: Nice shout out in the @guardian for @importio https://t.co/HfdlZIzeEh (2016-09-15) twitter.com
Screen Scraper Mitigation and Dogfooding Included in Riot Games#039; Textbook API Justifications (2016-09-15) www.programmableweb.com
Tweet: Learn how to handle #Javascript in #Scrapy with Splash: https://t.co/jM2GwEpOiF #opensource #data #python https://t.co/D690EKjm7y (2016-09-13) twitter.com
Extracting text from an image with OCR (Optical Character Recognition) using Node.js (2016-09-08) community.havenondemand.com
Improved Frontera: Web Crawling at Scale with Python 3 Support (2016-09-01) blog.scrapinghub.com
How to Crawl the Web Politely with Scrapy (2016-08-25) blog.scrapinghub.com
I Dont Need No Stinking API - Web Scraping in 2016 and Beyond (2016-08-24) franciskim.co
Facebook Twitter Scraper For Business (Search) (2016-08-22) codecanyon.net
Introducing Scrapy Cloud with Python 3 Support (2016-08-17) blog.scrapinghub.com
LinkedIn Goes After Anonymous Data Scrapers (2016-08-16) www.pcmag.com
LinkedIn sues anonymous data scrapers (2016-08-15) techcrunch.com
July 2016 Crawl Archive Now Available (2016-08-09) commoncrawl.org
This Month in Open Source at Scrapinghub August 2016 (2016-08-04) blog.scrapinghub.com
Practical tips for scraping data (2016-08-04) flowingdata.com
Local Google Results Scraper (2016-07-26) codecanyon.net
Scrapy Tips from the Pros: July 2016 (2016-07-20) blog.scrapinghub.com
ParseHub vs. Scrapy Comparison ? which alternative is better for web scraping? (2016-07-15) blog.parsehub.com
Capybara and Selenium for Testing and Scraping (2016-07-15) dzone.com
Scraping the Web for Water Levels using PowerShell (2016-07-14) learn-powershell.net
QuickCode is the new name for ScraperWiki (the product) (2016-07-14) blog.scraperwiki.com
ParseHub vs. Import.io ? which alternative is better for web scraping? (2016-07-12) blog.parsehub.com
Scrapely: The Brains Behind Portia Spiders (2016-07-07) blog.scrapinghub.com
Business Data Spider (2016-07-02) codecanyon.net
Introducing Portia2Code: Portia Projects into Scrapy Spiders (2016-06-29) blog.scrapinghub.com
Web Grabber - WordPress HTML Scraping Plugin (2016-06-13) codecanyon.net
Jam API ? Parse web pages using CSS query selectors (2016-06-09) github.com
Introducing the Datasets Catalog (2016-06-09) blog.scrapinghub.com
Wells Fargos Bid to Vanquish Screen Scraping (2016-06-06) www.americanbanker.com
Parsey McParseface ? The world?s most accurate open source parser (2016-05-13) github.com
Google just open sourced something called Parsey McParseface, and it could change AI forever (2016-05-12) thenextweb.com
Scrapy + MonkeyLearn: Textual Analysis of Web Data (2016-05-11) blog.scrapinghub.com
Introducing Scrapy Cloud 2.0 (2016-05-04) blog.scrapinghub.com
Harvesting Searched for Tweets Using Python (2016-05-02) blog.ouseful.info
How To Export Website Content To Excel (2016-04-29) www.magnolia-cms.com
List of most active web crawlers and spiders (2016-04-28) deviceatlas.com
Scrapy Tips from the Pros: April 2016 Edition (2016-04-20) blog.scrapinghub.com
Grok Your Data with the New MonkeyLearn Addon (2016-04-14) blog.scrapinghub.com
Data Journalism Tools Part 1: Extracting and Scraping Data (2016-04-13) blog.silk.co
Crawling a website with the SearchBlox API (2016-04-12) www.searchblox.com
Webscraping with C# - point and scrape! (2016-04-07) www.codeproject.com
Mapping Corruption in the Panama Papers with Open Data (2016-04-06) blog.scrapinghub.com
Instapaper launches Instaparser API (2016-04-06) blog.instapaper.com
WrapAPI ? APIs for the whole web (2016-04-05) wrapapi.com
PHP Web Scraper - Easily Grab HTML From Websites (2016-04-04) codecanyon.net
Trawling the Companies House API to Generate Co-Director Networks (2016-04-04) blog.ouseful.info
Web Scraping to Create Open Data (2016-03-30) blog.scrapinghub.com
Scraping Images and Files Using Casper.JS (2016-03-23) dzone.com
Scrapy Tips from the Pros: March 2016 Edition (2016-03-23) blog.scrapinghub.com
This Month in Open Source at Scrapinghub March 2016 (2016-03-16) blog.scrapinghub.com
Website Scraping Using Selenium, Docker, and Chrome With Extensions (2016-03-16) dzone.com
Power Your Sports Stats with Web Scraping (2016-03-16) blog.parsehub.com
Data mining the Votes of Members of the Polish Parliament (2016-03-14) dzone.com
How Web Scraping is Revealing Lobbying and Corruption in Peru (2016-03-09) blog.scrapinghub.com
Screen-scraper 7.0 Released (2016-03-02) blog.screen-scraper.com
Splash 2.0 Is Here with Qt 5 and Python 3 (2016-02-29) blog.scrapinghub.com
Migrate your Kimono Projects to Portia (2016-02-25) blog.scrapinghub.com
Kimono Alternative for Web Scraping - ParseHub (2016-02-20) blog.parsehub.com
Portia: The Open Source Alternative to Kimono Labs (2016-02-17) blog.scrapinghub.com
Python 3 is Coming to Scrapy (2016-02-04) blog.scrapinghub.com
Simple Way to Convert HTML Table Data into PHP Array (2016-01-25) www.codeproject.com
Scrapy Tips from the Pros: Part 1 (2016-01-19) blog.scrapinghub.com
Kimono : Turn websites into structured APIs from your browser in seconds (2016-01-05) www.kimonolabs.com
Finding Common Phrases or Sentences Across Different Documents (2015-12-13) blog.ouseful.info
Create Your Own Web Scraper Using node.js and Get Data in JSON Format (2015-12-13) www.codeproject.com
Acquiring at Digital Scale: Harvesting the StoryCorps.me Collection (2015-12-08) blogs.loc.gov
How Indiana?s Legislative Site Foiled Attempts to Scrape It (2015-08-30) www.programmableweb.com
Number of prescriptions by location (2015-08-28) blog.scraperwiki.com
The four kinds of data PDF (2015-08-11) blog.scraperwiki.com
Fragments ? Scraping Tabular Data from PDFs (2015-08-11) blog.ouseful.info
PDFTables: All the tables in one page, CSV (2015-06-30) blog.scraperwiki.com
Learn to scrape and build a Reddit API in Flask (2015-06-25) www.airpair.com
End User Programming at the Office for National Statistics (2015-06-10) blog.scraperwiki.com
Building knowledge graphs for new technologies with kimono (2015-06-08) blog.kimonolabs.com
How Contactive builds complete user profiles with kimono (2015-06-02) blog.kimonolabs.com
The screen-scraping vs. direct API integration debate: whats the best strategy for your mobile commerce site? (2015-06-01) www.information-age.com
Why Developers Should Avoid Screen Scraping (2015-05-21) openlegacy.com
Announcing PDFTables.com (2015-05-18) blog.scraperwiki.com
Generate high quality potential candidate leads (2015-05-03) blog.kimonolabs.com
The SEC API by Kimono (2015-05-02) kimonolabs.com
Frontera: The Brain Behind the Crawls (2015-04-22) blog.scrapinghub.com
Introducing crawl history error reports (2015-04-21) blog.kimonolabs.com
Are you getting the whole story? Investigating biases in mainstream news (2015-04-17) blog.kimonolabs.com
Scraping Web Pages With R (2015-04-15) blog.ouseful.info
6 PHP Libraries For HTTP And Scraping Websites (2015-04-07) www.developersfeed.com
Scrape Data Visually with Portia and Scrapy Cloud (2015-04-07) blog.scrapinghub.com
Could Kimonolabs#039; March Madness API Have Saved Your Bracket? (2015-03-24) www.programmableweb.com
Open March Madness API (2015-03-18) blog.kimonolabs.com
The History of Scrapinghub (2015-03-16) blog.scrapinghub.com
Scraping, Enriching, and Visualizing Data: A CrowdFlower Meet Up (2015-03-13) www.crowdflower.com
Tomato or Tomahto? How Next Caller uses kimono to add pronunciation to caller ID (2015-03-09) blog.kimonolabs.com
WPAS - Protect Your Data And Prevent web Scraping (Utilities) (2015-03-08) codecanyon.net
Skinfer ? a tool for inferring JSON Schemas (2015-03-04) blog.scrapinghub.com
How Google Pulls Structured Snippets from Websites Tables (2015-03-03) moz.com
Handling JavaScript in Scrapy with Splash (2015-03-02) blog.scrapinghub.com
rvest: R package to scrape web data (2015-03-02) flowingdata.com
Scrapinghub crawls the Deep Web (2015-02-24) blog.scrapinghub.com
What?s News Where? An Analysis of what made the front page of news sources across the globe with MonkeyLearn Entity Extraction (2015-02-24) kimonolabs.wpengine.com
Introducing ScrapyRT: An API for Scrapy spiders (2015-01-22) blog.scrapinghub.com
New Changes to Our Scrapy Cloud Platform (2015-01-22) blog.scrapinghub.com
Why Startups Need an API (2012-04-21) tune.com

If you think there is a link I should have listed here feel free to tweet it at me, or submit as a Github issue. Even though I do this full time, I'm still a one person show, and I miss quite a bit, and depend on my network to help me know what is going on.

API Scraping Organizations

These are the organizations I come across in my research who are doing interesting things in the API space. They could be companies, institutions, government agencies, or any other type of organizational entity. My goal is to aggregate so I can stay in tune with what they are up to and how it impacts the API space.

AlchemyAPI

The product of over 50 person years of engineering effort, AlchemyAPI is a text mining platform providing the most comprehensive set of semantic analysis capabilities in the natural language processing field. Used over 3 billion times every month, AlchemyAPI enables customers to perform large-scale social media monitoring, target advertisements more effectively, track influencers and sentiment within the media, automate content aggregation and recommendation, make more accurate stock trading decisions, enhance business and government intelligence systems, and create smarter applications and services.

Apache Nutch

Nutch is a well matured, production ready Web crawler. Nutch 1.x enables fine grained configuration, relying on Apache Hadoop™ data structures, which are great for batch processing. - See more at: http://nutch.apache.org/index.html#sthash.dehuG4St.dpuf

Apifier

Apify extracts data from websites, crawls lists of URLs and automates workflows on the web. Turn any website into an API in a few minutes!

ConvExtra

Convextra allows you collect valuable data from internet and represents it in easy-to-use CVS format for forther utilization.

Dandelion API

Semantic Text Analytics API: From text to actionable data: extract meaning from unstructured text and put it in context with a simple API.

Diffbot

Never write another web scraper. Diffbot automates web data extraction from any website using AI, computer vision, & machine learning.

Diggernaut

Web scraping just became easy. Extract any website content and turn it into data sets. No programming skills required. We also help larger accounts where we can do custom programming to collect the data to perform business intelligence, competitive pricing, market sentiment, and other forms of analysis.

Embedly

Extract allows you to mine important features within articles???so you can use written content how you want to. Control colors, text, keywords, and entities in any article on your site. Remove extraneous information. As you automate the way you use articles, you???ll gain insight into your users??? preferences, helping you serve them better.

Import.io

Import.io turns the web into a database, releasing the vast potential of data trapped in websites. Allowing you to identify a website, select the data and treat it as a table in your database. In effect transform the data into a row and column format. You can then add more websites to your data set, the same as adding more rows and query in real-time to access the data.

link.fish

Automatically get the information of the websites you are interested in and work with it together with others in real-time. Depending on the page you bookmark link.fish automatically extracts the data you actually care about. No matter if bedrooms for apartments, rating of movies or the cook time of your favorite recipe. Invite other people to your collection to work with them or make it public for everbody to see. All changes made will immediately be visible by everyone.

Moz

Backed by the largest community of SEOs on the planet, Moz builds tools that make SEO, inbound marketing, link building, and content marketing easy. Start your free 30-day trial today!

Mozenda

Over 7 billion web pages harvested since 2007. trusted by thousands of customers worldwide. Stellar account management and customer support.

ParseHub

ParseHub is a free web scraping tool. With our advanced web scraper, extracting data is as easy as clicking the data you need.

PromptCloud

Our web scraping service helps companies get the data they want, the way they need it. We use web crawling, web scraping and data extraction technologies to deliver clean and ready-to-use data.

Saplo

Saplo uses innovative semantic technologies to analyze text in a way that mimic how humans read and evaluate text. Saplo help organisations extract and refine valuable information hidden in large text collections. Saplo have five different services; Entity Tagging, Topic Tags, Related & Similar Articles, Contextual recognition and Sentiment Analysis.

ScrapeLogo

ScrapeLogo has been discovered and developed by Maintop Businesses, originally only for internal purposes. It was coded as an independent service for several Maintop’s B2B projects. When requests from other companies multiplied, a private beta version was launched too. We are now looking for the first beta testers, who would like to show company logos on their websites and help us improve the quality and precision of our algorithm.

ScraperWiki

ScraperWiki the company is now called @sensiblecodeio. ScraperWiki the product is https://t.co/k0MbgXFINE. Also try our other product https://t.co/gp7nY2c3w5.

TextRazor

The service provides analysis of selected text passages to identify named entities and statements of fact with disambiguation to distinguish similar text strings. It applies machine learning algorithms and natural language processing to connect a text sample with a knowledge base and identify known elements and their relationships. API methods support submission of a text sample to be parsed.

WrapAPI

Build an API on top of any website. Turn any website...into a parameterized APIBuild, share, and use APIs made from webpages. Use WrapAPI to scrape sites, build better UIs, and automate online tasks.

If you think there is an organization I should have listed here feel free to tweet it at me, or submit as a Github issue. Even though I do this full time, I'm still a one person show, and I miss quite a bit, and depend on my network to help me know what is going on.

API Scraping Tooling

As I study each API, and API related service, I'm always looking for open source tooling that has been developed around each area of the API life cycle. This is an aggregate of tooling I've come across and aggregated as part of my API testing research.

ASP.Net

Angular

Apache

Apache Tika

The Apache Tika™ toolkit detects and extracts metadata and structured text content from various documents using existing parser libraries.

Bash

C

PowerShellAylien

PowerShell Module to interact with AYLIEN Text Analysis API - a package consisting of eight differen

CSharp

Closure

CoffeeScript

Crawler

DNS

Dart

Delphi

Erlang

Forms

Frontera

aduana

Frontera backend to guide a crawl using PageRank, HITS or other ranking algorithms based on the link

Go

Groovy

HTML

Haskell

JavaScript

Links

Lists

Lua

Matlab

News

Node.js

Objective C

Octave

PHP

PowerShell

Prolog

Python

mdr

A python library detect and extract listing data from HTML page.

RapidMiner

Ruby

Rust

SalesForce

Scala

Security

Spreadsheets

Swift

Visual

Portia

Portia is a tool for visually scraping web sites without any programming knowledge. Just annotate web pages with a point and click editor to indicate what data you want to extract, and portia will learn how to scrape similar pages from the site. Portia has a web based UI served by a Twisted server, so you can install it on almost any modern platform.

If there is a tool that you think should be listed here, let me know by submitting a Github issue or Tweeting a link at me. I'm always looking for new types of tools, and get better at organizing them here and making sense.