API Scraping Building Blocks

As I study the API space, and profile the companies, services, and tooling I come across I'm always looking for the common building blocks in use across API operations. These are derived the features, and valuable elements of API operations, and the companies who are servicing this particular area of the API space.

  • Content Harvesting & Extraction
    • Concept Extraction -
    • Summarization -
    • Entity Extraction -
    • Taxonomy & Classification -
    • Relation Extraction -
    • Article Extraction -
    • Discussion Extraction -
    • Date Extraction -
    • Author Extraction -
    • Product Extraction -
    • Related Phrases -
    • Pagination Extraction -
    • Dictionaries -
  • Media Acquisition
    • Image Extraction -
    • Video Extraction -
    • Image Tagging -
    • Image Color Extraction -
    • Face Detection -
    • Barcode Recognition -
    • License Plate Recognition -
  • Document Processing
    • Feed Detection -
    • PDF Extraction -
    • Word Documents -
  • Structured Data Extraction
    • HTML Table Extraction -
    • Spreadsheet Extraction -
    • CSV Files -
    • JSON Files -
    • Microformats Parsing -
    • XML Extraction -
  • Crawling
    • Seed URLs -
    • Pseudo-URLs -
    • Scripting -
    • Conditional Expressions -
    • XPath -
    • RegEx -
    • Injection -
    • Timeout -
  • Machine Learning
    • Semantic Text Analysis -
    • Semantic Similarity -
    • Sentiment Analysis -
    • Emotion Analysis -
  • Content Access
    • Content Latest Index -
    • Historical Index -
    • Storage -
    • Search -
  • DNS
    • Domain Lists -
    • Domain Metadata -
  • Automation & Orchestration
    • API -
    • Webhooks -
  • Analytics
    • Reporting -
    • URL Metrics -
    • Spam Score -
    • Rankings -
  • International
    • Language Detection -
    • Geo IP Address -
  • Utilities
    • Proxies -
    • Cookies -
    • Headers -
    • User Agents -
    • IP Address -
    • Logging -
    • Batch Calls -
    • Scheduler -
    • Low Latency -
  • Integrations
    • Dropbox -
    • Amazon S3 -
    • Google Sheets -
    • Plot.ly -
    • Silk -
    • Tableau -

These building blocks are constantly being added to and reorganized. If there is something you think should be here feel free to let me know. Remember that this represents my living research, and will evolve, expand and actually seed new research areas as I find the time to pay attention to API testing.