Web scraping of data and content is a huge problem for businesses across many different industries. Financial data aggregators do it in the banking industry. Human resources groups do it on LinkedIn and other social networks. If there is value in data that is available publicly on the web or slightly hidden behind any public user account, someone will be looking to scrape it. It is a situation that causes headaches for website operators and is a security and privacy risk for end users. Even with the widespread existence of web scraping, there is a solution to help manage the problem, by making web APIs easier to access and put to use.
Given a legitimate, and easier to use way of getting at data and content, 80%+ of the web scrapers will choose the web API over the more difficult and unpredictable web scraping. Not all web scrapers are bad actors, and many are just looking to get at data and content for valid business purposes that bring value to end-users. However, in the absence of APIs, or usable and reliable APIs, many resort to web scraping to get the data and content they need. The presence of simple, easy to use, and reliable web APIs will bring most web scrapers out of the shadows, and potentially turn them into valid customers, where they can conduct business out in the open.
If the data and content are publicly available, or available behind user authentication, scrapers will find a way to get at it, no matter how much work platforms work to deter them. Setting the stage for an adversarial relationship with developers and business operators that could very easily become valuable customers, paying for premium access, and providing value-add applications to platform users. This is where the business of APIs comes in, and opening up web APIs that provide the access to data and content users are looking for, but requiring them to authentication, and provide API keys along with each API call. Allowing all API consumption to be measured much like web traffic is, but in a way that also allows for easier rate limiting, and allowing users to pay for premium levels of access.
API management allows for the monitoring of access to data and content, and the establishment of a feedback loop with consumers in a way that doesn't exist when they are forced to scrape. Sure, not all web scrapers will choose to adopt the API way of doing things, but the legitimate ones will always choose the path of least resistance, and be open to authentication, and welcome the logging and analysis of their usage that API management brings to the table. Opening up the door to an entirely new type of relationship between platform owners, and the consumers of the valuable data and content they generate. A relationship that could begin to define entirely new sources of revenue, data and content products, and potentially valuable applications that bring value to the platform.
If data and content are available via websites, web and mobile applications, it should also be available via APIs. It is the only way we will begin to do away with web scraping, by making web APIs easier to use, and more reliable than scraping alone. If more businesses choose the API path, eventually web scraping will be seen as riskier, more difficult, and something only bad actors do. Shifting the balance in how business is done on the web, bringing more platform activity out of the shadows, allowing for more compliance with a GDPR way of doing business that helps make for a more successful platforms, with a better class of developers and data aggregators, as well as happier and more secure end users--a world where everyone gets what they want.