Web scraping or an official API? How to actually decide
Most teams reach for scraping when an API would have saved them weeks, or pay for an API that doesn't hold the data they need. Here's the decision we walk every client through.
Almost every data project we take on opens with the same fork in the road, whether the client says it out loud or not: pull the data through an official API, or scrape it straight off the website. Most people have already picked a side before the first call. That instinct is usually what costs them a few weeks.
Here is the decision we actually walk through, minus the religion.
First question: does a usable API really exist?
Not "does this company have an API." Most of them do, and there is probably a nice landing page for it. The real question is whether that API gives you the exact fields you need, at the volume you need, for a price that makes sense. Those are three different walls, and projects tend to hit them in that order.
The fields wall
A real estate API might hand you price, bedrooms and bathrooms, then quietly stop short of the agent's phone number, the full photo set, or the price history. That missing slice is often the entire reason you wanted the data in the first place. Before you trust an API, pull one real response and read it field by field. Marketing pages lie by omission. Payloads don't.
The volume wall
Plenty of APIs feel generous right up until you use them in anger. A limit of 100 requests a minute sounds fine until you do the math on a few hundred thousand records and realise the job now takes days, or that the daily cap stops you at lunchtime. Read the rate limits before you fall for the feature list.
The price wall
The free tier is bait. Real pricing usually kicks in at exactly the scale you need, and for some providers the per-call cost at volume works out higher than running your own collection. Sometimes paying is still the right move. Just price it for your real numbers, not the demo.
When an API is the obvious winner
If the data you need fits inside what the API returns, the volume is reasonable, and the cost works, take the API every time. It is more stable, it does not break when the site gets a redesign, and it keeps you on the right side of the terms you agreed to. Good cases:
- You need a clean, well defined slice of data (prices, exchange rates, weather, payments).
- The provider actively maintains the API and keeps it current.
- You want something you can hand off and not think about for months.
When scraping is the honest answer
Scraping earns its place when the website knows things the API will not tell you. That happens more often than people expect:
- The data simply is not in any API (most marketplaces, listings, directories, reviews).
- The API exists but hides the fields you actually came for.
- You need many sources in one shape, and writing to ten different APIs is slower than scraping ten sites into one schema.
- The API pricing at your volume is genuinely worse than collecting it yourself.
Scraping does cost more to keep alive. Sites change, and your scraper has to change with them, so it is a commitment rather than a shortcut. Done properly it is completely worth it. We run scrapers that have quietly delivered millions of clean records a week for years.
The part nobody mentions: it is often both
The best pipelines we build rarely pick a side. A typical setup scrapes the messy stuff the API ignores, calls an API for the clean reference data (say, geocoding an address or pulling a currency rate), then merges both into one tidy dataset. Treat "API or scraping" as "which tool for which field," not a loyalty test.
A thirty second gut check
Open the website. Find the exact data you want on the page. Now read the API docs and check whether every one of those fields is there, at your volume, at a price you can live with. If yes, use the API. If anything is missing, you are scraping, and the only real question left is whether you build it to survive at scale.
That last part, surviving at scale, is its own craft, and it is the subject of the next post. If you are weighing up a data project and are not sure which way to go, that is exactly the kind of thing we are happy to think through with you.
Have a project like this?
If you need a scraper, a data pipeline, or a full product built and maintained properly, we would love to hear about it.
Start a project