Is Web Scraping Illegal? Depends on What the Meaning of the …
Depending on who you ask, web scraping can be loved or hated.
Web scraping has existed for a long time and, in its good form, it’s a key underpinning of the internet. “Good bots” enable, for example, search engines to index web content, price comparison services to save consumers money, and market researchers to gauge sentiment on social media.
“Bad bots, ” however, fetch content from a website with the intent of using it for purposes outside the site owner’s control. Bad bots make up 20 percent of all web traffic and are used to conduct a variety of harmful activities, such as denial of service attacks, competitive data mining, online fraud, account hijacking, data theft, stealing of intellectual property, unauthorized vulnerability scans, spam and digital ad fraud.
So, is it Illegal to Scrape a Website?
So is it legal or illegal? Web scraping and crawling aren’t illegal by themselves. After all, you could scrape or crawl your own website, without a hitch.
Startups love it because it’s a cheap and powerful way to gather data without the need for partnerships. Big companies use web scrapers for their own gain but also don’t want others to use bots against them.
The general opinion on the matter does not seem to matter anymore because in the past 12 months it has become very clear that the federal court system is cracking down more than ever.
Let’s take a look back. Web scraping started in a legal grey area where the use of bots to scrape a website was simply a nuisance. Not much could be done about the practice until in 2000 eBay filed a preliminary injunction against Bidder’s Edge. In the injunction eBay claimed that the use of bots on the site, against the will of the company violated Trespass to Chattels law.
The court granted the injunction because users had to opt in and agree to the terms of service on the site and that a large number of bots could be disruptive to eBay’s computer systems. The lawsuit was settled out of court so it all never came to a head but the legal precedent was set.
In 2001 however, a travel agency sued a competitor who had “scraped” its prices from its Web site to help the rival set its own prices. The judge ruled that the fact that this scraping was not welcomed by the site’s owner was not sufficient to make it “unauthorized access” for the purpose of federal hacking laws.
Two years later the legal standing for eBay v Bidder’s Edge was implicitly overruled in the “Intel v. Hamidi”, a case interpreting California’s common law trespass to chattels. It was the wild west once again. Over the next several years the courts ruled time and time again that simply putting “do not scrape us” in your website terms of service was not enough to warrant a legally binding agreement. For you to enforce that term, a user must explicitly agree or consent to the terms. This left the field wide open for scrapers to do as they wish.
Fast forward a few years and you start seeing a shift in opinion. In 2009 Facebook won one of the first copyright suits against a web scraper. This laid the groundwork for numerous lawsuits that tie any web scraping with a direct copyright violation and very clear monetary damages. The most recent case being AP v Meltwater where the courts stripped what is referred to as fair use on the internet.
Previously, for academic, personal, or information aggregation people could rely on fair use and use web scrapers. The court now gutted the fair use clause that companies had used to defend web scraping. The court determined that even small percentages, sometimes as little as 4. 5% of the content, are significant enough to not fall under fair use. The only caveat the court made was based on the simple fact that this data was available for purchase. Had it not been, it is unclear how they would have ruled. Then a few months back the gauntlet was dropped.
Andrew Auernheimer was convicted of hacking based on the act of web scraping. Although the data was unprotected and publically available via AT&T’s website, the fact that he wrote web scrapers to harvest that data in mass amounted to “brute force attack”. He did not have to consent to terms of service to deploy his bots and conduct the web scraping. The data was not available for purchase. It wasn’t behind a login. He did not even financially gain from the aggregation of the data. Most importantly, it was buggy programing by AT&T that exposed this information in the first place. Yet Andrew was at fault. This isn’t just a civil suit anymore. This charge is a felony violation that is on par with hacking or denial of service attacks and carries up to a 15-year sentence for each charge.
In 2016, Congress passed its first legislation specifically to target bad bots — the Better Online Ticket Sales (BOTS) Act, which bans the use of software that circumvents security measures on ticket seller websites. Automated ticket scalping bots use several techniques to do their dirty work including web scraping that incorporates advanced business logic to identify scalping opportunities, input purchase details into shopping carts, and even resell inventory on secondary markets.
To counteract this type of activity, the BOTS Act:
Prohibits the circumvention of a security measure used to enforce ticket purchasing limits for an event with an attendance capacity of greater than 200 persons.
Prohibits the sale of an event ticket obtained through such a circumvention violation if the seller participated in, had the ability to control, or should have known about it.
Treats violations as unfair or deceptive acts under the Federal Trade Commission Act. The bill provides authority to the FTC and states to enforce against such violations.
In other words, if you’re a venue, organization or ticketing software platform, it is still on you to defend against this fraudulent activity during your major onsales.
The UK seems to have followed the US with its Digital Economy Act 2017 which achieved Royal Assent in April. The Act seeks to protect consumers in a number of ways in an increasingly digital society, including by “cracking down on ticket touts by making it a criminal offence for those that misuse bot technology to sweep up tickets and sell them at inflated prices in the secondary market. ”
In the summer of 2017, LinkedIn sued hiQ Labs, a San Francisco-based startup. hiQ was scraping publicly available LinkedIn profiles to offer clients, according to its website, “a crystal ball that helps you determine skills gaps or turnover risks months ahead of time. ”
You might find it unsettling to think that your public LinkedIn profile could be used against you by your employer.
Yet a judge on Aug. 14, 2017 decided this is okay. Judge Edward Chen of the U. S. District Court in San Francisco agreed with hiQ’s claim in a lawsuit that Microsoft-owned LinkedIn violated antitrust laws when it blocked the startup from accessing such data. He ordered LinkedIn to remove the barriers within 24 hours. LinkedIn has filed to appeal.
The ruling contradicts previous decisions clamping down on web scraping. And it opens a Pandora’s box of questions about social media user privacy and the right of businesses to protect themselves from data hijacking.
There’s also the matter of fairness. LinkedIn spent years creating something of real value. Why should it have to hand it over to the likes of hiQ — paying for the servers and bandwidth to host all that bot traffic on top of their own human users, just so hiQ can ride LinkedIn’s coattails?
I am in the business of blocking bots. Chen’s ruling has sent a chill through those of us in the cybersecurity industry devoted to fighting web-scraping bots.
I think there is a legitimate need for some companies to be able to prevent unwanted web scrapers from accessing their site.
In October of 2017, and as reported by Bloomberg, Ticketmaster sued Prestige Entertainment, claiming it used computer programs to illegally buy as many as 40 percent of the available seats for performances of “Hamilton” in New York and the majority of the tickets Ticketmaster had available for the Mayweather v. Pacquiao fight in Las Vegas two years ago.
Prestige continued to use the illegal bots even after it paid a $3. 35 million to settle New York Attorney General Eric Schneiderman’s probe into the ticket resale industry.
Under that deal, Prestige promised to abstain from using bots, Ticketmaster said in the complaint. Ticketmaster asked for unspecified compensatory and punitive damages and a court order to stop Prestige from using bots.
Are the existing laws too antiquated to deal with the problem? Should new legislation be introduced to provide more clarity? Most sites don’t have any web scraping protections in place. Do the companies have some burden to prevent web scraping?
As the courts try to further decide the legality of scraping, companies are still having their data stolen and the business logic of their websites abused. Instead of looking to the law to eventually solve this technology problem, it’s time to start solving it with anti-bot and anti-scraping technology today.
Get the latest from imperva
The latest news from our experts in the fast-changing world of application, data, and edge security.
Subscribe to our blog
Is Web Scraping Legal ? – WebHarvy
Web Scraping is the technique of automatically extracting data from websites using software/script. Our software, WebHarvy, can be used to easily extract data from any website without any coding/scripting knowledge.
Is it legal to scrape data from websites using software? The answer to this question is not a simple yes or no.
The real question here should be regarding how you plan to use the data which you have extracted from a website (either manually or via using software). Because the data displayed by most website is for public consumption. It is totally legal to copy this information to a file in your computer. But it is regarding how you plan to use this data that you should be careful about. If the data is downloaded for your personal use and analysis, then it is absolutely ethical. But in case you are planning to use it as your own, in your website, in a way which is completely against the interest of the original owner of the data, without attributing the original owner, then it is unethical, illegal.
Also, while extracting data from websites using software, since web scrapers can read and extract data from web pages more quickly than humans, care should be taken that the web scraping process does not affect the performance/bandwidth of the web server in any way. Most web servers will automatically block your IP, preventing further access to its pages, in case this happens.
How to anonymously scrape data from websites?
Update: US federal court rules that web scraping does not violate hacking laws
Scrape Data Anonymously
WebHarvy is an easy-to-use visual web scraper which lets you scrape data anonymously from websites, thereby protecting your privacy. Proxy servers or VPNs can be easily used along with WebHarvy so that you are not connected directly to the web server during data extraction. Also, to minimize the load on web servers, and to avoid detection, there are options to automatically insert pauses & emulate a human user during the web scraping process.
Is Web Scraping Legal? 6 Misunderstandings About Web …
Hey guys, in my experience as a web scraping developer, I have come across so many misconceptions about web scraping. Because the reputation of web scraping has continued to get worse over the years, let’s shed light on some of the biggest misunderstandings about web scraping. Read the article or watch the video then let me know what else you would add to the list!
As web scraping is becoming more and more popular I think we need to get things straight. After a little research on the internet and considering the questions I often get asked, I’ve found that these six misconceptions are the most common about web scraping. If you are totally new to web scraping or you consider leveraging it the followings should be helpful for you.
Web scraping is illegal
Starting with the biggest BS around web scraping. Is web scraping legal? Yes, unless you use it unethically. Web scraping is just like any tool in the world. You can use it for good stuff and you can use it for bad stuff. Web scraping itself is not illegal. As a matter of fact, web scraping – or web crawling, were historically associated with well-known search engines like Google or Bing. These search engines crawl sites and index the web. Because these search engines built trust and brought back traffic and visibility to the sites they crawled, their bots created a favorable view towards web scraping. It is all about how you web scrape and what you do with the data you acquire.
A great example when web scraping can be illegal is when you try to scrape nonpublic data. Nonpublic data can be something that is not reachable for everyone on the web. Maybe you have to login to see the data. In this case web scraping is probably unethical, depending on the context. Also it does matter how nice you are technically when scraping a website. To learn more, I urge you to check out the most frequent legal issues associated with web scraping!
You need to code
Some people think that you need to be an expert programmer to scrape web data. However, there are software solutions out there like that make it so you don’t have to write any code. Also keep in mind that though scraping a website without coding is great but it’s not applicable in many cases. If you have to further process data (cleaning, deduplication, etc.. ) a web scraping software can’t really help you.
Web scraping projects traditionally are known to be labor intensive, leaving you with data that’s incomplete, inaccurate, unreliable, and out of date—while introducing high costs and business risk. ’s Web Data Integration removes this complexity and unifies fragmented data from across the internet into something you can trust.
Web scraping is cheap
Most people and businesses don’t want to deal with web scraping themselves. It is quite frequent that they hire a company that provides web scraping solutions or a freelancer. Now, just to get this straight, web scraping is cheap regarding the ROI it provides in most cases. At the same time, you should know that hiring a full-fledged web scraping service is gonna cost you money. If you do a quick research how much different vendors and freelancers charge for web scraping services you will find a huge difference. It’s because some companies and freelancers with higher rates do provide better services.
Also, you should figure out how complex your project is. For large, long-term projects I suggest hiring a vendor because they usually guarantee you’ll get your data every time on time. Also some web scraping companies provides additional useful services like further processing data to fit into your system. Once you figure out what your web data needs are, see how ’s Managed Data Service can help you solve your most complex, high-scale, high quality needs for web data.
The web scraper works forever
When building a scraper, we want it to work seamlessly forever and just deliver the data we need. Unfortunately it’s not that easy. The biggest challenge in web scraping is that websites are constantly changing. This is the nature of the current state of the internet. To keep up, we should always adjust our scraper so we can trust it delivers reliable and up-to-date data. Now, if you just setup your scraper with a freelancer dude then it’s gonna be a headache when the scraper wrecks(and it will sooner or later unfortunately) because you need to find another freelancer to make it work again or if you’re lucky the one who built the scraper is available at the moment.
You’re in a good position if you’re using a web scraping service because the vendor will take care of all the problems you will not even realize anything. The data is flowing as usual. So just keep in mind that if you need continuous data flowing into your system, you’ll need to watch your scraper and adjust if it wrecks.
Web scraping is all about selecting data from the HTML
This one is a myth often told by programmers who have never built a real world web scraper. I’ve heard this one soo many times. Like “It’s no big deal bro just write a regex and fetch the data from the html and you’re done. ” Sure web scraping is associated with fetching data from a website but the thing is what really matters is how you can use that data to drive your business. Web scraping is much more than getting raw data out of a website.
Web scraping – when done correctly – involves cleaning messy data(because 99% of the time raw data from the web is plain unusable), deduplication, all sort of filtering, integration with your current system, maybe analytics and visualization. It’s complex. Now you might say that hey at the end of the day you just want to see the raw data you don’t need any of the stuff just mentioned. That’s cool. But there’s a chance you’re leaving behind a massive amount of value on the table by not processing the data further.
Any website can be scraped
Website owners can make it really hard for bots to scrape data. There’s a bunch of ways to make a website scraping-proof. Although in reality, there’s no technical shield that could stop a full-fledged scraper from fetching data.
That being said, if the website has lots of scraper traps, captchas and other layers of defense against bots then surely web scraping is not welcomed there. In that case, you should think twice about it before scraping the website. Technically it’s possible to fight all types of bot defenses but do you really want? If the website proactively steps up against scrapers then it’s not a good idea to scrape it anyway.
Web data scraping and crawling aren’t illegal by themselves, but it is important to be ethical while doing it. Don’t tread onto other people’s sites without being considerate. Respect the rules of their site. Consider reading over their Terms of Service, read the file. If you suspect a site is preventing you from crawling, consider contacting the webmaster and asking permission to crawl their site. Don’t burn out their bandwidth–try using a slower crawl rate (like 1 request per 10-15 seconds). Don’t publish any content you find that was not intended to be published.
Web scraping has helped us make the best use of the web with services like Google and Bing search engines. It is a powerful tool that helps businesses leverage the data of the internet, but should be done respectfully.
Of course there are more things I could mention today I just wanted to tell you about the ones that I got the most and feel like these are the most crucial when it comes to leveraging web scraping. Comment below I would be glad to hear your thoughts!
Frequently Asked Questions about web scraping laws
What is web scraping and is it legal?
Web Scraping is the technique of automatically extracting data from websites using software/script. … Because the data displayed by most website is for public consumption. It is totally legal to copy this information to a file in your computer.
Is web scraping public data legal?
Web scraping is illegal Web scraping is just like any tool in the world. You can use it for good stuff and you can use it for bad stuff. Web scraping itself is not illegal. As a matter of fact, web scraping – or web crawling, were historically associated with well-known search engines like Google or Bing.Nov 17, 2017
Is web scraping a crime?
From all the above discussion, it can be concluded that Web Scraping is actually not illegal on its own but one should be ethical while doing it. If done in a good way, Web Scraping can help us to make the best use of the web, the biggest example of which is Google Search Engine.Feb 21, 2020