Is there a legal protection against web scraping?

Original article was published by Syed M. Hamza Tahir on Artificial Intelligence on Medium


Is there a legal protection against web scraping?

https://hackernoon.com/web-scraping-and-the-fight-for-the-open-internet-ly1o2t8i

Web Scraping is a process in which bots extract a large amount of information from public sites. So, is extracting data from a public site without permission theft or just using available resources? Recently, the European Court of Justice drew attention to the problem of intellectual theft through parsing, and also discussed the possibility of site owners using terms and conditions as a way to combat web Scraping. Web Scraping is not a new concept. Recently, the process has become more widespread due to the increased analysis of large amounts of data and the popularity of price comparison sites. Indeed, in 2013 it accounted for 18% of website visitors and 23% of all Internet traffic. In Russia, for example, you can look at the service xmldatafeed.com which carries out not so much parsing of sites, but the subsequent analysis of prices of online stores.

Parsing is not fundamentally a bad process: it has legal uses, stimulates innovation, and gives companies with limited resources access to large amounts of data. Unsurprisingly, many website operators don’t like this as they strive to protect their property rights. Re-parsing uses up bandwidth and negatively impacts websites, leading to network outages.

In the US, website operators have filed various lawsuits against Web scrapers. These included copyright claims, infringement of movable property, and statements based on site rules that claimed the Web scrapers violated the terms of use. In the European Union, operators tried to file claims against them for infringement of intellectual property rights, but the established court practice had not considered such cases before, and the existing base was not enough to apply the measures.

In January 2015, the European Court of Justice ruled, as expected, that when an operator is unable to establish intellectual property rights in its own database, it can rely on terms and conditions to protect against Web Scraping. The court decision can affect a large number of companies whose business models are based on the principle of extracting data from websites and social networks without permission. On the other hand, this is a good deal for large companies looking to protect and / or monetize their data.

The proceedings between Ryanair and PR Aviation:

The European Court was considering the case of PR Aviation. The organization owns a website that people can use to compare ticket prices of various low-cost airlines. Users can also book a flight from which PR Aviation receives a commission. The site relies on information obtained by parsing public data from the websites of low-cost airlines, including Ryanair.

In accordance with the Database Protection Directive (96/9 / EC) and the violation of website terms, Ryanair has sued PR Aviation for violating database rights. The first demanded to prohibit PR Aviation from committing such violations in the future and to impose a monetary fine for the damage caused. PR Aviation was ordered to pay damages.

What are database rights?

Database rights (DB) are a form of unregistered intellectual property rights. In 1996, they were incorporated into the OBD Protection Directive and implemented in national legislation throughout the European Union. The directive helped harmonize the rules applied to protect database copyright in the EU, protect the investment of database manufacturers and safeguard the legitimate interests of users. In fact, the document created the legal Protection environment for the use of databases in the information age. The directive protected the copyright of individual elements from Web Scraping that have a protected expression and introduces a new form of sui generis protection for database elements that are not “original” in terms of the author’s own intellectual creativity.

Thus, the Directive provides for two forms of protection. Article 3 (1) establishes the first right: “Databases, selection and arrangement of materials in which are the author’s own intellectual creation, are protected by copyright.” The second right, described in Article 7, provides protection where “a substantial amount has been invested, qualitatively and / or quantitatively, in obtaining, verifying or presenting the [database] content”.

It is important to note that the Directive describes the limitations of exclusive rights. Article 6 allows legitimate users to make Web Scraping of a copyrighted database without consent in order to access its contents. In addition, Article 8 allows legitimate users of a publicly available database to extract and / or reuse non-essential parts of its materials, if this use does not contradict the usual use of the database or does not unreasonably prejudice the legitimate interests of the author.

Consideration of the case by the Supreme Court of the Netherlands:

Ryanair’s dispute has ended in the Netherlands Supreme Court. PR Aviation has successfully proven that Ryanair’s database was not sufficiently original to attract copyright protection and that the company was underinvesting in compiling its database as it requires sui generis rights. However, the court was still questioning whether Ryanair could claim that PR Aviation violated their terms and parsed and reused material from their website. Notably, the Ryanair website contains the following provisions: “The use of automated systems or software to retrieve data from this website or www.bookryanair.com for commercial purposes (“Web scraping”) is prohibited, unless a third party has contracted with Ryanair a written license agreement that allows access to pricing information,

Ryanair has endeavored to comply with the above condition. PR Aviation argued that the prohibition on Web scraping was not enforceable: pursuant to Article 15 of the Database Protection Directive, any contractual clauses contrary to Articles 6 and 8 would be void. The Supreme Court of the Netherlands was not sure whether Article 15, which did not involve copyright or sui generis protection, was applicable and therefore requested a preliminary ruling from the European Court of Justice.

Decision of the European Court:

By a judgment of the European Court of Justice, the restrictions on rights provided for by the Directive do not apply to OBDs that are not protected by it. Accordingly, Articles 6, 8 and 15 do not prevent the website operator from setting contractual limits on the use of the database without prejudice to applicable national law. The case was sent back to the Dutch court, where a decision will later be made on the possibility of enforcing the terms of the Ryanair website.

It is important for the site owner not only to prohibit parsing according to its terms. The operator must ensure that the terms and conditions described are followed. Earlier, the problems faced in Europe were mentioned: that the conditions published on the network are recognized as fair and reasonable.

The operator requires that, ideally, any user of its site accepts the terms before accessing it. Most sites are reluctant to follow the rule, because it creates inconvenience for the user. A more acceptable option when a link to the terms is reflected on the site. The disadvantage of this method is that there is no active agreement to the conditions (for example, the ability to click on a checkbox). As a result, there is a risk that the website owner will not be able to show the existing contract with the user. A similar issue is dealt with by national legislation, as in this case.

There is no binding case law in the UK about Web Scraping. Although the issue has already been raised in the recent high-profile litigation between Newspaper Licensing Agency and Meltwater (case number [2011] EWCA Civ 890), the Court of Appeal did not consider whether the end-user was bound by the site’s terms of use because, given the nature of the case, it stated, that you should not “enter into the discussion.” The Ryanair court ruling still fails to provide an answer as to whether the parser can rely on the legal Protection use exceptions described in Articles 6 or 8 of the Directive, in case the database owner could have established copyright or sui generis protection directly in the database.

Other requirements:

In addition to claims for infringement of intellectual property rights and breach of contract, site owners may have other legal arguments against parsing. For example, in the US and UK, an operator can sue for infringement of property rights under common law. It may also rely on the Computer Misuse Act 1990, which prohibits unauthorized access to or alteration of computer information. As with the rights to the database, none of the legal arguments have been used in the UK parsing courts. (Similar legislation formed the basis of claims in other countries. For example, in the USA: “Data Extraction: Using the Computer Fraud and Abuse Act to Combat Parsing.”) When working with data provided by parsers, privacy and security should be taken care of. Parsers and users of this data must act very carefully to avoid problems with privacy laws.

Output:

The European Court’s decision in the Ryanair case gives a rather controversial result: under certain circumstances, the owner of the database may have broader, albeit contractual, rights against parsing, if in fact he does not have property rights to the databases. In light of this decision, owners living in the European Union may be asked to amend the terms on their website to prevent Web scraping and protect valuable data. It also remains to be seen how the decision will affect the work of companies that parse data from websites and social media platforms. Any business that uses parsing should consider where it comes from and determine if it is bound by contractual constraints. Then he can make an informed decision.