Programmatic Patent Searches Using Google’s BigQuery & Public Patent Data
James H. Moeller
Aug 31, 2018
(Approximate Read-Time: 20 minutes, Word Count: 2224.)
 

Update Note Sept 20, 2018: Google’s patents-public-data.patents.publications dataset has been updated as of Sept 18, 2018.

Google’s BigQuery data warehouse is one of the more interesting capabilities within their cloud offering and when it’s combined with their public datasets it can be a powerful platform for some very efficient patent research. While BigQuery was introduced back in 2010, the public patents datasets were not added until October 2017. So, it’s a relatively new patent data resource. Here is the link to the blog post introducing the public patents datasets; https://cloud.google.com/blog/big-data/2017/10/google-patents-public-datasets-connecting-public-paid-and-private-patent-data.

Essentially the combination of BigQuery and the public patent information enables a platform with ready-made datasets that can be queried with SQL (Structured Query Language - https://cloud.google.com/bigquery/docs/reference/standard-sql/). The public availability of these datasets can be a significant time and expense saver compared to crafting your own patent database from USPTO data or subscribing to commercial services. The BigQuery queries can be created and managed through the Google Cloud Portal, a command line tool, or with REST APIs and client libraries for Java, .NET or Python (https://cloud.google.com/bigquery/what-is-bigquery). In addition, the user can add their own private datasets to BigQuery (https://cloud.google.com/bigquery/docs/datasets) and/or access other commercially available datasets, that are available through BigQuery, to augment patent searches by combining that information into their query projects (Data Enrichments: https://console.cloud.google.com/marketplace/details/google_patents_public_datasets/ifi-claims-patent-data-enrichments). BigQuery charges are based on the amount of data processed in the query. The first 1 TB per user per month is free, then beyond that billed at $5.00 per terabyte. User’s datasets are billed at Google’s data storage rate at $0.02 per GB per month with the first 10 GB free each month, and then access to other commercial datasets are dependent on the rates set by the data provider. See Google BigQuery Pricing: https://cloud.google.com/bigquery/pricing#queries.

The first objective of this report is to experiment with BigQuery and Google’s public patent data via some queries that help characterize the datasets and aid in an understanding of writing queries and interpreting results. Then the second objective is to exemplify a simple keyword phrase query as a basis for more sophisticated patent searches.