Forensic Keyword Crawl:

USPTO Weekly Official Gazette Notices Keyword Crawl: As an example of another simple keyword crawl, we’ll set up a crawl for the U.S. Patent and Trademark Office (USPTO) weekly Official Gazette Notices. The USPTO publishes this Official Gazette ever Tuesday in digital form only. It contains bibliographic text and a representative drawing from each patent issued that week as well as the Official Gazette Notices, which is what we’ll crawl. For regular Web browsers, the USPTO provides a user-friendly menu structure for clicking through to the various individual sections of the Gazette and searching on specific information. But we’ll use the keyword crawler to search the Gazette Notices programmatically. Each issue of the Gazette Notices is available on a single page, with a table of contents at the top, at a URL similar to the one below.
 
The “week49” text portion of the URL represents the week of the year. The most recent 52 issues are available on line. So, to access a different week, you’d just change the “week49” text to the week of interest. The other detail to note about this Official Gazette URL is that it’s set-up on a sub-domain of the “uspto.gov” main domain name, namely “patentsgazette.uspto.gov”. So, we’ll use that as our Allowed Domain Names setting to restrict the crawl just to the Official Gazette sub-domain. We’ll also search for the keywords “wireless” and “fees”, just as examples. We won’t use any Deny Domain Names and URL(s) Strings so we’ll keep that field blank, which is its default. The Crawl Depth field will default to 1 and the Pages Crawled Limit field will default to 10 (i.e. the 10-page free crawl).
 
So, to execute the crawl enter the following.
  • Crawl Name: USPTO-week49-gazette-notices
  • Starting URL: https: http://patentsgazette.uspto.gov/week49/OG/TOC.htm
  • Allowed Domain Names: patentsgazette.uspto.gov
  • Deny Domain Names and URL(s) Strings: Keywords: wireless, fees
  • Crawl Depth: 1
  • Pages Crawled Limit: 10
The output results will show all instances where the keyword “wireless” or “fees” matched any text on the Official Gazette page.