This past weekend, I completed the most recent version, 0.3.0, of NewsBot, a series of time-saving automations that contribute to my work reporting for The Times-Independent. At this point, NewsBot is a collection of web scrapers — some of which are private — that regularly capture updates to specific webpages, saving me from keeping a series of bookmarks that I have to check every hour or so.
In the future, NewsBot will likely expand past mere web scraping into other automations, like providing an online dashboard of recent and popular tweets, news articles, Facebook posts, and other items shared about Moab. That dashboard would likely be hosted at the currently nonexistent website newsbot.carterpape.com.
For now, I will explain the idea behind the current iteration of NewsBot with an example.
The Utah Division of Public Utilities is investigating Frontier Communications of Utah following formal complaints of poor quality of service in certain remote areas around Moab, where the company holds a state certificate granting it a monopoly for landline services.
To keep up with the investigation, typically I would have to regularly visit the the docket webpage, where filings in the investigation are hosted, to check for new documents. It would be easy to miss a development in the investigation while we are focused on other stories that are physically happening in Moab.
Instead, NewsBot regularly checks the docket and sends an email whenever new documents appear on it. The bot is currently programmed to load the page every 5 to 15 minutes, so when a new filing appears, we get it very soon after it is available.
Lots of news gets disseminated as updates to a website, online database or other web location. Certainly not most, but plenty. Getting the earliest possible notice on a development in a story like the Frontier investigation sometimes involves closely watching a specific web page.
Another prime example of what makes NewsBot helpful is the tracking of proposals for oil and gas leases on federal lands.
Anyone is allowed to submit an expression of interest to the Bureau of Land Management that they would like to lease a public parcel of land for oil or gas extraction. These nominations get listed on the National Fluids Lease Sale System. There is a science to how nominations are listed and marked in the system, and it lends itself better to a computer figuring it out than a human.
As far as I know, there is no listserv for new nominations, and even if there were, I would want to filter notifications to only get those that concern a select few parcels. Instead, NewsBot can watch for new listings on that site.
NewsBot, as of writing, does not track this list of oil and gas lease nominations, but it is high on the priority list to do so. This functionality will allow us to get extremely timely updates on what lands in and around Moab are being nominated for oil and gas sales.
The list goes on of websites that are useful to track for the most up-to-date information, and the point of this project is to, over time, track more of them, synthesize the information more cohesively, and save time and mental energy for the humans that staff the newspaper.
Do this first: Report the truth.
That is my personal mantra when it comes to reporting. It is good to be first, but the raison d’être of the newspaper — the thing that justifies its existence — is that it reports only that which is true and correct. Truth is first priority. Being first is secondary.
NewsBot collects information in a timely manner, but more importantly, it gets the information straight from the source. In some cases, I can skip a step and go to the humans who populate the websites I am watching for information. Obviously, it is important to connect with and cite human sources in any reporting, but information technology in the form of court dockets, online databases, logs, lists, forums, … They all exist for a reason.
Information systems are tools for humans to use to standardize and make reliable the flow and availability of information. NewsBot is a tool for hooking into these systems and, using a predetermined set of rules, extracting and relaying the parts that are most important.