One of the projects that I rolled out this year is a web app that spiders and collects news articles from across the Internet. It saves that spidered data to a Firestore database, and uses some Natural Language Processing AI to determine what each articles is about.

It then classifies each article, determines it’s bias, creates some tags, and publishes the highest rated links on the HEADLIN3S web site every day.

I used Vue and JavaScript on the front end – specifically I used the beautiful Material Design Framework with Vuetify to create the user interface. The spiders, NLP, and AI are all Python based – utilizing primarily spaCy, Scrapy, and requests.

The project is hosted on the Google Firebase platform.

If you are looking to get a quick view of what is happening each day, you can do allot worse than using HEADLIN3S. I created it as Progressive Web App so you can install it on your phone straight from the website and use it like you use any other app on your phone.

As of last count the system had inserted over 250,000 unique URL’s in the database. You can see the most popular links for 2020 here if you are interested. Trump was without a doubt the most frequently ( non-stopword) word published in headlines this year followed by covid. This takes into account the lemmatization of all the words indexed.

I’ve learned allot about news headlines by crunching this data. For example:

  • There are 12,668 unique words in the headlines that I have collected. This means that the vast majority of the words (98%) are used at least twice.
  • Non biased and less biased articles get clicked at just about the same rate as politically biased articles. Far Left: 3.13% Leans Left: 3.34% Center: 3.25% Leans Right: 2.97% Far Right: 3.39%
  • Weather.com (2.5) has a stronger left bias than does USAToday (2.02)
  • @thehill has a less than 0.1 bias in our rankings which is even lower than Reuters – even though The Hills coverage is mostly political.

Lately I have been feeding all this data to some new AI ( NLP + GAN) that I have created to try and craft the perfect Headline . My hope is that this system can create headlines for any given article that are descriptive and honest but perform just as well as clickbait headlines.

You can follow the progress of HEADLIN3S on Twitter.

1 comment

Leave a comment

Your email address will not be published. Required fields are marked *