iBBC NEWS... WITHOUT THE CRAP null (FALSE) 0 i null (FALSE) 0 i2024-03-09 null (FALSE) 0 i null (FALSE) 0 iDid I mention recently that I love RSS? That it brings me great joy? That I null (FALSE) 0 istart and finish almost every day in my feed reader? Probably. null (FALSE) 0 i null (FALSE) 0 iI used to have a single minor niggle with the BBC News RSS feed: that it null (FALSE) 0 iincluded sports news, which I didn't care about. So I wrote a script that null (FALSE) 0 idownloaded it, stripped sports news, and re-exported the feed for me to null (FALSE) 0 isubscribe to. Magic. null (FALSE) 0 i null (FALSE) 0 IRSS reader showing duplicate copies of the news story "Barbie 2? 'We'd love to,' says Warner Bros boss", and an entry from BBC Sounds. /2024/03/bbc-news-rss-annoyances.png danq.me 70 i null (FALSE) 0 iBut lately - presumably as a result of technical changes at the Beeb's side - null (FALSE) 0 ithis feed has found two fresh ways to annoy me: null (FALSE) 0 i* The feed now re-publishes a story if it gets re-promoted to the front null (FALSE) 0 ipage... but with a different (it appears to get a #0 after it when null (FALSE) 0 ifirst published, a #1 the second time, and so on). In a typical day the feed null (FALSE) 0 ireader might scoop up new stories about once an hour, any by the time I get to null (FALSE) 0 ireading them the same exact story might appear in my reader multiple times. null (FALSE) 0 iUgh. null (FALSE) 0 i* They've started adding iPlayer and BBC Sounds content to the BBC News feed. null (FALSE) 0 iI don't follow BBC News in my feed reader because I want to watch or listen to null (FALSE) 0 ithings. If you do, that's fine, but I don't, and I'd rather filter this null (FALSE) 0 icontent out. null (FALSE) 0 i null (FALSE) 0 iLuckily, I already have a recipe for improving this feed, thanks to my prior null (FALSE) 0 iwork. Let's look at my newly-revised script (also available on GitHub): null (FALSE) 0 i null (FALSE) 0 i#!/usr/bin/env ruby null (FALSE) 0 irequire 'bundler/inline' null (FALSE) 0 i null (FALSE) 0 i# SAMPLE CRONTAB: null (FALSE) 0 i null (FALSE) 0 i# AT 41 MINUTES PAST EACH HOUR, RUN THE SCRIPT AND LOG THE RESULTS null (FALSE) 0 i null (FALSE) 0 i*/20 * * * * ~/BBC-NEWS-RSS-FILTER-SPORT-OUT.RB > null (FALSE) 0 i~/BBC-NEWS-RSS-FILTER-SPORT-OUT.LOG 2>>&1 null (FALSE) 0 i null (FALSE) 0 iDEPENDENCIES: null (FALSE) 0 i null (FALSE) 0 i* OPEN-URI - LOAD REMOTE URL CONTENT EASILY null (FALSE) 0 i null (FALSE) 0 i* NOKOGIRI - PARSE/FILTER XML null (FALSE) 0 i null (FALSE) 0 igemfile do null (FALSE) 0 i source 'https://rubygems.org' null (FALSE) 0 i gem 'nokogiri' null (FALSE) 0 iend null (FALSE) 0 irequire 'open-uri' null (FALSE) 0 i null (FALSE) 0 iREGULAR EXPRESSION DESCRIBING THE GUIDS TO REJECT FROM THE RESULTING RSS FEED null (FALSE) 0 i null (FALSE) 0 iWE WANT TO DROP EVERYTHING FROM THE "SPORT" SECTION OF THE WEBSITE, ALSO ANY null (FALSE) 0 iIPLAYER/SOUNDS LINKS null (FALSE) 0 i null (FALSE) 0 iREJECT_GUIDS_MATCHING = /^https:\/\/www\.bbc\.co\.uk\/(sport|iplayer|sounds)\// null (FALSE) 0 i null (FALSE) 0 iLOAD AND FILTER THE ORIGINAL RSS null (FALSE) 0 i null (FALSE) 0 irss = Nokogiri::XML(open('https://feeds.bbci.co.uk/news/rss.xml?edition=uk')) null (FALSE) 0 irss.css('item').select{|item| item.css('guid').text =~ REJECT_GUIDS_MATCHING null (FALSE) 0 i}.each(&:unlink) null (FALSE) 0 i null (FALSE) 0 iSTRIP THE ANCHORS OFF THE S: BBC NEWS "REPUBLISHES" STORIES BY USING GUIDS null (FALSE) 0 iWITH #0, #1, #2 ETC, WHICH RESULTS IN DUPLICATES IN FEED READERS null (FALSE) 0 i null (FALSE) 0 irss.css('guid').each{|g|g.content=g.content.gsub(/#.*$/,'')} null (FALSE) 0 i null (FALSE) 0 iFile.open( '/www/bbc-news-no-sport.xml', 'w' ){ |f| f.puts(rss.to_s) } null (FALSE) 0 i null (FALSE) 0 iIt's amazing what you can do with Nokogiri and a half dozen lines of Ruby. null (FALSE) 0 i null (FALSE) 0 iThat revised script removes from the feed anything whose suggests it's null (FALSE) 0 isports news or from BBC Sounds or iPlayer, and also strips any "anchor" part null (FALSE) 0 iof the before re-exporting the feed. Much better. (Strictly speaking, null (FALSE) 0 ithis can result in a technically-invalid feed by introducing duplicates, but null (FALSE) 0 iyour feed reader oughta be smart enough to compensate for and ignore that: null (FALSE) 0 imine certainly is!) null (FALSE) 0 i null (FALSE) 0 iYou're free to take and adapt the script to your own needs, or - if you don't null (FALSE) 0 imind being tied to my opinions about what should be in BBC News' RSS feed - null (FALSE) 0 ijust subscribe to my copy at: https://fox.q-t-a.uk/bbc-news-no-sport.xml null (FALSE) 0 i null (FALSE) 0 iLINKS null (FALSE) 0 i null (FALSE) 0 1My very recent blog post about how RSS is better than ActivityPub. /posts/rss-is-better-than-activitypub danq.me 70 1My blog post about using RSS for joy, and not persuing "RSS Zero". /posts/rss-zero danq.me 70 hMy 2021 blog note about starting and ending my days in FreshRSS. URL:https://danq.me/2021/09/29/freshrss-addiction/ (FALSE) 0 hMy blog post about scripting-out sport from BBC News' RSS feed. URL:https://danq.me/2019/05/14/bbc-news-without-the-sport/ (FALSE) 0 hMy Ruby script for filtering out the kinds of BBC News content I don't want to see right out of their RSS feed. URL:https://gist.github.com/Dan-Q/65bb0a9470236520cbe255ff44924ce3 (FALSE) 0 hFreshRSS: my favourite RSS reader URL:https://freshrss.org/ (FALSE) 0 hHttps://fox.q-t-a.uk/bbc-news-no-sport.xml URL:https://fox.q-t-a.uk/bbc-news-no-sport.xml (FALSE) 0 .