|
|
README - sfeed - RSS and Atom parser |
|
|
 |
git clone git://git.codemadness.org/sfeed (git://git.codemadness.org) |
|
|
 |
Log |
|
|
 |
Files |
|
|
 |
Refs |
|
|
 |
README |
|
|
 |
LICENSE |
|
|
|
--- |
|
|
|
README (34641B) |
|
|
|
--- |
|
|
|
1 sfeed |
|
|
|
2 ----- |
|
|
|
3 |
|
|
|
4 RSS and Atom parser (and some format programs). |
|
|
|
5 |
|
|
|
6 It converts RSS or Atom feeds from XML to a TAB-separated file. There are |
|
|
|
7 formatting programs included to convert this TAB-separated format to various |
|
|
|
8 other formats. There are also some programs and scripts included to import and |
|
|
|
9 export OPML and to fetch, filter, merge and order feed items. |
|
|
|
10 |
|
|
|
11 |
|
|
|
12 Build and install |
|
|
|
13 ----------------- |
|
|
|
14 |
|
|
|
15 $ make |
|
|
|
16 # make install |
|
|
|
17 |
|
|
|
18 |
|
|
|
19 To build sfeed without sfeed_curses set SFEED_CURSES to an empty string: |
|
|
|
20 |
|
|
|
21 $ make SFEED_CURSES="" |
|
|
|
22 # make SFEED_CURSES="" install |
|
|
|
23 |
|
|
|
24 |
|
|
|
25 To change the theme for sfeed_curses you can set SFEED_THEME. See the themes/ |
|
|
|
26 directory for the theme names. |
|
|
|
27 |
|
|
|
28 $ make SFEED_THEME="templeos" |
|
|
|
29 # make SFEED_THEME="templeos" install |
|
|
|
30 |
|
|
|
31 |
|
|
|
32 Usage |
|
|
|
33 ----- |
|
|
|
34 |
|
|
|
35 Initial setup: |
|
|
|
36 |
|
|
|
37 mkdir -p "$HOME/.sfeed/feeds" |
|
|
|
38 cp sfeedrc.example "$HOME/.sfeed/sfeedrc" |
|
|
|
39 |
|
|
|
40 Edit the sfeedrc(5) configuration file and change any RSS/Atom feeds. This file |
|
|
|
41 is included and evaluated as a shellscript for sfeed_update, so its functions |
|
|
|
42 and behaviour can be overridden: |
|
|
|
43 |
|
|
|
44 $EDITOR "$HOME/.sfeed/sfeedrc" |
|
|
|
45 |
|
|
|
46 or you can import existing OPML subscriptions using sfeed_opml_import(1): |
|
|
|
47 |
|
|
|
48 sfeed_opml_import < file.opml > "$HOME/.sfeed/sfeedrc" |
|
|
|
49 |
|
|
|
50 an example to export from an other RSS/Atom reader called newsboat and import |
|
|
|
51 for sfeed_update: |
|
|
|
52 |
|
|
|
53 newsboat -e | sfeed_opml_import > "$HOME/.sfeed/sfeedrc" |
|
|
|
54 |
|
|
|
55 an example to export from an other RSS/Atom reader called rss2email (3.x+) and |
|
|
|
56 import for sfeed_update: |
|
|
|
57 |
|
|
|
58 r2e opmlexport | sfeed_opml_import > "$HOME/.sfeed/sfeedrc" |
|
|
|
59 |
|
|
|
60 Update feeds, this script merges the new items, see sfeed_update(1) for more |
|
|
|
61 information what it can do: |
|
|
|
62 |
|
|
|
63 sfeed_update |
|
|
|
64 |
|
|
|
65 Format feeds: |
|
|
|
66 |
|
|
|
67 Plain-text list: |
|
|
|
68 |
|
|
|
69 sfeed_plain $HOME/.sfeed/feeds/* > "$HOME/.sfeed/feeds.txt" |
|
|
|
70 |
|
|
|
71 HTML view (no frames), copy style.css for a default style: |
|
|
|
72 |
|
|
|
73 cp style.css "$HOME/.sfeed/style.css" |
|
|
|
74 sfeed_html $HOME/.sfeed/feeds/* > "$HOME/.sfeed/feeds.html" |
|
|
|
75 |
|
|
|
76 HTML view with the menu as frames, copy style.css for a default style: |
|
|
|
77 |
|
|
|
78 mkdir -p "$HOME/.sfeed/frames" |
|
|
|
79 cp style.css "$HOME/.sfeed/frames/style.css" |
|
|
|
80 cd "$HOME/.sfeed/frames" && sfeed_frames $HOME/.sfeed/feeds/* |
|
|
|
81 |
|
|
|
82 To automatically update your feeds periodically and format them in a way you |
|
|
|
83 like you can make a wrapper script and add it as a cronjob. |
|
|
|
84 |
|
|
|
85 Most protocols are supported because curl(1) is used by default and also proxy |
|
|
|
86 settings from the environment (such as the $http_proxy environment variable) |
|
|
|
87 are used. |
|
|
|
88 |
|
|
|
89 The sfeed(1) program itself is just a parser that parses XML data from stdin |
|
|
|
90 and is therefore network protocol-agnostic. It can be used with HTTP, HTTPS, |
|
|
|
91 Gopher, SSH, etc. |
|
|
|
92 |
|
|
|
93 See the section "Usage and examples" below and the man-pages for more |
|
|
|
94 information how to use sfeed(1) and the additional tools. |
|
|
|
95 |
|
|
|
96 |
|
|
|
97 Dependencies |
|
|
|
98 ------------ |
|
|
|
99 |
|
|
|
100 - C compiler (C99). |
|
|
|
101 - libc (recommended: C99 and POSIX >= 200809). |
|
|
|
102 |
|
|
|
103 |
|
|
|
104 Optional dependencies |
|
|
|
105 --------------------- |
|
|
|
106 |
|
|
|
107 - POSIX make(1) for the Makefile. |
|
|
|
108 - POSIX sh(1), |
|
|
|
109 used by sfeed_update(1) and sfeed_opml_export(1). |
|
|
|
110 - POSIX utilities such as awk(1) and sort(1), |
|
|
|
111 used by sfeed_content(1), sfeed_markread(1), sfeed_opml_export(1) and |
|
|
|
112 sfeed_update(1). |
|
|
|
113 - curl(1) binary: https://curl.haxx.se/ , |
|
|
|
114 used by sfeed_update(1), but can be replaced with any tool like wget(1), |
|
|
|
115 OpenBSD ftp(1) or hurl(1): https://git.codemadness.org/hurl/ |
|
|
|
116 - iconv(1) command-line utilities, |
|
|
|
117 used by sfeed_update(1). If the text in your RSS/Atom feeds are already UTF-8 |
|
|
|
118 encoded then you don't need this. For a minimal iconv implementation: |
|
|
|
119 https://git.etalabs.net/cgit/noxcuse/tree/src/iconv.c |
|
|
|
120 - xargs with support for the -P and -0 option, |
|
|
|
121 used by sfeed_update(1). |
|
|
|
122 - mandoc for documentation: https://mdocml.bsd.lv/ |
|
|
|
123 - curses (typically ncurses), otherwise see minicurses.h, |
|
|
|
124 used by sfeed_curses(1). |
|
|
|
125 - a terminal (emulator) supporting UTF-8 and the used capabilities, |
|
|
|
126 used by sfeed_curses(1). |
|
|
|
127 |
|
|
|
128 |
|
|
|
129 Optional run-time dependencies for sfeed_curses |
|
|
|
130 ----------------------------------------------- |
|
|
|
131 |
|
|
|
132 - xclip for yanking the URL or enclosure. See $SFEED_YANKER to change it. |
|
|
|
133 - xdg-open, used as a plumber by default. See $SFEED_PLUMBER to change it. |
|
|
|
134 - awk, used by the sfeed_content and sfeed_markread script. |
|
|
|
135 See the ENVIRONMENT VARIABLES section in the man page to change it. |
|
|
|
136 - lynx, used by the sfeed_content script to convert HTML content. |
|
|
|
137 See the ENVIRONMENT VARIABLES section in the man page to change it. |
|
|
|
138 |
|
|
|
139 |
|
|
|
140 Formats supported |
|
|
|
141 ----------------- |
|
|
|
142 |
|
|
|
143 sfeed supports a subset of XML 1.0 and a subset of: |
|
|
|
144 |
|
|
|
145 - Atom 1.0 (RFC 4287): https://datatracker.ietf.org/doc/html/rfc4287 |
|
|
|
146 - Atom 0.3 (draft, historic). |
|
|
|
147 - RSS 0.90+. |
|
|
|
148 - RDF (when used with RSS). |
|
|
|
149 - MediaRSS extensions (media:). |
|
|
|
150 - Dublin Core extensions (dc:). |
|
|
|
151 |
|
|
|
152 Other formats like JSON Feed, twtxt or certain RSS/Atom extensions are |
|
|
|
153 supported by converting them to RSS/Atom or to the sfeed(5) format directly. |
|
|
|
154 |
|
|
|
155 |
|
|
|
156 OS tested |
|
|
|
157 --------- |
|
|
|
158 |
|
|
|
159 - Linux, |
|
|
|
160 compilers: clang, gcc, chibicc, cproc, lacc, pcc, scc, tcc, |
|
|
|
161 libc: glibc, musl. |
|
|
|
162 - OpenBSD (clang, gcc). |
|
|
|
163 - NetBSD (with NetBSD curses). |
|
|
|
164 - FreeBSD |
|
|
|
165 - DragonFlyBSD |
|
|
|
166 - GNU/Hurd |
|
|
|
167 - Illumos (OpenIndiana). |
|
|
|
168 - Windows (cygwin gcc + mintty, mingw). |
|
|
|
169 - HaikuOS |
|
|
|
170 - SerenityOS |
|
|
|
171 - FreeDOS (djgpp, Open Watcom). |
|
|
|
172 - FUZIX (sdcc -mz80, with the sfeed parser program). |
|
|
|
173 |
|
|
|
174 |
|
|
|
175 Architectures tested |
|
|
|
176 -------------------- |
|
|
|
177 |
|
|
|
178 amd64, ARM, aarch64, HPPA, i386, MIPS32-BE, RISCV64, SPARC64, Z80. |
|
|
|
179 |
|
|
|
180 |
|
|
|
181 Files |
|
|
|
182 ----- |
|
|
|
183 |
|
|
|
184 sfeed - Read XML RSS or Atom feed data from stdin. Write feed data |
|
|
|
185 in TAB-separated format to stdout. |
|
|
|
186 sfeed_atom - Format feed data (TSV) to an Atom feed. |
|
|
|
187 sfeed_content - View item content, for use with sfeed_curses. |
|
|
|
188 sfeed_curses - Format feed data (TSV) to a curses interface. |
|
|
|
189 sfeed_frames - Format feed data (TSV) to HTML file(s) with frames. |
|
|
|
190 sfeed_gopher - Format feed data (TSV) to Gopher files. |
|
|
|
191 sfeed_html - Format feed data (TSV) to HTML. |
|
|
|
192 sfeed_json - Format feed data (TSV) to JSON Feed. |
|
|
|
193 sfeed_opml_export - Generate an OPML XML file from a sfeedrc config file. |
|
|
|
194 sfeed_opml_import - Generate a sfeedrc config file from an OPML XML file. |
|
|
|
195 sfeed_markread - Mark items as read/unread, for use with sfeed_curses. |
|
|
|
196 sfeed_mbox - Format feed data (TSV) to mbox. |
|
|
|
197 sfeed_plain - Format feed data (TSV) to a plain-text list. |
|
|
|
198 sfeed_twtxt - Format feed data (TSV) to a twtxt feed. |
|
|
|
199 sfeed_update - Update feeds and merge items. |
|
|
|
200 sfeed_web - Find URLs to RSS/Atom feed from a webpage. |
|
|
|
201 sfeed_xmlenc - Detect character-set encoding from a XML stream. |
|
|
|
202 sfeedrc.example - Example config file. Can be copied to $HOME/.sfeed/sfeedrc. |
|
|
|
203 style.css - Example stylesheet to use with sfeed_html(1) and |
|
|
|
204 sfeed_frames(1). |
|
|
|
205 |
|
|
|
206 |
|
|
|
207 Files read at runtime by sfeed_update(1) |
|
|
|
208 ---------------------------------------- |
|
|
|
209 |
|
|
|
210 sfeedrc - Config file. This file is evaluated as a shellscript in |
|
|
|
211 sfeed_update(1). |
|
|
|
212 |
|
|
|
213 At least the following functions can be overridden per feed: |
|
|
|
214 |
|
|
|
215 - fetch: to use wget(1), OpenBSD ftp(1) or an other download program. |
|
|
|
216 - filter: to filter on fields. |
|
|
|
217 - merge: to change the merge logic. |
|
|
|
218 - order: to change the sort order. |
|
|
|
219 |
|
|
|
220 See also the sfeedrc(5) man page documentation for more details. |
|
|
|
221 |
|
|
|
222 The feeds() function is called to process the feeds. The default feed() |
|
|
|
223 function is executed concurrently as a background job in your sfeedrc(5) config |
|
|
|
224 file to make updating faster. The variable maxjobs can be changed to limit or |
|
|
|
225 increase the amount of concurrent jobs (8 by default). |
|
|
|
226 |
|
|
|
227 |
|
|
|
228 Files written at runtime by sfeed_update(1) |
|
|
|
229 ------------------------------------------- |
|
|
|
230 |
|
|
|
231 feedname - TAB-separated format containing all items per feed. The |
|
|
|
232 sfeed_update(1) script merges new items with this file. |
|
|
|
233 The format is documented in sfeed(5). |
|
|
|
234 |
|
|
|
235 |
|
|
|
236 File format |
|
|
|
237 ----------- |
|
|
|
238 |
|
|
|
239 man 5 sfeed |
|
|
|
240 man 5 sfeedrc |
|
|
|
241 man 1 sfeed |
|
|
|
242 |
|
|
|
243 |
|
|
|
244 Usage and examples |
|
|
|
245 ------------------ |
|
|
|
246 |
|
|
|
247 Find RSS/Atom feed URLs from a webpage: |
|
|
|
248 |
|
|
|
249 url="https://codemadness.org"; curl -L -s "$url" | sfeed_web "$url" |
|
|
|
250 |
|
|
|
251 output example: |
|
|
|
252 |
|
|
|
253 https://codemadness.org/atom.xml application/atom+xml |
|
|
|
254 https://codemadness.org/atom_content.xml application/atom+xml |
|
|
|
255 |
|
|
|
256 - - - |
|
|
|
257 |
|
|
|
258 Make sure your sfeedrc config file exists, see the sfeedrc.example file. To |
|
|
|
259 update your feeds (configfile argument is optional): |
|
|
|
260 |
|
|
|
261 sfeed_update "configfile" |
|
|
|
262 |
|
|
|
263 Format the feeds files: |
|
|
|
264 |
|
|
|
265 # Plain-text list. |
|
|
|
266 sfeed_plain $HOME/.sfeed/feeds/* > $HOME/.sfeed/feeds.txt |
|
|
|
267 # HTML view (no frames), copy style.css for a default style. |
|
|
|
268 sfeed_html $HOME/.sfeed/feeds/* > $HOME/.sfeed/feeds.html |
|
|
|
269 # HTML view with the menu as frames, copy style.css for a default style. |
|
|
|
270 mkdir -p somedir && cd somedir && sfeed_frames $HOME/.sfeed/feeds/* |
|
|
|
271 |
|
|
|
272 View formatted output in your browser: |
|
|
|
273 |
|
|
|
274 $BROWSER "$HOME/.sfeed/feeds.html" |
|
|
|
275 |
|
|
|
276 View formatted output in your editor: |
|
|
|
277 |
|
|
|
278 $EDITOR "$HOME/.sfeed/feeds.txt" |
|
|
|
279 |
|
|
|
280 - - - |
|
|
|
281 |
|
|
|
282 View formatted output in a curses interface. The interface has a look inspired |
|
|
|
283 by the mutt mail client. It has a sidebar panel for the feeds, a panel with a |
|
|
|
284 listing of the items and a small statusbar for the selected item/URL. Some |
|
|
|
285 functions like searching and scrolling are integrated in the interface itself. |
|
|
|
286 |
|
|
|
287 Just like the other format programs included in sfeed you can run it like this: |
|
|
|
288 |
|
|
|
289 sfeed_curses ~/.sfeed/feeds/* |
|
|
|
290 |
|
|
|
291 ... or by reading from stdin: |
|
|
|
292 |
|
|
|
293 sfeed_curses < ~/.sfeed/feeds/xkcd |
|
|
|
294 |
|
|
|
295 By default sfeed_curses marks the items of the last day as new/bold. This limit |
|
|
|
296 might be overridden by setting the environment variable $SFEED_NEW_AGE to the |
|
|
|
297 desired maximum in seconds. To manage read/unread items in a different way a |
|
|
|
298 plain-text file with a list of the read URLs can be used. To enable this |
|
|
|
299 behaviour the path to this file can be specified by setting the environment |
|
|
|
300 variable $SFEED_URL_FILE to the URL file: |
|
|
|
301 |
|
|
|
302 export SFEED_URL_FILE="$HOME/.sfeed/urls" |
|
|
|
303 [ -f "$SFEED_URL_FILE" ] || touch "$SFEED_URL_FILE" |
|
|
|
304 sfeed_curses ~/.sfeed/feeds/* |
|
|
|
305 |
|
|
|
306 It then uses the shellscript "sfeed_markread" to process the read and unread |
|
|
|
307 items. |
|
|
|
308 |
|
|
|
309 - - - |
|
|
|
310 |
|
|
|
311 Example script to view feed items in a vertical list/menu in dmenu(1). It opens |
|
|
|
312 the selected URL in the browser set in $BROWSER: |
|
|
|
313 |
|
|
|
314 #!/bin/sh |
|
|
|
315 url=$(sfeed_plain "$HOME/.sfeed/feeds/"* | dmenu -l 35 -i | \ |
|
|
|
316 sed -n 's@^.* \([a-zA-Z]*://\)\(.*\)$@\1\2@p') |
|
|
|
317 test -n "${url}" && $BROWSER "${url}" |
|
|
|
318 |
|
|
|
319 dmenu can be found at: https://git.suckless.org/dmenu/ |
|
|
|
320 |
|
|
|
321 - - - |
|
|
|
322 |
|
|
|
323 Generate a sfeedrc config file from your exported list of feeds in OPML |
|
|
|
324 format: |
|
|
|
325 |
|
|
|
326 sfeed_opml_import < opmlfile.xml > $HOME/.sfeed/sfeedrc |
|
|
|
327 |
|
|
|
328 - - - |
|
|
|
329 |
|
|
|
330 Export an OPML file of your feeds from a sfeedrc config file (configfile |
|
|
|
331 argument is optional): |
|
|
|
332 |
|
|
|
333 sfeed_opml_export configfile > myfeeds.opml |
|
|
|
334 |
|
|
|
335 - - - |
|
|
|
336 |
|
|
|
337 The filter function can be overridden in your sfeedrc file. This allows |
|
|
|
338 filtering items per feed. It can be used to shorten URLs, filter away |
|
|
|
339 advertisements, strip tracking parameters and more. |
|
|
|
340 |
|
|
|
341 # filter fields. |
|
|
|
342 # filter(name, url) |
|
|
|
343 filter() { |
|
|
|
344 case "$1" in |
|
|
|
345 "tweakers") |
|
|
|
346 awk -F '\t' 'BEGIN { OFS = "\t"; } |
|
|
|
347 # skip ads. |
|
|
|
348 $2 ~ /^ADV:/ { |
|
|
|
349 next; |
|
|
|
350 } |
|
|
|
351 # shorten link. |
|
|
|
352 { |
|
|
|
353 if (match($3, /^https:\/\/tweakers\.net\/[a-z]+\/[0-9]+\//)) { |
|
|
|
354 $3 = substr($3, RSTART, RLENGTH); |
|
|
|
355 } |
|
|
|
356 print $0; |
|
|
|
357 }';; |
|
|
|
358 "yt BSDNow") |
|
|
|
359 # filter only BSD Now from channel. |
|
|
|
360 awk -F '\t' '$2 ~ / \| BSD Now/';; |
|
|
|
361 *) |
|
|
|
362 cat;; |
|
|
|
363 esac | \ |
|
|
|
364 # replace youtube links with embed links. |
|
|
|
365 sed 's@www.youtube.com/watch?v=@www.youtube.com/embed/@g' | \ |
|
|
|
366 |
|
|
|
367 awk -F '\t' 'BEGIN { OFS = "\t"; } |
|
|
|
368 function filterlink(s) { |
|
|
|
369 # protocol must start with http, https or gopher. |
|
|
|
370 if (match(s, /^(http|https|gopher):\/\//) == 0) { |
|
|
|
371 return ""; |
|
|
|
372 } |
|
|
|
373 |
|
|
|
374 # shorten feedburner links. |
|
|
|
375 if (match(s, /^(http|https):\/\/[^\/]+\/~r\/.*\/~3\/[^\/]+\//)) { |
|
|
|
376 s = substr($3, RSTART, RLENGTH); |
|
|
|
377 } |
|
|
|
378 |
|
|
|
379 # strip tracking parameters |
|
|
|
380 # urchin, facebook, piwik, webtrekk and generic. |
|
|
|
381 gsub(/\?(ad|campaign|fbclid|pk|tm|utm|wt)_([^&]+)/, "?", s); |
|
|
|
382 gsub(/&(ad|campaign|fbclid|pk|tm|utm|wt)_([^&]+)/, "", s); |
|
|
|
383 |
|
|
|
384 gsub(/\?&/, "?", s); |
|
|
|
385 gsub(/[\?&]+$/, "", s); |
|
|
|
386 |
|
|
|
387 return s |
|
|
|
388 } |
|
|
|
389 { |
|
|
|
390 $3 = filterlink($3); # link |
|
|
|
391 $8 = filterlink($8); # enclosure |
|
|
|
392 |
|
|
|
393 # try to remove tracking pixels: <img/> tags with 1px width or height. |
|
|
|
394 gsub("<img[^>]*(width|height)[[:space:]]*=[[:space:]]*[\"'"'"' ]?1[\"'"'"' ]?[^0-9>]+[^>]*>", "", $4); |
|
|
|
395 |
|
|
|
396 print $0; |
|
|
|
397 }' |
|
|
|
398 } |
|
|
|
399 |
|
|
|
400 - - - |
|
|
|
401 |
|
|
|
402 Aggregate feeds. This filters new entries (maximum one day old) and sorts them |
|
|
|
403 by newest first. Prefix the feed name in the title. Convert the TSV output data |
|
|
|
404 to an Atom XML feed (again): |
|
|
|
405 |
|
|
|
406 #!/bin/sh |
|
|
|
407 cd ~/.sfeed/feeds/ || exit 1 |
|
|
|
408 |
|
|
|
409 awk -F '\t' -v "old=$(($(date +'%s') - 86400))" ' |
|
|
|
410 BEGIN { OFS = "\t"; } |
|
|
|
411 int($1) >= old { |
|
|
|
412 $2 = "[" FILENAME "] " $2; |
|
|
|
413 print $0; |
|
|
|
414 }' * | \ |
|
|
|
415 sort -k1,1rn | \ |
|
|
|
416 sfeed_atom |
|
|
|
417 |
|
|
|
418 - - - |
|
|
|
419 |
|
|
|
420 To have a "tail(1) -f"-like FIFO stream filtering for new unique feed items and |
|
|
|
421 showing them as plain-text per line similar to sfeed_plain(1): |
|
|
|
422 |
|
|
|
423 Create a FIFO: |
|
|
|
424 |
|
|
|
425 fifo="/tmp/sfeed_fifo" |
|
|
|
426 mkfifo "$fifo" |
|
|
|
427 |
|
|
|
428 On the reading side: |
|
|
|
429 |
|
|
|
430 # This keeps track of unique lines so might consume much memory. |
|
|
|
431 # It tries to reopen the $fifo after 1 second if it fails. |
|
|
|
432 while :; do cat "$fifo" || sleep 1; done | awk '!x[$0]++' |
|
|
|
433 |
|
|
|
434 On the writing side: |
|
|
|
435 |
|
|
|
436 feedsdir="$HOME/.sfeed/feeds/" |
|
|
|
437 cd "$feedsdir" || exit 1 |
|
|
|
438 test -p "$fifo" || exit 1 |
|
|
|
439 |
|
|
|
440 # 1 day is old news, don't write older items. |
|
|
|
441 awk -F '\t' -v "old=$(($(date +'%s') - 86400))" ' |
|
|
|
442 BEGIN { OFS = "\t"; } |
|
|
|
443 int($1) >= old { |
|
|
|
444 $2 = "[" FILENAME "] " $2; |
|
|
|
445 print $0; |
|
|
|
446 }' * | sort -k1,1n | sfeed_plain | cut -b 3- > "$fifo" |
|
|
|
447 |
|
|
|
448 cut -b is used to trim the "N " prefix of sfeed_plain(1). |
|
|
|
449 |
|
|
|
450 - - - |
|
|
|
451 |
|
|
|
452 For some podcast feed the following code can be used to filter the latest |
|
|
|
453 enclosure URL (probably some audio file): |
|
|
|
454 |
|
|
|
455 awk -F '\t' 'BEGIN { latest = 0; } |
|
|
|
456 length($8) { |
|
|
|
457 ts = int($1); |
|
|
|
458 if (ts > latest) { |
|
|
|
459 url = $8; |
|
|
|
460 latest = ts; |
|
|
|
461 } |
|
|
|
462 } |
|
|
|
463 END { if (length(url)) { print url; } }' |
|
|
|
464 |
|
|
|
465 ... or on a file already sorted from newest to oldest: |
|
|
|
466 |
|
|
|
467 awk -F '\t' '$8 { print $8; exit }' |
|
|
|
468 |
|
|
|
469 - - - |
|
|
|
470 |
|
|
|
471 Over time your feeds file might become quite big. You can archive items of a |
|
|
|
472 feed from (roughly) the last week by doing for example: |
|
|
|
473 |
|
|
|
474 awk -F '\t' -v "old=$(($(date +'%s') - 604800))" 'int($1) > old' < feed > feed.new |
|
|
|
475 mv feed feed.bak |
|
|
|
476 mv feed.new feed |
|
|
|
477 |
|
|
|
478 This could also be run weekly in a crontab to archive the feeds. Like throwing |
|
|
|
479 away old newspapers. It keeps the feeds list tidy and the formatted output |
|
|
|
480 small. |
|
|
|
481 |
|
|
|
482 - - - |
|
|
|
483 |
|
|
|
484 Convert mbox to separate maildirs per feed and filter duplicate messages using the |
|
|
|
485 fdm program. |
|
|
|
486 fdm is available at: https://github.com/nicm/fdm |
|
|
|
487 |
|
|
|
488 fdm config file (~/.sfeed/fdm.conf): |
|
|
|
489 |
|
|
|
490 set unmatched-mail keep |
|
|
|
491 |
|
|
|
492 account "sfeed" mbox "%[home]/.sfeed/mbox" |
|
|
|
493 $cachepath = "%[home]/.sfeed/fdm.cache" |
|
|
|
494 cache "${cachepath}" |
|
|
|
495 $maildir = "%[home]/feeds/" |
|
|
|
496 |
|
|
|
497 # Check if message is in the cache by Message-ID. |
|
|
|
498 match case "^Message-ID: (.*)" in headers |
|
|
|
499 action { |
|
|
|
500 tag "msgid" value "%1" |
|
|
|
501 } |
|
|
|
502 continue |
|
|
|
503 |
|
|
|
504 # If it is in the cache, stop. |
|
|
|
505 match matched and in-cache "${cachepath}" key "%[msgid]" |
|
|
|
506 action { |
|
|
|
507 keep |
|
|
|
508 } |
|
|
|
509 |
|
|
|
510 # Not in the cache, process it and add to cache. |
|
|
|
511 match case "^X-Feedname: (.*)" in headers |
|
|
|
512 action { |
|
|
|
513 # Store to local maildir. |
|
|
|
514 maildir "${maildir}%1" |
|
|
|
515 |
|
|
|
516 add-to-cache "${cachepath}" key "%[msgid]" |
|
|
|
517 keep |
|
|
|
518 } |
|
|
|
519 |
|
|
|
520 Now run: |
|
|
|
521 |
|
|
|
522 $ sfeed_mbox ~/.sfeed/feeds/* > ~/.sfeed/mbox |
|
|
|
523 $ fdm -f ~/.sfeed/fdm.conf fetch |
|
|
|
524 |
|
|
|
525 Now you can view feeds in mutt(1) for example. |
|
|
|
526 |
|
|
|
527 - - - |
|
|
|
528 |
|
|
|
529 Read from mbox and filter duplicate messages using the fdm program and deliver |
|
|
|
530 it to a SMTP server. This works similar to the rss2email program. |
|
|
|
531 fdm is available at: https://github.com/nicm/fdm |
|
|
|
532 |
|
|
|
533 fdm config file (~/.sfeed/fdm.conf): |
|
|
|
534 |
|
|
|
535 set unmatched-mail keep |
|
|
|
536 |
|
|
|
537 account "sfeed" mbox "%[home]/.sfeed/mbox" |
|
|
|
538 $cachepath = "%[home]/.sfeed/fdm.cache" |
|
|
|
539 cache "${cachepath}" |
|
|
|
540 |
|
|
|
541 # Check if message is in the cache by Message-ID. |
|
|
|
542 match case "^Message-ID: (.*)" in headers |
|
|
|
543 action { |
|
|
|
544 tag "msgid" value "%1" |
|
|
|
545 } |
|
|
|
546 continue |
|
|
|
547 |
|
|
|
548 # If it is in the cache, stop. |
|
|
|
549 match matched and in-cache "${cachepath}" key "%[msgid]" |
|
|
|
550 action { |
|
|
|
551 keep |
|
|
|
552 } |
|
|
|
553 |
|
|
|
554 # Not in the cache, process it and add to cache. |
|
|
|
555 match case "^X-Feedname: (.*)" in headers |
|
|
|
556 action { |
|
|
|
557 # Connect to a SMTP server and attempt to deliver the |
|
|
|
558 # mail to it. |
|
|
|
559 # Of course change the server and e-mail below. |
|
|
|
560 smtp server "codemadness.org" to "hiltjo@codemadness.org" |
|
|
|
561 |
|
|
|
562 add-to-cache "${cachepath}" key "%[msgid]" |
|
|
|
563 keep |
|
|
|
564 } |
|
|
|
565 |
|
|
|
566 Now run: |
|
|
|
567 |
|
|
|
568 $ sfeed_mbox ~/.sfeed/feeds/* > ~/.sfeed/mbox |
|
|
|
569 $ fdm -f ~/.sfeed/fdm.conf fetch |
|
|
|
570 |
|
|
|
571 Now you can view feeds in mutt(1) for example. |
|
|
|
572 |
|
|
|
573 - - - |
|
|
|
574 |
|
|
|
575 Convert mbox to separate maildirs per feed and filter duplicate messages using |
|
|
|
576 procmail(1). |
|
|
|
577 |
|
|
|
578 procmail_maildirs.sh file: |
|
|
|
579 |
|
|
|
580 maildir="$HOME/feeds" |
|
|
|
581 feedsdir="$HOME/.sfeed/feeds" |
|
|
|
582 procmailconfig="$HOME/.sfeed/procmailrc" |
|
|
|
583 |
|
|
|
584 # message-id cache to prevent duplicates. |
|
|
|
585 mkdir -p "${maildir}/.cache" |
|
|
|
586 |
|
|
|
587 if ! test -r "${procmailconfig}"; then |
|
|
|
588 printf "Procmail configuration file \"%s\" does not exist or is not readable.\n" "${procmailconfig}" >&2 |
|
|
|
589 echo "See procmailrc.example for an example." >&2 |
|
|
|
590 exit 1 |
|
|
|
591 fi |
|
|
|
592 |
|
|
|
593 find "${feedsdir}" -type f -exec printf '%s\n' {} \; | while read -r d; do |
|
|
|
594 name=$(basename "${d}") |
|
|
|
595 mkdir -p "${maildir}/${name}/cur" |
|
|
|
596 mkdir -p "${maildir}/${name}/new" |
|
|
|
597 mkdir -p "${maildir}/${name}/tmp" |
|
|
|
598 printf 'Mailbox %s\n' "${name}" |
|
|
|
599 sfeed_mbox "${d}" | formail -s procmail "${procmailconfig}" |
|
|
|
600 done |
|
|
|
601 |
|
|
|
602 Procmailrc(5) file: |
|
|
|
603 |
|
|
|
604 # Example for use with sfeed_mbox(1). |
|
|
|
605 # The header X-Feedname is used to split into separate maildirs. It is |
|
|
|
606 # assumed this name is sane. |
|
|
|
607 |
|
|
|
608 MAILDIR="$HOME/feeds/" |
|
|
|
609 |
|
|
|
610 :0 |
|
|
|
611 * ^X-Feedname: \/.* |
|
|
|
612 { |
|
|
|
613 FEED="$MATCH" |
|
|
|
614 |
|
|
|
615 :0 Wh: "msgid_$FEED.lock" |
|
|
|
616 | formail -D 1024000 ".cache/msgid_$FEED.cache" |
|
|
|
617 |
|
|
|
618 :0 |
|
|
|
619 "$FEED"/ |
|
|
|
620 } |
|
|
|
621 |
|
|
|
622 Now run: |
|
|
|
623 |
|
|
|
624 $ procmail_maildirs.sh |
|
|
|
625 |
|
|
|
626 Now you can view feeds in mutt(1) for example. |
|
|
|
627 |
|
|
|
628 - - - |
|
|
|
629 |
|
|
|
630 The fetch function can be overridden in your sfeedrc file. This allows to |
|
|
|
631 replace the default curl(1) for sfeed_update with any other client to fetch the |
|
|
|
632 RSS/Atom data or change the default curl options: |
|
|
|
633 |
|
|
|
634 # fetch a feed via HTTP/HTTPS etc. |
|
|
|
635 # fetch(name, url, feedfile) |
|
|
|
636 fetch() { |
|
|
|
637 hurl -m 1048576 -t 15 "$2" 2>/dev/null |
|
|
|
638 } |
|
|
|
639 |
|
|
|
640 - - - |
|
|
|
641 |
|
|
|
642 Caching, incremental data updates and bandwidth-saving |
|
|
|
643 |
|
|
|
644 For servers that support it some incremental updates and bandwidth-saving can |
|
|
|
645 be done by using the "ETag" HTTP header. |
|
|
|
646 |
|
|
|
647 Create a directory for storing the ETags per feed: |
|
|
|
648 |
|
|
|
649 mkdir -p ~/.sfeed/etags/ |
|
|
|
650 |
|
|
|
651 The curl ETag options (--etag-save and --etag-compare) can be used to store and |
|
|
|
652 send the previous ETag header value. curl version 7.73+ is recommended for it |
|
|
|
653 to work properly. |
|
|
|
654 |
|
|
|
655 The curl -z option can be used to send the modification date of a local file as |
|
|
|
656 a HTTP "If-Modified-Since" request header. The server can then respond if the |
|
|
|
657 data is modified or not or respond with only the incremental data. |
|
|
|
658 |
|
|
|
659 The curl --compressed option can be used to indicate the client supports |
|
|
|
660 decompression. Because RSS/Atom feeds are textual XML content this generally |
|
|
|
661 compresses very well. |
|
|
|
662 |
|
|
|
663 These options can be set by overriding the fetch() function in the sfeedrc |
|
|
|
664 file: |
|
|
|
665 |
|
|
|
666 # fetch(name, url, feedfile) |
|
|
|
667 fetch() { |
|
|
|
668 etag="$HOME/.sfeed/etags/$(basename "$3")" |
|
|
|
669 curl \ |
|
|
|
670 -L --max-redirs 0 -H "User-Agent:" -f -s -m 15 \ |
|
|
|
671 --compressed \ |
|
|
|
672 --etag-save "${etag}" --etag-compare "${etag}" \ |
|
|
|
673 -z "${etag}" \ |
|
|
|
674 "$2" 2>/dev/null |
|
|
|
675 } |
|
|
|
676 |
|
|
|
677 These options can come at a cost of some privacy, because it exposes |
|
|
|
678 additional metadata from the previous request. |
|
|
|
679 |
|
|
|
680 - - - |
|
|
|
681 |
|
|
|
682 CDNs blocking requests due to a missing HTTP User-Agent request header |
|
|
|
683 |
|
|
|
684 sfeed_update will not send the "User-Agent" header by default for privacy |
|
|
|
685 reasons. Some CDNs like Cloudflare or websites like Reddit.com don't like this |
|
|
|
686 and will block such HTTP requests. |
|
|
|
687 |
|
|
|
688 A custom User-Agent can be set by using the curl -H option, like so: |
|
|
|
689 |
|
|
|
690 curl -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0' |
|
|
|
691 |
|
|
|
692 The above example string pretends to be a Windows 10 (x86-64) machine running |
|
|
|
693 Firefox 78. |
|
|
|
694 |
|
|
|
695 - - - |
|
|
|
696 |
|
|
|
697 Page redirects |
|
|
|
698 |
|
|
|
699 For security and efficiency reasons by default redirects are not allowed and |
|
|
|
700 are treated as an error. |
|
|
|
701 |
|
|
|
702 For example to prevent hijacking an unencrypted http:// to https:// redirect or |
|
|
|
703 to not add time of an unnecessary page redirect each time. It is encouraged to |
|
|
|
704 use the final redirected URL in the sfeedrc config file. |
|
|
|
705 |
|
|
|
706 If you want to ignore this advise you can override the fetch() function in the |
|
|
|
707 sfeedrc file and change the curl options "-L --max-redirs 0". |
|
|
|
708 |
|
|
|
709 - - - |
|
|
|
710 |
|
|
|
711 Shellscript to handle URLs and enclosures in parallel using xargs -P. |
|
|
|
712 |
|
|
|
713 This can be used to download and process URLs for downloading podcasts, |
|
|
|
714 webcomics, download and convert webpages, mirror videos, etc. It uses a |
|
|
|
715 plain-text cache file for remembering processed URLs. The match patterns are |
|
|
|
716 defined in the shellscript fetch() function and in the awk script and can be |
|
|
|
717 modified to handle items differently depending on their context. |
|
|
|
718 |
|
|
|
719 The arguments for the script are files in the sfeed(5) format. If no file |
|
|
|
720 arguments are specified then the data is read from stdin. |
|
|
|
721 |
|
|
|
722 #!/bin/sh |
|
|
|
723 # sfeed_download: downloader for URLs and enclosures in sfeed(5) files. |
|
|
|
724 # Dependencies: awk, curl, flock, xargs (-P), yt-dlp. |
|
|
|
725 |
|
|
|
726 cachefile="${SFEED_CACHEFILE:-$HOME/.sfeed/downloaded_urls}" |
|
|
|
727 jobs="${SFEED_JOBS:-4}" |
|
|
|
728 lockfile="${HOME}/.sfeed/sfeed_download.lock" |
|
|
|
729 |
|
|
|
730 # log(feedname, s, status) |
|
|
|
731 log() { |
|
|
|
732 if [ "$1" != "-" ]; then |
|
|
|
733 s="[$1] $2" |
|
|
|
734 else |
|
|
|
735 s="$2" |
|
|
|
736 fi |
|
|
|
737 printf '[%s]: %s: %s\n' "$(date +'%H:%M:%S')" "${s}" "$3" |
|
|
|
738 } |
|
|
|
739 |
|
|
|
740 # fetch(url, feedname) |
|
|
|
741 fetch() { |
|
|
|
742 case "$1" in |
|
|
|
743 *youtube.com*) |
|
|
|
744 yt-dlp "$1";; |
|
|
|
745 *.flac|*.ogg|*.m3u|*.m3u8|*.m4a|*.mkv|*.mp3|*.mp4|*.wav|*.webm) |
|
|
|
746 # allow 2 redirects, hide User-Agent, connect timeout is 15 seconds. |
|
|
|
747 curl -O -L --max-redirs 2 -H "User-Agent:" -f -s --connect-timeout 15 "$1";; |
|
|
|
748 esac |
|
|
|
749 } |
|
|
|
750 |
|
|
|
751 # downloader(url, title, feedname) |
|
|
|
752 downloader() { |
|
|
|
753 url="$1" |
|
|
|
754 title="$2" |
|
|
|
755 feedname="${3##*/}" |
|
|
|
756 |
|
|
|
757 msg="${title}: ${url}" |
|
|
|
758 |
|
|
|
759 # download directory. |
|
|
|
760 if [ "${feedname}" != "-" ]; then |
|
|
|
761 mkdir -p "${feedname}" |
|
|
|
762 if ! cd "${feedname}"; then |
|
|
|
763 log "${feedname}" "${msg}: ${feedname}" "DIR FAIL" >&2 |
|
|
|
764 return 1 |
|
|
|
765 fi |
|
|
|
766 fi |
|
|
|
767 |
|
|
|
768 log "${feedname}" "${msg}" "START" |
|
|
|
769 if fetch "${url}" "${feedname}"; then |
|
|
|
770 log "${feedname}" "${msg}" "OK" |
|
|
|
771 |
|
|
|
772 # append it safely in parallel to the cachefile on a |
|
|
|
773 # successful download. |
|
|
|
774 (flock 9 || exit 1 |
|
|
|
775 printf '%s\n' "${url}" >> "${cachefile}" |
|
|
|
776 ) 9>"${lockfile}" |
|
|
|
777 else |
|
|
|
778 log "${feedname}" "${msg}" "FAIL" >&2 |
|
|
|
779 return 1 |
|
|
|
780 fi |
|
|
|
781 return 0 |
|
|
|
782 } |
|
|
|
783 |
|
|
|
784 if [ "${SFEED_DOWNLOAD_CHILD}" = "1" ]; then |
|
|
|
785 # Downloader helper for parallel downloading. |
|
|
|
786 # Receives arguments: $1 = URL, $2 = title, $3 = feed filename or "-". |
|
|
|
787 # It should write the URI to the cachefile if it is successful. |
|
|
|
788 downloader "$1" "$2" "$3" |
|
|
|
789 exit $? |
|
|
|
790 fi |
|
|
|
791 |
|
|
|
792 # ...else parent mode: |
|
|
|
793 |
|
|
|
794 tmp="$(mktemp)" || exit 1 |
|
|
|
795 trap "rm -f ${tmp}" EXIT |
|
|
|
796 |
|
|
|
797 [ -f "${cachefile}" ] || touch "${cachefile}" |
|
|
|
798 cat "${cachefile}" > "${tmp}" |
|
|
|
799 echo >> "${tmp}" # force it to have one line for awk. |
|
|
|
800 |
|
|
|
801 LC_ALL=C awk -F '\t' ' |
|
|
|
802 # fast prefilter what to download or not. |
|
|
|
803 function filter(url, field, feedname) { |
|
|
|
804 u = tolower(url); |
|
|
|
805 return (match(u, "youtube\\.com") || |
|
|
|
806 match(u, "\\.(flac|ogg|m3u|m3u8|m4a|mkv|mp3|mp4|wav|webm)$")); |
|
|
|
807 } |
|
|
|
808 function download(url, field, title, filename) { |
|
|
|
809 if (!length(url) || urls[url] || !filter(url, field, filename)) |
|
|
|
810 return; |
|
|
|
811 # NUL-separated for xargs -0. |
|
|
|
812 printf("%s%c%s%c%s%c", url, 0, title, 0, filename, 0); |
|
|
|
813 urls[url] = 1; # print once |
|
|
|
814 } |
|
|
|
815 { |
|
|
|
816 FILENR += (FNR == 1); |
|
|
|
817 } |
|
|
|
818 # lookup table from cachefile which contains downloaded URLs. |
|
|
|
819 FILENR == 1 { |
|
|
|
820 urls[$0] = 1; |
|
|
|
821 } |
|
|
|
822 # feed file(s). |
|
|
|
823 FILENR != 1 { |
|
|
|
824 download($3, 3, $2, FILENAME); # link |
|
|
|
825 download($8, 8, $2, FILENAME); # enclosure |
|
|
|
826 } |
|
|
|
827 ' "${tmp}" "${@:--}" | \ |
|
|
|
828 SFEED_DOWNLOAD_CHILD="1" xargs -r -0 -L 3 -P "${jobs}" "$(readlink -f "$0")" |
|
|
|
829 |
|
|
|
830 - - - |
|
|
|
831 |
|
|
|
832 Shellscript to export existing newsboat cached items from sqlite3 to the sfeed |
|
|
|
833 TSV format. |
|
|
|
834 |
|
|
|
835 #!/bin/sh |
|
|
|
836 # Export newsbeuter/newsboat cached items from sqlite3 to the sfeed TSV format. |
|
|
|
837 # The data is split per file per feed with the name of the newsboat title/url. |
|
|
|
838 # It writes the URLs of the read items line by line to a "urls" file. |
|
|
|
839 # |
|
|
|
840 # Dependencies: sqlite3, awk. |
|
|
|
841 # |
|
|
|
842 # Usage: create some directory to store the feeds then run this script. |
|
|
|
843 |
|
|
|
844 # newsboat cache.db file. |
|
|
|
845 cachefile="$HOME/.newsboat/cache.db" |
|
|
|
846 test -n "$1" && cachefile="$1" |
|
|
|
847 |
|
|
|
848 # dump data. |
|
|
|
849 # .mode ascii: Columns/rows delimited by 0x1F and 0x1E |
|
|
|
850 # get the first fields in the order of the sfeed(5) format. |
|
|
|
851 sqlite3 "$cachefile" <<!EOF | |
|
|
|
852 .headers off |
|
|
|
853 .mode ascii |
|
|
|
854 .output |
|
|
|
855 SELECT |
|
|
|
856 i.pubDate, i.title, i.url, i.content, i.content_mime_type, |
|
|
|
857 i.guid, i.author, i.enclosure_url, |
|
|
|
858 f.rssurl AS rssurl, f.title AS feedtitle, i.unread |
|
|
|
859 -- i.id, i.enclosure_type, i.enqueued, i.flags, i.deleted, i.base |
|
|
|
860 FROM rss_feed f |
|
|
|
861 INNER JOIN rss_item i ON i.feedurl = f.rssurl |
|
|
|
862 ORDER BY |
|
|
|
863 i.feedurl ASC, i.pubDate DESC; |
|
|
|
864 .quit |
|
|
|
865 !EOF |
|
|
|
866 # convert to sfeed(5) TSV format. |
|
|
|
867 LC_ALL=C awk ' |
|
|
|
868 BEGIN { |
|
|
|
869 FS = "\x1f"; |
|
|
|
870 RS = "\x1e"; |
|
|
|
871 } |
|
|
|
872 # normal non-content fields. |
|
|
|
873 function field(s) { |
|
|
|
874 gsub("^[[:space:]]*", "", s); |
|
|
|
875 gsub("[[:space:]]*$", "", s); |
|
|
|
876 gsub("[[:space:]]", " ", s); |
|
|
|
877 gsub("[[:cntrl:]]", "", s); |
|
|
|
878 return s; |
|
|
|
879 } |
|
|
|
880 # content field. |
|
|
|
881 function content(s) { |
|
|
|
882 gsub("^[[:space:]]*", "", s); |
|
|
|
883 gsub("[[:space:]]*$", "", s); |
|
|
|
884 # escape chars in content field. |
|
|
|
885 gsub("\\\\", "\\\\", s); |
|
|
|
886 gsub("\n", "\\n", s); |
|
|
|
887 gsub("\t", "\\t", s); |
|
|
|
888 return s; |
|
|
|
889 } |
|
|
|
890 function feedname(feedurl, feedtitle) { |
|
|
|
891 if (feedtitle == "") { |
|
|
|
892 gsub("/", "_", feedurl); |
|
|
|
893 return feedurl; |
|
|
|
894 } |
|
|
|
895 gsub("/", "_", feedtitle); |
|
|
|
896 return feedtitle; |
|
|
|
897 } |
|
|
|
898 { |
|
|
|
899 fname = feedname($9, $10); |
|
|
|
900 if (!feed[fname]++) { |
|
|
|
901 print "Writing file: \"" fname "\" (title: " $10 ", url: " $9 ")" > "/dev/stderr"; |
|
|
|
902 } |
|
|
|
903 |
|
|
|
904 contenttype = field($5); |
|
|
|
905 if (contenttype == "") |
|
|
|
906 contenttype = "html"; |
|
|
|
907 else if (index(contenttype, "/html") || index(contenttype, "/xhtml")) |
|
|
|
908 contenttype = "html"; |
|
|
|
909 else |
|
|
|
910 contenttype = "plain"; |
|
|
|
911 |
|
|
|
912 print $1 "\t" field($2) "\t" field($3) "\t" content($4) "\t" \ |
|
|
|
913 contenttype "\t" field($6) "\t" field($7) "\t" field($8) "\t" \ |
|
|
|
914 > fname; |
|
|
|
915 |
|
|
|
916 # write URLs of the read items to a file line by line. |
|
|
|
917 if ($11 == "0") { |
|
|
|
918 print $3 > "urls"; |
|
|
|
919 } |
|
|
|
920 }' |
|
|
|
921 |
|
|
|
922 - - - |
|
|
|
923 |
|
|
|
924 Progress indicator |
|
|
|
925 ------------------ |
|
|
|
926 |
|
|
|
927 The below sfeed_update wrapper script counts the amount of feeds in a sfeedrc |
|
|
|
928 config. It then calls sfeed_update and pipes the output lines to a function |
|
|
|
929 that counts the current progress. It writes the total progress to stderr. |
|
|
|
930 Alternative: pv -l -s totallines |
|
|
|
931 |
|
|
|
932 #!/bin/sh |
|
|
|
933 # Progress indicator script. |
|
|
|
934 |
|
|
|
935 # Pass lines as input to stdin and write progress status to stderr. |
|
|
|
936 # progress(totallines) |
|
|
|
937 progress() { |
|
|
|
938 total="$(($1 + 0))" # must be a number, no divide by zero. |
|
|
|
939 test "${total}" -le 0 -o "$1" != "${total}" && return |
|
|
|
940 LC_ALL=C awk -v "total=${total}" ' |
|
|
|
941 { |
|
|
|
942 counter++; |
|
|
|
943 percent = (counter * 100) / total; |
|
|
|
944 printf("\033[K") > "/dev/stderr"; # clear EOL |
|
|
|
945 print $0; |
|
|
|
946 printf("[%s/%s] %.0f%%\r", counter, total, percent) > "/dev/stderr"; |
|
|
|
947 fflush(); # flush all buffers per line. |
|
|
|
948 } |
|
|
|
949 END { |
|
|
|
950 printf("\033[K") > "/dev/stderr"; |
|
|
|
951 }' |
|
|
|
952 } |
|
|
|
953 |
|
|
|
954 # Counts the feeds from the sfeedrc config. |
|
|
|
955 countfeeds() { |
|
|
|
956 count=0 |
|
|
|
957 . "$1" |
|
|
|
958 feed() { |
|
|
|
959 count=$((count + 1)) |
|
|
|
960 } |
|
|
|
961 feeds |
|
|
|
962 echo "${count}" |
|
|
|
963 } |
|
|
|
964 |
|
|
|
965 config="${1:-$HOME/.sfeed/sfeedrc}" |
|
|
|
966 total=$(countfeeds "${config}") |
|
|
|
967 sfeed_update "${config}" 2>&1 | progress "${total}" |
|
|
|
968 |
|
|
|
969 - - - |
|
|
|
970 |
|
|
|
971 Counting unread and total items |
|
|
|
972 ------------------------------- |
|
|
|
973 |
|
|
|
974 It can be useful to show the counts of unread items, for example in a |
|
|
|
975 windowmanager or statusbar. |
|
|
|
976 |
|
|
|
977 The below example script counts the items of the last day in the same way the |
|
|
|
978 formatting tools do: |
|
|
|
979 |
|
|
|
980 #!/bin/sh |
|
|
|
981 # Count the new items of the last day. |
|
|
|
982 LC_ALL=C awk -F '\t' -v "old=$(($(date +'%s') - 86400))" ' |
|
|
|
983 { |
|
|
|
984 total++; |
|
|
|
985 } |
|
|
|
986 int($1) >= old { |
|
|
|
987 totalnew++; |
|
|
|
988 } |
|
|
|
989 END { |
|
|
|
990 print "New: " totalnew; |
|
|
|
991 print "Total: " total; |
|
|
|
992 }' ~/.sfeed/feeds/* |
|
|
|
993 |
|
|
|
994 The below example script counts the unread items using the sfeed_curses URL |
|
|
|
995 file: |
|
|
|
996 |
|
|
|
997 #!/bin/sh |
|
|
|
998 # Count the unread and total items from feeds using the URL file. |
|
|
|
999 LC_ALL=C awk -F '\t' ' |
|
|
|
1000 # URL file: amount of fields is 1. |
|
|
|
1001 NF == 1 { |
|
|
|
1002 u[$0] = 1; # lookup table of URLs. |
|
|
|
1003 next; |
|
|
|
1004 } |
|
|
|
1005 # feed file: check by URL or id. |
|
|
|
1006 { |
|
|
|
1007 total++; |
|
|
|
1008 if (length($3)) { |
|
|
|
1009 if (u[$3]) |
|
|
|
1010 read++; |
|
|
|
1011 } else if (length($6)) { |
|
|
|
1012 if (u[$6]) |
|
|
|
1013 read++; |
|
|
|
1014 } |
|
|
|
1015 } |
|
|
|
1016 END { |
|
|
|
1017 print "Unread: " (total - read); |
|
|
|
1018 print "Total: " total; |
|
|
|
1019 }' ~/.sfeed/urls ~/.sfeed/feeds/* |
|
|
|
1020 |
|
|
|
1021 - - - |
|
|
|
1022 |
|
|
|
1023 sfeed.c: adding new XML tags or sfeed(5) fields to the parser |
|
|
|
1024 ------------------------------------------------------------- |
|
|
|
1025 |
|
|
|
1026 sfeed.c contains definitions to parse XML tags and map them to sfeed(5) TSV |
|
|
|
1027 fields. Parsed RSS and Atom tag names are first stored as a TagId, which is a |
|
|
|
1028 number. This TagId is then mapped to the output field index. |
|
|
|
1029 |
|
|
|
1030 Steps to modify the code: |
|
|
|
1031 |
|
|
|
1032 * Add a new TagId enum for the tag. |
|
|
|
1033 |
|
|
|
1034 * (optional) Add a new FeedField* enum for the new output field or you can map |
|
|
|
1035 it to an existing field. |
|
|
|
1036 |
|
|
|
1037 * Add the new XML tag name to the array variable of parsed RSS or Atom |
|
|
|
1038 tags: rsstags[] or atomtags[]. |
|
|
|
1039 |
|
|
|
1040 These must be defined in alphabetical order, because a binary search is used |
|
|
|
1041 which uses the strcasecmp() function. |
|
|
|
1042 |
|
|
|
1043 * Add the parsed TagId to the output field in the array variable fieldmap[]. |
|
|
|
1044 |
|
|
|
1045 When another tag is also mapped to the same output field then the tag with |
|
|
|
1046 the highest TagId number value overrides the mapped field: the order is from |
|
|
|
1047 least important to high. |
|
|
|
1048 |
|
|
|
1049 * If this defined tag is just using the inner data of the XML tag, then this |
|
|
|
1050 definition is enough. If it for example has to parse a certain attribute you |
|
|
|
1051 have to add a check for the TagId to the xmlattr() callback function. |
|
|
|
1052 |
|
|
|
1053 * (optional) Print the new field in the printfields() function. |
|
|
|
1054 |
|
|
|
1055 Below is a patch example to add the MRSS "media:content" tag as a new field: |
|
|
|
1056 |
|
|
|
1057 diff --git a/sfeed.c b/sfeed.c |
|
|
|
1058 --- a/sfeed.c |
|
|
|
1059 +++ b/sfeed.c |
|
|
|
1060 @@ -50,7 +50,7 @@ enum TagId { |
|
|
|
1061 RSSTagGuidPermalinkTrue, |
|
|
|
1062 /* must be defined after GUID, because it can be a link (isPermaLink) */ |
|
|
|
1063 RSSTagLink, |
|
|
|
1064 - RSSTagEnclosure, |
|
|
|
1065 + RSSTagMediaContent, RSSTagEnclosure, |
|
|
|
1066 RSSTagAuthor, RSSTagDccreator, |
|
|
|
1067 RSSTagCategory, |
|
|
|
1068 /* Atom */ |
|
|
|
1069 @@ -81,7 +81,7 @@ typedef struct field { |
|
|
|
1070 enum { |
|
|
|
1071 FeedFieldTime = 0, FeedFieldTitle, FeedFieldLink, FeedFieldContent, |
|
|
|
1072 FeedFieldId, FeedFieldAuthor, FeedFieldEnclosure, FeedFieldCategory, |
|
|
|
1073 - FeedFieldLast |
|
|
|
1074 + FeedFieldMediaContent, FeedFieldLast |
|
|
|
1075 }; |
|
|
|
1076 |
|
|
|
1077 typedef struct feedcontext { |
|
|
|
1078 @@ -137,6 +137,7 @@ static const FeedTag rsstags[] = { |
|
|
|
1079 { STRP("enclosure"), RSSTagEnclosure }, |
|
|
|
1080 { STRP("guid"), RSSTagGuid }, |
|
|
|
1081 { STRP("link"), RSSTagLink }, |
|
|
|
1082 + { STRP("media:content"), RSSTagMediaContent }, |
|
|
|
1083 { STRP("media:description"), RSSTagMediaDescription }, |
|
|
|
1084 { STRP("pubdate"), RSSTagPubdate }, |
|
|
|
1085 { STRP("title"), RSSTagTitle } |
|
|
|
1086 @@ -180,6 +181,7 @@ static const int fieldmap[TagLast] = { |
|
|
|
1087 [RSSTagGuidPermalinkFalse] = FeedFieldId, |
|
|
|
1088 [RSSTagGuidPermalinkTrue] = FeedFieldId, /* special-case: both a link and an id */ |
|
|
|
1089 [RSSTagLink] = FeedFieldLink, |
|
|
|
1090 + [RSSTagMediaContent] = FeedFieldMediaContent, |
|
|
|
1091 [RSSTagEnclosure] = FeedFieldEnclosure, |
|
|
|
1092 [RSSTagAuthor] = FeedFieldAuthor, |
|
|
|
1093 [RSSTagDccreator] = FeedFieldAuthor, |
|
|
|
1094 @@ -677,6 +679,8 @@ printfields(void) |
|
|
|
1095 string_print_uri(&ctx.fields[FeedFieldEnclosure].str); |
|
|
|
1096 putchar(FieldSeparator); |
|
|
|
1097 string_print_trimmed_multi(&ctx.fields[FeedFieldCategory].str); |
|
|
|
1098 + putchar(FieldSeparator); |
|
|
|
1099 + string_print_trimmed(&ctx.fields[FeedFieldMediaContent].str); |
|
|
|
1100 putchar('\n'); |
|
|
|
1101 |
|
|
|
1102 if (ferror(stdout)) /* check for errors but do not flush */ |
|
|
|
1103 @@ -718,7 +722,7 @@ xmlattr(XMLParser *p, const char *t, size_t tl, const char *n, size_t nl, |
|
|
|
1104 } |
|
|
|
1105 |
|
|
|
1106 if (ctx.feedtype == FeedTypeRSS) { |
|
|
|
1107 - if (ctx.tag.id == RSSTagEnclosure && |
|
|
|
1108 + if ((ctx.tag.id == RSSTagEnclosure || ctx.tag.id == RSSTagMediaContent) && |
|
|
|
1109 isattr(n, nl, STRP("url"))) { |
|
|
|
1110 string_append(&tmpstr, v, vl); |
|
|
|
1111 } else if (ctx.tag.id == RSSTagGuid && |
|
|
|
1112 |
|
|
|
1113 - - - |
|
|
|
1114 |
|
|
|
1115 Running custom commands inside the sfeed_curses program |
|
|
|
1116 ------------------------------------------------------- |
|
|
|
1117 |
|
|
|
1118 Running commands inside the sfeed_curses program can be useful for example to |
|
|
|
1119 sync items or mark all items across all feeds as read. It can be comfortable to |
|
|
|
1120 have a keybind for this inside the program to perform a scripted action and |
|
|
|
1121 then reload the feeds by sending the signal SIGHUP. |
|
|
|
1122 |
|
|
|
1123 In the input handling code you can then add a case: |
|
|
|
1124 |
|
|
|
1125 case 'M': |
|
|
|
1126 forkexec((char *[]) { "markallread.sh", NULL }, 0); |
|
|
|
1127 break; |
|
|
|
1128 |
|
|
|
1129 or |
|
|
|
1130 |
|
|
|
1131 case 'S': |
|
|
|
1132 forkexec((char *[]) { "syncnews.sh", NULL }, 1); |
|
|
|
1133 break; |
|
|
|
1134 |
|
|
|
1135 The specified script should be in $PATH or be an absolute path. |
|
|
|
1136 |
|
|
|
1137 Example of a `markallread.sh` shellscript to mark all URLs as read: |
|
|
|
1138 |
|
|
|
1139 #!/bin/sh |
|
|
|
1140 # mark all items/URLs as read. |
|
|
|
1141 tmp="$(mktemp)" || exit 1 |
|
|
|
1142 (cat ~/.sfeed/urls; cut -f 3 ~/.sfeed/feeds/*) | \ |
|
|
|
1143 awk '!x[$0]++' > "$tmp" && |
|
|
|
1144 mv "$tmp" ~/.sfeed/urls && |
|
|
|
1145 pkill -SIGHUP sfeed_curses # reload feeds. |
|
|
|
1146 |
|
|
|
1147 Example of a `syncnews.sh` shellscript to update the feeds and reload them: |
|
|
|
1148 |
|
|
|
1149 #!/bin/sh |
|
|
|
1150 sfeed_update |
|
|
|
1151 pkill -SIGHUP sfeed_curses |
|
|
|
1152 |
|
|
|
1153 |
|
|
|
1154 Running programs in a new session |
|
|
|
1155 --------------------------------- |
|
|
|
1156 |
|
|
|
1157 By default processes are spawned in the same session and process group as |
|
|
|
1158 sfeed_curses. When sfeed_curses is closed this can also close the spawned |
|
|
|
1159 process in some cases. |
|
|
|
1160 |
|
|
|
1161 When the setsid command-line program is available the following wrapper command |
|
|
|
1162 can be used to run the program in a new session, for a plumb program: |
|
|
|
1163 |
|
|
|
1164 setsid -f xdg-open "$@" |
|
|
|
1165 |
|
|
|
1166 Alternatively the code can be changed to call setsid() before execvp(). |
|
|
|
1167 |
|
|
|
1168 |
|
|
|
1169 Open an URL directly in the same terminal |
|
|
|
1170 ----------------------------------------- |
|
|
|
1171 |
|
|
|
1172 To open an URL directly in the same terminal using the text-mode lynx browser: |
|
|
|
1173 |
|
|
|
1174 SFEED_PLUMBER=lynx SFEED_PLUMBER_INTERACTIVE=1 sfeed_curses ~/.sfeed/feeds/* |
|
|
|
1175 |
|
|
|
1176 |
|
|
|
1177 Yank to tmux buffer |
|
|
|
1178 ------------------- |
|
|
|
1179 |
|
|
|
1180 This changes the yank command to set the tmux buffer, instead of X11 xclip: |
|
|
|
1181 |
|
|
|
1182 SFEED_YANKER="tmux set-buffer \`cat\`" |
|
|
|
1183 |
|
|
|
1184 |
|
|
|
1185 Known terminal issues |
|
|
|
1186 --------------------- |
|
|
|
1187 |
|
|
|
1188 Below lists some bugs or missing features in terminals that are found while |
|
|
|
1189 testing sfeed_curses. Some of them might be fixed already upstream: |
|
|
|
1190 |
|
|
|
1191 - cygwin + mintty: the xterm mouse-encoding of the mouse position is broken for |
|
|
|
1192 scrolling. |
|
|
|
1193 - HaikuOS terminal: the xterm mouse-encoding of the mouse button number of the |
|
|
|
1194 middle-button, right-button is incorrect / reversed. |
|
|
|
1195 - putty: the full reset attribute (ESC c, typically `rs1`) does not reset the |
|
|
|
1196 window title. |
|
|
|
1197 - Mouse button encoding for extended buttons (like side-buttons) in some |
|
|
|
1198 terminals are unsupported or map to the same button: for example side-buttons 7 |
|
|
|
1199 and 8 map to the scroll buttons 4 and 5 in urxvt. |
|
|
|
1200 |
|
|
|
1201 |
|
|
|
1202 License |
|
|
|
1203 ------- |
|
|
|
1204 |
|
|
|
1205 ISC, see LICENSE file. |
|
|
|
1206 |
|
|
|
1207 |
|
|
|
1208 Author |
|
|
|
1209 ------ |
|
|
|
1210 |
|
|
|
1211 Hiltjo Posthuma <hiltjo@codemadness.org> |
|