Filter, Rewrite, and Scraper Rules
Feed Filtering Rules ¶
Miniflux has a regex-based filtering system that allows you to ignore or keep articles. For more advanced filtering, you can use the Entry Filtering Rules feature.
Regex-Based Blocking Filters
Block rules ignore articles with a title, an entry URL, a tag, or an author that matches the regex (RE2 syntax).
For example, the regex (?i)miniflux
will ignore all articles with a title that contains the word Miniflux (case insensitive).
Ignored articles won’t be saved into the database.
Regex-Based Keep Filters
Keep rules retain only articles that match the regex (RE2 syntax).
For example, the regex (?i)miniflux
will keep only the articles with a title that contains the word Miniflux (case insensitive).
Entry Filtering Rules ¶
Since Miniflux 2.2.10, filtering rules can be defined for each feed and globally on the Settings page.
There are two types of rules:
- Block Rules: Ignore articles that match the regex.
- Keep Rules: Retain only articles that match the regex.
Rules Format and Syntax
Example:
FieldName=RegEx
FieldName=RegEx
FieldName=RegEx
- Each rule must be on a separate line.
- Duplicate rules are allowed. For example, having multiple
EntryTitle
rules is possible. - The provided regex should use the RE2 syntax.
- The order of the rules matters as the processor stops on the first match for both Block and Keep rules.
- Invalid rules are ignored.
Available Fields:
EntryTitle
EntryURL
EntryCommentsURL
EntryContent
EntryAuthor
EntryTag
EntryDate
Date Patterns
The EntryDate
field supports the following date patterns:
future
- Match entries with future publication dates.before:YYYY-MM-DD
- Match entries published before a specific date.after:YYYY-MM-DD
- Match entries published after a specific date.between:YYYY-MM-DD,YYYY-MM-DD
- Match entries published between two dates.max-age:duration
- Match entries that are not older than a specific duration (e.g.,max-age:7d
for 7 days). Valid time units are “ns”, “us” (or “µs”), “ms”, “s”, “m”, “h”, “d”.
Date format must be YYYY-MM-DD
, for example: 2024-01-01
.
Block rules examples:
EntryDate=future # Ignore articles with future publication dates
EntryDate=before:2024-01-01 # Ignore articles published before January 1st, 2024
EntryDate=max-age:30d # Ignore articles older than 30 days
EntryTitle=(?i)miniflux # Ignore articles with "Miniflux" in the title
EntryTitle=(?i)\b(save|take|get)\s+\$\d{2,5}\b # Ignore articles with "Save $50", "Get $100…" in the title
EntryTitle=(?i)\$\d{2,5}\s+(off|discount)\b # Ignore articles with "$50 off"
EntryTitle=(?i)\bbest\s+.*\bdeals?\b # Ignore articles with "Best Foobar Deals…"
EntryTitle=(?i)\bgift\s+(guide|ideas|list)\b # Ignore articles that look like listicles
Keep rules examples:
EntryDate=between:2024-01-01,2024-12-31 # Keep only articles published in 2024
EntryDate=after:2024-03-01 # Keep only articles published after March 1st, 2024
Global Rules & Feed Rules Ordering
Rules are processed in this order:
- Global Block Rules
- Feed Block Rules
- Global Keep Rules
- Feed Keep Rules
Content Rewrite Rules ¶
To improve the reading experience, it’s possible to alter the content of feed items.
For example, if you are reading a popular comic website like XKCD,
it’s nice to have the image title (the alt
attribute) added under the image,
especially on mobile devices where there is no hover
event.
add_dynamic_image
- Tries to add the highest quality images from sites that use JavaScript to load images (e.g., either lazily when scrolling or based on screen size).
add_dynamic_iframe
- Tries to add embedded videos from sites that use JavaScript to load iframes (e.g., either lazily when scrolling or after the rest of the page is loaded).
add_image_title
- Adds each image's title as a caption under the image.
add_youtube_video
- Inserts a YouTube video into the article (automatic for Youtube.com).
add_youtube_video_from_id
- Inserts a YouTube video into the article based on the video ID.
add_invidious_video
- Inserts an Invidious player into the article (automatic for https://invidio.us).
add_youtube_video_using_invidious_player
- Inserts an Invidious player into the article for YouTube feeds.
add_castopod_episode
- Inserts a Castopod episode player.
add_mailto_subject
- Inserts mailto links subject into the article.
base64_decode
- Decodes base64 content. It can be used with a selector:
base64_decode(".base64")
, but can also be used without arguments:base64_decode
. In this case, it will try to convert all TextNodes and always fall back to the original text if it cannot decode. nl2br
- Converts new lines
\n
to<br>
(useful for non-HTML content). convert_text_links
- Converts text links to HTML links (useful for non-HTML content).
fix_medium_images
- Attempts to fix Medium's images rendered in JavaScript.
use_noscript_figure_images
- Uses
<noscript>
content for images rendered with JavaScript. replace("search term"|"replace term")
- Searches and replaces text.
remove(".selector, #another_selector")
- Removes DOM elements.
parse_markdown
- Converts Markdown to HTML. This rule has been removed in version 2.2.4.
remove_tables
- Removes any tables while keeping the content inside (useful for email newsletters).
remove_clickbait
- Removes clickbait titles (converts uppercase titles).
replace_title("search-term"|"replace-term")
- Adjusts entry titles.
add_hn_links_using_hack
- Opens HN comments with Hack.
add_hn_links_using_opener
- Opens HN comments with Opener.
fix_ghost_cards
- Converts Ghost link cards to regular links.
Miniflux includes a set of predefined rules for some websites, but you can define your own rules.
On the feed edit page, enter your custom rules in the field “Rewrite Rules” like this:
rule1,rule2
Separate each rule with a comma.
URL Rewrite Rules ¶
Sometimes it might be required to rewrite a URL in a feed to fetch better-suited content.
For example, for some users, the URL https://www.npr.org/sections/money/2021/05/18/997501946/the-case-for-universal-pre-k-just-got-stronger displays a cookie consent dialog instead of the actual content, and it would be preferred to fetch the URL https://text.npr.org/997501946 instead.
The following rule does this:
rewrite("^https:\/\/www\.npr\.org\/\d{4}\/\d{2}\/\d{2}\/(\d+)\/.*$"|"https://text.npr.org/$1")
This will rewrite all URLs from the original feed to URLs pointing to text.npr.org when the article content is fetched. You may also need to add your own scraper rule because the default rule will try to fetch #storytext.
Another example is the German page
https://www.heise.de/news/Industrie-ruestet-sich-fuer-Gasstopp-Forscher-vorsichtig-optimistisch-7167721.html
,
which splits the article into multiple pages. The full text can be read on
https://www.heise.de/news/Industrie-ruestet-sich-fuer-Gasstopp-Forscher-vorsichtig-optimistisch-7167721.html?seite=all
.
The URL rewrite rule for that would be:
rewrite("(.*?\.html)"|"$1?seite=all")
Scraper Rules ¶
When an article contains only an extract of the content, you can fetch the original web page and apply a set of rules to get relevant content.
Miniflux uses CSS selectors for custom rules. These custom rules can be saved in the feed properties (select a feed and click on edit).
CSS Selector | Description |
---|---|
div#articleBody | Fetch a div element with the ID articleBody . |
div.content | Fetch all div elements with the class content . |
article, div.article | Use a comma to define multiple rules. |
Miniflux includes a list of predefined rules for popular websites. You can contribute to the project to keep them up to date.
Under the hood, Miniflux uses the library Goquery.