RSS feed generator from Markdown files

11 November 20249 minute read

The WarpBuild documentation site is built with Docusaurus and hosted on Vercel. The documentation is a collection of markdown files stored in a Github repository.

Here's a simple script to generate RSS feeds for the documentation pages. I used this script to generate the RSS feed for the changelog page so users can subscribe to the changelog via RSS, especially to keep track of breaking changes. This was built heavily leveraging claude sonnet 3.5 v2 and cursor.

Docusaurus is a static site generator with content in markdown and extensive customization options. It is maintained by Meta Open Source and is used by many popular companies including Meta, The Linux Foundation, and Red Hat.

While Docusaurus has a great RSS feed generator for blog posts, it does not support RSS feeds for the documentation content page type.

Hope you find this useful!

RSS Feed Generator Usage

The changelog-to-rss.sh script generates the changelog.xml file, which is the RSS feed for the changelog.

  1. Keep the slug in the frontmatter of the changelog file the same as the filename.
  2. The slug is used to generate the permalink for the changelog entry.
  3. The updatedAt field in the frontmatter is used to set the date of the changelog entry.
  4. The permalink points to the different sections in the changelog.
  5. Sections starting with ### in the changelog file are used as the title of the RSS item.
  6. All the markdown files are in the docs/changelog directory, one file per month. The naming convention is YYYY-monthname.mdx. Example: 2024-October.mdx.

The Script

The code for the script is available in the warpbuilds/docs-rss-feed repository.

1#!/bin/bash
2
3# Configuration
4FEED_TITLE="WarpBuild Changelog"
5FEED_DESC="WarpBuild platform updates, improvements, and bug fixes"
6FEED_LINK="https://docs.warpbuild.com/changelog"
7DOCS_BASE_URL="https://docs.warpbuild.com"
8OUTPUT_FILE="static/changelog.xml"
9CHANGELOG_DIR="docs/changelog"
10
11# Create RSS header
12cat > "$OUTPUT_FILE" << EOF
13<?xml version="1.0" encoding="UTF-8" ?>
14<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
15<channel>
16    <title>$FEED_TITLE</title>
17    <description>$FEED_DESC</description>
18    <link>$FEED_LINK</link>
19    <atom:link href="$FEED_LINK/changelog.xml" rel="self" type="application/rss+xml" />
20    <lastBuildDate>$(date -R)</lastBuildDate>
21EOF
22
23# Function to convert date format for macOS
24convert_date() {
25    local input_date="$1"
26    if [ -z "$input_date" ]; then
27        return 1
28    fi
29    # Convert "Month DD, YYYY" to RFC822 format and strip the time portion
30    date -R -j -f "%B %d, %Y" "$input_date" 2>/dev/null | sed 's/ [0-9][0-9]:[0-9][0-9]:[0-9][0-9] .*//'
31}
32
33# Function to create anchor-friendly string
34create_anchor() {
35    local input="$1"
36    if [ -z "$input" ]; then
37        return 1
38    fi
39    echo "$input" | tr '[:upper:]' '[:lower:]' | tr ' ' '-' | tr -d ',' 2>/dev/null
40}
41
42# Function to extract frontmatter value
43get_frontmatter_value() {
44    local file="$1"
45    local key="$2"
46    awk -v key="$key:" '$1 == key {print substr($0, length(key) + 3)}' "$file" | tr -d '"'
47}
48
49# Function to process markdown content
50process_markdown() {
51    local content="$1"
52    local processed="$content"
53
54    # Convert markdown to HTML first
55    processed=$(echo "$processed" | perl -pe 's|\[([^\]]*)\]\(([^\)]*)\)|<a href="\2">\1</a>|g')
56
57    # Properly escape HTML content
58    processed=$(echo "$processed" | sed 's/\\n/<br\/>/g')
59
60    echo "$processed"
61}
62
63# Process each changelog file in reverse chronological order
64for file in $(ls -r "$CHANGELOG_DIR"/*.mdx); do
65    # Skip changelog.mdx
66    if [[ $file == *"changelog.mdx" ]]; then
67        continue
68    fi
69
70    # Get update date from frontmatter
71    updated_at=$(get_frontmatter_value "$file" "updatedAt")
72    title=$(get_frontmatter_value "$file" "title")
73
74    # Extract the slug from the filename (remove path and extension)
75    SLUG=$(basename "$file" .mdx)
76
77    CONTENT=""
78    CURRENT_DATE=""
79
80    while IFS= read -r line; do
81        # Look for changelog entries starting with ###
82        if [[ $line =~ ^###[[:space:]]+(.*,[[:space:]]+[0-9]{4})$ ]]; then
83            # If we have accumulated content, create an item
84            if [ ! -z "$CURRENT_DATE" ] && [ ! -z "$CONTENT" ]; then
85                RFC_DATE=$(convert_date "$CURRENT_DATE")
86                PROCESSED_CONTENT=$(process_markdown "$CONTENT")
87
88                # Create anchor-friendly date string with error checking
89                ANCHOR_DATE=$(create_anchor "$CURRENT_DATE")
90                if [ ! -z "$ANCHOR_DATE" ]; then
91                    cat >> "$OUTPUT_FILE" << EOF
92    <item>
93        <title>WarpBuild Updates - $CURRENT_DATE</title>
94        <link>$FEED_LINK/$SLUG#$ANCHOR_DATE</link>
95        <guid isPermaLink="false">$FEED_LINK/$SLUG#$ANCHOR_DATE</guid>
96        <pubDate>$RFC_DATE</pubDate>
97        <description><![CDATA[$PROCESSED_CONTENT]]></description>
98    </item>
99EOF
100                fi
101            fi
102
103            CURRENT_DATE="${BASH_REMATCH[1]}"
104            CONTENT=""
105        elif [[ -n $line && ! $line =~ ^--- && ! $line =~ ^$ ]]; then
106            CONTENT+="$line\n"
107        fi
108    done < "$file"
109
110    # Process the last entry in the file
111    if [ ! -z "$CURRENT_DATE" ] && [ ! -z "$CONTENT" ]; then
112        RFC_DATE=$(convert_date "$CURRENT_DATE")
113        echo "RFC_DATE: $RFC_DATE"
114        PROCESSED_CONTENT=$(process_markdown "$CONTENT")
115        ANCHOR_DATE=$(create_anchor "$CURRENT_DATE")
116
117        if [ ! -z "$ANCHOR_DATE" ]; then
118            cat >> "$OUTPUT_FILE" << EOF
119    <item>
120        <title>WarpBuild Updates - $CURRENT_DATE</title>
121        <link>$FEED_LINK/$SLUG#$ANCHOR_DATE</link>
122        <guid isPermaLink="false">$FEED_LINK/$SLUG#$ANCHOR_DATE</guid>
123        <pubDate>$RFC_DATE</pubDate>
124        <description><![CDATA[$PROCESSED_CONTENT]]></description>
125    </item>
126EOF
127        fi
128    fi
129done
130
131# Close RSS feed
132cat >> "$OUTPUT_FILE" << EOF
133</channel>
134</rss>
135EOF
136
137echo "RSS feed generated at $OUTPUT_FILE"

Example Markdown File

Here's a snippet of the markdown file for the changelog:

1---
2title: "October 2024"
3slug: "2024-October"
4description: "List of updates in 2024-October"
5sidebar_position: -9
6createdAt: "2024-10-04"
7updatedAt: "2024-10-29"
8---
9
10### October 29, 2024
11
12- `Feature`: Custom VM images are now supported for GCP BYOC runners.
13
14### October 21, 2024
15
16- `Feature`: Ubuntu 24.04 arm64 runners are now supported natively as cloud
17  runners as well as with AWS and GCP custom runners. These runners are
18  compatible with GitHub's Ubuntu 24.04 arm64. Refer to [cloud runner labels](/cloud-runners#linux-arm64)
19  for the full list of available labels. Refer to
20  [this link](https://github.com/actions/partner-runner-images/blob/main/images/arm-ubuntu-24-image.md) for the details on the packaged tools.
21
22### October 17, 2024
23
24- `Enhancement`: The image for `macos-14` (https://github.com/actions/runner-images/releases/tag/macos-14-arm64%2F20241007.259) has been updated. This fixes the issue with iOS 18 SDK and simulator not being available.
25
26### October 15, 2024
27
28- `Feature`: Docker Layer Caching is now available for GCP BYOC runners.
29- `Enhancement`: The images for `ubuntu-2204` for [x86-64](https://github.com/actions/runner-images/releases/tag/ubuntu22%2F20241006.1) for `arm64` architecture have been updated.
30- `Enhancement`: [ubuntu-2404 for x86-64](https://github.com/actions/runner-images/releases/tag/ubuntu24%2F20241006.1) image has been updated.
31
32### October 14, 2024
33
34- `Enhancement`: BYOC features do not require a payment method to be added, by default. Credits can be used for BYOC runners.
35
36### October 11, 2024
37
38- `Pricing`: Cost for cache operations has been **reduced** from $0.001 to $0.0001 per operation.
39
40### October 09, 2024
41
42- `Feature`: GCP BYOC is now generally available. Read more here: [BYOC on GCP](/byoc/gcp).
43
44### October 08, 2024
45
46- `Enhancement`: The runner start times are now much faster, with a 90%ile of the start times being under 20 seconds. This is a a significant improvement over the previous 90%ile of 45 seconds.
47
48---

Next steps

It would be fantastic to have this as a Docusaurus plugin so it can be reused for other markdown pages. If you are interested in this, please let me know!

The full script is available as a GitHub Gist.

Example markdown file is available here.


Tip

Use WarpBuild for blazing fast GitHub actions runners with superior job start times, caching backed by object storage, unlimited concurrency, and easy to use dashboards. Save 50-90% on your GitHub Actions costs while getting 10x the performance. Book a call or get started today!

Previous post

Self-host GitHub Actions runners with Actions Runner Controller (ARC) on AWS

6 November 2024
Actions Runner ControllerARCAWSGitHub