How GitHub Trending Works

A daily pipeline that collects, scores, and summarises rising GitHub repositories — fully automated and open-source.

Data Collection

Each day the pipeline calls the GitHub Search API to discover repositories that have been actively pushed within the last 30 days and have accumulated at least 50 stars. Up to 200 repositories are collected per run.

Results are paginated and deduplicated by GitHub repository ID before being passed to the ranker. If the API returns the same repository on multiple pages (which can happen at page boundaries), only the entry with the higher star count is kept.

The Trending Score Formula

Repositories are ranked by a composite trending score that blends five signals. The goal is to surface repos that are genuinely gaining momentum right now — not just the all-time most-starred projects.

Signal Weight What it measures
Star Velocity 50% Stars divided by the repo's age in days (capped at 365). A repo accumulating stars quickly relative to its age scores higher — this is the dominant signal.
Fork Ratio 15% Forks ÷ stars. Reflects how often people actually build on the code, not just bookmark it with a star.
Watcher Ratio 10% Watchers ÷ stars. Sustained community interest: people who opt-in to receive notifications indicate deeper engagement than a passive star.
Recency Bonus 20% Linear decay from 1.0 → 0.0 over the 30 days since the repo's last push. Rewards actively maintained projects and penalises abandoned ones.
Issue Health 5% Calculated as 1 / (1 + (open_issues / stars) × 10). Penalises repos drowning in unresolved issues relative to their community size.

AI-Generated Summaries

For each repository the pipeline fetches the README, the top-level file tree, key config files (e.g. package.json, pyproject.toml, Cargo.toml), and the primary entry-point source file. This context is sent to Claude (Haiku model by default) to generate:

Summaries are cached in the database. A repository is only re-summarised if it was not already present from a previous run, keeping API costs low and the daily run fast.

Update Schedule

The pipeline runs daily at 8:00 AM via Windows Task Scheduler. Each run executes the following steps in order:

Fetch
Query the GitHub Search API and collect up to 200 recently-pushed repositories with ≥ 50 stars.
Rank
Compute the composite trending score for each repo and sort descending.
Summarise
Generate AI summaries and topic tags for any new repos not already in the database.
Build
Regenerate the static docs/index.html site from the latest database snapshot.
Deploy
Push the updated site to GitHub, which triggers an automatic Cloudflare Pages deployment.

Data Storage

All results are stored in a SQLite database at data/trending.db. The database accumulates every daily snapshot, so historical data is preserved across runs. The database is seeded with 16 days of historical snapshots (over 1.5 million rows) giving the period filters meaningful baseline data from day one.

This enables the pipeline to compute star-growth deltas over multiple time windows by diffing successive snapshots — the data behind the 24h, 7-day, and 30-day growth stats shown on each repository card and used by the period filter thresholds.

Filters Explained

The main page provides several ways to narrow the list of trending repositories:

Period Threshold What it means
24 hours ≥ 30% Gained at least 30% of its total star count in a single day — a genuine viral spike.
7 days ≥ 60% Gained at least 60% of its total star count over the past week — sustained rapid growth.
30 days ≥ 120% Gained more stars in a single month than its entire prior history — explosive breakout growth.