About — GitHub Trending

Data Collection

Each day the pipeline calls the GitHub Search API to discover repositories that have been actively pushed within the last 30 days and have accumulated at least 50 stars. Up to 200 repositories are collected per run.

Results are paginated and deduplicated by GitHub repository ID before being passed to the ranker. If the API returns the same repository on multiple pages (which can happen at page boundaries), only the entry with the higher star count is kept.

The Trending Score Formula

Repositories are ranked by a composite trending score that blends five signals. The goal is to surface repos that are genuinely gaining momentum right now — not just the all-time most-starred projects.

score = star_velocity × 0.50 + fork_ratio × 0.15 + watcher_ratio × 0.10 + recency_bonus × 0.20 + issue_health × 0.05

Signal	Weight	What it measures
Star Velocity	50%	Stars divided by the repo's age in days (capped at 365). A repo accumulating stars quickly relative to its age scores higher — this is the dominant signal.
Fork Ratio	15%	Forks ÷ stars. Reflects how often people actually build on the code, not just bookmark it with a star.
Watcher Ratio	10%	Watchers ÷ stars. Sustained community interest: people who opt-in to receive notifications indicate deeper engagement than a passive star.
Recency Bonus	20%	Linear decay from 1.0 → 0.0 over the 30 days since the repo's last push. Rewards actively maintained projects and penalises abandoned ones.
Issue Health	5%	Calculated as `1 / (1 + (open_issues / stars) × 10)`. Penalises repos drowning in unresolved issues relative to their community size.

AI-Generated Summaries

For each repository the pipeline fetches the README, the top-level file tree, key config files (e.g. package.json, pyproject.toml, Cargo.toml), and the primary entry-point source file. This context is sent to Claude (Haiku model by default) to generate:

Summaries are cached in the database. A repository is only re-summarised if it was not already present from a previous run, keeping API costs low and the daily run fast.

Update Schedule

The pipeline runs daily at 8:00 AM via Windows Task Scheduler. Each run executes the following steps in order:

Fetch

Query the GitHub Search API and collect up to 200 recently-pushed repositories with ≥ 50 stars.

Rank

Compute the composite trending score for each repo and sort descending.

Summarise

Generate AI summaries and topic tags for any new repos not already in the database.

Build

Regenerate the static docs/index.html site from the latest database snapshot.

Deploy

Push the updated site to GitHub, which triggers an automatic Cloudflare Pages deployment.

Data Storage

All results are stored in a SQLite database at data/trending.db. The database accumulates every daily snapshot, so historical data is preserved across runs. The database is seeded with 16 days of historical snapshots (over 1.5 million rows) giving the period filters meaningful baseline data from day one.

This enables the pipeline to compute star-growth deltas over multiple time windows by diffing successive snapshots — the data behind the 24h, 7-day, and 30-day growth stats shown on each repository card and used by the period filter thresholds.

Filters Explained

The main page provides several ways to narrow the list of trending repositories: