Reproducible · Source-Available
How OdyseeWatchdog Collects Data
Every claim on this site is tied to a public Odysee URL or a published third-party investigation. The scanner code, blocklist, and raw data are all public. Anything you read here, you can reproduce.
Source code on GitHub
Scanner, supplementary scripts, blocklist, and the Next.js app. MIT-licensed.
Raw data: /data/*.json
flagged.json (the dataset), stake_concentration.json, channel_velocity.json, search_ranking.json, youtube_migration.json, takedown_tests.json.
The Primary Scanner — scan_odysee.py
The primary scanner queries Odysee's public Lighthouse search API (the same API the site's own search box uses) with a curated keyword list grouped by violation category: terrorism, hate speech / neo-nazi, harassment / doxxing / threats, CSAM / child safety, incitement to violence, drug trafficking, illegal weapons, financial fraud, cybercrime, human trafficking, animal cruelty, self-harm promotion, conspiracy-driven violence.
For every result returned, the scanner records: id, title, channel, url, tipsLBC (effective stake — sum of creator stake plus audience tips on that claim), flags (categories the title matched), and lastSeen (date the scanner last confirmed the URL was live). The output is written atomically to /data/flagged.json.
ToS-compliant operation
- 8-second polite delay between requests (well above any documented Odysee API rate limit).
- Public metadata only. Titles, channel handles, effective stake. The scanner does not view, download, or mirror video content. It does not access private channels or bypass any access control.
- Atomic writes. JSON output is written to a temp file and renamed; no partial-write states reach the deployed dataset.
- Transparent diff. Every scanner run that changes the dataset is committed to the public git repository under a bot identity. The diff is inspectable; nothing is silently revised.
Blocklist & Classification — False-Positive Floor
A keyword scanner pulls in false positives by design. We maintain a public blocklist of channels that match the keyword traps but produce clearly off-topic content (educational science channels, consumer-rights advocates, gaming creators, mainstream science explainers). Any item from a blocklisted channel is dropped from the dataset before publication.
We also maintain a title-pattern blocklist for benign substrings that incidentally match a violating keyword (e.g., "keyboard" drops hardware-review videos that match the "board" substring of certain hate-symbol queries). Both blocklists are public.
The remaining items are then classified into category flags by re-running the title against the same keyword groups (a stricter pass than the initial search query). Items whose titles don't match any category cleanly are tagged "Under Review" and surfaced separately on the Stats page — they are reachable via a violating search query, but the title alone isn't enough to assert a category. We hold them under review rather than over-claim. Every category count we publish is therefore a conservative floor, not a ceiling.
Supplementary Scanners
Five additional scanners run on demand and produce specialised datasets. All read-only, all public-metadata-only:
scan_stake_concentration.py → /data/stake_concentration.json
For every channel with 3+ flagged videos: compute its top-1 share (what percentage of the channel's total LBC sits on its single most-staked video) and Gini coefficient. A channel above 70% top-1 share with ≥5,000 LBC total is flagged as likelySelfStake. The signal: extreme concentration on one video is the textbook fingerprint of a creator paying themselves to game search-rank, not of organic audience tipping. The full math is in the source.
scan_search_ranking.py → /data/search_ranking.json
Queries Odysee's own search box for 20 benign public-interest terms ("vaccine", "climate change", "trans healthcare", "black lives matter", "january 6", etc.) and captures the top results. If the top result for "vaccine" is denial content, that's a direct indictment of the platform's ranking — no additional interpretation required. We diff snapshots over time to surface ranking drift.
scan_channel_age.py → /data/channel_velocity.json
For every channel in the flagged dataset with 10+ items, fetch the channel claim's creation_timestamp and compute age in days. Flag any channel under 180 days old with 20+ flagged items as a fresh propaganda node — purpose-built recent accounts, as distinct from legacy content sitting around.
scan_youtube_migration.py → /data/youtube_migration.json
Tracks 20 named creators removed from YouTube for hate speech, harassment, incitement, or repeated medical misinformation. For each, probe known channel handles on Odysee via claim_search and record their current presence + video count + stake. The source list is hand-maintained at scripts/deplatformed_creators.json.
scan_comments.py → /data/comment_flags.json
Samples the comment threads on flagged items and pattern- matches for slurs, threats, and explicit targeted harassment. The titles are often sanitised; the comment section is where the explicit behaviour lives. Note: the upstream comment API has been intermittently auth-walled in 2026; this dataset is sample-based and conservative.
Takedown Register — takedown_tests.json
Counting flagged items is one thing; counting responses is another. The takedown register is a hand-maintained list of specific URLs we have referred to Odysee (with statute citations) and to the appropriate regulator (Coimisiún na Meán, ARCOM, BNetzA, IWF, NCMEC, FBI IC3, depending on jurisdiction). For each case we record: URL, channel, reason, date reported, who it was reported to, and current status.
The register is the empirical answer to the question "Does Odysee act when you specifically ask?" To date, status of every entry: still_live. Not most. Every. See /takedown-tests for the live list.
Known Limitations
Public methodology disclosure is the price of credibility. Here's what our data does not capture, so you can size our claims accurately:
- Title-only classification. Items whose violating content is in the video itself but not in the title are not flagged. Our counts are a floor.
- English-keyword bias. Most of our keyword groups are English. Non-English content that uses local-language hate speech, conspiracy framings, or extremist code is undercounted. Where we do flag French/ German/Spanish content (e.g., French Covid-Denial Network), it's because the titles use phrases that landed in our blocklist via investigation rather than via the English-keyword sweep.
- No view counts. Odysee's public API stopped returning per-claim view counts in 2026. Our flagged.json schema retains a
viewsfield for backwards compatibility but it reads 0 for every item. We do not publish view-based statistics derived from the current dataset; older articles citing view counts are labelled with their original publication date. - Stake ≠ tips. The
tipsLBCfield is the "effective amount" on a claim, which is a blend of creator self-stake and audience tips. The self-stake fraud analyser exists precisely to call out cases where the "tips" appear to be the creator paying themselves. Any single LBC figure on a specific item should be read with that caveat. - Blocklist asymmetry. We add channels to the blocklist when we identify them as false positives. We have no comparable mechanism for false negatives — channels with violating content that don't hit any of our keyword traps. Our coverage of extant Odysee extremist content is partial by construction.
Reproducing Our Findings
Anything in our datasets you can verify yourself:
- Pull the relevant JSON from
/data/<name>.json. - Pick any claim — say,
@neohumaneve's 495,000 LBCself-stake on "One Lie to Rule Them All". - Open the matching Odysee URL listed in the JSON. Inspect the channel page yourself.
- Confirm the LBC figure shown on Odysee matches the figure in our JSON (within a small drift if Odysee's page has had new tip activity since our last scan).
For the methodology code itself, clone the repo and read scripts/scan_odysee.py and scripts/scan_stake_concentration.py. Both are well under 1,000 lines and have no third-party ML dependencies.
Corrections & Disputes
If you believe we've misclassified content, mis-quoted data, or that a channel we've named is a false positive we should add to our blocklist: press@odyseewatchdog.com or submit via the form. We will review and either correct, expand, or document why the call stands. Either way the response is logged in the public commit history.
Legal Disclaimer
This site only highlights publicly available content that violates Odysee's own Community Guidelines and/or applicable laws. We do not host, embed, or redistribute any Odysee content. All referenced material is linked in its original, publicly accessible location for accountability and reporting purposes only.