User:SuperHamster/CiteUnseen

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
Cite Unseen
Cite Unseen.pdf
Presentation introducing Cite Unseen as delivered at CredCon Austin 2018
DescriptionUser script that adds iconic indicators to Wikipedia citations
AuthorsSuperHamster, Sky Harbor
Updated2020-12-24
ChangelogUser:SuperHamster/CiteUnseen/Changelog
Script locationUser:SuperHamster/CiteUnseen.js
Version controlhttps://github.com/KevinPayravi/CiteUnseen
Issue trackingPhabricator

Cite Unseen is a user script that adds categorical icons to Wikipedia citations, providing readers and editors a quick initial evaluation of citations at a glance. This helps guide users on the nature and reliability of sources, and to help identify sources that may potentially be problematic or should be used with caution (key word is may - see the usage guide below).

Cite Unseen's categorization dataset currently holds over 3,400 domains in 18 categories. These categories include:

  • Perennial sources list statuses (generally reliable; marginally reliable; generally unreliable; deprecated; blacklisted)
  • Advocacy groups; books; blogs; user-generated news; editable sites; state media; news; opinion pieces; press releases; social media sites; tabloids; and TV and radio programs
  • Predatory journals listed on the predatory source list

Initially developed at CredCon in November 2018, Cite Unseen was jointly developed by Kevin Payravi (SuperHamster) and Josh Lim (Sky Harbor), with support from the Credibility Coalition and the Knowledge Graph Working Group. The project saw more development at Wikimedia Hackathon 2019.

Installation[edit]

Cite Unseen after running on President of the United States:
  • The first source has been marked as an Font Awesome 5 solid bullhorn.svg advocacy group (the Heritage Foundation is an American conservative think tank).
  • The second source has been marked as a Government icon (black).svg government-controlled source.
  • The third source has been marked as an Font Awesome 5 solid bullhorn.svg advocacy group (while the document itself is from the Congressional Research Center, it is hosted on the Federation of American Scientists website).
  • The fourth source has been marked as a Historical Newspaper - The Noun Project.svg news article from the The Washington Post.
  • The fifth source has been marked as an link opinion piece.

The script is located at User:SuperHamster/CiteUnseen.js. You can add the script to your Wikipedia browsing experience by editing your common.js file and adding the following line:

{{subst:iusc|User:SuperHamster/CiteUnseen.js}}

Cite Unseen will automatically run whenever you open a Wikipedia page.

Before using, please read the usage guidelines below. It's particularly important to keep in mind that while Cite Unseen is here to guide you, it does not evaluate context and should not justify editing decisions.

Configuration[edit]

By default, all icon types will be shown except for resources that are considered generally reliable per the perennial sources list, to reduce clutter.

You can toggle icons on or off by pasting the following in your CiteUnseen-Rules.js page, and adjusting the true/false values however you wish.

cite_unseen_ruleset = {
  "advocacy": true,
  "blogs": true,
  "books": true,
  "community": true,
  "editable": true,
  "government": true,
  "news": true,
  "opinions": true,
  "predatory": true,
  "press": true,
  "rspDeprecated": true,
  "rspBlacklisted": true,
  "rspGenerallyUnreliable": true,
  "rspMarginallyReliable": true,
  "rspGenerallyReliable": false,
  "rspMulti": true,
  "social": true,
  "tabloids": true
}

Usage[edit]

Cite Unseen after running on Yemeni Civil War (2014–present):

Once installed, Cite Unseen will automatically analyze and annotate references you come across. When it finds a match in its categorization dataset, it will add a categorical icon (refer to the chart below). You can hover over an icon to get more details about the categorization.

Important points to keep in mind while using Cite Unseen:

  • Context matters. Sources that are considered generally unreliable can still have valid use. For example, while we typically avoid citing social media, social media posts may still be used for uncontroversial self-descriptions. And while we typically try to avoid self-published blogs and other user-generated content, they may still be acceptable when authored by established subject-matter experts (see WP:SPS for more).
  • Evaluate. The point of Cite Unseen is to highlight the nature of sources, and to prompt you to think about potential concerns with a source. Just because a source has a concerning mark does not automatically mean it is being used inappropriately. You should never justify removing or adding a source solely because of information that Cite Unseen provides; you need to do your own homework as well.
  • It does not cover everything. There is an endless trove of resources out there, and we can't categorize all of them. You'll find many citations that Cite Unseen won't mark up; this does not indicate anything other than that it either (a) does not fit in an existing category or (b) more commonly, it simply hasn't been categorized.
  • It is not always right. Cite Unseen looks at citation types and does string-matching against URLs. While generally successful, it's possible for Cite Unseen to misidentify a source.
    • Sometimes reliable sources are hosted on an unreliable site. For example, editors citing a book may link to its listing on Amazon.com, which is classified as Argentina - NO symbol.svg generally unreliable. This will cause the citation to be marked as generally unreliable even if the book itself is fine. Situations like these are something to keep in mind while investigating the usage of a source.

Classifications[edit]

Cite Unseen classifies sources into eighteen categories.

Icon Description Code
Font Awesome 5 solid bullhorn.svg
Advocacy: An organization that is engaged in advocacy (anything from political to civil rights to lobbying). Note that an advocacy group can very well be a reliable source; this indicator serves to note when a source's primary purpose is to advocate for certain positions or policies. The websites in this category predominately come from articles in Category:Advocacy groups. advocacy
Education - The Noun Project.svg
Books: Books and other similar printed matter. Not an indicator of reliability by itself. books
Feed Noun project 104.svg
Blog post: Note that a blog post may be considered reliable as a source on the author themselves, or when produced by an established subject-matter expert, whose work in the relevant field has previously been published by reliable, independent publications. See WP:ABOUTSELF and WP:SPS for more information. blogs
Community Noun project 2280.svg
User-generated news: News sites that accept articles from the community, such as Examiner.com or Global Voices. community
Hand-33988.svg
Editable: Sites that are editable by the public, such as wikis (Wikipedia, Fandom) or some databases (IMDb, Discogs). editable
Maki2-town-hall-18 black.svg
State media and other government sources. This categorization takes into account the direct editorial control the government has on the source. Some public broadcasters and other outlets in which the state does not exercise tight editorial control (such as PBS in the United States) will not have this icon. government
Historical Newspaper - The Noun Project.svg
News: News published in reputable news sources that are generally considered reliable on Wikipedia. news
FAQ icon (Noun like).svg
Opinion piece: Opinion pieces and op-eds. opinion
Book X red.svg
Predatory journals: Predatory journals and publishers; these sites charge publication fees to authors without checking articles for quality and legitimacy. This list is derived from Template:Predatory open access source list. predatory
Noun project 401.svg
Press releases press
Social Media - The Noun Project.svg
Social media: Usually a post from a user on a social media platform. Note that a social media post may be considered reliable as a source on the author themselves, or when produced by an established subject-matter expert, whose work in the relevant field has previously been published by reliable, independent publications. See WP:ABOUTSELF and WP:SPS for more information. social
Talking (49969) - The Noun Project.svg
Tabloids: Sites that publish celebrity gossip and tabloid journalism (as in the style of largely sensationalist journalism; publications that publish in tabloid format but are otherwise generally reliable and non-sensationalist are not categorized as tabloids). tabloids
Television (2315) with screen.svg
TV / radio programs: TV and radio programs, which may or may not qualify as news and/or reliable depending on the individual program. tvPrograms
Yes Check Circle.svg
[RSP] Generally reliable in its areas of expertise: Per RSP, editors show consensus that the source is reliable in most cases on subject matters in its areas of expertise. The source has a reputation for fact-checking, accuracy, and error-correction, often in the form of a strong editorial team. rspGenerallyReliable
Achtung-orange.svg
[RSP] Marginally reliable: Per RSP, the source is marginally reliable (i.e. neither generally reliable nor unreliable), and may be usable depending on context. Editors may not have been able to agree on whether the source is appropriate, or may have agreed that it is only reliable in certain circumstances. It may be necessary to evaluate each use of the source on a case-by-case basis while accounting for specific factors unique to the source in question. See Wikipedia's perennial sources list for more details. rspMarginallyReliable
Argentina - NO symbol.svg
[RSP] Generally unreliable: Per RSP, there is community consensus that the source is questionable in most cases. The source may lack an editorial team, have a poor reputation for fact-checking, fail to correct errors, be self-published, or present user-generated content. Outside exceptional circumstances, the source should normally not be used, and it should never be used for information about a living person. Even in cases where the source may be valid, it is usually better to find a more reliable source instead. If no such source exists, that may suggest that the information is inaccurate. The source may still be used for uncontroversial self-descriptions, and self-published or user-generated content authored by established subject-matter experts is also acceptable. rspGenerallyUnreliable
Stop hand.svg
[RSP] Deprecated: Per RSP, there is community consensus to deprecate the source. The source is considered [generally unreliable, and use of the source is generally prohibited. Despite this, the source may be used for uncontroversial self-descriptions. rspDeprecated
X-circle.svg
[RSP] Blacklisted: Per RSP, due to persistent abuse, usually in the form of external link spamming, the source is on the spam blacklist or the Wikimedia global spam blacklist. rspBlacklisted
Question Circle.svg
[RSP] Varied consensus: Per RSP, the community's consensus on the reliability of this site depends on one or more factors (for example, Forbes articles by staff are considered generally reliable, while articles by contributors are considered generally unreliable). See Wikipedia's perennial sources list for more details. rspMulti

Contributing[edit]

We're always looking to expand and tune our categorizations. Please place any questions or ideas on the talk page.

If you're interested in touching the code itself:

Next steps[edit]

Some of the next big goals for the project:

Technical implementation[edit]

Cite Unseen performs string matching on URLs, as well as checks for different types of citation templates, in order to identify the kind of work and any potential ideological leanings.

Cite Unseen is implemented in JavaScript. When Cite Unseen is run, it does the following:

  • Iterates through every citation in a given Wikipedia article and pulls URLs.
  • Checks each URL against a pre-defined list of domains and strings that are categorized by nature (biased, press, news, opinion piece, etc.).
  • Injects icons next to citations accordingly.

See also[edit]