Wikipedia talk:WikiProject Image Monitoring Group

From Wikipedia, the free encyclopedia

Jump to: navigation, search
Shortcut:
WT:IMG

Contents

[edit] Photo/Image Categorization

I've been working recently on categorizing photo requests. Category:Wikipedia requested photographs and Category:Wikipedia requested photographs of people each have thousands of undifferentiated requests, making the categories unwieldy. I have created a lot of subcategories under Category:Wikipedia requested photographs by subject for better categorization, and have written a bot (PhotoCatBot, currently in trial) to do as much of the work automatically as possible.

I'm looking into starting a WikiProject to promote the classification of image requests. Would this fit into the Image Monitoring Group's charter? Most of the group's work seems to be focused on orphaned and inappropriately licensed images. Tim Pierce 14:37, 16 August 2007 (UTC)

[edit] Dynamic image pages

Some websites generate dynamic image pages, so while it is possible to link directly to the image, or search for it by number at a search page, it is not possible to link directly to a page with the image on it. An example is Image:NRCSNM02052.jpg. Go to their website and try and figure out how to link to a page with the image on it. If that can't be done, what should we do? And what does this mean for your image tagging criteria? Do we need a template specifically for "direct image URLS for websites that dynamically generate their front end content"? Carcharoth (talk) 12:12, 3 April 2008 (UTC)

Hang on. Maybe this will work? Carcharoth (talk) 12:13, 3 April 2008 (UTC)
Nope. Any ideas? Carcharoth (talk) 12:17, 3 April 2008 (UTC)

OK. Let's move somewhere else, but first, can you answer my question above, please? It is a fairly common case to have these dynamically generated pages, and I don't see how people can provide a source link that will satisfy you, even if the image is totally free and you can be taken through a step-by-step process to verify its source (ie. not a single link, but a verification method could be put on the image page). Carcharoth (talk) 13:55, 3 April 2008 (UTC)

Sure, I was getting to it. There are many military images under the same situation (I forget which one, perhaps the DoD grouped military image site), and my though that as long as they provide the ID# or other "search" criteria that you can use to find the image, then it's valid. The criteria I uphold is that the sourcing information makes it "possible" to find the source of the image. It doesn't matter if it's easy (a simple url link, 1 click and you're there), or directions/information on where it hangs in a museum. Much like "replaceability" it's possible to get to the source, no matter how hard. If you can't directly link to the webpage for whatever reason, then tell me how I could get there. "Go to this search page (link), enter ID fn98ac98ync4" would be just fine. For this image, I'd put "Go to http://photogallery.nrcs.usda.gov/Search.asp and enter 'NRCSNM02052' under 'filename contains'". That's the best sourcing information we're going to get with these type of image.
But again, we just gave the uploader a fish instead of teaching them to fish. Did we solve the problem? We solved the image sourcing problem, but I'd say we didn't solve the real problem: educating Bill708 (talk contribs count). Though they seemed to just make a few edits and upload this image and then leave. Which them points to the problem again: Users can upload without being properly education. Bill did a pretty good job, but it's not the best. I don't expect all uploaders to be awesome by just reading a bunch of stuff, but certainly Bill could have done better if he had read something that said "We need to be able to verify the source. A direct link to an image is not the best. If you need help ask." (Then they don't ask for help is my experience, at least 99.9999% of the time.) MECUtalk 14:13, 3 April 2008 (UTC)
I googled the image name. Found this at about.com [1] But it appears it was slightly edited... Jimmyjones22 (talk) —Preceding comment was added at 14:20, 3 April 2008 (UTC)
It's a learning process. I'll let MECU explain why care is needed with googling images (that is a good first step, but great care needs to be taken). Ultimately, you are limited by how helpful the incomplete information provided by the uploader was. In general, the more information the better. Even webpage links go dead eventually, or change. Carcharoth (talk) 14:24, 3 April 2008 (UTC)
True. still, it was a fun exercise for me. I made sure the website was the actual source of the photograph before posting the link found. the way I see it, and looking at Mecu's edit history, most of these images he flags probably get deleted. Call me a pack rat, but if they are being used in articles, then they are worth something - even if they have fallen by the wayside. I have a lot of homework, but don't mind searching for sources in my spare time. But where do you guys find all the un-sourced images? Is there a lost and found bin that I don't know about? Jimmyjones22 (talk) 14:32, 3 April 2008 (UTC)

[edit] More images stuff

Compare Image:P200084A.jpg and Image:P200084B.jpg. One appears to be a scan from a New York City Fire Department publication. The other has a US government exhibit evidence tag on it. Both link to the same generic source. What would you do here? Carcharoth (talk) 01:36, 3 April 2008 (UTC)

Here is a hard link to it. [2], hope that helps. Let me know if you want to find any more. Jimmyjones22 (talk) 05:31, 3 April 2008 (UTC)
Thanks. I've added those. Carcharoth (talk) 12:06, 3 April 2008 (UTC)
The images aren't the same. That is, the firefighters in the images aren't the same on both. Still need to find a source (as Jimmyjones22 did) and put it on there. Then, move the image here to Commons and delete the one here. In fact, it was already on Commons (and the source there even pointed to here, ugh). I marked it as such. May I ask how long either of you spent trying to find the source for these images? MECUtalk 14:00, 3 April 2008 (UTC)
I spent about 2-3 minutes. I found a few sites that had the picture, but in the end I decided to use that one, since it was probably the one the poster wanted. Jimmyjones22 (talk) 14:17, 3 April 2008 (UTC)
I just confused myself by thinking the pictures were the same. Mea culpa. I also often fail to check Commons. Something I must do far more often. Carcharoth (talk) 14:30, 3 April 2008 (UTC)
This gives me an idea for a bot idea: Check every image on Wikipedia to see if the image already exists on Commons, if so, mark it with the standard CSD I8 tag. It would just search by image name and could compare MD5 to help make sure it was the exact same image. If it's not the same image, it could put it in another location so that a human could check and see if it's just resized or slightly cropped or something else a bot couldn't verify easily. Not sure how many it would find, but it would be very low-intensity on Wikipedia usage (use database dump for starters). MECUtalk 13:18, 7 April 2008 (UTC)

[edit] Why not gather people together to add sources?

I've commented at Wikipedia talk:Criteria for speedy deletion#Expand/Change CSD I4 to include "incomplete source". In general, have you thought about gathering people together to add or improve sources (where possible)? I know it is not possible in all cases, and I know the uploaders should do this when they upload, but that could be part of the project as well: leaving notes for the uploaders showing them how we have added sources and what they should do in future. I see the same situation developing here as developed with Betacommand, and I'm going to quote what BrownHairedGirl said here:

"The fair use image problem had to be dealt with, and it was inevitably going to cause a lot of very bitter objections. However, so far as I can see, there was very little effective planning on how to handle all those inevitable protests, and minimise the damage caused by the process.

I have not read little of the history, but from what I can see of how things are being handled now, the fundamental problem was that the fair use issue seems to have been approached primarily as a technical problem — how to identify, tag, and if necessary remove non-free images — when some minimal hazard analysis should have shown that the community-anger problem was in fact going to be much more serious.

That set off a pretty much inevitable cycle of a necessarily-hyperactive bot starting work, massive howls of anguish, too many of them turning into very unpleasant attacks ... leading to defensiveness and counter-attacks by a bot operator who must have felt with full justification that he was being savaged by a million mad dogs.

I have a huge amount of sympathy for Betacommand in this, who seems to have been ill-equipped and under-supported in this. He ended up an appallingly exposed position, and it must have been absolutely horrible for him." - User:BrownHairedGirl - User talk:BrownHairedGirl - 7 March 2008

Do you see how this could end up applying to you? I already see the large template at the top of your talk page, and the attempt to answer people's anger before they post here. I support adding sources to images, just as we should for articles, but I think a better approach is to get a large group of people involved, to analyse the problem, identify areas that can be easily fixed, and then start fixing. Then delete what can't be fixed. It may take time, but it will be less stressful and more productive. How about it? Would you be willing to help out with something like this? Carcharoth (talk) 16:23, 2 April 2008 (UTC)

This is the most constructive thing to come of this yet. I don't have time to fully respond now, but I will. Expect a barnstar soon for this. MECUtalk 18:50, 2 April 2008 (UTC)
How long have you been doing this sort of tagging for? I've added more comments here. Could you also check my changes to some other images from the "bsr" category. Such as this one (adding a pdf link) and this one (which link works better as a source?). One the other hand, these three ([3], [4], [5]) are difficult to fix because they are no longer in use in images. From the look of the pictures and the uploader's name, they are probably old Chinese prints of geishas, but a look through the page history of that article failed. The current links for those images are either dead or to baidu, which is most unhelpful. I had to give up on those three iamges, but am working through the other old and B&W images in that category. Modern images leave me cold. Carcharoth (talk) 01:26, 3 April 2008 (UTC)
This is an interesting thread. I would be willing to help with the citation project if someone would like to take charge, (I nominate Carcharoth)but I would also like to recommend a few changes in the deletion process - such as increasing the amount of time before deletion (from 7 to 30 days) and changing the proceedure involved for flagging - maybe adding a few exrta steps. Additionally, the email you send to posters seems antagonistic. You should rephrase it to be a little less impersonal. As it stands, it sounds like a robot is flushing thier images down the toilet to a countdown. Also, I would suggest making a new board for images needing citations, and devide them by thier tags, and include links to thier articles to make it easier to find sources. But other than that, i'm definetely in. Jimmyjones22 (talk) 03:38, 3 April 2008 (UTC)
Image sources should always link to a HTML source. (Just) Linking directly to the image (like the second link) would also end up getting a {{bsr}} tag. Part of the problem is, as Jimmyjones22 stated, lack of information about the image on the image description page. Unlink Commons, we typically don't ask folks to put the {{information}} template filled out correctly, a license tag of "pd-self" or some (vague) source information is typically all we get. What the image is, is a rarity. Perhaps we could designate a future month (like June or July?) as "Wikipedia Image Cleanup Month" and get it in the Wikinews and try and put the publicity out otherwise (that annoying header thing that I usually hide?) with directions on how to cleanup a user's own images (ie, better source, information template, good license, nominate images they don't need/want/use anymore for deletion, etc). While we won't solve the true problem, even if a few images get fixed by someone, it may be worth it. Perhaps we end up recruiting a few folks that end up going through thousands of images to clean them up?
But I think the true problem is that users are allowed to uploader images (that's a joke). The real problem is that users are allowed to upload images with little education as to what they really need to do. Last year we (others mostly) revamped the upload system to try and provide more education, which is a good first step, and I do believe it helped, but there are users that will ignore it (can't teach someone who already "knows") and then the 100,000s of images already uploaded that need to be gone through. Few people are willing to ask for help at the many locations where it's available (or even read an image policy). I don't know the best solution ("Image Cleanup Month" is the best I've got for now). I've been going through images since ?December? 2006. Not as heavily as I have recently, but I've been going through them for quite a long time and this is the first brew-ha-ha I've had over any of my work. I'd be fine with changing the CSD from 7 to 30 days. The length is largely irrelevant, and it'll just mean a 3 week break for admins that go through the categories and they'll be working on 30 day old tagged images instead of 7. It will largely be the same.
Ask admins that work that area how many images that get marked "no source/license" end up getting deletion. I'd say it's about 90-95%. Most never get fixed by the user. I actually looked at some images in detail once to see if users were active, had seen the notice(s) and just ignored them or what. 90-95% of the users were active but just chose to ignore it (that was last summer, so perhaps it's different now) with the age of the image (when it was uploaded) seemed to not matter. Perhaps we could do another study (there are lots of test cases for research, maybe get a bot to do the labor intensive work this time?) which may help us identify the real problem(s)? It's not hard to get the information, just labor intensive (ideal for a bot, and they could do it fast since they're just reading information, not writing).
But in the end, after June/July and any other steps we take, we're still going to end up having to go through images and fixing them. If the tags are ugly or rude or whatever (I'm largely immune to them and can recognize them on sight without having to read any text anymore), then that needs to be addressed outside of the scope since they are standard templates. MECUtalk 13:50, 3 April 2008 (UTC)

[edit] Summarising the above three discussions

Anyone want to pull out the important points from the above three discussions? Carcharoth (talk) 15:15, 3 April 2008 (UTC)

[edit] Wikipedia Image Cleanup Month (June)

I really think this idea could be very useful and helpful to the project. I'm therefore going to really push the idea. Why June? Because it's currently April and figure about 8 weeks should be plenty of time to develop and research all the required information to pull this off. It starts on a Sunday, meaning roll-out should be softer than on a Weekday.

  • To do list:
    • Find out how to get a message on the top of every page (that users can dismiss).
      • See if possible to show only to logged in users (because only editors with accounts can upload, and that's the target, IPs CAN help, but aren't the target)
    • Define concrete (hopefully measurable) goal(s)
      • Moved images to Commons? (Commons mover can track images moved in any time period since early April) commons:User:Magnus Manske/CommonsHelper stats)
        • Need to know number of current (free) images on Wikipedia at start of month, is this easy to calculate? (answered, 360,125 on April 8, see below for how)
      • number of images deleted/cleaned/processed?
    • find out how to get an article in The Signpost [6], and other wiki-related outlets (Wikipedia Weekly [7]) Wikipedia:NotTheWikipediaWeekly (just provide an mp3?)
      • Plan message in advance, get someone who writes well to create copy
      • can get message on main page?
      • tailor message to WikiProjects? Target individual needs of WikiProjects (music will need fair use, military more like PD-old, most probably general fair use/replaceability/etc)
    • Create landing page and central action point
      • Instruction pages about what the heck we want to do this month
      • Where to find bad images
      • Coordinating activities
      • How-to pages
    • Create directions, self-learning activities, question/answer/quiz for new (or new to images) people to learn
      • create multi levels:
        • beginner: (source, tags, permission (how-to OTRS), what's free, what's not, etc), What is Commons? Why Commons?
        • intermediate: (replaceable, how do tag images, how to search for bad ones, moving to Commons, etc)
        • advanced: (complex copyright cases (derivative works), freedom of panorama, what permissions are "good"
        • special administrator level: how to process IFD, no source, no license, WP:PUI, other speedy (I3) etc.
        • ?special OTRS level: what is a good permission, bad permission, things to look for in email, how to process (OTRS folks may help write this)
    • Central place to coordinate new activities that arise during the month (will be off Wikipedia:WikiProject Image Monitoring Group/June 2008 image cleanup project)
      • Someone will likely come up with something new that we should be welcoming to accomplish during the month (or at least a follow-on)
    • Need to find out how to recruit more people to help setup/plan the Month
    • Need to have some award structure (some people are motivated by awards, they're cheap and easy, so whynot? Also will have residual affect, people in months/years later will see the awards and maybe find us and learn then
      • Moved 10/50/100/1000/most images to Commons [8]
      • "cleaned up" 10/50/100/1000/most images
      • "answered the most edits at X help place"
    • Activities to do during the month that help support images/using/finding and Commons
    • Commons involved as well
      • Categorize images there
      • Clean structure, pages
      • Process images there as well
      • Nominate featured images (what is this, how do you do it) and quality images
    • create "banner" to go on all project pages to help navigation and location of other things during the month Wikipedia:WikiProject Image Monitoring Group/June 2008 image cleanup project/banner
    • Beta test pages in late May before beginning.
      • find new user, ask them to read though, does it all makes sense? What doesn't? What else do they have questions about?

This list is by no means complete. Feel free to edit it or remove things that aren't needed per any discussion, or once the information is found (maybe leave it with a link of how to?). MECUtalk 22:52, 3 April 2008 (UTC)

[edit] Goal

Goal of project: (measurable item)

  1. Education - new user, old users, all user, admins (hard to define easy traceable statistic, but long term should be reduction in free images here, less problems with deletions, etc, except for new user problems, but then we have a place to point for education)
  2. Cleanup the 300,000+ free images on Wikipedia. - Better source, better license, better information, rationales. (Number of image edits, compare to regular month, database dump?)
    1. Eliminate orphaned images *Use it or lose it.* - not just for fair use reasons, though we can include these at well). (Number of images deleted. Need to compare "normal" deletion month with June. IFD & CSD. hard to track "orphaned" as a reason sans fair use, used to be 100k+ orphaned images, unknown if still true.)
  3. Move free images to Commons. (Number of images moved to Commons in June compared to normal month, average ~10k/month)
Anyone have any other ideas or how to better state goals? Perhaps not include the measurement criteria when presenting the topic to the masses, this would just be for our verification to see what was good/bad/pointless. For repeat cleanup projects and to know what needs more help in the future. Should this be a subpage off WP:IMG like Wikipedia:WikiProject Image Monitoring Group/June 2008 image cleanup project or another location? MECUtalk 13:28, 7 April 2008 (UTC)

Actual goals presented:

  1. Educate.
    1. Read through information topics (beginner, intermediate, advanced, admins(, OTRS))
    2. All about images (Wikipedia, Commons, usage, free)
    3. Ask questions (if needed)
  2. Cleanup.
    1. Your own images (link), images in articles you care about.
      1. Source
      2. License
      3. Information
      4. Rationales (if needed)
    2. All other images.
      1. How-to find.
      2. Orphans
      3. What do to with them (tagging, intermediate/advanced topic).
      4. Admins:Work at PUI, IFD, CSD categories.
  3. Move.
    1. Free images to Commons

[edit] Memberlist

See Wikipedia:WikiProject Image Monitoring Group/June 2008 image cleanup project/members. MECUtalk 16:37, 8 April 2008 (UTC)

[edit] Responses

  • A lot to think about here. You might want to advertise it a bit more. I was hoping others might have responded before I got round to it, and even now I only have time to answer one question:
    • "Need to know number of current (free) images on Wikipedia at start of month, is this easy to calculate?" - Not really, is the short answer. The long answer is that an approximation can be arrived at if you consider the number of non-image media files (sounds, .ogg; and video clips, can't remember the filename extension) to be much smaller than the number of image media files (.jpg, .svg, .png, etc). I'm never sure whether .gifs count as proper video files or not. Anyway, the total number of media files (ie. everything in the "Image:" namespace) is given at Special:Statistics. The current total is 772,759 (there is also a magic word that I can't find at the moment). This also includes dupes of Commons images that are kept here and not deleted (for various reasons, though most are in fact deleted, I think). Trying to assess the split between free and non-free can be done if you assume that: (a) all images have license templates, and that the lists at User:BetacommandBot/Free Template Useage and User:BetacommandBot/Non-Free Template Useage cover all the relevant templates in use. I've pasted the data at those two pages into an Excel spreadsheet, and the total, as of 8th April 2008, are, 360,125 free images and 282,264 non-free images. Add those together, and you get a total of 642,389. I would like to know what the missing 130,370 media files are (whether they are all sounds and video clips, or whether there are lots of images knocking around without license tags or using license tags not tracked by BetacommandBot), but for your purposes, the total at User:BetacommandBot/Free Template Useage when you start the clean-up month, should be sufficient. At the moment it is 360,125 and shouldn't be too much larger or smaller in a few months time. Carcharoth (talk) 13:28, 8 April 2008 (UTC)
      • That does sound like a good estimate at least for our purposes. Getting anymore accurate or effort likely won't be worth it. Thanks for doing the research. How/where should I advertise? (I thought "advertising" was bad.) MECUtalk 14:12, 8 April 2008 (UTC)
      • That 130,000 or so are probably orphaned images. Betacommand is likely using database dump which reports image used in articles, though you said license tags. Several months ago I used to track the orphaned image count and it was well over 100k before the system broke. 130,000 sounds reasonable to me. All those orphans should be deleted or moved to Commons. Period. If usable, the move to Commons and use. Otherwise delete. Yes, it's that big of a problem. Many of the images were uploaded long ago (2004, I think someone cleared out all the 2003 images) which have plenty of bad sources and/or licenses. MECUtalk 16:59, 8 April 2008 (UTC)
        • Could you put a ballpark figure up next to the question? Advertising is not a problem in a case like this, as long as it is not excessive and is limited to places where people expect to find notices. WP:VP (pick the one that looks best), WP:AN (technically not needed, but worth it as admin tools may be needed), Wikipedia:Community Portal has a bulletin board, there is also WP:CENT. The widest of issues (once in a blue moon) end up on the site notice. When the clean-up month is ready to launch, you could try and end up on the site notice, but it is very hard to persuade people to do that. There are also some image places you could post to (WP:MCQ and WT:IMAGE for starters, and lots more), as while not always directly related, people already working on images might be interested in working specifically on sources. WT:SOURCE and WT:RS might be other good bets, as though people there are interested in article sources, they should, in theory, be interested in image sources as well, if maybe not very experienced in that. Limit your initial requests for input to get a core of interested people developing the plans, and then advertise it more to get people to sign up. Good luck and let me know if I can help with anything specific (I'll keep watching and helping when I can anyway). Carcharoth (talk) 14:47, 8 April 2008 (UTC)
          • I put the figure you found up there (you could have), will have to be updated before we begin. I'll advertise at all those places shortly. I agree for now it should be people willing to help plan, write pages, plan and figure things out. Thanks for the tips and help. MECUtalk 16:33, 8 April 2008 (UTC)
          • Move all this to the project page/talk page? MECUtalk 16:37, 8 April 2008 (UTC)
            • Move what where? Some of the new details and plan can go on the front page, some of the talk can be archived before inviting other people in, yes, if that's what you mean. Carcharoth (talk) 18:39, 8 April 2008 (UTC)
Personal tools