User talk:GreenC bot
You can stop the bot by pushing the stop button. The bot sees and immediately stops running. Unless it is an emergency please consider reporting problems first to my talk page. |
Flagging non-dead link as dead[edit]
This edit flagged this URL as dead even though it isn't. Jo-Jo Eumerus (talk) 11:17, 18 July 2022 (UTC)
- Same with these edits:
- I appreciate it probably has to do with some kind of automatic PDF link serving in Javascript that Academia.edu uses wouldn't be readily captured with a bot; I don't know how fixable it is, but the links noted are not dead at all; I reverted both edits that the bot flagged. Ifly6 (talk) 14:35, 18 July 2022 (UTC)
- The url that Editor Jo-Jo Eumerus linked:
- Both of the urls that Editor Ifly6 links:
- There was some discussion about these kinds of academia links at Wikipedia:Link rot/URL change requests § www.academia.edu/download/
- —Trappist the monk (talk)
14:43, 18 July 2022 (UTC)14:46, 18 July 2022 (UTC)
- Jo-Jo Eumerus & User:Ifly6 they are dead for me (USA). Example. Are you getting a redirect to a cloudfront URL? Wondering if there is some kind of location-aware policy that determines when to serve the cloudfront URL vs a 404. If the cloudfront URL was known, it would be possible to save it at the Wayback Machine, then use the Cloudfront-Wayback URL on Wikipedia treated as a dead link (due to its &Expires self-destruct mechanism see WP:AWSURL). However, I wonder about copyright if academia.edu is making them unavailable in the US and possibly elsewhere, question why have that policy if not a rights issue. -- GreenC 15:04, 18 July 2022 (UTC)
- I'm in the US and am getting the links promptly. The links I am getting are Cloudfront ones with an expiry; I used the Academic.edu links to avoid the known expiry. Ifly6 (talk) 15:41, 18 July 2022 (UTC)
- Ah I see you use British English so I assumed you are not US. What browser do you use? Do you have any plugins that might affect javascript? This is impacting archive providers as well, such as Wayback Machine and Ghostarchive (US-based), they also get 404. Archive.today it "works" (global IP pool) but they are unable to correctly save the PDF. -- GreenC 16:00, 18 July 2022 (UTC)
- I do get a "d1wqtxts1xzle7.cloudfront.net" sort of thing. Jo-Jo Eumerus (talk) 17:33, 18 July 2022 (UTC)
- Language heuristics are always right 99pc of the time haha. I've confirmed on Edge (Windows 10) and Safari (macOS) that the Academia.edu link work. I don't have any plugins installed other than ad blockers that would affect something like this. The specific link that got generated for me with Rafferty was https://d1wqtxts1xzle7.cloudfront.net/51344857/Iris-_Fall_of_the_Roman_Republic-with-cover-page-v2.pdf. There were then a pile of GET parameters that I've excerpted – they change every time anyway – but are necessary to get the file served properly. Ifly6 (talk) 19:24, 18 July 2022 (UTC)
- Jo-Jo Eumerus do you use Edge or Safari? -- GreenC 19:38, 18 July 2022 (UTC)
- Wikipedia:Village_pump_(technical)#academia.edu/download .. seeing if anything comes up here. -- GreenC 19:52, 18 July 2022 (UTC)
- Ifly6 in the above thread someone suggested perhaps you had signed up for account on academia.edu at some point? Or some old cookies that are giving permission. One way to test is try to access from a private window. -- GreenC 20:46, 18 July 2022 (UTC)
- Yea, that's probably it. I opened it in a private window and got the 404. Ifly6 (talk) 20:57, 18 July 2022 (UTC)
- Same for me (Firefox) Jo-Jo Eumerus (talk) 21:12, 18 July 2022 (UTC)
- Cool, glad it is figured out what is causing it. My thinking is to replace the academia.edu links with a Wayback version of the cloudfront URL so it's accessible for everyone. Or second option is to use
|url-access=registration
but that 404 page is confusing and will result in bots marking it dead. -- GreenC 21:30, 18 July 2022 (UTC)
- Yea, that's probably it. I opened it in a private window and got the 404. Ifly6 (talk) 20:57, 18 July 2022 (UTC)
- Ifly6 in the above thread someone suggested perhaps you had signed up for account on academia.edu at some point? Or some old cookies that are giving permission. One way to test is try to access from a private window. -- GreenC 20:46, 18 July 2022 (UTC)
- Ah I see you use British English so I assumed you are not US. What browser do you use? Do you have any plugins that might affect javascript? This is impacting archive providers as well, such as Wayback Machine and Ghostarchive (US-based), they also get 404. Archive.today it "works" (global IP pool) but they are unable to correctly save the PDF. -- GreenC 16:00, 18 July 2022 (UTC)
- I'm in the US and am getting the links promptly. The links I am getting are Cloudfront ones with an expiry; I used the Academic.edu links to avoid the known expiry. Ifly6 (talk) 15:41, 18 July 2022 (UTC)
User:Jo-Jo Eumerus|User:Ifly6|User:Biogeographist: Would like to propose this solution: Special:Diff/1098978075/1099315632. It's only for academia.edu/download links, which are about 1,000 on enwiki.
- academia.edu returns a 404 when a user is not registered and logged in, which is most users. It does not say "log in to access paper", rather a misleading 404 dead link page. This causes problems:
- Archive bots will determine the links are dead (404) and mark with a
{{dead link}}
. - Users will be confused thinking the link is dead and not behind a registration wall.
- Should the link ever actually die for real, there would be no archive available since the Wayback Machine sees only a dead 404 page - the Wayback machine is not an academia.edu registered user.
- Archive bots will determine the links are dead (404) and mark with a
- While possible to use
|url-access=registration
this does not solve the misleading 404 problems. - The cloudfront link is an AWS container with an &Expires self-destruct mechanism. It's where the paper is actually located (not on academia.edu which redirects to cloudfront).
- The proposal is to determine the active cloudfront link via bot magic, immediately create a Wayback Machine save of the cloudfront URL, and change the citation to the Wayback-cloudfront link. eg. Special:Diff/1098978075/1099315632
This is what I can do somewhat easily right away. There are limits due to bot design and coding efforts what can be done. -- GreenC 04:15, 20 July 2022 (UTC)
- Hmm. It seems a bit complex and I wonder if people will be deleting the "expires" part of the link. Jo-Jo Eumerus (talk) 10:22, 20 July 2022 (UTC)
Unfortunately there is something preventing cloudfront pages from being saved at Wayback. Not all pages, but most. So we have a bad situation with academia.edu/download links - ideally they should be converted to a non /download/ links - but can't be done by bot requires manual searching. The /download/ links are probably originating from Google Scholar, copy-pasting. -- GreenC 15:56, 23 July 2022 (UTC)
Backlinks report[edit]
User:Certes/Backlinks/Report seems to have stopped, but User:GoingBatty/Backlinks/Report is running normally. I've not added any new backlinks recently. Can you see anything else that I may have broken? Certes (talk) 11:17, 25 July 2022 (UTC)
- It aborted for unknown reasons. I increased the memory allocation by 10x in case that is the problem. The data may be messed up from the abort. I've restarted the process and will see what happens over the next hour or so if it can recover. Worse case will just delete all the data and it will rebuild from scratch, but that will result in a missed day. -- GreenC 15:34, 25 July 2022 (UTC)
- Thanks. Let me know if I'm checking too many targets or if some produce exceptionally big reports, and I'll remove the less productive ones. Certes (talk) 15:45, 25 July 2022 (UTC)
- It was crashing at "m" then after increasing memory made it to "v". Odd bc it should not run out of memory, and there are no error messages system or program to suggest why it's silently halting so it might be something different. I added debug statements, takes a while to replicate an hour or more. Thanks for holding. -- GreenC 04:26, 26 July 2022 (UTC)
- Odd: "m" and "v" are early in my list, and neither they nor anything earlier have many incoming links. If it's taking an hour then we may need to remove the entries with lowest benefit per second. A few entries have never triggered a fix and could probably be removed, but I've already removed the resource-heavy ones. Maybe I need to rate them all by fixes done per 1000 incoming links or similar and chop those scoring lowest. "v" is an oddity because it can indicate that the editor failed to press Ctrl when pasting: easy to spot, but hard to fix as you need to guess what was in their clipboard. Certes (talk) 12:39, 26 July 2022 (UTC)
- The memory problem appears to be cumulative if I run m or v in isolation they do fine but when running the whole bunch there is a massive spike in memory claim that occurs at the same spot around v or x, but also others don't release their claims so it builds up. It could be related to the Sun Grid Engine caching for performance reasons. I've checked the program for errant global vars and it's fine there is nothing holding onto data. I might try separating the backlinks retrieval portion to a different program so it exits between each item clearing any memory claims. -- GreenC 16:48, 26 July 2022 (UTC)
- I think it is fixed. A combination of repetitive backlinks reported by the API and inefficiencies in the program magnifying those repetitions. It should never use more than about 25MB of ram, but with "V" (and "v") it was as high as 1 gigabyte. Why V? I suspect it's due to WP:V which is so commonly linked outside mainspace. V exposed the problem, but it was occurring at a smaller scale with everything else. (The API typically and erroneously reports 100s of the same backlink - I don't know why it's always done this.) "V" had 2.5 million non-unique occurrences. Add to this the program was inefficient in how it dealt with the repetitions, it added up and the Grid Engine was nope and dropped the job. Right now it's starting over rebuilding the database, it should be back to normal soon. -- GreenC 05:44, 27 July 2022 (UTC)
- Thanks very much. The current version looks right, considering that it's for a few hours rather than the usual 24. Is it possible to add the namespace of the link target to the query? I'm not sure how you're extracting the data but, for example, Quarry would run its SQL much faster with "and pl_namespace=0". Certes (talk) 11:21, 27 July 2022 (UTC)
- API:Backlinks. When I first made this program (not your fork of it) around April 2015, Quarry was only about 6 months old I think, anyway I wasn't aware of it, and I wanted something that would run from anywhere which left the API. Speed is not an issue when running daily, unless it takes > 24hrs. Your job completes in about 2 hours, it is exceptionally big. The API behavior of multiple results is weird but can be adjusted for. If it continues to be a problem I can look into Quarry, getting a JSON file would nice. -- GreenC 15:41, 27 July 2022 (UTC)
- In that case, blnamespace is what I meant, but I'm not clear what it should be set to: the several namespaces in which relevant links appear, or ns 0 to which relevant links lead. If my job is taking two hours then I should be checking fewer targets; any clues as to which entries take the most time would help with that. Certes (talk) 18:27, 27 July 2022 (UTC)
- Below is an 'ls' of the data files. The timestamps show how long each took to complete. The file size is misleading as the program filters out namespaces. Like "V" (and "v" they are indenitcal to the API) is not very large filesize, but took almost 25 minutes to complete. It took about 85m to finish not 120m my mistake. V/v is about 50 minutes. U/u 20 minutes. N/n 10 minutes. Those are the big three and use 95% of the time (is that right?). Probably due to WP:V, WP:U and WP:N. -- GreenC 19:28, 27 July 2022 (UTC)
- Thanks. I'll take V/v, U/u and N/n out then. U and N rarely get a hit. V gets more but I'm less confident about fixing them as most of them require me to guess what article the editor was thinking of. Certes (talk) 20:57, 27 July 2022 (UTC)
- All working as normal today, and an hour faster than previously. Thanks again for your help. Certes (talk) 10:03, 28 July 2022 (UTC)
- Below is an 'ls' of the data files. The timestamps show how long each took to complete. The file size is misleading as the program filters out namespaces. Like "V" (and "v" they are indenitcal to the API) is not very large filesize, but took almost 25 minutes to complete. It took about 85m to finish not 120m my mistake. V/v is about 50 minutes. U/u 20 minutes. N/n 10 minutes. Those are the big three and use 95% of the time (is that right?). Probably due to WP:V, WP:U and WP:N. -- GreenC 19:28, 27 July 2022 (UTC)
- In that case, blnamespace is what I meant, but I'm not clear what it should be set to: the several namespaces in which relevant links appear, or ns 0 to which relevant links lead. If my job is taking two hours then I should be checking fewer targets; any clues as to which entries take the most time would help with that. Certes (talk) 18:27, 27 July 2022 (UTC)
- API:Backlinks. When I first made this program (not your fork of it) around April 2015, Quarry was only about 6 months old I think, anyway I wasn't aware of it, and I wanted something that would run from anywhere which left the API. Speed is not an issue when running daily, unless it takes > 24hrs. Your job completes in about 2 hours, it is exceptionally big. The API behavior of multiple results is weird but can be adjusted for. If it continues to be a problem I can look into Quarry, getting a JSON file would nice. -- GreenC 15:41, 27 July 2022 (UTC)
- Thanks very much. The current version looks right, considering that it's for a few hours rather than the usual 24. Is it possible to add the namespace of the link target to the query? I'm not sure how you're extracting the data but, for example, Quarry would run its SQL much faster with "and pl_namespace=0". Certes (talk) 11:21, 27 July 2022 (UTC)
- I think it is fixed. A combination of repetitive backlinks reported by the API and inefficiencies in the program magnifying those repetitions. It should never use more than about 25MB of ram, but with "V" (and "v") it was as high as 1 gigabyte. Why V? I suspect it's due to WP:V which is so commonly linked outside mainspace. V exposed the problem, but it was occurring at a smaller scale with everything else. (The API typically and erroneously reports 100s of the same backlink - I don't know why it's always done this.) "V" had 2.5 million non-unique occurrences. Add to this the program was inefficient in how it dealt with the repetitions, it added up and the Grid Engine was nope and dropped the job. Right now it's starting over rebuilding the database, it should be back to normal soon. -- GreenC 05:44, 27 July 2022 (UTC)
- The memory problem appears to be cumulative if I run m or v in isolation they do fine but when running the whole bunch there is a massive spike in memory claim that occurs at the same spot around v or x, but also others don't release their claims so it builds up. It could be related to the Sun Grid Engine caching for performance reasons. I've checked the program for errant global vars and it's fine there is nothing holding onto data. I might try separating the backlinks retrieval portion to a different program so it exits between each item clearing any memory claims. -- GreenC 16:48, 26 July 2022 (UTC)
- Odd: "m" and "v" are early in my list, and neither they nor anything earlier have many incoming links. If it's taking an hour then we may need to remove the entries with lowest benefit per second. A few entries have never triggered a fix and could probably be removed, but I've already removed the resource-heavy ones. Maybe I need to rate them all by fixes done per 1000 incoming links or similar and chop those scoring lowest. "v" is an oddity because it can indicate that the editor failed to press Ctrl when pasting: easy to spot, but hard to fix as you need to guess what was in their clipboard. Certes (talk) 12:39, 26 July 2022 (UTC)
- It was crashing at "m" then after increasing memory made it to "v". Odd bc it should not run out of memory, and there are no error messages system or program to suggest why it's silently halting so it might be something different. I added debug statements, takes a while to replicate an hour or more. Thanks for holding. -- GreenC 04:26, 26 July 2022 (UTC)
- Thanks. Let me know if I'm checking too many targets or if some produce exceptionally big reports, and I'll remove the less productive ones. Certes (talk) 15:45, 25 July 2022 (UTC)
Extended content
|
---|
22930 Jul 27 09:11 0.new 127027 Jul 27 09:11 1.new 16924 Jul 27 09:11 2.new 15575 Jul 27 09:11 3.new 15540 Jul 27 09:11 4.new 14709 Jul 27 09:12 5.new 12741 Jul 27 09:12 6.new 17054 Jul 27 09:12 7.new 15220 Jul 27 09:12 8.new 14745 Jul 27 09:12 9.new 7476 Jul 27 09:13 10.new 6315 Jul 27 09:13 100.new 15741 Jul 27 09:13 A.new 13776 Jul 27 09:13 B.new 16104 Jul 27 09:13 C.new 13410 Jul 27 09:13 D.new 13301 Jul 27 09:14 E.new 12605 Jul 27 09:14 F.new 13550 Jul 27 09:14 G.new 13518 Jul 27 09:14 H.new 14387 Jul 27 09:14 I.new 13005 Jul 27 09:14 J.new 12845 Jul 27 09:14 K.new 14099 Jul 27 09:14 L.new 13174 Jul 27 09:14 M.new 39805 Jul 27 09:18 N.new 13668 Jul 27 09:19 O.new 13088 Jul 27 09:19 P.new 11858 Jul 27 09:19 Q.new 14160 Jul 27 09:19 R.new 14529 Jul 27 09:19 S.new 13146 Jul 27 09:19 T.new 15718 Jul 27 09:21 U.new 96856 Jul 27 09:45 V.new 12403 Jul 27 09:45 W.new 12797 Jul 27 09:45 X.new 13659 Jul 27 09:45 Y.new 13403 Jul 27 09:45 Z.new 15741 Jul 27 09:45 a.new 13776 Jul 27 09:45 b.new 16104 Jul 27 09:45 c.new 13410 Jul 27 09:46 d.new 13301 Jul 27 09:46 e.new 12605 Jul 27 09:46 f.new 13550 Jul 27 09:46 g.new 13518 Jul 27 09:46 h.new 14387 Jul 27 09:46 i.new 13005 Jul 27 09:46 j.new 12845 Jul 27 09:46 k.new 14099 Jul 27 09:46 l.new 13174 Jul 27 09:46 m.new 39805 Jul 27 09:51 n.new 13668 Jul 27 09:51 o.new 13088 Jul 27 09:51 p.new 11858 Jul 27 09:51 q.new 14160 Jul 27 09:51 r.new 14529 Jul 27 09:51 s.new 13146 Jul 27 09:51 t.new 15718 Jul 27 09:53 u.new 96856 Jul 27 10:16 v.new 12403 Jul 27 10:16 w.new 12797 Jul 27 10:16 x.new 13659 Jul 27 10:16 y.new 13403 Jul 27 10:16 z.new 217699 Jul 27 10:17 ABC 5951 Jul 27 10:17 Accolade.new 118095 Jul 27 10:17 Acre.new 89027 Jul 27 10:17 Admiral.new 22088 Jul 27 10:17 Alphabet.new 29758 Jul 27 10:17 Amber.new 4295 Jul 27 10:17 Amen.new 31785 Jul 27 10:17 Aperture.new 2643 Jul 27 10:17 Ash.new 2643 Jul 27 10:17 ash.new 44238 Jul 27 10:17 Atlantic.new 1375 Jul 27 10:17 Back.new 1375 Jul 27 10:17 back.new 36337 Jul 27 10:17 Bay.new 36337 Jul 27 10:17 bay.new 53374 Jul 27 10:17 Bowling.new 53374 Jul 27 10:17 bowling.new 2048 Jul 27 10:17 Cabinet 36569 Jul 27 10:17 Captain.new 36569 Jul 27 10:17 captain.new 12368 Jul 27 10:17 Calvary.new 12368 Jul 27 10:17 calvary.new 26920 Jul 27 10:17 Caterpillar.new 28665 Jul 27 10:17 Chancellor.new 28665 Jul 27 10:17 chancellor.new 31754 Jul 27 10:17 Chestnut.new 31754 Jul 27 10:17 chestnut.new 4924 Jul 27 10:17 Chin.new 725 Jul 27 10:17 Clipboard.new 725 Jul 27 10:17 clipboard.new 44162 Jul 27 10:17 Colony.new 44162 Jul 27 10:18 colony.new 3070 Jul 27 10:18 Colonies.new 3070 Jul 27 10:18 colonies.new 55 Jul 27 10:18 Colors.new 55 Jul 27 10:18 colors.new 565 Jul 27 10:18 Colours.new 565 Jul 27 10:18 colours.new 138372 Jul 27 10:19 Company.new 138372 Jul 27 10:20 company.new 6611 Jul 27 10:20 Companies.new 6611 Jul 27 10:20 companies.new 14699 Jul 27 10:20 Consul.new 14699 Jul 27 10:20 consul.new 76725 Jul 27 10:20 Colorado 3180 Jul 27 10:21 Commonwealth.new 3180 Jul 27 10:21 commonwealth.new 30657 Jul 27 10:21 Conservative.new 1206 Jul 27 10:21 Conservatives.new 113900 Jul 27 10:21 Corvette.new 2005 Jul 27 10:21 Corvettes.new 28639 Jul 27 10:21 Delphi.new 48181 Jul 27 10:21 Family.new 48181 Jul 27 10:21 family.new 2257 Jul 27 10:21 Families.new 2257 Jul 27 10:21 families.new 61603 Jul 27 10:21 Icon.new 61603 Jul 27 10:21 icon.new 6665 Jul 27 10:21 Icons.new 6665 Jul 27 10:21 icons.new 5801 Jul 27 10:21 Interpreter.new 5801 Jul 27 10:21 interpreter.new 70977 Jul 27 10:21 Jupiter.new 12095 Jul 27 10:21 Knot.new 12095 Jul 27 10:21 knot.new 80891 Jul 27 10:21 Krishna.new 121459 Jul 27 10:21 Lead.new 121459 Jul 27 10:21 lead.new 127 Jul 27 10:21 Liberal 180 Jul 27 10:21 Libertarian 183969 Jul 27 10:22 Madonna.new 183969 Jul 27 10:22 madonna.new 65528 Jul 27 10:22 Mass.new 65528 Jul 27 10:22 mass.new 5378 Jul 27 10:22 Meta.new 770 Jul 27 10:22 Ministry 3160 Jul 27 10:22 Model.new 3160 Jul 27 10:22 model.new 176677 Jul 27 10:23 Moon.new 176677 Jul 27 10:23 moon.new 214735 Jul 27 10:23 National 199067 Jul 27 10:23 Oxygen.new 76332 Jul 27 10:23 Primate.new 76332 Jul 27 10:23 primate.new 5462 Jul 27 10:23 Roland.new 346 Jul 27 10:24 Ronaldo.new 68973 Jul 27 10:24 Salt.new 68973 Jul 27 10:24 salt.new 16813 Jul 27 10:24 Season.new 16813 Jul 27 10:24 season.new 44306 Jul 27 10:24 Shiraz.new 44306 Jul 27 10:24 shiraz.new 53287 Jul 27 10:24 Spire.new 53287 Jul 27 10:24 spire.new 153867 Jul 27 10:24 Stream.new 153867 Jul 27 10:24 stream.new 11482 Jul 27 10:24 Telegram.new 3845 Jul 27 10:24 Thermal.new 3845 Jul 27 10:24 thermal.new 88519 Jul 27 10:24 Tree.new 88519 Jul 27 10:24 tree.new 3102 Jul 27 10:24 Trojan 3102 Jul 27 10:24 trojan 167 Jul 27 10:24 U.S. 2334 Jul 27 10:24 Victory.new 26424 Jul 27 10:24 Ardennes.new 19159 Jul 27 10:24 Aspen.new 1884 Jul 27 10:24 Baler.new 105737 Jul 27 10:25 Batman.new 20662 Jul 27 10:25 Battle.new 53364 Jul 27 10:25 Bethlehem.new 439921 Jul 27 10:25 Birmingham.new 11530 Jul 27 10:25 Boulder.new 54094 Jul 27 10:25 Brampton.new 14995 Jul 27 10:25 Calvados.new 208354 Jul 27 10:25 Cambridge.new 71179 Jul 27 10:25 Canterbury.new 15715 Jul 27 10:25 Caracal.new 203571 Jul 27 10:26 Christchurch.new 78460 Jul 27 10:26 Cicero.new 43543 Jul 27 10:26 Durango.new 18943 Jul 27 10:26 East 296629 Jul 27 10:26 Edmonton.new 12304 Jul 27 10:26 Esplanade.new 25247 Jul 27 10:26 Eye.new 32977 Jul 27 10:26 Flint.new 151 Jul 27 10:26 Gladstone.new 81116 Jul 27 10:26 Gloucester.new 56266 Jul 27 10:26 Greenwich.new 780 Jul 27 10:26 Guna.new 21889 Jul 27 10:26 Horsham.new 199436 Jul 27 10:26 Hyderabad.new 89915 Jul 27 10:26 Ipswich.new 15229 Jul 27 10:26 Ithaca.new 132579 Jul 27 10:27 Lagos.new 68478 Jul 27 10:27 La 18993 Jul 27 10:27 Leek.new 439197 Jul 27 10:27 Liverpool.new 26324 Jul 27 10:27 Loire.new 54 Jul 27 10:27 Loni.new 8106 Jul 27 10:27 Malmesbury.new 35538 Jul 27 10:27 Mansfield.new 7545 Jul 27 10:27 March.new 16434 Jul 27 10:27 Mold.new 25849 Jul 27 10:27 Moselle.new 33698 Jul 27 10:27 New 270789 Jul 27 10:27 New 205009 Jul 27 10:28 Norfolk.new 112023 Jul 27 10:28 Norwich.new 28431 Jul 27 10:28 Ore.new 71930 Jul 27 10:28 Pali.new 83138 Jul 27 10:28 Panama 373705 Jul 27 10:28 Perth.new 99124 Jul 27 10:28 Piedmont.new 22133 Jul 27 10:28 Pueblo.new 73659 Jul 27 10:28 Punjab.new 30869 Jul 27 10:28 Reading.new 100419 Jul 27 10:29 Republic 19646 Jul 27 10:29 Rye.new 23084 Jul 27 10:29 Saga.new 6106 Jul 27 10:29 Saint 5866 Jul 27 10:29 St. 11630 Jul 27 10:29 Saint 5336 Jul 27 10:29 St. 97107 Jul 27 10:29 St. 22068 Jul 27 10:29 Stanford.new 255991 Jul 27 10:29 Surrey.new 93952 Jul 27 10:29 Tripoli.new 50366 Jul 27 10:29 Troy.new 38853 Jul 27 10:29 Van.new 18130 Jul 27 10:29 Vosges.new 21909 Jul 27 10:29 Warwick.new 15455 Jul 27 10:29 Angels.new 23662 Jul 27 10:29 Arsenal.new 38084 Jul 27 10:29 Avalanche.new 2391 Jul 27 10:29 Barbarians.new 1558 Jul 27 10:29 Bears.new 5145 Jul 27 10:29 Border 296 Jul 27 10:29 Broncos.new 463 Jul 27 10:29 Buccaneers.new 1063 Jul 27 10:29 Canadiens.new 15399 Jul 27 10:29 Cavaliers.new 751 Jul 27 10:29 Cheetahs.new 367 Jul 27 10:29 Corinthians.new 3529 Jul 27 10:29 Coyotes.new 9722 Jul 27 10:29 Crusaders.new 5268 Jul 27 10:29 Dolphins.new 3090 Jul 27 10:29 Dragons.new 4159 Jul 27 10:29 Ducks.new 160 Jul 27 10:29 Eagles.new 45 Jul 27 10:29 Flames.new 48481 Jul 27 10:29 Force.new 181 Jul 27 10:29 Griquas.new 2627 Jul 27 10:29 Hawks.new 27971 Jul 27 10:29 Heat.new 653 Jul 27 10:29 Hornets.new 5809 Jul 27 10:29 Hurricanes.new 949 Jul 27 10:29 Jaguars.new 223 Jul 27 10:29 Jays.new 1571 Jul 27 10:29 Leopards.new 43470 Jul 27 10:30 Lightning.new 2409 Jul 27 10:30 Lions.new 229 Jul 27 10:30 Ospreys.new 1981 Jul 27 10:30 Pelicans.new 2413 Jul 27 10:30 Penguins.new 9026 Jul 27 10:30 Pirates.new 4012 Jul 27 10:30 Predators.new 2731 Jul 27 10:30 Rockets.new 802 Jul 27 10:30 Rockies.new 7330 Jul 27 10:30 Saints.new 9918 Jul 27 10:30 Saracens.new 3954 Jul 27 10:30 Sharks.new 3306 Jul 27 10:30 Stars.new 6305 Jul 27 10:30 Thunder.new 2129 Jul 27 10:30 Tigers.new 26592 Jul 27 10:30 Titans.new 3808 Jul 27 10:30 Twins.new 98682 Jul 27 10:30 Vikings.new 663 Jul 27 10:30 Warriors.new 3396 Jul 27 10:30 Wasps.new 5597 Jul 27 10:30 Wolves.new 6 Jul 27 10:30 Zunz.new 795 Jul 27 10:30 Orsini.new 226 Jul 27 10:30 Rockefeller.new 32 Jul 27 10:30 Paintal.new 483 Jul 27 10:30 Rothschild.new 8 Jul 27 10:30 Pevsner.new 4861 Jul 27 10:30 O'Reilly.new 62 Jul 27 10:30 Primo 18 Jul 27 10:30 Cimarosa.new 53 Jul 27 10:30 Narasimha 505 Jul 27 10:30 Caracciolo.new 155 Jul 27 10:30 Bakunin.new 665 Jul 27 10:30 Weber.new 26 Jul 27 10:30 Malevich.new 57 Jul 27 10:30 Korotayev.new 18 Jul 27 10:30 Krauser.new 186 Jul 27 10:30 Ghazali.new 266 Jul 27 10:30 Touré.new 190 Jul 27 10:30 Sadat.new 288 Jul 27 10:30 Rajguru.new 289 Jul 27 10:30 Maitland.new 83 Jul 27 10:30 Strozzi.new 90 Jul 27 10:30 Delacroix.new 167 Jul 27 10:30 Reuter.new 185 Jul 27 10:30 Baden 31 Jul 27 10:30 Lessing.new 129 Jul 27 10:30 Boyle.new 96 Jul 27 10:30 Aelian.new 48 Jul 27 10:30 Zichy.new 64 Jul 27 10:30 Nomura.new 204 Jul 27 10:30 Takeda.new 21 Jul 27 10:30 Gilbert 265 Jul 27 10:30 Batista.new 939 Jul 27 10:30 Andrássy.new 544 Jul 27 10:30 Prabhu.new 165 Jul 27 10:30 Tyszkiewicz.new 22 Jul 27 10:30 Mommsen.new 251 Jul 27 10:30 Köppen.new 492 Jul 27 10:30 Della 168 Jul 27 10:30 Bernstein.new 32 Jul 27 10:30 Tippett.new 380 Jul 27 10:30 Sanseverino.new 51 Jul 27 10:30 Pucci.new 377 Jul 27 10:30 Hieronymus 113 Jul 27 10:30 Ghirlandaio.new 65 Jul 27 10:30 Beckett.new 711 Jul 27 10:30 O'Ryan.new 273 Jul 27 10:30 Neumann.new 10 Jul 27 10:30 Matsushita.new 1276 Jul 27 10:30 Ferrero.new 114 Jul 27 10:30 Dietz.new 59 Jul 27 10:30 Amorim.new 29 Jul 27 10:30 Wankel.new 594 Jul 27 10:30 Uexküll.new 20 Jul 27 10:30 Stirner.new 80 Jul 27 10:30 Sridhar.new 234 Jul 27 10:30 Rossetti.new 150 Jul 27 10:30 Nassar.new 115 Jul 27 10:30 Morandi.new 160 Jul 27 10:30 Bulgakov.new 25 Jul 27 10:30 Barks.new 136 Jul 27 10:30 Agnelli.new 350 Jul 27 10:30 Teleki.new 134 Jul 27 10:30 Tarnowski.new 574 Jul 27 10:30 Hamdan.new 93 Jul 27 10:30 Guicciardini.new 589 Jul 27 10:30 Clark.new 97 Jul 27 10:30 Borromeo.new 22 Jul 27 10:30 Bazzi.new 51 Jul 27 10:30 Wolf-Ferrari.new 357 Jul 27 10:30 Sylvester.new 26 Jul 27 10:30 Schichau.new 164 Jul 27 10:30 Scarlatti.new 67 Jul 27 10:30 Noriega.new 24 Jul 27 10:30 Bohlen.new 40 Jul 27 10:30 Boiardo.new 45 Jul 27 10:30 Bosman.new 446 Jul 27 10:30 Braun.new 9 Jul 27 10:30 Gabrielli.new 56 Jul 27 10:30 Haider.new 49 Jul 27 10:30 Jayachandran.new 72 Jul 27 10:30 Jellinek.new 332 Jul 27 10:30 Manning.new 28 Jul 27 10:30 Naryshkin.new 157 Jul 27 10:30 Sachs.new 118 Jul 27 10:30 Sacks.new 101 Jul 27 10:30 Saunders.new 159 Jul 27 10:30 Uccello.new 204 Jul 27 10:30 Velazquez.new 29 Jul 27 10:30 Wills.new 60 Jul 27 10:30 Bergman.new 759 Jul 27 10:30 Haim.new 18588 Jul 27 10:30 Agamemnon.new 3872 Jul 27 10:30 Antigone.new 33458 Jul 27 10:30 Bloomsbury.new 36678 Jul 27 10:30 Cabaret.new 494 Jul 27 10:30 Can-Can.new 23895 Jul 27 10:30 Carousel.new 7172 Jul 27 10:30 Cyrano 47072 Jul 27 10:30 Dune.new 13573 Jul 27 10:30 Euphoria.new 6460 Jul 27 10:30 Falstaff.new 13338 Jul 27 10:30 Faust.new 575 Jul 27 10:30 Fra 1650 Jul 27 10:30 Gidget.new 16873 Jul 27 10:31 Gladiator.new 85498 Jul 27 10:31 Julius 10409 Jul 27 10:31 Medea.new 7415 Jul 27 10:31 Mystic 536 Jul 27 10:31 Peaky 9674 Jul 27 10:31 Peer 16265 Jul 27 10:31 Pericles.new 60538 Jul 27 10:31 Quartz.new 9418 Jul 27 10:31 Salome.new 49778 Jul 27 10:31 St. 84 Jul 27 10:31 The 9885 Jul 27 10:31 Ansible.new 20259 Jul 27 10:31 Arrow.new 57727 Jul 27 10:31 Daily 672758 Jul 27 10:31 The 8853 Jul 27 10:32 Decanter.new 11944 Jul 27 10:32 Dissent.new 13559 Jul 27 10:32 Germania.new 7858 Jul 27 10:32 Guernica.new 29403 Jul 27 10:32 Life.new 6739 Jul 27 10:32 The 809 Jul 27 10:32 The 195831 Jul 27 10:32 The 13864 Jul 27 10:32 Referee.new 2987 Jul 27 10:32 Sunday 24360 Jul 27 10:32 Sunday 154416 Jul 27 10:32 The 5692 Jul 27 10:32 Cage.new 872 Jul 27 10:32 Carpenters.new 2853 Jul 27 10:32 Chrysalis.new 133 Jul 27 10:32 Doors.new 324 Jul 27 10:32 Fernando.new 62059 Jul 27 10:32 Grenade.new 38621 Jul 27 10:32 Guru.new 125 Jul 27 10:32 Happy.new 970 Jul 27 10:32 Hello.new 190 Jul 27 10:32 Jojo.new 13288 Jul 27 10:32 Pink.new 84108 Jul 27 10:33 Sugar.new 16057 Jul 27 10:33 anchorage.new 25 Jul 27 10:33 barks.new 105737 Jul 27 10:33 batman.new 109392 Jul 27 10:33 derby.new 166471 Jul 27 10:33 jersey.new 107237 Jul 27 10:33 limerick.new 121643 Jul 27 10:33 louvre.new 332 Jul 27 10:33 manning.new 7545 Jul 27 10:33 march.new 99124 Jul 27 10:34 piedmont.new 118 Jul 27 10:34 sacks.new 1443 Jul 27 10:34 sandbanks.new 26151 Jul 27 10:34 slough.new 255991 Jul 27 10:34 surrey.new 50366 Jul 27 10:34 troy.new 29 Jul 27 10:34 wills.new 523 Jul 27 10:34 The.new 523 Jul 27 10:34 the.new 48 Jul 27 10:34 Is.new 48 Jul 27 10:34 is.new 337 Jul 27 10:34 were.new 199 Jul 27 10:34 That.new 199 Jul 27 10:34 that.new 370 Jul 27 10:34 said.new 1155 Jul 27 10:34 One.new 1155 Jul 27 10:34 one.new 5430 Jul 27 10:34 goes.new |
Bot updating Webarchive template is adding "url" same as existing "url2"[edit]
This bot made a group of WaybackMedic 2.5 edits in June where it "rescued" an archive link in the |url=
parameter of {{Webarchive}}, replacing it with a this link which was already in the |url2=
parameter. Two examples of this are Grant Bramwell: revised 1 June 2022 and List of ICF Canoe Sprint World Championships medalists in men's kayak: revised 26 June 2022. Can the bot remove the duplicate url2/date2/title2 parameters and renumber any subsequent url3/date3/title3, etc.? I've fixed over 500 of these edits myself, but there are still over 700 remaining to be fixed. Thanks. -- Zyxw (talk) 03:54, 9 August 2022 (UTC)
- That was part of the deprecation of WebCite which is a dead archive provider. It didn't account for dups. It's complicated here because even though
|url=
and|url2=
are the same,|title=
and|title2=
are different - which do you choose. I think the best course is the keep|url=
set and remove the|url2=
set, at least based on two examples. In terms of renumbering that is not required as the webarchive template is designed to allow any numbers up to 10, so long as there is a|url=
.. aka|url1=
.. is the only requirement. I'll start looking at this today. -- GreenC 15:35, 9 August 2022 (UTC)
- @GreenC: I agree with keeping the
|url=
set and removing the|url2=
set when there is a duplicate URL and that is what I did for the 500+ already fixed. I also thought {{Webarchive}} might automatically handle the missing|url2=
set and display the|url3=
set, but as per these tests that is not the case: - archive with url/date/title, url2/date2/title2, and url3/date3/title3
- Medal Winners – Olympic Games and World Championships (1936–2007) – Part 1: flatwater (now sprint). CanoeICF.com. International Canoe Federation. at the Wayback Machine (archived 5 January 2010). Additional archives: Wayback Machine, BCU.org.uk.
- url2/date2/title2 removed with url3/date3/title3 remaining
- Medal Winners – Olympic Games and World Championships (1936–2007) – Part 1: flatwater (now sprint). CanoeICF.com. International Canoe Federation. at the Wayback Machine (archived 5 January 2010). Additional archives: BCU.org.uk.
- url2/date2/title2 removed and url3/date3/title3 renumbered
- Medal Winners – Olympic Games and World Championships (1936–2007) – Part 1: flatwater (now sprint). CanoeICF.com. International Canoe Federation. at the Wayback Machine (archived 5 January 2010). Additional archives: BCU.org.uk.
- -- Zyxw (talk) 16:15, 9 August 2022 (UTC)
- Reported at Template_talk:Webarchive#Gaps_in_argument_sequence. I wrote the template originally but Trappist did a major rewrite so I'm not sure if that is my bug or his. I processed the first 500 articles and there are only 3 with a
|url3=
suggesting 40 or 50 at most in the whole bunch. Anyway it won't be difficult to renumber them. -- GreenC 16:26, 9 August 2022 (UTC)
- Reported at Template_talk:Webarchive#Gaps_in_argument_sequence. I wrote the template originally but Trappist did a major rewrite so I'm not sure if that is my bug or his. I processed the first 500 articles and there are only 3 with a
- @GreenC: I agree with keeping the
Bad webcitation link replacement[edit]
So I've just found out that GreenC bot made edits like this, replacing a dead archive link with another dead archive link. Would it be possible to replace that archive link with, say, this one that actually works? Thanks very much! Graham87 11:48, 26 August 2022 (UTC)
- Bots are not 100% perfect. It relies on the Wayback API to determine live links and it is not perfect so for those errors it depends on human intervention to correct. The alternative is not to use bots at all , in which case most links never get fixed at all due to the scale, it's back-end boring work people want bots to do, but there is not guarantee bots, or for that matter people, will not make mistakes. The question is the scale of mistakes. -- GreenC 15:08, 26 August 2022 (UTC)
- Yeah fair enough, soft 404's and all. On re-reading my message I spectacularly failed at phrasing it clearly ... there are nearly a hundred more such links; could you instruct the bot to replace them with a working archive (i.e. the one linked above)? I thought that would be the easiest way to fix this problem. I tried changing the archive link on InternetArchiveBot's side and asking it to fix the affected articles, but that didn't do what I intended. Graham87 13:34, 27 August 2022 (UTC)
Avoid editing inside HTML comments[edit]
GreenC bot now edits inside HTML comments eg. Special:Diff/1107954452, but I suggest it not to. Although the edit in this example happened to be harmless (even useful), in general, comments could be used for a wide range of reasons, so there is a higher risk that automatic edits could break their intentions. Wotheina (talk) 03:49, 2 September 2022 (UTC)
- That's true but there is a positive trade-off so for a couple reasons I am OK fixing certain (not all) link rot in comments, as I have been doing for 7 years. If someone wants to preserve a block of immutable wikitext they should use the talk page, user page or offline - otherwise anyone can edit the comment or delete it entirely. Comments can be strangely formatted, I take measures, auto and manual, to check commented text before posting a live diff. -- GreenC 05:39, 2 September 2022 (UTC)
Stopping backlinks report during wikibreak[edit]
Hello, and thanks again for the useful Backlinks reports. I'm currently taking a Wikibreak and have attempted to exclude my list from the bot's tasks thus but it still ran today. It's not a problem for me if the reports continue but, if you'd like to save some resources by stopping it properly, please go ahead. Certes (talk) 11:25, 5 September 2022 (UTC)
- Fixed, it was seeing
Action=RUN
in the "#" comment. First time this code has been tested :) Have a good break. -- GreenC 05:14, 6 September 2022 (UTC)
Please Update the monthly list of Top 10000 wikipedia users by Article Count[edit]
Please Update the monthly list of Top 10000 wikipedia users by Article Count which changes every 1st and 15th date of a month. Abbasulu (talk) 07:52, 3 October 2022 (UTC)
- It's still running for some reason very slowly in 3 days it only completed 19%. -- GreenC 12:51, 3 October 2022 (UTC)
Exactly what purpose did this edit serve? Edit summary is misleading at best[edit]
https://en.wikipedia.org/w/index.php?title=Rodney_Marks&diff=1095741886&oldid=1091111369 108.246.204.20 (talk) 20:17, 3 October 2022 (UTC)
- Don't use
{{dead link}}
if the citation has a working|archive-url=
. -- GreenC 20:46, 3 October 2022 (UTC)- it doesn't. "this page is not available". 108.246.204.20 (talk) 04:15, 14 October 2022 (UTC)
A cookie for you![edit]
Ulises12345678 (talk) 11:00, 9 October 2022 (UTC) |