User talk:GreenC bot

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search


Flagging non-dead link as dead[edit]

This edit flagged this URL as dead even though it isn't. Jo-Jo Eumerus (talk) 11:17, 18 July 2022 (UTC)Reply[reply]

Same with these edits:
I appreciate it probably has to do with some kind of automatic PDF link serving in Javascript that Academia.edu uses wouldn't be readily captured with a bot; I don't know how fixable it is, but the links noted are not dead at all; I reverted both edits that the bot flagged. Ifly6 (talk) 14:35, 18 July 2022 (UTC)Reply[reply]
The url that Editor Jo-Jo Eumerus linked:
Both of the urls that Editor Ifly6 links:
There was some discussion about these kinds of academia links at Wikipedia:Link rot/URL change requests § www.academia.edu/download/
Trappist the monk (talk) 14:43, 18 July 2022 (UTC) 14:46, 18 July 2022 (UTC)Reply[reply]
  • Jo-Jo Eumerus & User:Ifly6 they are dead for me (USA). Example. Are you getting a redirect to a cloudfront URL? Wondering if there is some kind of location-aware policy that determines when to serve the cloudfront URL vs a 404. If the cloudfront URL was known, it would be possible to save it at the Wayback Machine, then use the Cloudfront-Wayback URL on Wikipedia treated as a dead link (due to its &Expires self-destruct mechanism see WP:AWSURL). However, I wonder about copyright if academia.edu is making them unavailable in the US and possibly elsewhere, question why have that policy if not a rights issue. -- GreenC 15:04, 18 July 2022 (UTC)Reply[reply]
    I'm in the US and am getting the links promptly. The links I am getting are Cloudfront ones with an expiry; I used the Academic.edu links to avoid the known expiry. Ifly6 (talk) 15:41, 18 July 2022 (UTC)Reply[reply]
    Ah I see you use British English so I assumed you are not US. What browser do you use? Do you have any plugins that might affect javascript? This is impacting archive providers as well, such as Wayback Machine and Ghostarchive (US-based), they also get 404. Archive.today it "works" (global IP pool) but they are unable to correctly save the PDF. -- GreenC 16:00, 18 July 2022 (UTC)Reply[reply]
    I do get a "d1wqtxts1xzle7.cloudfront.net" sort of thing. Jo-Jo Eumerus (talk) 17:33, 18 July 2022 (UTC)Reply[reply]
    Language heuristics are always right 99pc of the time haha. I've confirmed on Edge (Windows 10) and Safari (macOS) that the Academia.edu link work. I don't have any plugins installed other than ad blockers that would affect something like this. The specific link that got generated for me with Rafferty was https://d1wqtxts1xzle7.cloudfront.net/51344857/Iris-_Fall_of_the_Roman_Republic-with-cover-page-v2.pdf. There were then a pile of GET parameters that I've excerpted – they change every time anyway – but are necessary to get the file served properly. Ifly6 (talk) 19:24, 18 July 2022 (UTC)Reply[reply]
    Jo-Jo Eumerus do you use Edge or Safari? -- GreenC 19:38, 18 July 2022 (UTC)Reply[reply]
    Wikipedia:Village_pump_(technical)#academia.edu/download .. seeing if anything comes up here. -- GreenC 19:52, 18 July 2022 (UTC)Reply[reply]
    Ifly6 in the above thread someone suggested perhaps you had signed up for account on academia.edu at some point? Or some old cookies that are giving permission. One way to test is try to access from a private window. -- GreenC 20:46, 18 July 2022 (UTC)Reply[reply]
    Yea, that's probably it. I opened it in a private window and got the 404. Ifly6 (talk) 20:57, 18 July 2022 (UTC)Reply[reply]
    Same for me (Firefox) Jo-Jo Eumerus (talk) 21:12, 18 July 2022 (UTC)Reply[reply]
    Cool, glad it is figured out what is causing it. My thinking is to replace the academia.edu links with a Wayback version of the cloudfront URL so it's accessible for everyone. Or second option is to use |url-access=registration but that 404 page is confusing and will result in bots marking it dead. -- GreenC 21:30, 18 July 2022 (UTC)Reply[reply]

User:Jo-Jo Eumerus|User:Ifly6|User:Biogeographist: Would like to propose this solution: Special:Diff/1098978075/1099315632. It's only for academia.edu/download links, which are about 1,000 on enwiki.

  • academia.edu returns a 404 when a user is not registered and logged in, which is most users. It does not say "log in to access paper", rather a misleading 404 dead link page. This causes problems:
    • Archive bots will determine the links are dead (404) and mark with a {{dead link}}.
    • Users will be confused thinking the link is dead and not behind a registration wall.
    • Should the link ever actually die for real, there would be no archive available since the Wayback Machine sees only a dead 404 page - the Wayback machine is not an academia.edu registered user.
  • While possible to use |url-access=registration this does not solve the misleading 404 problems.
  • The cloudfront link is an AWS container with an &Expires self-destruct mechanism. It's where the paper is actually located (not on academia.edu which redirects to cloudfront).
  • The proposal is to determine the active cloudfront link via bot magic, immediately create a Wayback Machine save of the cloudfront URL, and change the citation to the Wayback-cloudfront link. eg. Special:Diff/1098978075/1099315632

This is what I can do somewhat easily right away. There are limits due to bot design and coding efforts what can be done. -- GreenC 04:15, 20 July 2022 (UTC)Reply[reply]

Hmm. It seems a bit complex and I wonder if people will be deleting the "expires" part of the link. Jo-Jo Eumerus (talk) 10:22, 20 July 2022 (UTC)Reply[reply]
It's a complex situation. If they delete the &Expires the URL will break (404). It will break anyway, due to the Expires, that is why the archive URL version is made the primary. The archive URL is accessible to everyone - academia.edu account not required. -- GreenC 15:30, 20 July 2022 (UTC)Reply[reply]

Unfortunately there is something preventing cloudfront pages from being saved at Wayback. Not all pages, but most. So we have a bad situation with academia.edu/download links - ideally they should be converted to a non /download/ links - but can't be done by bot requires manual searching. The /download/ links are probably originating from Google Scholar, copy-pasting. -- GreenC 15:56, 23 July 2022 (UTC)Reply[reply]

Backlinks report[edit]

User:Certes/Backlinks/Report seems to have stopped, but User:GoingBatty/Backlinks/Report is running normally. I've not added any new backlinks recently. Can you see anything else that I may have broken? Certes (talk) 11:17, 25 July 2022 (UTC)Reply[reply]

It aborted for unknown reasons. I increased the memory allocation by 10x in case that is the problem. The data may be messed up from the abort. I've restarted the process and will see what happens over the next hour or so if it can recover. Worse case will just delete all the data and it will rebuild from scratch, but that will result in a missed day. -- GreenC 15:34, 25 July 2022 (UTC)Reply[reply]
Thanks. Let me know if I'm checking too many targets or if some produce exceptionally big reports, and I'll remove the less productive ones. Certes (talk) 15:45, 25 July 2022 (UTC)Reply[reply]
It was crashing at "m" then after increasing memory made it to "v". Odd bc it should not run out of memory, and there are no error messages system or program to suggest why it's silently halting so it might be something different. I added debug statements, takes a while to replicate an hour or more. Thanks for holding. -- GreenC 04:26, 26 July 2022 (UTC)Reply[reply]
Odd: "m" and "v" are early in my list, and neither they nor anything earlier have many incoming links. If it's taking an hour then we may need to remove the entries with lowest benefit per second. A few entries have never triggered a fix and could probably be removed, but I've already removed the resource-heavy ones. Maybe I need to rate them all by fixes done per 1000 incoming links or similar and chop those scoring lowest. "v" is an oddity because it can indicate that the editor failed to press Ctrl when pasting: easy to spot, but hard to fix as you need to guess what was in their clipboard. Certes (talk) 12:39, 26 July 2022 (UTC)Reply[reply]
The memory problem appears to be cumulative if I run m or v in isolation they do fine but when running the whole bunch there is a massive spike in memory claim that occurs at the same spot around v or x, but also others don't release their claims so it builds up. It could be related to the Sun Grid Engine caching for performance reasons. I've checked the program for errant global vars and it's fine there is nothing holding onto data. I might try separating the backlinks retrieval portion to a different program so it exits between each item clearing any memory claims. -- GreenC 16:48, 26 July 2022 (UTC)Reply[reply]
I think it is fixed. A combination of repetitive backlinks reported by the API and inefficiencies in the program magnifying those repetitions. It should never use more than about 25MB of ram, but with "V" (and "v") it was as high as 1 gigabyte. Why V? I suspect it's due to WP:V which is so commonly linked outside mainspace. V exposed the problem, but it was occurring at a smaller scale with everything else. (The API typically and erroneously reports 100s of the same backlink - I don't know why it's always done this.) "V" had 2.5 million non-unique occurrences. Add to this the program was inefficient in how it dealt with the repetitions, it added up and the Grid Engine was nope and dropped the job. Right now it's starting over rebuilding the database, it should be back to normal soon. -- GreenC 05:44, 27 July 2022 (UTC)Reply[reply]
Thanks very much. The current version looks right, considering that it's for a few hours rather than the usual 24. Is it possible to add the namespace of the link target to the query? I'm not sure how you're extracting the data but, for example, Quarry would run its SQL much faster with "and pl_namespace=0". Certes (talk) 11:21, 27 July 2022 (UTC)Reply[reply]
API:Backlinks. When I first made this program (not your fork of it) around April 2015, Quarry was only about 6 months old I think, anyway I wasn't aware of it, and I wanted something that would run from anywhere which left the API. Speed is not an issue when running daily, unless it takes > 24hrs. Your job completes in about 2 hours, it is exceptionally big. The API behavior of multiple results is weird but can be adjusted for. If it continues to be a problem I can look into Quarry, getting a JSON file would nice. -- GreenC 15:41, 27 July 2022 (UTC)Reply[reply]
In that case, blnamespace is what I meant, but I'm not clear what it should be set to: the several namespaces in which relevant links appear, or ns 0 to which relevant links lead. If my job is taking two hours then I should be checking fewer targets; any clues as to which entries take the most time would help with that. Certes (talk) 18:27, 27 July 2022 (UTC)Reply[reply]
Below is an 'ls' of the data files. The timestamps show how long each took to complete. The file size is misleading as the program filters out namespaces. Like "V" (and "v" they are indenitcal to the API) is not very large filesize, but took almost 25 minutes to complete. It took about 85m to finish not 120m my mistake. V/v is about 50 minutes. U/u 20 minutes. N/n 10 minutes. Those are the big three and use 95% of the time (is that right?). Probably due to WP:V, WP:U and WP:N. -- GreenC 19:28, 27 July 2022 (UTC)Reply[reply]
Thanks. I'll take V/v, U/u and N/n out then. U and N rarely get a hit. V gets more but I'm less confident about fixing them as most of them require me to guess what article the editor was thinking of. Certes (talk) 20:57, 27 July 2022 (UTC)Reply[reply]
All working as normal today, and an hour faster than previously. Thanks again for your help. Certes (talk) 10:03, 28 July 2022 (UTC)Reply[reply]
Yes, finished in 25 minutes. No single one took very long (or much memory!). You are welcome and thanks for reporting it because it uncovered a problem in the program that only became evident at scale. -- GreenC 15:52, 28 July 2022 (UTC)Reply[reply]
Extended content
22930	Jul	27	09:11	0.new
127027	Jul	27	09:11	1.new
16924	Jul	27	09:11	2.new
15575	Jul	27	09:11	3.new
15540	Jul	27	09:11	4.new
14709	Jul	27	09:12	5.new
12741	Jul	27	09:12	6.new
17054	Jul	27	09:12	7.new
15220	Jul	27	09:12	8.new
14745	Jul	27	09:12	9.new
7476	Jul	27	09:13	10.new
6315	Jul	27	09:13	100.new
15741	Jul	27	09:13	A.new
13776	Jul	27	09:13	B.new
16104	Jul	27	09:13	C.new
13410	Jul	27	09:13	D.new
13301	Jul	27	09:14	E.new
12605	Jul	27	09:14	F.new
13550	Jul	27	09:14	G.new
13518	Jul	27	09:14	H.new
14387	Jul	27	09:14	I.new
13005	Jul	27	09:14	J.new
12845	Jul	27	09:14	K.new
14099	Jul	27	09:14	L.new
13174	Jul	27	09:14	M.new
39805	Jul	27	09:18	N.new
13668	Jul	27	09:19	O.new
13088	Jul	27	09:19	P.new
11858	Jul	27	09:19	Q.new
14160	Jul	27	09:19	R.new
14529	Jul	27	09:19	S.new
13146	Jul	27	09:19	T.new
15718	Jul	27	09:21	U.new
96856	Jul	27	09:45	V.new
12403	Jul	27	09:45	W.new
12797	Jul	27	09:45	X.new
13659	Jul	27	09:45	Y.new
13403	Jul	27	09:45	Z.new
15741	Jul	27	09:45	a.new
13776	Jul	27	09:45	b.new
16104	Jul	27	09:45	c.new
13410	Jul	27	09:46	d.new
13301	Jul	27	09:46	e.new
12605	Jul	27	09:46	f.new
13550	Jul	27	09:46	g.new
13518	Jul	27	09:46	h.new
14387	Jul	27	09:46	i.new
13005	Jul	27	09:46	j.new
12845	Jul	27	09:46	k.new
14099	Jul	27	09:46	l.new
13174	Jul	27	09:46	m.new
39805	Jul	27	09:51	n.new
13668	Jul	27	09:51	o.new
13088	Jul	27	09:51	p.new
11858	Jul	27	09:51	q.new
14160	Jul	27	09:51	r.new
14529	Jul	27	09:51	s.new
13146	Jul	27	09:51	t.new
15718	Jul	27	09:53	u.new
96856	Jul	27	10:16	v.new
12403	Jul	27	10:16	w.new
12797	Jul	27	10:16	x.new
13659	Jul	27	10:16	y.new
13403	Jul	27	10:16	z.new
217699	Jul	27	10:17	ABC
5951	Jul	27	10:17	Accolade.new
118095	Jul	27	10:17	Acre.new
89027	Jul	27	10:17	Admiral.new
22088	Jul	27	10:17	Alphabet.new
29758	Jul	27	10:17	Amber.new
4295	Jul	27	10:17	Amen.new
31785	Jul	27	10:17	Aperture.new
2643	Jul	27	10:17	Ash.new
2643	Jul	27	10:17	ash.new
44238	Jul	27	10:17	Atlantic.new
1375	Jul	27	10:17	Back.new
1375	Jul	27	10:17	back.new
36337	Jul	27	10:17	Bay.new
36337	Jul	27	10:17	bay.new
53374	Jul	27	10:17	Bowling.new
53374	Jul	27	10:17	bowling.new
2048	Jul	27	10:17	Cabinet
36569	Jul	27	10:17	Captain.new
36569	Jul	27	10:17	captain.new
12368	Jul	27	10:17	Calvary.new
12368	Jul	27	10:17	calvary.new
26920	Jul	27	10:17	Caterpillar.new
28665	Jul	27	10:17	Chancellor.new
28665	Jul	27	10:17	chancellor.new
31754	Jul	27	10:17	Chestnut.new
31754	Jul	27	10:17	chestnut.new
4924	Jul	27	10:17	Chin.new
725	Jul	27	10:17	Clipboard.new
725	Jul	27	10:17	clipboard.new
44162	Jul	27	10:17	Colony.new
44162	Jul	27	10:18	colony.new
3070	Jul	27	10:18	Colonies.new
3070	Jul	27	10:18	colonies.new
55	Jul	27	10:18	Colors.new
55	Jul	27	10:18	colors.new
565	Jul	27	10:18	Colours.new
565	Jul	27	10:18	colours.new
138372	Jul	27	10:19	Company.new
138372	Jul	27	10:20	company.new
6611	Jul	27	10:20	Companies.new
6611	Jul	27	10:20	companies.new
14699	Jul	27	10:20	Consul.new
14699	Jul	27	10:20	consul.new
76725	Jul	27	10:20	Colorado
3180	Jul	27	10:21	Commonwealth.new
3180	Jul	27	10:21	commonwealth.new
30657	Jul	27	10:21	Conservative.new
1206	Jul	27	10:21	Conservatives.new
113900	Jul	27	10:21	Corvette.new
2005	Jul	27	10:21	Corvettes.new
28639	Jul	27	10:21	Delphi.new
48181	Jul	27	10:21	Family.new
48181	Jul	27	10:21	family.new
2257	Jul	27	10:21	Families.new
2257	Jul	27	10:21	families.new
61603	Jul	27	10:21	Icon.new
61603	Jul	27	10:21	icon.new
6665	Jul	27	10:21	Icons.new
6665	Jul	27	10:21	icons.new
5801	Jul	27	10:21	Interpreter.new
5801	Jul	27	10:21	interpreter.new
70977	Jul	27	10:21	Jupiter.new
12095	Jul	27	10:21	Knot.new
12095	Jul	27	10:21	knot.new
80891	Jul	27	10:21	Krishna.new
121459	Jul	27	10:21	Lead.new
121459	Jul	27	10:21	lead.new
127	Jul	27	10:21	Liberal
180	Jul	27	10:21	Libertarian
183969	Jul	27	10:22	Madonna.new
183969	Jul	27	10:22	madonna.new
65528	Jul	27	10:22	Mass.new
65528	Jul	27	10:22	mass.new
5378	Jul	27	10:22	Meta.new
770	Jul	27	10:22	Ministry
3160	Jul	27	10:22	Model.new
3160	Jul	27	10:22	model.new
176677	Jul	27	10:23	Moon.new
176677	Jul	27	10:23	moon.new
214735	Jul	27	10:23	National
199067	Jul	27	10:23	Oxygen.new
76332	Jul	27	10:23	Primate.new
76332	Jul	27	10:23	primate.new
5462	Jul	27	10:23	Roland.new
346	Jul	27	10:24	Ronaldo.new
68973	Jul	27	10:24	Salt.new
68973	Jul	27	10:24	salt.new
16813	Jul	27	10:24	Season.new
16813	Jul	27	10:24	season.new
44306	Jul	27	10:24	Shiraz.new
44306	Jul	27	10:24	shiraz.new
53287	Jul	27	10:24	Spire.new
53287	Jul	27	10:24	spire.new
153867	Jul	27	10:24	Stream.new
153867	Jul	27	10:24	stream.new
11482	Jul	27	10:24	Telegram.new
3845	Jul	27	10:24	Thermal.new
3845	Jul	27	10:24	thermal.new
88519	Jul	27	10:24	Tree.new
88519	Jul	27	10:24	tree.new
3102	Jul	27	10:24	Trojan
3102	Jul	27	10:24	trojan
167	Jul	27	10:24	U.S.
2334	Jul	27	10:24	Victory.new
26424	Jul	27	10:24	Ardennes.new
19159	Jul	27	10:24	Aspen.new
1884	Jul	27	10:24	Baler.new
105737	Jul	27	10:25	Batman.new
20662	Jul	27	10:25	Battle.new
53364	Jul	27	10:25	Bethlehem.new
439921	Jul	27	10:25	Birmingham.new
11530	Jul	27	10:25	Boulder.new
54094	Jul	27	10:25	Brampton.new
14995	Jul	27	10:25	Calvados.new
208354	Jul	27	10:25	Cambridge.new
71179	Jul	27	10:25	Canterbury.new
15715	Jul	27	10:25	Caracal.new
203571	Jul	27	10:26	Christchurch.new
78460	Jul	27	10:26	Cicero.new
43543	Jul	27	10:26	Durango.new
18943	Jul	27	10:26	East
296629	Jul	27	10:26	Edmonton.new
12304	Jul	27	10:26	Esplanade.new
25247	Jul	27	10:26	Eye.new
32977	Jul	27	10:26	Flint.new
151	Jul	27	10:26	Gladstone.new
81116	Jul	27	10:26	Gloucester.new
56266	Jul	27	10:26	Greenwich.new
780	Jul	27	10:26	Guna.new
21889	Jul	27	10:26	Horsham.new
199436	Jul	27	10:26	Hyderabad.new
89915	Jul	27	10:26	Ipswich.new
15229	Jul	27	10:26	Ithaca.new
132579	Jul	27	10:27	Lagos.new
68478	Jul	27	10:27	La
18993	Jul	27	10:27	Leek.new
439197	Jul	27	10:27	Liverpool.new
26324	Jul	27	10:27	Loire.new
54	Jul	27	10:27	Loni.new
8106	Jul	27	10:27	Malmesbury.new
35538	Jul	27	10:27	Mansfield.new
7545	Jul	27	10:27	March.new
16434	Jul	27	10:27	Mold.new
25849	Jul	27	10:27	Moselle.new
33698	Jul	27	10:27	New
270789	Jul	27	10:27	New
205009	Jul	27	10:28	Norfolk.new
112023	Jul	27	10:28	Norwich.new
28431	Jul	27	10:28	Ore.new
71930	Jul	27	10:28	Pali.new
83138	Jul	27	10:28	Panama
373705	Jul	27	10:28	Perth.new
99124	Jul	27	10:28	Piedmont.new
22133	Jul	27	10:28	Pueblo.new
73659	Jul	27	10:28	Punjab.new
30869	Jul	27	10:28	Reading.new
100419	Jul	27	10:29	Republic
19646	Jul	27	10:29	Rye.new
23084	Jul	27	10:29	Saga.new
6106	Jul	27	10:29	Saint
5866	Jul	27	10:29	St.
11630	Jul	27	10:29	Saint
5336	Jul	27	10:29	St.
97107	Jul	27	10:29	St.
22068	Jul	27	10:29	Stanford.new
255991	Jul	27	10:29	Surrey.new
93952	Jul	27	10:29	Tripoli.new
50366	Jul	27	10:29	Troy.new
38853	Jul	27	10:29	Van.new
18130	Jul	27	10:29	Vosges.new
21909	Jul	27	10:29	Warwick.new
15455	Jul	27	10:29	Angels.new
23662	Jul	27	10:29	Arsenal.new
38084	Jul	27	10:29	Avalanche.new
2391	Jul	27	10:29	Barbarians.new
1558	Jul	27	10:29	Bears.new
5145	Jul	27	10:29	Border
296	Jul	27	10:29	Broncos.new
463	Jul	27	10:29	Buccaneers.new
1063	Jul	27	10:29	Canadiens.new
15399	Jul	27	10:29	Cavaliers.new
751	Jul	27	10:29	Cheetahs.new
367	Jul	27	10:29	Corinthians.new
3529	Jul	27	10:29	Coyotes.new
9722	Jul	27	10:29	Crusaders.new
5268	Jul	27	10:29	Dolphins.new
3090	Jul	27	10:29	Dragons.new
4159	Jul	27	10:29	Ducks.new
160	Jul	27	10:29	Eagles.new
45	Jul	27	10:29	Flames.new
48481	Jul	27	10:29	Force.new
181	Jul	27	10:29	Griquas.new
2627	Jul	27	10:29	Hawks.new
27971	Jul	27	10:29	Heat.new
653	Jul	27	10:29	Hornets.new
5809	Jul	27	10:29	Hurricanes.new
949	Jul	27	10:29	Jaguars.new
223	Jul	27	10:29	Jays.new
1571	Jul	27	10:29	Leopards.new
43470	Jul	27	10:30	Lightning.new
2409	Jul	27	10:30	Lions.new
229	Jul	27	10:30	Ospreys.new
1981	Jul	27	10:30	Pelicans.new
2413	Jul	27	10:30	Penguins.new
9026	Jul	27	10:30	Pirates.new
4012	Jul	27	10:30	Predators.new
2731	Jul	27	10:30	Rockets.new
802	Jul	27	10:30	Rockies.new
7330	Jul	27	10:30	Saints.new
9918	Jul	27	10:30	Saracens.new
3954	Jul	27	10:30	Sharks.new
3306	Jul	27	10:30	Stars.new
6305	Jul	27	10:30	Thunder.new
2129	Jul	27	10:30	Tigers.new
26592	Jul	27	10:30	Titans.new
3808	Jul	27	10:30	Twins.new
98682	Jul	27	10:30	Vikings.new
663	Jul	27	10:30	Warriors.new
3396	Jul	27	10:30	Wasps.new
5597	Jul	27	10:30	Wolves.new
6	Jul	27	10:30	Zunz.new
795	Jul	27	10:30	Orsini.new
226	Jul	27	10:30	Rockefeller.new
32	Jul	27	10:30	Paintal.new
483	Jul	27	10:30	Rothschild.new
8	Jul	27	10:30	Pevsner.new
4861	Jul	27	10:30	O'Reilly.new
62	Jul	27	10:30	Primo
18	Jul	27	10:30	Cimarosa.new
53	Jul	27	10:30	Narasimha
505	Jul	27	10:30	Caracciolo.new
155	Jul	27	10:30	Bakunin.new
665	Jul	27	10:30	Weber.new
26	Jul	27	10:30	Malevich.new
57	Jul	27	10:30	Korotayev.new
18	Jul	27	10:30	Krauser.new
186	Jul	27	10:30	Ghazali.new
266	Jul	27	10:30	Touré.new
190	Jul	27	10:30	Sadat.new
288	Jul	27	10:30	Rajguru.new
289	Jul	27	10:30	Maitland.new
83	Jul	27	10:30	Strozzi.new
90	Jul	27	10:30	Delacroix.new
167	Jul	27	10:30	Reuter.new
185	Jul	27	10:30	Baden
31	Jul	27	10:30	Lessing.new
129	Jul	27	10:30	Boyle.new
96	Jul	27	10:30	Aelian.new
48	Jul	27	10:30	Zichy.new
64	Jul	27	10:30	Nomura.new
204	Jul	27	10:30	Takeda.new
21	Jul	27	10:30	Gilbert
265	Jul	27	10:30	Batista.new
939	Jul	27	10:30	Andrássy.new
544	Jul	27	10:30	Prabhu.new
165	Jul	27	10:30	Tyszkiewicz.new
22	Jul	27	10:30	Mommsen.new
251	Jul	27	10:30	Köppen.new
492	Jul	27	10:30	Della
168	Jul	27	10:30	Bernstein.new
32	Jul	27	10:30	Tippett.new
380	Jul	27	10:30	Sanseverino.new
51	Jul	27	10:30	Pucci.new
377	Jul	27	10:30	Hieronymus
113	Jul	27	10:30	Ghirlandaio.new
65	Jul	27	10:30	Beckett.new
711	Jul	27	10:30	O'Ryan.new
273	Jul	27	10:30	Neumann.new
10	Jul	27	10:30	Matsushita.new
1276	Jul	27	10:30	Ferrero.new
114	Jul	27	10:30	Dietz.new
59	Jul	27	10:30	Amorim.new
29	Jul	27	10:30	Wankel.new
594	Jul	27	10:30	Uexküll.new
20	Jul	27	10:30	Stirner.new
80	Jul	27	10:30	Sridhar.new
234	Jul	27	10:30	Rossetti.new
150	Jul	27	10:30	Nassar.new
115	Jul	27	10:30	Morandi.new
160	Jul	27	10:30	Bulgakov.new
25	Jul	27	10:30	Barks.new
136	Jul	27	10:30	Agnelli.new
350	Jul	27	10:30	Teleki.new
134	Jul	27	10:30	Tarnowski.new
574	Jul	27	10:30	Hamdan.new
93	Jul	27	10:30	Guicciardini.new
589	Jul	27	10:30	Clark.new
97	Jul	27	10:30	Borromeo.new
22	Jul	27	10:30	Bazzi.new
51	Jul	27	10:30	Wolf-Ferrari.new
357	Jul	27	10:30	Sylvester.new
26	Jul	27	10:30	Schichau.new
164	Jul	27	10:30	Scarlatti.new
67	Jul	27	10:30	Noriega.new
24	Jul	27	10:30	Bohlen.new
40	Jul	27	10:30	Boiardo.new
45	Jul	27	10:30	Bosman.new
446	Jul	27	10:30	Braun.new
9	Jul	27	10:30	Gabrielli.new
56	Jul	27	10:30	Haider.new
49	Jul	27	10:30	Jayachandran.new
72	Jul	27	10:30	Jellinek.new
332	Jul	27	10:30	Manning.new
28	Jul	27	10:30	Naryshkin.new
157	Jul	27	10:30	Sachs.new
118	Jul	27	10:30	Sacks.new
101	Jul	27	10:30	Saunders.new
159	Jul	27	10:30	Uccello.new
204	Jul	27	10:30	Velazquez.new
29	Jul	27	10:30	Wills.new
60	Jul	27	10:30	Bergman.new
759	Jul	27	10:30	Haim.new
18588	Jul	27	10:30	Agamemnon.new
3872	Jul	27	10:30	Antigone.new
33458	Jul	27	10:30	Bloomsbury.new
36678	Jul	27	10:30	Cabaret.new
494	Jul	27	10:30	Can-Can.new
23895	Jul	27	10:30	Carousel.new
7172	Jul	27	10:30	Cyrano
47072	Jul	27	10:30	Dune.new
13573	Jul	27	10:30	Euphoria.new
6460	Jul	27	10:30	Falstaff.new
13338	Jul	27	10:30	Faust.new
575	Jul	27	10:30	Fra
1650	Jul	27	10:30	Gidget.new
16873	Jul	27	10:31	Gladiator.new
85498	Jul	27	10:31	Julius
10409	Jul	27	10:31	Medea.new
7415	Jul	27	10:31	Mystic
536	Jul	27	10:31	Peaky
9674	Jul	27	10:31	Peer
16265	Jul	27	10:31	Pericles.new
60538	Jul	27	10:31	Quartz.new
9418	Jul	27	10:31	Salome.new
49778	Jul	27	10:31	St.
84	Jul	27	10:31	The
9885	Jul	27	10:31	Ansible.new
20259	Jul	27	10:31	Arrow.new
57727	Jul	27	10:31	Daily
672758	Jul	27	10:31	The
8853	Jul	27	10:32	Decanter.new
11944	Jul	27	10:32	Dissent.new
13559	Jul	27	10:32	Germania.new
7858	Jul	27	10:32	Guernica.new
29403	Jul	27	10:32	Life.new
6739	Jul	27	10:32	The
809	Jul	27	10:32	The
195831	Jul	27	10:32	The
13864	Jul	27	10:32	Referee.new
2987	Jul	27	10:32	Sunday
24360	Jul	27	10:32	Sunday
154416	Jul	27	10:32	The
5692	Jul	27	10:32	Cage.new
872	Jul	27	10:32	Carpenters.new
2853	Jul	27	10:32	Chrysalis.new
133	Jul	27	10:32	Doors.new
324	Jul	27	10:32	Fernando.new
62059	Jul	27	10:32	Grenade.new
38621	Jul	27	10:32	Guru.new
125	Jul	27	10:32	Happy.new
970	Jul	27	10:32	Hello.new
190	Jul	27	10:32	Jojo.new
13288	Jul	27	10:32	Pink.new
84108	Jul	27	10:33	Sugar.new
16057	Jul	27	10:33	anchorage.new
25	Jul	27	10:33	barks.new
105737	Jul	27	10:33	batman.new
109392	Jul	27	10:33	derby.new
166471	Jul	27	10:33	jersey.new
107237	Jul	27	10:33	limerick.new
121643	Jul	27	10:33	louvre.new
332	Jul	27	10:33	manning.new
7545	Jul	27	10:33	march.new
99124	Jul	27	10:34	piedmont.new
118	Jul	27	10:34	sacks.new
1443	Jul	27	10:34	sandbanks.new
26151	Jul	27	10:34	slough.new
255991	Jul	27	10:34	surrey.new
50366	Jul	27	10:34	troy.new
29	Jul	27	10:34	wills.new
523	Jul	27	10:34	The.new
523	Jul	27	10:34	the.new
48	Jul	27	10:34	Is.new
48	Jul	27	10:34	is.new
337	Jul	27	10:34	were.new
199	Jul	27	10:34	That.new
199	Jul	27	10:34	that.new
370	Jul	27	10:34	said.new
1155	Jul	27	10:34	One.new
1155	Jul	27	10:34	one.new
5430	Jul	27	10:34	goes.new

Bot updating Webarchive template is adding "url" same as existing "url2"[edit]

This bot made a group of WaybackMedic 2.5 edits in June where it "rescued" an archive link in the |url= parameter of {{Webarchive}}, replacing it with a this link which was already in the |url2= parameter. Two examples of this are Grant Bramwell: revised 1 June 2022 and List of ICF Canoe Sprint World Championships medalists in men's kayak: revised 26 June 2022. Can the bot remove the duplicate url2/date2/title2 parameters and renumber any subsequent url3/date3/title3, etc.? I've fixed over 500 of these edits myself, but there are still over 700 remaining to be fixed. Thanks. -- Zyxw (talk) 03:54, 9 August 2022 (UTC)Reply[reply]

That was part of the deprecation of WebCite which is a dead archive provider. It didn't account for dups. It's complicated here because even though |url= and |url2= are the same, |title= and |title2= are different - which do you choose. I think the best course is the keep |url= set and remove the |url2= set, at least based on two examples. In terms of renumbering that is not required as the webarchive template is designed to allow any numbers up to 10, so long as there is a |url= .. aka |url1= .. is the only requirement. I'll start looking at this today. -- GreenC 15:35, 9 August 2022 (UTC)Reply[reply]
@GreenC: I agree with keeping the |url= set and removing the |url2= set when there is a duplicate URL and that is what I did for the 500+ already fixed. I also thought {{Webarchive}} might automatically handle the missing |url2= set and display the |url3= set, but as per these tests that is not the case:
archive with url/date/title, url2/date2/title2, and url3/date3/title3
url2/date2/title2 removed with url3/date3/title3 remaining
url2/date2/title2 removed and url3/date3/title3 renumbered
-- Zyxw (talk) 16:15, 9 August 2022 (UTC)Reply[reply]
Reported at Template_talk:Webarchive#Gaps_in_argument_sequence. I wrote the template originally but Trappist did a major rewrite so I'm not sure if that is my bug or his. I processed the first 500 articles and there are only 3 with a |url3= suggesting 40 or 50 at most in the whole bunch. Anyway it won't be difficult to renumber them. -- GreenC 16:26, 9 August 2022 (UTC)Reply[reply]
Ah miscalculated it's 733 not 7,330 :) It's done see anything more let me know. -- GreenC 17:08, 9 August 2022 (UTC)Reply[reply]
Fixed the webarchive bug. -- GreenC 18:06, 9 August 2022 (UTC)Reply[reply]

Bad webcitation link replacement[edit]

So I've just found out that GreenC bot made edits like this, replacing a dead archive link with another dead archive link. Would it be possible to replace that archive link with, say, this one that actually works? Thanks very much! Graham87 11:48, 26 August 2022 (UTC)Reply[reply]

Bots are not 100% perfect. It relies on the Wayback API to determine live links and it is not perfect so for those errors it depends on human intervention to correct. The alternative is not to use bots at all , in which case most links never get fixed at all due to the scale, it's back-end boring work people want bots to do, but there is not guarantee bots, or for that matter people, will not make mistakes. The question is the scale of mistakes. -- GreenC 15:08, 26 August 2022 (UTC)Reply[reply]
Yeah fair enough, soft 404's and all. On re-reading my message I spectacularly failed at phrasing it clearly ... there are nearly a hundred more such links; could you instruct the bot to replace them with a working archive (i.e. the one linked above)? I thought that would be the easiest way to fix this problem. I tried changing the archive link on InternetArchiveBot's side and asking it to fix the affected articles, but that didn't do what I intended. Graham87 13:34, 27 August 2022 (UTC)Reply[reply]
OK it's done. Yeah there's no way to automate replace of one archive with another via IABot. That would be a good feature though when finding soft-404s. -- GreenC 16:16, 27 August 2022 (UTC)Reply[reply]
Opened Phab T316438 .. no idea if or when. -- GreenC 16:34, 27 August 2022 (UTC)Reply[reply]

Avoid editing inside HTML comments[edit]

GreenC bot now edits inside HTML comments eg. Special:Diff/1107954452, but I suggest it not to. Although the edit in this example happened to be harmless (even useful), in general, comments could be used for a wide range of reasons, so there is a higher risk that automatic edits could break their intentions. Wotheina (talk) 03:49, 2 September 2022 (UTC)Reply[reply]

That's true but there is a positive trade-off so for a couple reasons I am OK fixing certain (not all) link rot in comments, as I have been doing for 7 years. If someone wants to preserve a block of immutable wikitext they should use the talk page, user page or offline - otherwise anyone can edit the comment or delete it entirely. Comments can be strangely formatted, I take measures, auto and manual, to check commented text before posting a live diff. -- GreenC 05:39, 2 September 2022 (UTC)Reply[reply]

Stopping backlinks report during wikibreak[edit]

Hello, and thanks again for the useful Backlinks reports. I'm currently taking a Wikibreak and have attempted to exclude my list from the bot's tasks thus but it still ran today. It's not a problem for me if the reports continue but, if you'd like to save some resources by stopping it properly, please go ahead. Certes (talk) 11:25, 5 September 2022 (UTC)Reply[reply]

Fixed, it was seeing Action=RUN in the "#" comment. First time this code has been tested :) Have a good break. -- GreenC 05:14, 6 September 2022 (UTC)Reply[reply]

Please Update the monthly list of Top 10000 wikipedia users by Article Count[edit]

Please Update the monthly list of Top 10000 wikipedia users by Article Count which changes every 1st and 15th date of a month. Abbasulu (talk) 07:52, 3 October 2022 (UTC)Reply[reply]

It's still running for some reason very slowly in 3 days it only completed 19%. -- GreenC 12:51, 3 October 2022 (UTC)Reply[reply]

Exactly what purpose did this edit serve? Edit summary is misleading at best[edit]

https://en.wikipedia.org/w/index.php?title=Rodney_Marks&diff=1095741886&oldid=1091111369 108.246.204.20 (talk) 20:17, 3 October 2022 (UTC)Reply[reply]

Don't use {{dead link}} if the citation has a working |archive-url=. -- GreenC 20:46, 3 October 2022 (UTC)Reply[reply]
it doesn't. "this page is not available". 108.246.204.20 (talk) 04:15, 14 October 2022 (UTC)Reply[reply]
Ah soft-404. Removed. O also updated the IABot databace. -- GreenC 04:24, 14 October 2022 (UTC)Reply[reply]

A cookie for you![edit]

Choco chip cookie.png Ulises12345678 (talk) 11:00, 9 October 2022 (UTC)Reply[reply]
Thank you. For the Cookie. -- GreenC 14:12, 9 October 2022 (UTC)Reply[reply]