Wikipedia:Link rot/URL change requests

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

This page is for requesting modifications to URLs, such as marking dead or changing to a new domain. Some bots are designed to fix link rot, they can be notified here, these include InternetArchiveBot and WaybackMedic. This page can be monitored by bot operators from other language wikis since URL changes are universally applicable.

blackwell-synergy.com and gaylesbiantimes.com[edit]

These previously-reputable domains were semi-recently replaced with spam and other nasty content. Blackwell-synergy.com has already been marked as dead in IABot, but I do not believe gaylesbiantimes.com has been. Both need to have |url_status=usurped set as they are not fit to be linked to. --AntiCompositeNumber (talk) 04:59, 13 November 2019 (UTC)

Yup, usurped. Blackwell has a lot of links too. I've set GLT to Blacklisted in IABot for now until I can start on this project. -- GreenC 14:34, 13 November 2019 (UTC)
4453 globally at the moment, if you're curious. And that's after the global cleanup effort. --AntiCompositeNumber (talk) 15:39, 13 November 2019 (UTC)
@AntiCompositeNumber: What do you suggest to do with the Blackwell links: 1. try to convert them to doi.org URLs, or 2. treat them as dead links, set to "usurped" and add an archive if avaiable? Or Step 1 and if not then Step 2? For Step 2, there is the possibility no archive can be found and the link exists outside a CS1|2 template in which case it would normally add a {{dead link}} but the spam link then is still clickable. There was talk about creating a new template called {{usurped}} where these free-floating usurped links could be embedded so they don't display but nothing has happened. -- GreenC 16:46, 13 November 2019 (UTC)
@GreenC:The best option is to convert Blackwell links in citation templates to |doi= and covert bare links to {{DOI}}. When that can't be done (say, because they're used in a labeled link, or because that would take a lot of development effort), doi.org links are the best option for an automated fix. If there's no valid DOI and no valid archive, tagging dead and moving on is the best option at the moment. Where we go from there would depend on how many are unfixable. If it's less than ~100, humans can review the links and take appropriate action. --AntiCompositeNumber (talk) 17:05, 13 November 2019 (UTC)

@AntiCompositeNumber: the bot ran for Blackwell and it basically eliminated the domain from mainspace. Replacing the url with |doi= or doi.org (examples: [1][2][3]) .. It can't detect {{doi}} so there are a few duplicates ([4]), and in a few cases cite templates ended up with both a doi.org URL and |doi=. It edited about 550 pages. The spam filters won't allow addition of new archive URLs, for one reason or another the bot couldn't do some things, these remaining pages have a Blackwell domain that need manual attention:

I'll take a look at GLT next. -- GreenC 16:41, 23 November 2019 (UTC)

@GreenC: Thanks. I've manually fixed those articles. --AntiCompositeNumber (talk) 21:04, 8 December 2019 (UTC)

@AntiCompositeNumber: - GayLesbianTimes.com is only in 76 mainspace articles so I set them manually - either with |url-status=usurped or for square and bare links that have a {{webarchive}} moving the archive URL into the square-barelink (example). Those without an archive URL had to be deleted and replaced with a non-URL citation. There are still links in non-mainspace, maybe they should just be blanked with a quick search-replace script unless someone wants to manually fix, it's not possible to add new archive URLs because of a blacklist filter. -- GreenC 01:38, 14 February 2020 (UTC)

comicbookdb.com shutting down on 16 December 2019[edit]

Web site comicbookdb.com has announced that it is shutting down as of 16 December 2019.

English-language Wikipedia has about 4,500 articles which include links to comicbookdb.com (mostly using the "comicbookdb" template).

Fabrickator (talk) 17:53, 20 November 2019 (UTC)

Looks like someone added "one-size-fits-all" archive to the template. Hard to know how many actually fit, better than nothing. Ideally a bot would convert the templates to {{cite web}} with |archive-url= so the bots can search for custom fit archives on a per-link basis. -- GreenC 04:43, 14 February 2020 (UTC)

NASA Image and Video Library[edit]

NASA's image library moved, see this change for an example of what can be potentially mass fixed. Likely mostly a Commons issue, might be some here.

There is a change GRIN links can be changed to the NASA Image and Video Library as well (see this change or this change), but those can get weird since there are sometimes multiple IDs. Would have to check to see if the GRIN ID number matches a non-404'ed URL for images.nasa.gov. Bonus points if the NASA-image template on the image page e.g. {{NASA-image|id=GPN-2000-001167|alternateid=S70-36485|center=JSC}} to try multiple possible ID's for working links.

The first case seems simple, the second case is more difficult but still possible I think. Likely a lot more of the second issue than the first, but not sure without having a tool to check. Kees08 (Talk) 21:54, 22 December 2019 (UTC)

@GreenC: Any thoughts on this? Kees08 (Talk) 16:32, 12 February 2020 (UTC)
  • @Kees08: Thank you for your patience as I work through these one at a time (I should be going oldest to newest but somehow went the other direction), it's a lot of work to program the bot for each job. On Enwiki the first task is small enough it might be done manually by someone. On Commons there are 300 or so, I could probably make a quick search-transform-replace script that would get most of them. However on commons when modifying an image "Source: " URL some people complain the new link may or may not be the original source image where the Commons image came from. I can see the point though at the same time maintaining a dead link for the source doesn't seem very useful, unless an archive URL can be found. The GRIN idea not sure I understand. How to determine http://grin.hq.nasa.gov/ABSTRACTS/GPN-2000-001167.html equates to http://images.nasa.gov/details-S70-36485 -- GreenC 21:50, 13 February 2020 (UTC)
    No worries on the timing, there is no rush on this, was just seeing if there was a technical reason you had not responded to it. No problem! In the case of GRIN, in this specific case, the workflow would be something like:
    1. Detect dead GRIN link
    2. Find ID numbers:
      1. http://grin.hq.nasa.gov/ABSTRACTS/GPN-2000-001167.html
      2. http://dayton.hq.nasa.gov/IMAGES/LARGE/GPN-2000-001167.jpg
      3. {{NASA-image|id=GPN-2000-001167|alternateid=S70-36485|center=JSC}}
    3. Use NASA Image Library API to determine if either ID number returns a valid page
    4. Replace GRIN link with NASA Image Library link
    Does that workflow make a little more sense? I would guess there are many dead GRIN links as that used to be NASA's main library before they moved it.
    On the replace text note where people complain about removing the dead link, perhaps you could add a dead link template to the dead one, and add the live link separately? Although I personally, in these cases, would prefer to just remove the dead link. Kees08 (Talk) 23:47, 13 February 2020 (UTC)
Ok wasn't aware of an API. But tried without luck. For example it works using the new NASA ID http://images-api.nasa.gov/search?q=S70-36485 (taken from the above details-S70-36485) but a search for a GRIN ID http://images-api.nasa.gov/search?q=2000-001167 (and variants) does not. Wonder if the API is aware of GRIN IDs? -- GreenC 03:04, 14 February 2020 (UTC)
Bummer. I sent an inquiry to NASA to see if the GRIN IDs are mapped to the NASA center specific IDs. Will let you know if I hear anything back. Kees08 (Talk) 02:01, 22 February 2020 (UTC)
@GreenC: Here is the response back I received: GRIN was created and maintained by HQ History Office but they since exported most of the imagery over to Flickr under “NASA Commons” They don’t use the GRIN numbers at all anymore. We only have 25 images in our database that cross reference a GPN #. I know if you put the GPN number in archive.org website (NON government site) it comes up with the image. Sometimes it will have a NASA center image ID too. http://archive.org/details/GPN-2000-001167
So sounds like we are out of luck when it comes to mapping the ID numbers. Do you have any idea how many links could be fixed with case 3 above, where we have a non-GRIN ID (such as S70-36485)? That would still be majorly helpful. Kees08 (Talk) 20:12, 25 February 2020 (UTC)
I'm not sure sorry. This is a complex task and I'm not sure when/if I will be able to do it. I don't think the numbers are very large, you could refactor the request given what we learned and try Village Pump Technical, there are some good programmers who might take it up. -- GreenC 16:19, 1 March 2020 (UTC)
No worries, I will try that, thanks. Kees08 (Talk) 17:33, 13 March 2020 (UTC)

500 obsolete usda links[edit]

We have nearly 500 links to ndb.nal.usda.gov/ndb/[5] which now return this redirect:

"As of October 1, 2019, this website (http://ndb.nal.usda.gov/ndb/) will no longer be available and users will be automatically redirected to FoodData Central..."

Picking two at random, this URL...

http://ndb.nal.usda.gov/ndb/foods/show/2950

...brings you to this page:

http://fdc.nal.usda.gov/fdc-app.html#/?query=ndbNumber:11197

and this URL...

http://ndb.nal.usda.gov/ndb/foods/show/105?fg=&man=&lfacet=&format=&count=&max=25&offset=25&sort=&qlookup=yogurt

...brings you to this page:

http://fdc.nal.usda.gov/fdc-app.html#/?query=ndbNumber:1116

Would this be a good candidate for an automated fix, or does someone have to manually fix all 500 before the original URL disappears? --Guy Macon (talk) 05:49, 6 January 2020 (UTC)

It doesn't seem like there is immediate danger of the redirects disappearing. They assure us the redirects will be in place as of Oct. 1 which is the case. But I agree it's a good idea to move URLs while redirects still exist. I'm way behind on projects, will keep this one in the queue, but it might be a while before I can program it. Most of the time, cases like this are more complex then they appear (some URLs don't have redirects, some do but lead to 404 pages etc..). -- GreenC 04:15, 7 January 2020 (UTC)
The number in the URL data is just a sequential identifier. For low numbers, it corresponds to the ndbNumber, but that number starts skipping values. That means we'd have to do some sort of lookup, and updating based on the redirects that are in place is probably easiest. --AntiCompositeNumber (talk) 04:18, 3 March 2020 (UTC)

kodak-worldREMOVETHIS.com now hosts malware[edit]

Kodak-worldREMOVETHIS.com was changed to www.officialkodakblack.com but URLs within the site do not necessarily map cleanly. The old URL now hosts malware (Signpost coverage, "Beware of malware", screen shot from Kaspersky).

Please change http://kodak-world.REMOVETHIScom/?page_id=24 (Biography of Kodak Black) to http://web.archive.org/web/20170103124913/http://kodak-world.com?page_id=24 and change the main URL where it appears by itself (such as in "Official web site" links) to www.officialkodakblack.com. Change any other uses to a non-recent/non-poison version on http://web.archive.org or a similar archive site or on www.officialkodakblak.com if it exists, and flag the rest for manual handling.

I found only a few instances of this in a manual sweep of Kodak Black articles in 14 languages so this task may already be complete. ru:Kodak Black, uk:Kodak Black, and fr:Kodak Black are now clean. However, we do need to scan the entire project for other instances of the poisoned web site. Previous discussion which pointed me here is at Wikipedia:Village_pump_(technical)#Should we be checking for links to the Shlayer trojan horse?(permalink). davidwr/(talk)/(contribs) 15:13, 31 January 2020 (UTC)

@Davidwr: It exists in one article. This page is for custom bot (programming) help, like 100s or 1000s. -- GreenC 16:19, 31 January 2020 (UTC)
Thanks GreenC. I don't know how I missed the English version. In any case, is there an easy way to request that the entire wikimedia/wikipedia space, across all languages and projects, be scanned for this URL? More generally, is there an easy way to do a wikimedia/wikipedia-wide scan of URLs that are currently "toxic"? davidwr/(talk)/(contribs) 17:05, 31 January 2020 (UTC)
As for scanning all 300+ language wikis, this Google search has some results, though it is missing the Enwiki so may not be complete. It would be a good question for Village Pump Tech as there might be a tool for searching across all languages. -- GreenC 21:56, 31 January 2020 (UTC)

springerlink.com[edit]

Since a month or two ago, springerlink.com has stopped working. Now all 3500 links from articles are a 404 like this, served by a supposed "UltraDNS client redirection service" with "Copyright © 2001-2008 NeuStar".

The good news is that a request to the Internet Archive can reveal the current location, for instance [6] redirects to [7] (and then [8] which can be ignored). Because the new URLs contain the DOI, they can then be translated in a more permanent doi.org URL. Nemo 08:17, 6 February 2020 (UTC)

Worth a shot see what archive.org returns if something make the change. The hardest part will be "Springer <whatever>" text that can appear in the title, work, publisher fields and square brackets or free floating text inside/outside a ref. Will start in on this next. -- GreenC 05:13, 12 February 2020 (UTC)
Nemo following the example the URL is http://doi.org/10.1007%2Fs12132-009-9048-y which redirects to link.springer.com .. it looks like they replaced springerlink.com with link.springer.com .. I'll leave the metadata stuff alone since it ends up at Springer anyway, just replace the springerlink.com URLs to doi.org where possible. -- GreenC 15:47, 12 February 2020 (UTC)
Yes, changing the URL should be enough. One could replace springerlink.com + whatever with link.springer.com + DOI, but while we're at it better use the doi.org resolver so we don't have to do this again in 5 or 10 years from now. Nemo 19:25, 12 February 2020 (UTC)
OK, after some testing it seems adding a doi.org url when an existing |doi= has the same DOI, so in those cases the net effect will be deletion of |url= field (or |chapter-url= or wherever). -- GreenC 21:10, 12 February 2020 (UTC)
That's fine! Citation bot can then easily finish the job. (Let me know if you're interested in running it yourself on those pages and you can use tips on how to do so.) Nemo 21:24, 12 February 2020 (UTC)
  • Done (i hope). Saved about 4,071 links. This includes deletions when the |doi= already exists. Another 1,000 archive URL additions when no DOI could be found. Archive URL removals when a doi.org could be found. Added [dead link] when no archive or doi discovered. Operations on CS1|2 templates, square and bare links; and in Mainspace, File:, Wikipedia: and Template:. -- GreenC 21:25, 13 February 2020 (UTC)
Yes, if not in templates then it doesn't have much option but to archive it because the other option is to delete the URL and it can't be done safely since it could create smoking craters. The "Minskey moment" diff looks like an oversight in the code, but you are right citation bot should pick those up in time. The ISSN and ISBN hard to say without seeing them in context why they were kept. -- GreenC 04:30, 18 February 2020 (UTC)
The ISSN are usually ancient batch additions which serve no purpose whatsoever because there's usually another link to the current homepage, plus there's always a link via ISSN or (for articles) other identifiers. Some were links to an RSS function which no longer exists. I've removed them now (some remain in Wikidata, hopefully will be taken care of). Nemo 08:06, 18 February 2020 (UTC)

U.S. Census Bureau domain factfinder.census.gov shutting down on 31 March 2020[edit]

The domain

factfinder.census.gov

will be taken offline on 31 March 2020.

As per http://factfinder.census.gov/faces/nav/jsf/pages/index.xhtml:

Most data previously released on AFF are now being released on the U.S. Census Bureau's new dissemination platform, data.census.gov. For more information about the transition from American FactFinder to data.census.gov, see Transition From AFF. Included on this page are information on historic AFF data, documentation on updating AFF links, and resource materials, including tutorials, webinars, and how-tos on using data.census.gov. If you have questions or comments, please email: cedsci.feedback@census.gov.

There are over 4,600 Wikipedia articles directly referencing this domain, as well as several templates that reference the domain. However, there are over 40,000 Wikipedia articles that use these templates. — Preceding unsigned comment added by Fabrickator (talkcontribs)

This is hugely important, but also hugely complex, plus most links are hidden inside custom templates to parse. There are two ways to approach it. 1) unwind the templates by converting them to {{cite web}} and treat them as dead links and add an archive URL, or 2) find the corresponding new URL at data.census.gov .. the problem with technique #1 is the FactFinder site uses web 2.0 type stuff that Wayback Machine has trouble archiving so won't be much help. Archive.today does better but most of the links are not saved. For #2, this is the ideal solution, but mapping URLs between old and new site looks very complicated. There are two documents (ominously 2 20-page "deep linking guide"), one for the old site and new site - the trick is to learn how to map between them and write software that can do it. -- GreenC 20:47, 8 February 2020 (UTC)

Discussion moved to WP:USCENSUS -- GreenC 03:31, 12 February 2020 (UTC)

Second shortcut WP:USCENSUSLINKS created, USCENSUS is a confusing name for shortcut, will discuss on Wikipedia talk:US Census Migration shortly. davidwr/(talk)/(contribs) 19:16, 12 February 2020 (UTC)

rpc.ift.org.mx[edit]

Technical and legal authorizations from the Mexican Federal Telecommunications Institute's Registro Público de Concesiones (RPC) are cited in hundreds of articles about Mexican broadcasting. There are 1,290 citations from the domain rpc.ift.org.mx which hosts the PDF documents.

On January 31, 2020, the RPC changed to begin serving HTTPS only. In addition, they added a "v" to the URL, so URLs that were formerly

http://rpc.ift.org.mx/rpc/pdfs/96255_181211120729_7489.pdf

changed to

http://rpc.ift.org.mx/vrpc/pdfs/96255_181211120729_7489.pdf

This will particularly be needed for Mexican radio and TV articles, as well as the lists that use them on eswiki (such as es:Anexo:Estaciones de radio en el estado de Michoacán). I am doing some high-link-count articles, like Imagen Televisión, manually. Raymie (tc) 02:13, 9 February 2020 (UTC)

I've done the above, so we've gone from 1,290 links out to 560 that need repair. Raymie (tc) 03:27, 9 February 2020 (UTC)
@Raymie: done in 430 articles. -- GreenC 05:03, 12 February 2020 (UTC)
Thank you GreenC for carrying out this continually important work for the project. Raymie (tc) 06:01, 12 February 2020 (UTC)

Thank you, @Raymie:. Comments like that help to keep going. In case you want to pursue it further there are 57 articles on eswiki with the links (listed). My bot doesn't have permissions there. Or we could make a bot request at [13] but I don't speak Spanish (well). -- GreenC 15:28, 12 February 2020 (UTC)

Extended content
  • Durango
  • Santiago de Querétaro
  • Canal 5 (México)
  • Celaya
  • Canal 11 (México)
  • Canal 9 (México)
  • Universidad Autónoma del Estado de Hidalgo
  • Imagen Televisión
  • XEQ-TDT
  • Ciudad Mante
  • MVS TV
  • TV UNAM
  • A+ (canal de televisión)
  • Isla Socorro
  • Televisión Independiente de México
  • Sistema Público de Radiodifusión del Estado Mexicano
  • XHTVM-TDT
  • Excélsior TV
  • XHTRES-TDT
  • Ingenio TV
  • XHUDG-TDT
  • XHUNAM-TDT
  • XHCDM-TDT
  • Canal 44 El Canal de las Noticias
  • Canal 28 de Chihuahua
  • Villa Insurgentes
  • Radio Universidad (Chihuahua)
  • TV Azteca Chihuahua
  • XHHEM-FM
  • La Caliente 90.9
  • Expresa TV
  • D95
  • XEROK-AM
  • XHLO-FM
  • XHUS-TDT
  • XHY-TDT
  • XHCHI-FM
  • XHJCI-TDT
  • XEFI-AM
  • XHES-FM
  • Arnoldo Cabada de la O
  • XHSECE-TDT
  • XHHM-FM
  • XHTPG-TDT
  • Canal 13 (México)
  • XHTM-TDT
  • XHHES-FM
  • XHENB-TV
  • XHDT-FM
  • XHIPN-FM
  • XEPL-AM
  • XHBW-FM
  • XHQMGU-TDT
  • XHFAMX-TDT
  • XHK-TV
  • Canal 46 (Ciudad de México)
  • XEJP 1150

Request for change of (soon to be) broken links to LPSN[edit]

(thread moved from WP:BOTREQ by GreenC)

The old LPSN website at http://www.bacterio.net is frequently linked to from Wikipedia. Many of these links target LPSN entries for species. Because all species belong to a genus and because LPSN uses one HTML page per genus name, links to LPSN species names are links to anchors within an LPSN page for the according genus name. For instance, on http://en.wikipedia.org/wiki/Acetobacter_aceti we find the link http://www.bacterio.net/acetobacter.html#aceti to the old LPSN page.

As part of an agreement between the old LPSN maintainer, Aidan C. Parte, and the Leibniz Institute DSMZ, LPSN has been taken over by DSMZ to ensure long-term maintenance (see also announcement here). In the course of this takeover, a new website was created. In contrast to the old LPSN website, the new LPSN website at http://lpsn.dsmz.de (currently http://lpsn-dev.dsmz.de) uses individual pages for species names. We will employ the following mapping:

(1) the domain http://www.bacterio.net is permanently redirected to http://lpsn.dsmz.de;

(2) the page address acetobacter.html is mapped to genus/acetobacter, which is the page for the genus Acetobacter on the new LPSN website.

This means, however, that http://www.bacterio.net/acetobacter.html#aceti is mapped to http://lpsn.dsmz.de/genus/acetobacter and not to http://lpsn.dsmz.de/species/acetobacter-aceti, which is the page for the species on the new LPSN website, as it should be. The reason for this limitation is that the anchor aceti is not even transferred by the browser and thus cannot be processed by the website. While links on http://lpsn.dsmz.de/genus/acetobacter are present that lead to http://lpsn.dsmz.de/species/acetobacter-aceti, it would be more convenient for the user if http://www.bacterio.net/acetobacter.html#aceti was transferred to a link that leads directly to http://lpsn.dsmz.de/species/acetobacter-aceti.

As LPSN URLs are stored in Wikidata (LPSN), this change should be doable task with the help of a bot. Therefore we are kindly asking for help to accordingly modify all Wikipedia links to LPSN species pages as described above. Tobias1984: you did a great job in the past, helping us with BacDive: Is there a chance that you help us again with this issue? --L.C.Reimer

@L.C.Reimer: I can help with this but wanted to get the request moved to the right place. -- GreenC 03:27, 14 February 2020 (UTC)

L.C.Reimer -- When would http://lpsn.dsmz.de be ready for the change? Seeing about 13,000 links. -- GreenC 04:18, 14 February 2020 (UTC)

@GreenC: We would appreciate your help very much. We will launch the new site and activate the redirect beginning next week. I will give here a note, when it is done.--L.C.Reimer

This is a very useful and thoughtful request for URL update, but I'd like to note that it ought to be possible for the target website to redirect the requests based on the fragment, if you use JavaScript. MediaWiki for instance rewrites some of its URLs when you're redirected. Nemo 09:43, 14 February 2020 (UTC)
Nemo thank you for the hint. We just discussed this solution, but this would mean another redirect and we already have 2 redirects. We believe this would negatively affect SEO. However, clean links are favorable and I hope by the aid of GreenC we are able to clean up and maintain these. So, we just launched the new site and the redirects are now active. This means we could start with the bot. @GreenC:: eventually we should discuss the details directly?--L.C.Reimer
L.C.Reimer, on closer look there are two types of links on Wikipedia. For example in Yersinia aldovae there are two links to bacterio.net .. in the "External links" section which is a normal type of URL directly in the page. The other in the bottom graphic labeled "Taxon identifiers". This is the template {{taxonbar}} which pulls the URL from Wikidata. I am able to fix the first type, but not the second. For Wikidata requests you could try [14]. The other problem my processes only update English Wikipedia (and Commons) and since there are about 300 language wikis it presents a challenge to make Wikipedia-wide changes as each wiki language is its own organization where permissions and tools customized for that language are secured eg. ar.wikipedia.org requires tools customized for Arabic language and permissions from the Arabic community to make these changes with a bot. I would suggest, if you are able, to create and maintain redirects. Nevertheless, if you would like to convert the in-wiki links on Enwiki I can do that. -- GreenC 23:23, 18 February 2020 (UTC)
On Enwiki, there are 6,487 links in 6,386 articles that might be converted. The rest are imported from Wikidata via templates like {{taxonbar}}. -- GreenC 00:53, 19 February 2020 (UTC)
GreenC Thank you for the explanations. We would be happy, if you could convert the links in Enwiki. We will deal with the links in wikidata separately, as we want to make sure to have clean URLs for future entries anyway. Regarding all the other language wikis we will have a closer look, what we can do.--L.C.Reimer

L.C.Reimer, a couple new issues.

  • 1. In this list, there are some links that 404: http://www.bacterio.net/a/acetoanaerobium.html has an extra "/a/" in the path (there is "/m/" and other letters). Some links have a leading "-" like http://www.bacterio.net/-number.html. I guess for now it will verify the new URL is working with a header check before making the change or otherwise leave as-is, these look like low volume exceptions.
For "/a/" it seems that simply removing it works; so http://www.bacterio.net/a/acetoanaerobium.html --> http://www.bacterio.net/acetoanaerobium.html --> http://lpsn.dsmz.de/genus/acetoanaerobium. -- GreenC 20:01, 19 February 2020 (UTC)
Extended content
  HTTP/1.1 301 Moved Permanently
  Date: Wed, 19 Feb 2020 18:32:22 GMT
  Server: Apache
  Location: http://lpsn.dsmz.de/bacillales.html
  Content-Length: 244
  Content-Type: text/html; charset=iso-8859-1
  Via: 1.1 varnish (Varnish/6.3), 1.1 varnish (Varnish/6.3)
  X-Cache-Hits: 0
  X-Cache: MISS
  Age: 0
  Connection: keep-alive
  HTTP/1.1 301 Moved Permanently
  Date: Wed, 19 Feb 2020 18:32:23 GMT
  Server: Apache/2.4.37 (Red Hat Enterprise Linux) OpenSSL/1.1.1c mod_fcgid/2.3.9
  X-Powered-By: PHP/7.3.5
  Location: /order/bacillales
  Content-Length: 0
  Content-Type: text/html; charset=UTF-8
  HTTP/1.1 200 OK
  Date: Wed, 19 Feb 2020 18:32:23 GMT
  Server: Apache/2.4.37 (Red Hat Enterprise Linux) OpenSSL/1.1.1c mod_fcgid/2.3.9
  X-Powered-By: PHP/7.3.5
  Vary: Accept-Encoding
  Transfer-Encoding: chunked
  Content-Type: text/html; charset=UTF-8

The second Location: line contains /order/bacillales which is added onto the domain name found in the first Location line. There are probably other paths besides /order/ we don't know about yet. -- GreenC 19:44, 19 February 2020 (UTC)

Results[edit]

@L.C.Reimer: The bot has completed. It converted 11,355 links in 5,718 articles (the previous link count of 6,487 is incorrect.) All links were tested as working (header status code 200). Some typical diffs:

It was unable to convert 1,240 links because the new URL doesn't work (header status 404). Can provide a list of those if you want, most of them appear to be related to Streptomyces. -- GreenC 02:29, 20 February 2020 (UTC)

www.bacterio.cict.fr[edit]

Found these: [19] -- GreenC 14:47, 20 February 2020 (UTC)

It converted 371 links in 343 articles. Examples: [20][21]. It was unable to convert 260 links, a list of these available on request. -- GreenC 15:32, 20 February 2020 (UTC)

SR/Olympics will die soon[edit]

Hello! Please archive SR/Olympics (link) before 1st of March, when the site is going to die. I am posting this request here per GreenC's comment on Cyberpower678's talk page. Please note that this problem effects every wiki. Thanks in advance! Bencemac (talk) 18:36, 17 February 2020 (UTC)

Update on progress. I was able to determine there are about 156,000 unique URLs of www.sports-reference.com/olympics across all wiki projects and languages. I've setup a script to check each against Wayback and if not there issues a save command. It takes about 2 seconds for each URL so that is about 3.5 days .. should be done by mid-day Monday the 24th assuming no problem. Once the saves are done, the next step will be to update the IABot database with the new archive URLs. -- GreenC 03:58, 21 February 2020 (UTC)

Did the first part of this go well? Is it time for phase 2? Kees08 (Talk) 20:14, 25 February 2020 (UTC)
Yes everything is saved at Wayback, it could go offline today and we have everything needed. There are over a quarter million links (not the 156,000 I thought), so every step takes a long time. 1. Find all links across all Wiki projects (a day). 2. Save those links at Wayback (4 days). 3. Find all links in the IABot database by downloading the entire database (5 days). 4. Find which links are not in the IABot database that exist in the Wikis (2 days). 5. Add those missing links to the IABot database (3 days). 6. Upload archive URLs into the IAbot database (TBD). 7. Run IABot on all articles (TBD). I'm in the middle of step 5. Step 6 will take a long time many days. So will step 7. There might be a step 8, which is run WaybackMedic but that would take weeks, I have no decided if it will be needed depending on step 7 results. -- GreenC 21:46, 25 February 2020 (UTC)
GreenC, thank you very much! :) Bencemac (talk) 08:01, 27 February 2020 (UTC)
This is great news, thanks a lot for your hard work! Teemeah 편지 (letter) 09:29, 27 February 2020 (UTC)
Turns out IABot database has separate records for http and http, it sees them as totally different URLs. And sometimes sports-reference.com has pages available for both, or only one or the other, oddly. Will need to redo step 5 and 6 for those. -- GreenC 15:06, 27 February 2020 (UTC)
  • There are some links that don't work, example: [22] -> [23] -> [24] .. these links worked at one time, went dead before they were archived, and now we have no archives for them. Not sure the percentages but there will be some {{dead link}} once IABot starts running. -- GreenC 16:13, 1 March 2020 (UTC)
  • @Bencemac: I have run into a problem. See [25]. Notice the log at the bottom. It says "Kleivas(lvwiki) Set all links in the domain www.sports-reference.com to Alive". This overwrote the work I did to change the olympics links to Dead (about a quarter million links) which took about 8 days to process. So we are back to ground zero -- all links are now Live. Basically, I can't set the olympics links to dead until the site is actually dead, otherwise anyone will change the entire domain to Live at any time nullifying the work to set them dead. @Cyberpower678: if you have any thoughts. -- GreenC 16:23, 5 March 2020 (UTC)
I see, although only admin and root permissions give you access to changedomaindata. Does it mean that Kleivas is one of them? Few clicks and your work is gone, very unfortunate; I am sorry about it. Bencemac (talk) 17:57, 5 March 2020 (UTC)
Looking more closely, Kleivas is a user on lvwiki and has never logged into the interface. It was done by IABot through some internal logic that looks at reverts of IABot edits (on wiki), as done by Kleivas in this case, and makes a decision to change the domain status ie. if a user says the domain is working by way of reverting IABot, then it resets the domain status to live. Hrmph. - GreenC 18:12, 5 March 2020 (UTC)

Added another Phab related to this: T248641 .. -- GreenC 23:24, 26 March 2020 (UTC)

Spaceflight101 may die[edit]

Sorry to copy the title of the previous section :). At an unknown date Spaceflight101.com will go away. The notice on the homepage says it is paid through to the end of 2019. Though the Twitter says On Hiatus. Revival planned ... late 2020. It may be prudent to archive all of the Spaceflight101.com links just in case the revival does not happen. If you think it is a waste of time we can just see what happens, not a big deal. Kees08 (Talk) 17:40, 13 March 2020 (UTC)

IABot and/or nomore404 should have automatically archived every link by now, a check of a few shows so. There are a couple hundred total. -- GreenC 17:52, 13 March 2020 (UTC)

ECOSecretariat.org usurped[edit]

ECOSecretariat.org has been usurped by an unrelated website. There were 24 cites/ELs that I recovered and marked, and one that I was unable to recover and marked dead. —[AlanM1 (talk)]— 21:24, 18 March 2020 (UTC)

I blacklisted it in the IABot interface. -- GreenC 20:51, 19 March 2020 (UTC)