Jump to content

sadasya:SchlurcherBot

Page contents not supported in other languages.
Wikipedia se

SchlurcherBot

Function overview: Convert links from http:// to https://

Rationale:

Programming language: C#

Source code available: Main C# script: commons:User:SchlurcherBot/LinkChecker

Namespaces: This bot only edits on namespace 0 (Main) and 6 (File)

Function details: The link checking algorithm is as follows:

  1. The bot extracts all http-links from the parsed html code of a page.
    • It searches for all href elements and extracts the links.
    • It does not search the wikitext, and thus does not rely on any Regex.
    • This is also to avoid any problems with templates that modify links (like archiving templates).
    • Links that are subsets of other links are filtered out to minimize search and replace errors.
  2. The bot checks if the identified http-links also occur in the wikitext, otherwise they are skipped.
  3. The bot checks if both the http-link and the corresponding https-link is accessible.
    • This step also uses a blacklist of domains that were previously identified as not accessible.
  4. If both links redirect to the same page, the http-link will be replaced by the https-link (the link will not be changed to the redirect page, the original link path will be kept).
  5. If both Links are accessible and return a success code (2xx), it will be checked if the content is identical.
    1. If the content is identical, and the link is directly to the host, then the http-link will be replaced by the https-link.
    2. If the content is identical but not the host, it will be checked if the content is identical to the host link, only if the content is different, then the http-link will be replaced by the https-link.
      • This step is added as some hosts return the same content for all their pages (like most domain sellers, some news sites or pages in ongoing maintenance).
    3. If the content is not identical, it will be checked if the content is at least 99.9% identical (calculated via the en:Levenshtein distance).
      • This step is added as most homepages use dynamic IDs for certain elements, like for ad containers to circumvent Ad Blockers.
    4. If the content is at least 99.9% identical, the same host check as before will be performed.
    5. If any of the checked links fails (like Code 404), then nothing will happen.

Source for pages: The bot works on the list of pages identified through the external links SQL dump. The list was scrambled to ensure that subsequent edits are not clustered from a specific area.

Further comments: The bot respects the API etiquette and uses both a user-agent header as well as respects the maxlag parameter.

Status: (CentralAuth)

Approved as global bot (per this request) and thus flagged as bot on all projects that did not opt-out (per this list).

Project Request Pages Edit Description Used Status
commonswiki Approved 31'145'089 Fix http to https Working Waiting
dewiki Approved 1'888'381 Bot: http → https Working Waiting
enwiki Approved 8'570'327 Bot: http → https Working Waiting
eswiki Approved 2'191'542 Bot: http → https Working Waiting
frwiki Approved 2'970'187 Bot: http → https Working Waiting
itwiki Approved 2'359'233 Bot: http → https Working Waiting
jawiki Allows global bots 994'375 Bot: http → https Running…
plwiki Approved 1'527'763 Bot: http → https Running…
ptwiki Approved 1'214'889 Bot: http → https Working Waiting
ruwiki Allows global bots 1'797'992 Bot: http → https Working Waiting
zhwiki Allows global bots 1'105'051 Bot: http → https Working Waiting
dewikinews Approved 17'280 Bot: http → https Done
dewikiquote Pending 5'673 Bot: http → https  On hold
dewikisource Approved 97'284 Bot: http → https Done
dewikiversity Approved 9'301 Bot: http → https Done
dewikivoyage Approved 19'094 Bot: http → https Done
dewiktionary Approved 145'334 Bot: http → https Done
altwiki Approved (Village Pump) 864 Bot: http → https Working Waiting
arywiki [1] 8'953 Bot: http → https
bnwiki [2] 172'842 বট: http থেকে https-তে পরিবর্তন করছে
bnwikibooks Approved (Village Pump) 745 বট: http থেকে https-তে পরিবর্তন করছে Working Waiting
bswiki [3] 79'281 Bot: http → https
cswikibooks [4] 601 Bot: http → https
cswikisource [5] 46'231 Bot: http → https
cswikiversity [6] 1'946 Bot: http → https
dvwiki [7] 1'088 Bot: http → https
enwikisource [8] 117'079 Bot: http → https
enwiktionary [9] 443'242 Bot: http → https
eswikibooks [10] 3'417 Bot: http → https
eswikinews [11] 14'339 Bot: http → https
eswikisource Approved (Village Pump) 5'826 Bot: http → https Done
frwikibooks [12] 7'632 Bot: http → https
frwikinews [13] 19721 Bot: http → https
frwikisource Pending 42'309 Bot: http → https  On hold
frwikiversity [14] 4'126 Bot: http → https
frwikivoyage [15] 8'536 Bot: http → https
frwiktionary [16] 532'493 Bot: http → https
fywiki Approved (Village Pump) 30'516 Bot: http → https Running…
glwiki [17] 213'696 Bot: http → https
hewikibooks Pending (Village Pump) 1'660 Bot: http → https  On hold
hewikisource Pending 98'820 Bot: http → https  On hold
hewikivoyage Allows global bots 2'038 Bot: http → https
hewiktionary Pending (Village Pump) 6'559 Bot: http → https  On hold
hiwiktionary Pending (Village Pump) 4'970 Bot: http → https  On hold
hrwikibooks Pending (Village Pump) 428 Bot: http → https  On hold
hrwikiquote Pending (Village Pump) 1'254 Bot: http → https  On hold
huwikibooks Pending (Village Pump) 18'488 Bot: http → https  On hold
huwikisource Pending (Village Pump) 7'222 Bot: http → https  On hold
idwiki [18] 673'383 Bot: http → https
iswiki Approved 30'026 Bot: http → https  On hold
iswikisource Pending (Village Pump) 38 Bot: http → https  On hold
iswiktionary Pending (Village Pump) 17'145 Bot: http → https  On hold
itwikinews [19] 12'880 Bot: http → https
itwikivoyage [20] 8'563 Bot: http → https
itwiktionary [21] 80'610 Bot: http → https
jawikibooks Approved 1'873 Bot: http → https Done
jawiktionary Pending (Village Pump) 8'834 Bot: http → https  On hold
kshwiki Approved (Village Pump) 1'364 Bot: http → https Working Waiting
lawikisource Pending (Village Pump) 9'453 Bot: http → https  On hold
liwikisource Pending (Village Pump) 1'080 Bot: http → https  On hold
liwiktionary [22] 86 Bot: http → https
mnwwiki [23] 1'010 Bot: http → https
mrwiki [24] 59'852 Bot: http → https
mrwikisource Pending (Village Pump) 1'372 Bot: http → https  On hold
mtwiki [25] 4'626 Bot: http → https
ndswiki [26] 24'342 Bot: http → https
nlwikivoyage [27] 2'385 Bot: http → https
nnwiki Approved (Village Pump) 144'877 Bot: http → https Working Waiting
outreachwiki [28] 6'136 Bot: http → https
plwikiquote Approved (Temporarily) 10'443 Bot: http → https Done
plwiktionary Pending 92'252 Bot: http → https  On hold
ptwikibooks Pending (Village Pump) 4'917 Bot: http → https  On hold
rowiki [29] 608'015 Bot: http → https
rowiktionary Approved (Temporarily) (Village Pump) 82'646 Bot: http → https  On hold
ruwikinews Pending 833'044 Bot: http → https  On hold
ruwikisource [30] 213'855 Bot: http → https
ruwiktionary [31] 19'274 Bot: http → https
slwiki [32] 135'056 Bot: http → https
slwikisource [33] 16'902 Bot: http → https
sourceswiki Pending (Village Pump) 26'478 Bot: http → https  On hold
specieswiki Pending (Village Pump) 640'405 Bot: http → https  On hold
srwiki [34]] 659'443 Bot: http → https
srwikibooks Pending (Village Pump) 935 Bot: http → https  On hold
svwikisource [35] 1'791 Bot: http → https
svwikiversity Pending (Village Pump) 374 Bot: http → https  On hold
svwikivoyage [36] 1'327 Bot: http → https
ukwiki [37] 1'280'019 Bot: http → https
urwiki [38] 148'606 Bot: http → https
vecwikisource Pending (Village Pump) 4'875 Bot: http → https  On hold
viwiki [39] 1'350'516 Bot: http → https
wuuwiki [40] 6'016 Bot: http → https
yuewiktionary Pending (Village Pump) 401 Bot: http → https  On hold