Update our Gitea robots.txt from gitea.com's
We've experienced some runaway growth of Gitea archive cache files on one of our backends, which according to upstream is often caused by web crawlers indexing the archive URLs. They recommended updating our robots.txt to the current state of https://gitea.com/robots.txt in order to help mitigate the issue. I've kept things we expressly commented out before still commented out, or anything that seems similar to what we commented out on the assumption that the reasons would carry over. After some discussion in IRC, we also decided it would make sense to disallow /avatars and /user/* like they do. Change-Id: I2b43b89de08c9a9d170e1ecbd14b1e6336fd2c84
This commit is contained in:
parent
8734fa7c6e
commit
79103e1a35
@ -3,6 +3,7 @@
|
|||||||
# and
|
# and
|
||||||
# https://github.com/robots.txt
|
# https://github.com/robots.txt
|
||||||
# at 2020-07-01
|
# at 2020-07-01
|
||||||
|
# and https://gitea.com/robots.txt on 2024-01-05
|
||||||
#
|
#
|
||||||
# Some commented out items are left to indicate we have considered
|
# Some commented out items are left to indicate we have considered
|
||||||
# them and would like to explicitly allow them for indexing while they
|
# them and would like to explicitly allow them for indexing while they
|
||||||
@ -10,26 +11,82 @@
|
|||||||
|
|
||||||
User-agent: *
|
User-agent: *
|
||||||
|
|
||||||
# Disallow: /avatars
|
Disallow: /api/*
|
||||||
# Disallow: /user/*
|
Disallow: /avatars
|
||||||
|
Disallow: /user/*
|
||||||
|
|
||||||
# Disallow: /*/*/src/commit/*
|
# Disallow: /*/*/src/commit/*
|
||||||
# Disallow: /*/*/commit/*
|
# Disallow: /*/*/commit/*
|
||||||
|
# Disallow: /*/*/*/refs/*
|
||||||
|
|
||||||
|
Disallow: /*/*/*/star
|
||||||
|
Disallow: /*/*/*/watch
|
||||||
|
Disallow: /*/*/labels
|
||||||
Disallow: /*/*/activity/*
|
Disallow: /*/*/activity/*
|
||||||
Disallow: /vendor/librejs.html
|
Disallow: /vendor/*
|
||||||
Disallow: /api/swagger
|
|
||||||
Disallow: /swagger.*.json
|
Disallow: /swagger.*.json
|
||||||
|
|
||||||
# Language spam
|
# Language spam
|
||||||
Disallow: /*?lang=
|
Disallow: /*?lang=
|
||||||
|
|
||||||
# From github
|
# from Github, to be cleaned
|
||||||
Disallow: */archive/
|
Allow: /*/*/tree/master
|
||||||
Disallow: */blame/
|
Allow: /*/*/blob/master
|
||||||
|
Disallow: /*/*/pulse
|
||||||
|
Disallow: /*/*/tree/*
|
||||||
|
Disallow: /*/*/blob/*
|
||||||
|
Disallow: /*/*/wiki/*/*
|
||||||
|
Disallow: /gist/*/*/*
|
||||||
|
Disallow: /oembed
|
||||||
|
Disallow: /*/forks
|
||||||
|
Disallow: /*/stars
|
||||||
|
Disallow: /*/download
|
||||||
|
Disallow: /*/revisions
|
||||||
|
Disallow: /*/*/issues/new
|
||||||
|
Disallow: /*/*/issues/search
|
||||||
|
Disallow: /*/*/commits/*/*
|
||||||
|
Disallow: /*/*/commits/*?author
|
||||||
|
Disallow: /*/*/commits/*?path
|
||||||
|
Disallow: /*/*/branches
|
||||||
|
Disallow: /*/*/tags
|
||||||
|
Disallow: /*/*/contributors
|
||||||
|
Disallow: /*/*/comments
|
||||||
|
Disallow: /*/*/stargazers
|
||||||
|
Disallow: /*/*/search
|
||||||
|
Disallow: /*/tarball/
|
||||||
|
Disallow: /*/zipball/
|
||||||
|
Disallow: /*/*/archive/
|
||||||
|
|
||||||
# Disallow: /raw/*
|
# Disallow: /raw/*
|
||||||
|
|
||||||
|
Disallow: /*/followers
|
||||||
|
Disallow: /*/following
|
||||||
|
Disallow: /stars/*
|
||||||
|
Disallow: /*/blame/
|
||||||
|
Disallow: /*/watchers
|
||||||
|
Disallow: /*/network
|
||||||
|
Disallow: /*/graphs
|
||||||
|
|
||||||
|
# Disallow: /*/raw/
|
||||||
|
|
||||||
|
Disallow: /*/compare/
|
||||||
|
Disallow: /*/cache/
|
||||||
|
Disallow: /*/*/blame/
|
||||||
|
Disallow: /*/*/watchers
|
||||||
|
Disallow: /*/*/network
|
||||||
|
Disallow: /*/*/graphs
|
||||||
|
|
||||||
|
# Disallow: /*/*/raw/
|
||||||
|
|
||||||
|
Disallow: /*/*/compare/
|
||||||
|
Disallow: /*/*/cache/
|
||||||
Disallow: /.git/
|
Disallow: /.git/
|
||||||
Disallow: */.git/
|
Disallow: /*/.git/
|
||||||
Disallow: /*.git$
|
Disallow: /*.git$
|
||||||
|
Disallow: /*/sitemap.xml
|
||||||
|
Disallow: /search/advanced
|
||||||
|
Disallow: /search
|
||||||
Disallow: /*q=
|
Disallow: /*q=
|
||||||
|
Disallow: /*.atom
|
||||||
|
|
||||||
Crawl-delay: 2
|
Crawl-delay: 2
|
||||||
|
Loading…
Reference in New Issue
Block a user