Block access to Gitea's archive feature

Crawlers have been hitting the archive URLs in Gitea, which can
result in massive cached archive files filling the disk faster than
the daily cron clears them out. This feature is an attractive
nuisance anyway for many projects, particularly Python-based source
repositories for which users mistakenly assume that a tarball of the
worktree is a suitable substitute for an sdist package, which leads
to a lot of confusion if build backends like PBR or setuptools-scm
are relied on.

Fortunately, Gitea now has a way to turn off this functionality. Add
a test to make sure these URLs return a 404 in order to prevent any
accidental future regression. Disable the archive cleanup cron as
well, since it's just a no-op at this point.

Change-Id: I0912243f40f2101bf1f3133fbf306def10aa5f83
This commit is contained in:
Jeremy Stanley
2025-06-05 15:25:16 +00:00
parent 8e5b2a072a
commit 36e1de0d5c
2 changed files with 25 additions and 8 deletions

View File

@ -39,6 +39,7 @@ ROOT = /data/git/repositories
DISABLED_REPO_UNITS = repo.issues,repo.pulls,repo.wiki,repo.projects,repo.actions
DISABLE_STARS = true
DISABLE_MIGRATIONS = true
DISABLE_DOWNLOAD_SOURCE_ARCHIVES = true
[git]
; Implemented in 1.16 but broke older git clients. Now expected to work
@ -128,16 +129,15 @@ STORAGE_TYPE = local
PATH = /data/git/lfs
; This is an undocumented gitea cron job that will delete all
; repo archives once daily at midnight. Repo archives are
; repo archives periodically if enabled. Repo archives are
; tarballs/zips/etc of repository state generate for things like
; tags. This helps ensure we don't run out of disk.
; tags. We used to rely on it, but some crawlers are so aggressive
; they manage to fill up our filesystems between scheduled cleanups
; so instead we've blocked access to the feature entirely. This
; defaults to disabled, but keep it explicit in here as a reminder
; in case we ever revert the change and restore archive access.
[cron.delete_repo_archives]
ENABLED = true
RUN_AT_START = false
NOTICE_ON_SUCCESS = false
; Note we run this several hours after 0000 (midnight) to avoid conflict
; with default cron jobs run by gitea at that time.
SCHEDULE = 0 0 3 * * *
ENABLED = false
; We don't need gitea phoning out to check versions. We stay on
; top of new releases using github release notifications over email.

View File

@ -71,6 +71,23 @@ def test_proxy_ua_blacklist(host):
'https://gitea99.opendev.org:3081/')
assert '403 Forbidden' in cmd.stdout
def test_disable_archives(host):
cmd = host.run('curl --insecure '
'--resolve gitea99.opendev.org:3081:127.0.0.1 '
'https://gitea99.opendev.org:3081/'
'opendev/system-config/archive/master.bundle')
assert cmd.stdout == 'Not Found\n'
cmd = host.run('curl --insecure '
'--resolve gitea99.opendev.org:3081:127.0.0.1 '
'https://gitea99.opendev.org:3081/'
'opendev/system-config/archive/master.tar.gz')
assert cmd.stdout == 'Not Found\n'
cmd = host.run('curl --insecure '
'--resolve gitea99.opendev.org:3081:127.0.0.1 '
'https://gitea99.opendev.org:3081/'
'opendev/system-config/archive/master.zip')
assert cmd.stdout == 'Not Found\n'
def test_ondisk_logs(host):
mariadb_log = host.file('/var/log/containers/docker-mariadb.log')
assert mariadb_log.exists