pkgsrc/pkgtools/distlint/files
..
distlint.py
README.md

NetBSD: README.md,v 1.1 2022/08/20 13:32:06 rillig Exp

Introduction

Distlint ensures that the distfiles on the TNF servers conform to the license requirements.

Distfiles distributed under the GPL must be kept available for as long as a binary package based on this distfile is distributed, plus 3 years.[citation needed]

Distfiles from packages with NO_SRC_ON_FTP must not be available at all.

Edge case: Imagine a package having NO_SRC_ON_FTP and multiple distfiles. Some of them must not be available, the others have license GPL.

Configuration

Distlint is configured by the distlint.conf file, which contains one or more distdir sections. Each such section configures how a single distdir is related to the directories for pkgsrc installations and binary package directories:

# Each distdir can be populated by several pkgsrc versions, such as 
# pkgsrc-current and the quarterly branches.
# Each distdir can be the source for multiple distributions of binary
# packages, for example for different platforms. 

distdir /usr/pkgsrc/distfiles
        database /var/db/distlint/main 
        pkgsrc /usr/pkgsrc-current
        pkgsrc /usr/pkgsrc-2022Q2
        pkgsrc /usr/pkgsrc-2022Q1
        packages /usr/pkgsrc/packages
        packages /usr/pkgsrc/current-packages

distdir /pub/pkgsrc-archive/distfiles
        database /var/db/distlint/archive                
        pkgsrc /pub/pkgsrc-archive/pkgsrc
        packages /pub/pkgsrc-archive/packages       

Infrastructure overview

Approach

Distlint maintains a database of distfile requirements. The requirements are collected from all pkgsrc branches that are either current or in the archive.

Examples of database entries

$distfile must not be in distfiles, because on $updated_at, it belonged to package $pkgname in $pkgpath, which was marked as NO_SRC_ON_FTP because $no_src_on_ftp.

$distfile must be kept in distfiles until $keep_until, because on $updated_at, it belonged to package $pkgname, which is published at $publish_url and licensed under $license.

Implementation details

NO_SRC_ON_FTP

To find out whether a binary package has NO_SRC_ON_FTP, look at its +BUILD_INFO.

Find out the distfiles of a binary package

For most binary packages, the file +BUILD_VERSION contains the CVS revision information of the distinfo file.

Some packages use DISTINFO_FILE to refer to a distinfo file outside their PKGPATH. The CVS revision information for these distinfo files is not recorded anywhere.

Some packages have no distinfo file at all because they are self-contained. Example: pkgtools/lintpkgsrc.

Whether a binary package had a distinfo file or not is not visible from looking at the binary package alone.

Using the CVS revision information of the distinfo file, its file list can be retrieved from CVS.

Quick hacks

Find distfiles with NO_SRC_ON_FTP

This program finds most distfiles with NO_SRC_ON_FTP that are referenced from the current pkgsrc tree.

Shortcomings:

  • It does not find distfiles from stable pkgsrc branches.
  • It does not find distfiles from previous versions of the packages.
  • It does not find distfiles from packages with DISTINFO_FILE.
ssh ftp.netbsd.org
cd /pub/pkgsrc/current/pkgsrc

for pkgpath in $(grep -r NO_SRC_ON_FTP . 2>/dev/null | cut -d/ -f2-3); do

  if [ -f "$pkgpath/distinfo" ] &&
    ! grep -r MASTER_SITE_LOCAL "$pkgpath" >/dev/null 2>&1; then

    sed -n 's,^Size (\(.*\)) =.*$,\1,p' "$pkgpath/distinfo" |
      while read distfile; do
        if [ -f "/pub/pkgsrc/distfiles/$distfile" ]; then
          echo "$distfile"
        fi
      done
  fi
done | sort