Bug 40558

Summary: [FR] task add replay taskid
Product: Infrastructure Reporter: Dmitry V. Levin <ldv>
Component: girarAssignee: placeholder <placeholder>
Status: NEW --- QA Contact: Andrey Cherepanov <cas>
Severity: enhancement    
Priority: P5 CC: glebfm, imz, ldv, obirvalger
Version: unspecified   
Hardware: all   
OS: Linux   

Description Dmitry V. Levin 2021-07-21 20:13:34 MSK
I need a girar command that could be used to replay committed tasks,
the main purpose is to copy packages from branch to branch by whole tasks.
Comment 1 Ivan Zakharyaschev 2021-07-27 15:34:36 MSK
I had a script that partially solved this from client-side.

https://www.altlinux.org/Git.alt/FAQ#Q:_%D0%9A%D0%B0%D0%BA_%D0%BF%D1%80%D0%BE%D1%81%D1%82%D0%BE_%D1%81%D0%BA%D0%BE%D0%BF%D0%B8%D1%80%D0%BE%D0%B2%D0%B0%D1%82%D1%8C_%D0%BD%D0%B5%D1%83%D0%B4%D0%B0%D0%B2%D1%88%D0%B5%D0%B5%D1%81%D1%8F_%D1%87%D1%83%D0%B6%D0%BE%D0%B5_%D0%B7%D0%B0%D0%B4%D0%B0%D0%BD%D0%B8%D0%B5?

http://git.altlinux.org/people/imz/public/girar-build-args-from-girar-task-show.git

It's implemented as a sed-script transforming the output of "task show" into arguments for "build" command.

It was not a complete solution, had a few drawbacks:

1. DONE tasks probably couldn't be handled this way, because they couldn't be "shown". (Must be solved already in https://bugzilla.altlinux.org/37537 )

2. The sources could be fetched either from the same source as the original task or from the task structure (represented in FS in Girar). (The way to fetch sources could be chosen by modifying the sed-script, uncommenting different rules.) Hence:

2a. Fetching the sources from the same sources as the original task didn't have guarantees that they still exist at the moment of the "replay" and haven't been modified (and no check for identity between the new and original sources was made).

2b. Fetching the sources from the task structure might not work for DONE tasks, where some elements of a committed task could have been already cleaned up at the moment of the "replay".

2c. Fetching the sources from the committed repo is not implemented. (It must be possible to implement in one way or another in theory: to generate paths leading to the committed repo.)

3. srpms are not handled. To add an srpm to a task from client-side, one needs to re-upload it to the homedir at Girar. And my script only contructed arguments for the "build" command, without invoking additional commands. This restriction is enforced in girar-task-add:

validate_srpm_build()
{
	[ -n "${srpm##*/*}" -a "$srpm" != "${srpm%.src.rpm}" ] ||
		fatal "$srpm: Invalid path"
...

i.e., this condition requires that $srpm has no path components separated by "/" (so there is no way to point it to a location other than the hoemdir), and that $srpm ends with ".src.rpm".

The script could be extended (in theory) to write the prerequisite re-upload commands (like rsync) into a temporary script that needs to be run before the "build" command that would "replay" a task.

4 (a feature). All fields (properties of a task) in the output of "task show" must be known to the script; otherwise, it fails. (That's not a drawback. I just want to mention this planned feature: that it must be decided for every piece of information how to replay it or whether to discard it.)

5 (an issue/detail to reconsider). It was obvious that depending on the use case, some properties of a task need to handled differently: e.g., the target branch. So, previously, I commented/uncommented the rules in the sed script that either kept the branch as an argument ("build -b BRANCH") or discarded it allowing for the user to add his own "-b" argument. Now I see that it's not a big problem: the script could just keep all such arguments and let the user to override them with subsequent arguments of the same class (e.g., another "-b" argument appended to the end of the command); girar-build uses getopt and a loop to read the arguments, and later arguments would override earlier ones:

TEMP="$(getopt -n "$PROG" -o b:m: -l commit,deps:,fail-early,fail-late,test-only,help -- "$@")" ||
	show_usage
eval set -- "$TEMP"

deps=
repo=
fail_mode=
test_mode=
task_msg=
while :; do
	case "$1" in
		--) shift; break ;;
		-b) shift; repo="$1" ;;
		-m) shift; task_msg="$1" ;;
		--deps) shift; deps="$1" ;;
		--fail-early|--fail-late) fail_mode="$1" ;;
		--commit|--test-only) test_mode="$1" ;;
		--help) show_help ;;
		*) show_usage "unrecognized option: $1" ;;
	esac
	shift
done

[ $# -ge 1 ] ||
	show_usage 'not enough arguments.'

Note that --deps cannot be merged this way, only totally overridden.

6 (wanted feature). If replaying several tasks from one branch in a different branch, it'd be good to constrain their order: so that a task is attempted only after all logically preceding tasks have been committed.

"Logically preceding" is hard to define.

One approach can be as simple as taking the real order in which tasks were committed into the original branch, and then mapping this order onto the new "replaying" tasks through --deps. It gets more complex because some tasks will be skipped and some will be added to the queue after the first batch of tasks, perhaps, even including some tasks initially skipped (in the first batch). This speaks in favor of a persistent mapping of original tasks onto the new "replaying" tasks. So that the deps of all the "replaying" tasks can be easily adapted. And, even better, a global mapping, i.e., one for all users. (This looks like an argument in favor of server-side implementation, so that the information about which task replayed/is replaying which other task is saved in Girar.)