Skip to content

Add ParallelFileTask for parallel file processing#11

Merged
jlegrand62 merged 1 commit into
devfrom
feature/parallel_file_task
May 7, 2026
Merged

Add ParallelFileTask for parallel file processing#11
jlegrand62 merged 1 commit into
devfrom
feature/parallel_file_task

Conversation

@jlegrand62
Copy link
Copy Markdown
Member

Summary
Introduces ParallelFileTask, a new RomiTask subclass that enables parallel processing of files using a thread pool and progress tracking.

Key Changes

  • Imported concurrent.futures in task.py.
  • Added ParallelFileTask with new Luigi parameters:
    • query: pattern to locate input files.
    • n_workers: number of worker threads (-1 uses the default pool size).
    • parallel: flag to enable/disable parallel execution.
  • Defined an abstract method f for subclasses to implement the per‑file operation.
  • Implemented run method that:
    • Retrieves input files via query.
    • Copies metadata from each input file to the corresponding output.
    • Executes processing sequentially when parallel=False.
    • Executes processing in parallel with ThreadPoolExecutor and tqdm when parallel=True.
  • Added helper _process_file to encapsulate per‑file logic and metadata handling.

- Import `concurrent.futures` in `task.py`
- Introduce new class `ParallelFileTask` extending `RomiTask`
  - New luigi parameters: `query`, `n_workers`, `parallel`
  - Abstract method `f` to be implemented by subclasses
  - `run` method:
    - Retrieves input files with `query`
    - Copies metadata from input to output files
    - Supports sequential processing when `parallel` is disabled
    - Enables parallel execution using `ThreadPoolExecutor` and `tqdm` when enabled
    - Handles `n_workers` = -1 to use default thread pool size
- Uses helper `_process_file` to encapsulate per‑file logic and metadata handling
@jlegrand62 jlegrand62 requested a review from ArthurLuciani2 May 7, 2026 08:04
@jlegrand62 jlegrand62 self-assigned this May 7, 2026
@jlegrand62 jlegrand62 added the enhancement New feature or request label May 7, 2026
@jlegrand62 jlegrand62 merged commit 06006b3 into dev May 7, 2026
1 check passed
@jlegrand62 jlegrand62 deleted the feature/parallel_file_task branch May 7, 2026 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants