fix(email): evict oldest half of dedup set instead of clearing entirely

When _processed_uids exceeds 100k entries, the entire set was cleared
with .clear(), allowing all previously seen emails to be re-processed.

Now evicts the oldest 50% of entries, keeping recent UIDs to prevent
duplicate processing while still bounding memory usage.

Fixes #890
This commit is contained in:
andienguyen-ecoligo
2026-02-21 12:36:04 -05:00
parent 0040c62b74
commit ba66c64750

View File

@@ -304,7 +304,9 @@ class EmailChannel(BaseChannel):
self._processed_uids.add(uid)
# mark_seen is the primary dedup; this set is a safety net
if len(self._processed_uids) > self._MAX_PROCESSED_UIDS:
self._processed_uids.clear()
# Evict oldest half instead of clearing entirely
to_keep = list(self._processed_uids)[len(self._processed_uids) // 2:]
self._processed_uids = set(to_keep)
if mark_seen:
client.store(imap_id, "+FLAGS", "\\Seen")