taskSavane - Tasks: task #4126, filter spam with spamassassin

 
 
Show feedback again

You are not allowed to post comments on this tracker with your current authentification level.

task #4126: filter spam with spamassassin

Submitted by:  Mathieu Roy <yeupou>
Submitted on:  Mon 13 Nov 2006 06:54:28 PM UTC  
 
Should Start On: Sun 12 Nov 2006 11:00:00 PM UTCShould be Finished on: Mon 27 Nov 2006 11:00:00 PM UTC
Category: TransversalStatus: Done
Priority: 3 - NormalPlanned Release: 3.0
Assigned to: NoneOpen/Closed: Closed
Privacy: PublicFor/By: CERN

(Jump to the original submission Jump to the original submission)

Tue 21 Nov 2006 05:13:12 PM UTC, SVN revision 6426:

Add last part of task #4126: the script that will use sa-learn so spamd use bayesian filters

(Browse SVN revision 6426)

Mathieu Roy <yeupou>
Project Administrator
Tue 21 Nov 2006 05:12:02 PM UTC, SVN revision 6425:

Add last part of task #4126: the script that will use sa-learn so spamd use bayesian filters

(Browse SVN revision 6425)

Mathieu Roy <yeupou>
Project Administrator
Mon 20 Nov 2006 11:14:17 PM UTC, SVN revision 6409:

If content is unflagged as spam by admin, peon skip the spamcheck and directly send the notif (task #4126)

(Browse SVN revision 6409)

Mathieu Roy <yeupou>
Project Administrator
Mon 20 Nov 2006 10:54:27 PM UTC, comment #13:

The currently done part can already be tested. The remaining learn script for spamassassin bayesian filtering is not essential.

I wont test it at Gna! because so far we have no spam problems here yet and so there is no reason to delay posts from anonymous and increase spamd load. If you run an installation affected by spam, you should try.

Mathieu Roy <yeupou>
Project Administrator
Mon 20 Nov 2006 10:49:14 PM UTC, SVN revision 6407:

Remains the last step of task #4126: the backend script that will make spamassassin learn from flagged spam

(Browse SVN revision 6407)

Mathieu Roy <yeupou>
Project Administrator
Mon 20 Nov 2006 10:31:20 PM UTC, SVN revision 6406:

Add SendTrackersDelayedNotification() to deal with delayed notifs (task #4126): able to send notifs exactly how the frontend would have sent it (FIXME: MailSend() is unable to force the msg-id)

(Browse SVN revision 6406)

Mathieu Roy <yeupou>
Project Administrator
Mon 20 Nov 2006 09:39:43 PM UTC, SVN revision 6405:

Store delayed mail notif (spamcheck) in a table, so the backend can send clean notifs without duplicating code (task #4126)

(Browse SVN revision 6405)

Mathieu Roy <yeupou>
Project Administrator
Mon 20 Nov 2006 06:54:15 PM UTC, SVN revision 6400:

Add second part of the backend regarding task #4126: monitor the queue, remove stuff in there for too long (> 15 minutes)

(Browse SVN revision 6400)

Mathieu Roy <yeupou>
Project Administrator
Mon 20 Nov 2006 06:41:34 PM UTC, SVN revision 6399:

Still on the of the backend regarding task #4126: remove the item from the queue when beginning to process it, to avoid the monitor to thinks it is not about to be handled soon; it also permits to check whether it was not already removed from the queue by the monitor at the exact time we start processing it

(Browse SVN revision 6399)

Mathieu Roy <yeupou>
Project Administrator
Mon 20 Nov 2006 06:22:57 PM UTC, SVN revision 6398:

Add first part of the backend regarding task #4126: now handle items put in the queue by the frontend and pass them to spamc. Map spamassassin score to something sensible in regard of savane own scores:
# Found out a reasonable score for the item, considering that the current
# score equal to default + 5.
# if the score is below 0, score = 0 (-8)
# if the score is between 0 and 2, unchanged (-5)
# if the score is between 2 and 3, increment of one (-4)
# if the score is between 3 and 5, increment of two (-3)
# if the score is between 5 and 7, increment of three (-2)
# if the score is between 7 and 9, increment of four (-1)
# superior to 9 , increment of five (untouched)

(Browse SVN revision 6398)

Mathieu Roy <yeupou>
Project Administrator
Mon 20 Nov 2006 04:36:51 PM UTC, comment #6:

> guess we need to make configurable to host for spamc to
> connect) content that is in the queue, starting
> from anonymous pos


Actually, it could makes more sense to process first logged-in posts, however, as spam are more likely to be in these, we start with these ones.

Mathieu Roy <yeupou>
Project Administrator
Mon 20 Nov 2006 03:59:32 PM UTC, SVN revision 6397:

Do the frontend part of task #4126: put newly posted items in the spamcheck queue, according to conf (all or anonymous post - project members are always excluded from this process)

(Browse SVN revision 6397)

Mathieu Roy <yeupou>
Project Administrator
Mon 20 Nov 2006 03:58:20 PM UTC, SVN revision 6396:

Do the frontend part of task #4126: put newly posted items in the spamcheck queue, according to conf (all or anonymous post - project members are always excluded from this process)

(Browse SVN revision 6396)

Mathieu Roy <yeupou>
Project Administrator
Fri 17 Nov 2006 06:00:56 PM UTC, SVN revision 6362:

Add new spamcheck queue table (task #4126)
Wondering about:
-- should we put the summary in this database or extract it when checking?
-- * duplicating data means we do bigger inserts
-- * extracting when checking means doing more SQL requests

(Browse SVN revision 6362)

Mathieu Roy <yeupou>
Project Administrator
Fri 17 Nov 2006 05:12:02 PM UTC, SVN revision 6361:

Add spamassassin conf options (task #4126)

(Browse SVN revision 6361)

Mathieu Roy <yeupou>
Project Administrator
Fri 17 Nov 2006 08:18:51 AM UTC, comment #1:

So I thought yesterday about this. The way I see it:

  • Site admin configure what kind of content should be checked: anonymous post or anonymous + logged in post
  • On the frontend Each time a content to be checked is posted:
    • it get an additional temporary) +5 in spamscore. So no notifications are sent and the comment is virtually invisible. Project admins can however already unflag it as spam, if it is legitimate content.
    • it is added in a new table trackers_spamcheckqueue that record the timestamp of the add
  • On the backend, there is
    • a script (sv_spamcheck_checker ?) that will pass to spamc (I guess we need to make configurable to host for spamc to connect) content that is in the queue, starting from anonymous post. Depending on the spamscore, it will remove the temporary +5 and will had itself additional points (filling also the trackers_spamscore table, unlikely with temporary spamscore; otherwise, the added score would be lost next time it is flagged)
    • a script (sv_spamcheck_monitor) will check every 10 minutes the queue. If it found items that are in the queue for more than, say 20 minutes, then it will remove their temporary spamscore and remove them from the queue, assuming that the server cannot handle the charge and need some air.
    • a script (sv_spamcheck_learn) will check on a regular basis this list of caught spams (spamscore > 4), then it will learn for the bayesian checks. It will store a timestamp in /var/cache/savane so it wont recheck the same marked spam, unless /var/cache/savane is missing (what would be the case if it was moved to another box)
Mathieu Roy <yeupou>
Project Administrator
Mon 13 Nov 2006 06:54:28 PM UTC, original submission:

We intend to provide a wrapper to spamassassin:

  • learn with bayesian filters from items marked as spam by users
  • run in the background on a regular basis and do itself a score increase

When this is achieved:

  • putting anonymous post in a temporary state, having spamassassin to precheck them, and then too show them really only if they passed the test, otherwise mark them directly as spam ; send notifications only when the test was passed
Mathieu Roy <yeupou>
Project Administrator

 

No files currently attached

 

Depends on the following items: None found

Digest:
   task dependencies.

 

Carbon-Copy List
  • -unavailable- added by yeupou (Submitted the item)
  •  

    Do you think this task is very important?
    If so, you can click here to add your encouragement to it.
    This task has 0 encouragements so far.

    Only logged-in users can vote.

     

    Please enter the title of George Orwell's famous dystopian book (it's a date):

     

     

    Follow 7 latest changes.

    Date Changed By Updated Field Previous Value => Replaced By
    Tue 21 Nov 2006 05:16:12 PM UTCyeupouStatusReady For Test=>Done
      Open/Closed-Automatic update due to transitions settings-=>Closed
    Mon 20 Nov 2006 10:54:43 PM UTCyeupouCarbon-CopyRemoved savane-dev=>-
    Mon 20 Nov 2006 10:54:27 PM UTCyeupouStatusNone=>Ready For Test
      Carbon-Copy-=>Added savane-dev
    Fri 17 Nov 2006 08:18:51 AM UTCyeupouPlanned Release2.1=>3.0
    Mon 13 Nov 2006 06:58:46 PM UTCyeupouDependencies-=>task #3776 is dependent
    Show feedback again

    Back to the top


    Powered by Savane 3.1-cleanup