CiviCRM GSoC issueshttps://lab.civicrm.org/community/gsoc/-/issues2021-04-11T20:50:28Zhttps://lab.civicrm.org/community/gsoc/-/issues/11Machine Learning Tool for Email Spam Score Calculator2021-04-11T20:50:28ZPratik10100Machine Learning Tool for Email Spam Score CalculatorHello @JoeMcLaughlin and potential mentors,
This project aims to revolutionize the way we engage and interact with email. Email marketing sending multiple emails as part of a campaign is already enough of a challenge. According to some...Hello @JoeMcLaughlin and potential mentors,
This project aims to revolutionize the way we engage and interact with email. Email marketing sending multiple emails as part of a campaign is already enough of a challenge. According to some estimation, about one-fifth of permission-based emails sent by legitimate marketers land in recipients' spam folders. Without using some spam tester tool, emails stand an even greater chance of being marked as spam.
So I'm trying to develop a tool that produces a score of text provided to the model. If the score goes above five on the scale of ten, it will be marked as spam; knowing this beforehand helps a lot to the mass mailer, and then needy textual changes can be made.
I have gone through CiviCRM stack exchange there; I have seen people struggling with low opening rates in the final part of A/B testing and thinking spamminess is one factor. So by having a tool that deals with spam classification would be helpful and handy to the mass mailer.
I believe CRM is known to have a big data hub, so it makes sense to try and utilize those insights for machine learning in creating a tool that helps the user in the domain of email marketing.
Two significant steps involved in building a new open-source email spam score calculator are.
- Experimenting with various spam classification techniques to figure out which one provides a required balance of precision (the fraction of results classified as positive, which are indeed positive) and recall(the fraction of all positive results which were detected).
- Providing an independent web service (like ORES) that can entertain the request to calculate the spamminess(score) of the email.
I am looking forward to know your opinion about this project, and I will soon come up with a detailed proposal that will cover all the algorithmic and implementation parts.
Thanks 😊https://lab.civicrm.org/community/gsoc/-/issues/6Search Builder Level Up2021-03-30T18:06:28Zsarvesh21Search Builder Level UpSearch Builder Level Up ([Link-to-Idea](https://lab.civicrm.org/community/gsoc/blob/master/projects.md#search-builder-level-up))
1) Easy To Use Interface (with proper tooltips and instructions)
- A demo video can be made like https://w...Search Builder Level Up ([Link-to-Idea](https://lab.civicrm.org/community/gsoc/blob/master/projects.md#search-builder-level-up))
1) Easy To Use Interface (with proper tooltips and instructions)
- A demo video can be made like https://www.youtube.com/watch?v=SZAo6UBqi8Y
- A tutroial using [this](https://civicrm.org/extensions/civitutorial)
2) All the Basic Queries.
3) Nested AND/OR grouping Querys
- Presently Only Conditions can be grouped by AND, OR . We cannot nest them like
Presently - (condition1 AND condition2) or (condition3 AND condition4)
To Work On - (condition1 AND (condition2 or condition3)) or (condition4 AND condition5)
4) If time permits ability to specify joins
Thanks
Sarveshhttps://lab.civicrm.org/community/gsoc/-/issues/4Gitlab to Github integration + Github issues.2021-03-30T18:05:37ZrexsteroxyGitlab to Github integration + Github issues.I have a keen interest on working on the above project,
But i have the following questions.
1.Do the organization have an existing probot forks? If yes,how can a student have access to them.****
2. A more detailed Clarification on "Add...I have a keen interest on working on the above project,
But i have the following questions.
1.Do the organization have an existing probot forks? If yes,how can a student have access to them.****
2. A more detailed Clarification on "Adding specific comments to a Github Pull Request will result in certain tags being added or removed".https://lab.civicrm.org/community/gsoc/-/issues/2Machine Learning for Fraud Detection - Proposal2019-03-14T11:10:08ZsaurabhbatraMachine Learning for Fraud Detection - Proposal### Profile Information
* **Name**: Saurabh Batra
* **Mattermost nick**: saurabh
* **Web Page**: http://saurabhbatra96.github.io/
* **Resume**: http://saurabhbatra96.github.io/public/cv.pdf
* **Location**: India
* **Typical working hours...### Profile Information
* **Name**: Saurabh Batra
* **Mattermost nick**: saurabh
* **Web Page**: http://saurabhbatra96.github.io/
* **Resume**: http://saurabhbatra96.github.io/public/cv.pdf
* **Location**: India
* **Typical working hours**: 12 PM - 10 PM UTC+5:30
### Synopsis
The project aims to build a new open-source fraud detection system. The 2 major steps involved are:
- experimenting with various anomaly detection techniques (see the ML section at the end) to figure out which one provides a required balance of precision (% of detected frauds which are actually fraudulent) and recall (% of all frauds detected);
- providing the technique as an independent web service (like https://www.mediawiki.org/wiki/ORES) which can entertain requests to ascertain the authenticity of transactions.
**Stretch Goals**
- The web service uses the feedback from its decisions (new correct detection/wrong detection corrected by a human) to train the underlying model, improving its accuracy in the future.
- Use something like LIME (https://github.com/marcotcr/lime) to provide a justification as to why our classifier chose to mark a transaction as fraud.
- CiviCRM extension to interface directly with the web service.
**Previous experience** I've already worked with Eileen for about an year back in 2016 which included a GSoC project for CiviCRM and have discussed the proposal with Adam.
**Possible Mentor(s)** Eileen McNaughton , Adam Wight
### Timeline
I’m going to divide the work into 2 major phases:
**Experimentation phase (May - mid June)**
The experimentation phase will majorly consist of trying out the proposed techniques on the current dataset and comparing how they perform against each other and against the current fraud detection system. Tentative tasks include:
- **(Week 1)** Dataset procurement and cleaning
- **(Week 1-2)** Reading up and applying feature selection to the dataset
* **(Week 2-5)** Reading up and applying anomaly detection techniques; comparing precision and recall scores; deciding on the best technique for the web service
**Architectural phase (June - August)**
The architectural phase involves integrating the best-performing technique with a web service. Tentative tasks include:
- **(Week 6)** API design for the web service
- **(Week 6-7)** Setting up the bare-bones architecture for the web service
- **(Week 7-8)** Implement the API (or at least the important parts of it)
* **(Week 9-10)** Integrate the API into WMF transaction workflow
### About Me
I'm currently a final year B.Tech. Computer Science & Engineering at IIT Guwahati, India. I started contributing to CiviCRM in 2015 and ended up doing a GSoC project with Eileen in 2016. This project is going to be priority number one during my summer break as I don't have any pressing commitments during the same time.
### Past Experience
For the past year I've been working on a thesis project on data science and information retrieval which involves machine learning techniques similar to the ones I want to use here. In addition to that I have considerable experience working with open source organizations - I was an active contributor to CiviCRM and a GSoC participant back in 2016. Also, I'm comfortable adapting to new tech stacks and getting "code-ready" in a short period of time thanks to my internship at Google in 2017.
### Other Info
#### Machine Learning Techniques for Anomaly Detection
* **Autoencoders**: Autoencoders are neural nets that try to learn the underlying patterns in data in an unsupervised way. Outliers to these patterns are detected as anomalies. More details: https://shiring.github.io/machine_learning/2017/05/01/fraud.
* **Logistic Regression**: Logistic regression tries to find the best (yet reasonable) fitting model to describe the relationship between a dependent variable (fraud/not fraud) and a set of independent variables (features). Outliers to these patterns are detected as anomalies.
* **Supervised Learning using Classifiers**: The problem with using supervised learning is that if for ex. a SVM guessed that transactions were never fraudulent, it would’ve been correct ~99.6% of the times on WMF’s transactions from 2017. A workaround is that we under-sample normal transactions such that frauds are not underwhelmingly less as compared to number of normal transactions. An ensemble of classifiers (think something which combines the outputs of multiple classifiers and then classifies the transaction as fraud/not fraud) should work even better than singular classifiers.
#### Additional Links
- https://blog.codecentric.de/en/2017/09/data-science-fraud-detection/
- http://ieeexplore.ieee.org/document/8123782/?reload=true
- An interesting one (just read the dataset description and conclusions if you don’t want to go through the entirety of it): http://www.wipro.com/documents/comparative-analysis-of-machine-learning-techniques-for-detecting-insurance-claims-fraud.pdf
- Radar is a proprietary software that does exactly what we’re trying to achieve: https://stripe.com/radar
- https://iwringer.wordpress.com/2015/11/17/anomaly-detection-concepts-and-techniques/