January 28, 2022 - No Comments!

Association Rule Mining Support and Confidence Example

The `Iris` record is loaded with the Retrieve operator. A breakpoint is inserted here so that you can view the ExampleSet. As you can see, the ExampleSet has real attributes. Therefore, the FP growth operator cannot be applied directly to it because the FP growth operator requires all attributes to be binomial. We need to perform preprocessing to get the ExampleSet in the desired form. The Diskretize by Frequency operator is applied to replace the actual attributes with nominal attributes. Then, the Nominal to Binominal operator is applied to replace these nominal attributes with binomial attributes. Finally, the FP Growth operator is applied to generate common item sets. The sets of common elements generated by the FP growth operator are provided to the Create Association Rules operator. The resulting mapping rules can be displayed in the Results workspace. Run this process with different values for different parameters to better understand this operator. Although the concepts behind association rules date back to earlier, the exploration of association rules was defined in the 1990s when computer scientists Rakesh Agrawal, Tomasz Imieliński and Arun Swami developed an algorithm-based method for finding relationships between items using point-of-sale (POS) systems.

By applying the algorithms to supermarkets, the scientists were able to discover links between different purchased items called association rules, and eventually use this information to predict the likelihood of different products being purchased together. It should be remembered that one of the disadvantages of confidence measurement is that it tends to distort the meaning of an association. To demonstrate this, we`ll go back to the main dataset to select 3 association rules that contain beer: If you`re using association rules, you`re probably just using support and trust. However, this means that you must simultaneously adhere to minimum custom support and minimal custom simultaneity. Typically, the generation of association rules is divided into two different steps that must be applied: K-optimal pattern recognition offers an alternative to the standard approach to learning association rules, where each model must appear frequently in the data. Searching for association rules can be computationally intensive and essentially involves finding all the sets of coverage attributes A and then testing whether the rule "A implies B" applies with sufficient certainty to a set of attributes B distinct from A. Efficiency gains can be achieved by combinatorial analysis of information from previous runs to remove unnecessary rules from the candidate rule list. Another very effective approach is to use random samples from the database to estimate whether a set of attributes is covered or not.

In a large data set, it may be necessary to determine which rules the user is interested in. One approach is to use models to describe the form of interesting rules. A first use (circa 1989) of minimal support and trust to find all association rules is the Feature Based Modeling Framework, which found all rules with s u p p ( X ) {displaystyle mathrm {supp} (X)} and c o n f ( X ⇒ Y ) {displaystyle mathrm {conf} (XRightarrow Y)} larger than custom constraints. [24] Analyze the database and calculate each candidate`s support for common item sets. Programmers use association rules to create programs that enable machine learning. Machine learning is a type of artificial intelligence (AI) that attempts to create programs that can become more effective without being explicitly programmed. If you find that sales of items that exceed a certain share tend to have a significant impact on your profits, you can use that share as a support threshold. You can then identify item sets with support values above this threshold as meaningful item sets. The analysis of the elements that often occur together is supported by two measures that give an indication of the value of a model and should be further studied. Taking into account a database containing information on asset replacement, the analysis can be carried out to determine which groups of assets are normally replaced together and which assets should be present in the group. Missing resources in the system can be detected by requesting resources at each physical location from the GIS and comparing this list to the group`s resource list.

Two measures that support this analysis to determine asset groups are trust and support. For example, a confidence value of 70% indicates that if asset A is replaced, there is a 70% probability that asset B will also be replaced. The support metric indicates the percentage of transactions in the database that support this statement. For example, a support score of 20% means that of all transactions in the database, 20% showed that assets A and B were replaced together. Typically, thresholds are set to display only the most meaningful models, rather than all models with low trust and support values. In addition, the set of articles Y = { m i l k , b r e a d , b u t t e r } {displaystyle Y={mathrm {milk,bread,butter} }} has a support of 1 / 5 = 0.2 {displaystyle 1/5=0.2}, as it also appears in 20% of all transactions. Here are some real use cases for association rules: To illustrate the concepts, let`s use a small example from the supermarket field. Table 2 shows a small database of items, with a value of 1 in each entry representing the presence of the item in the corresponding transaction and a value of 0 representing the absence of an item in that transaction.

The set of articles is I = { m i l k , b r e a d , b u t t e r , b e r , d i a p e r s , e g g s , f are you i t } {displaystyle I={mathrm {milk, bread, butter, beer, diapers, eggs, fruits} }}. This information can be used as a basis for making decisions regarding marketing activities such as prices. B advertising or product placements. In addition to the shopping cart analysis example above, association rules are now used in many application areas, including web usage exploration, intrusion detection, and bioinformatics. Shaded lines represent sets of items whose number of supports does not meet the threshold requirement. Rules of association that predict multiple consequences should be interpreted with sufficient caution. For example, in the weather data in Table 1.2, we saw this rule: Add the sets of elements that meet the minimum support requirement to Fk+1. A mapping rule maps the values of a particular set of attributes to the value of another attribute outside that set. In addition, the rule can contain information about how often attribute values are linked to each other.

For example, such a rule could state that "75% of men aged 50 to 55 complete supplementary pension plans in management positions." Let X , Y {displaystyle X,Y} of the item sets, X ⇒ Y {displaystyle XRightarrow Y} is an association rule, and T is a set of transactions from a particular database. To order the most interesting rules, the lift index is also used to measure the (symmetric) correlation between the precursor and the later extracted rules. Elevation of an association rule (X → Y) = c(X → Y)}/s(Y) = s(X → Y)/s(X)s(Y), where s(X → Y) and c(X → Y) are respectively the support and trust of the rule, and s(X) and s(Y) are the supports of the preceding and subsequent rules. If Lift (X,Y) = 1, the sets of elements X and Y are not correlated, that is, they are statistically independent. Lifting values below 1 show a negative correlation between element sets X and Y, while values above 1 indicate a positive correlation. Interest in rules with a buoyancy value close to 1 may be marginal. A classic example of exploring association rules refers to a relationship between layers and beers. The example, which appears to be fictitious, claims that men who go to a store to buy diapers are also likely to buy beer. The data that suggests this might look like this: Association rules are "if-then" statements that help show the probability of relationships between data elements within large data sets in different types of databases. Association Rule Mining has a number of applications and is often used to detect revenue correlations in transactional data or in medical records.

The extraction of frequently closed sets of items was proposed in Pasquier, Bastide, Taouil and Lakhal [PBTL99], where an Apriori-based algorithm called A-Close was presented for such mining. CLOSET, an efficient closed object extraction algorithm based on the method of frequent model growth, was proposed by Pei, Han and Mao [PHM00]. CHARM by Zaki and Hsiao [ZH02] has developed a compact vertical TID list structure called Diffset, which only records the difference in a candidate model`s TID list compared to its prefix model. .

Published by: gianni57

Comments are closed.