Mining Your Business

32 | What is conformance checking? With Boudewijn Van Dongen, professor at Eindhoven University of Technology

March 30, 2022 Mining Your Business Episode 32
Mining Your Business
32 | What is conformance checking? With Boudewijn Van Dongen, professor at Eindhoven University of Technology
Show Notes Transcript

Conformance checking is, regrettably, very often an omitted process mining technique. Is that rightfully so, though? To tell us more about why conformance checking is on the rise and why you should pay attention, we host Boudewijn van Dongen, professor at Eindhoven University of Technology and co-author of the book Conformance Checking - Relating Processes and Models.

00:00

Patrick:

We are Patrick and Jakub and this is the Mining Your Business podcast - a show all about process mining, data science and advanced business analytics. Jakub, how are you doing today?

 

00:09

Jakub:

I'm doing quite well, Patrick. Thank you.

 

00:11

Patrick:

Boudewijn van Dongen, a professor at the University of Eindhoven and chair of the IEEE Task Force, joins us on the podcast today to finally explain to us what is conformance checking, what it isn't, and the various ways it can be used to discover hidden errors in your process. Let's get into it.

 

00:37

Jakub:

I will start this episode off with a foreword by Wil van der Aalst, one of our previous guests on the show, to the book Conformance Checking Relating Processes and Models by authors - Joseph Carmona, Boudewijn van Dongen, Andreas Solti, and Matthias Weidlich. Here we go. Conference checking is an important but also challenging topic in process mining. Most people who see process mining for the first time are dazzled by the process discovery capabilities of today's process mining pools. However, when people are really starts to use process mining, more detailed questions emerge and is no longer sufficient to look at fancy process, diagrams composed of boxes and arrows. Conformance checking will be a focal point of today's discussion. However, not the only one. With today's guest professor Boudewijn van Dongen from University of Eindhoven. Boudewijn, welcome to our show.

 

01:30

Boudewijn van Dongen:

Thank you.

 

01:31

Jakub:

It's really a pleasure to have you here. And I guess my first question really would be you are currently a full professor at the university in Eindhoven. And my question would be what does it actually look like and what is your area of expertise?

 

01:47

Boudewijn van Dongen:

Thank you, Jakub, for inviting me and, Patrick, for hosting me today. So I'm very happy to be here. So indeed, I'm at Eindhoven University and I'm the exceptional I'm the odd one out here. So I came to this university in 1998 as a student and I never left. So from an academic perspective, that is sort of the worst career path you can have, but I'm really enjoying myself in Eindhoven. So my group is a small research group focusing on all aspects of process analytics, both in the foundational theoretical results that we are developing. For example, in the area conformance checking, but also on specific application areas where we have industry in Eindhoven area, interested are in actually co-developing and co-creating the technology of the future in process mining.

 

 

02:53

Patrick:

So can I ask what subjects do you actually teach at Eindhoven?

 

02:57

Boudewijn van Dongen:

So, so we have a couple of courses. One of the courses that we are responsible for in our research group is a foundational course on data analytics that starts in the first year. Bachelor of all our university programs. So over 2000 students a year participate in that. That's the first time our students get a glimpse on what data science is actually about. And the key point is that we try to teach our students that data science is not just about changing or improving the error by a small margin, but it's actually about solving a real problem for a company. And we have various courses that have this kind of background and all the way into the master, where we teach advanced process mining topics such as real time process mining. So, doing process mining, but then in an online setting of courses, specifically on discovery, specifically on conformance checking, on health care applications, which is a very dedicated topic. So these are the sort of topics that come back into our course program.

 

04:10

Jakub:

So we are a show dedicated to process mining, but neither myself or Patrick really studied process mining. We had two different career paths and somehow we just split us here or spit it us out here at the process mining. Can students actually apply in Eindhoven for a career path or ,let's say, a process mining architect, or is it just some kind of a a little deviation from the normal curriculum?

 

04:36

Boudewijn van Dongen:

No. Process mining is an important trajectory within our masters. So we have a dedicated master on data science and artificial intelligence. And within that master process mining is a part of the core curriculum for all the students and there is really a specialized track preparing students to becoming maybe not necessarily a process mining architect in the sense of software, but maybe becoming a process mining expert in translating questions from companies, from businesses into the question that you can answer with plus mining knowledge.

 

05:16

Patrick:

So how long have you been exposed to or when did you first learn about process mining?

 

05:22

Boudewijn van Dongen:

That's a funny story. So this happened in 2000-2001. I was a third year student and I was not the most active student. And that meant that at the time I was supposed to do an internship and to do an internship, you normally would have to reach out to a company and sign contracts and spend a couple of months inside but I just woke up one day and realized I did not take care of doing an internship, and I had to start one next week. So my girlfriend at the time, she had been in contact with Wil van der Aalst for a very long time about doing a project with him. And she told me- Wil is looking for a student to implement the Alpha algorithm. He must have told you about that in the podcast. And then I went to him and said, Look, I can do programming. If you want me to program an algorithm, I can do that for you. So he gave me this assignment and Wil has no idea about programming. So he expected this to last for a couple of months. But then I came back after a few weeks and said, Look, I implemented this algorithm. So now what? And that was actually the point where I learned that implementation requires testing because there was still quite a few bugs in the actual implementation. But at the same time, Wil challenged me to come up with new ideas, new extensions to the Alpha algorithm. That's when I first started sort of getting acquainted with doing research. I had no idea what it was at the time. And slowly in the course of these couple of months, I stuck with that research group so I kept on doing some implementation work. And finally, at some point, Wil suggested, can you come and do a PhD with me? And that's how I entered into the subject.

 

07:14

Jakub:

That sounds amazing. We did have Wil, I think the only thing he was mentioning was, you know, ideas coming up while having a beer. I'm not sure whether this was the case.

 

07:25

Boudewijn van Dongen:

At some point. Sure.

 

07:29

Jakub:

So, you know, one of my other question would be, since we have a lot of I would say also students who are at this point of their careers, but they are also considering, do I go to work in business or do I actually pursue this academical career? Since you mentioned that you were inspired by Wil and started to work on an algorithm with him. Can our listeners or generally students also reach out to you or some some acquaintances of yours to pursue some academic research in process mining field? And where would you point them to?

 

08:06

Boudewijn van Dongen:

I think every researcher in every university would be happy to talk to students who are interested in doing some sort of research oriented task or research assignments. So we have various opportunities for that in our university. But also, I'm sure all my colleagues share this. I understand it's a very difficult choice to make when you're 22-23 years old and you need to decide whether you stay at this university at a typically lower salary, spending four years writing a book or do you go to industry, especially in process mining, where you get a car and house and a salary that is three times that of your colleagues. That is a very tough decision to make. I too, I really understand that. I think it's worth staying at the university when you're young because the moment you step away, it will be far more difficult to ever go back. And you might regret that decision.

 

09:18

Jakub:

So can they reach out also at some point, maybe even to you and your team, if they are interested in some of the topics that we are going to discuss today?

 

09:25

Boudewijn van Dongen:

Of course. Of course. Yes.

 

09:27

Jakub:

Sounds great. Then you heard it if you're interested in pursuing a career in process mining in academia. You are always welcome. And, you know, there can be never enough of researchers on the topic. Maybe one comment from my side. Unfortunately, we don't get cars or houses, even though we do work in business.

 

09:47

Boudewijn van Dongen:

That's. Yeah, that's unfortunate.

 

09:50

Jakub:

Maybe we picked the wrong company, haha.

 

09:51

Patrick:

Haha, maybe, Jakub, we should think about this.

 

09:55

Jakub:

Anyhow, the main topic for today should be conformance checking and Boudewijn, I know you wrote a book on this. Maybe, could you give us a little introduction into what the book is really about? And later on, we'll get into the conformance checking itself because this is a topic that we haven't properly discussed on the show yet. And it feels from what I've been studying and as a part of preparation for this episode, that this is really something that we should devote our energies more towards to.

 

10:26

Boudewijn van Dongen:

The topic of conformance checking is something. When we started doing process mining, the idea was always that if you take an event log and you look at it with the right algorithm, you get a perfect picture of the process. But in practice, we soon realize that there is actually a difference between the model and the process itself. Every single model that is ever being made of a process and that can be a formal model in terms of Petri net, or out of direct succession, graph, or even a handbook. Every single model is always an abstraction of the reality of the process that's actually taking place. And in any practical situation, you have exceptions. You have cases where you have to work it on the system where you have to make sure that something just works because it has to work. You have an important customer for that customer you're willing to bend the rules a bit, right? So that is part of the process. It's not a part of the model. And in the early process mining research, the model was always seen as a synonym for the process, and very soon already was Anna Rozinat, who in her PhD looked at this relation, at a process model and an event law and said, let me try to capture quantify how well these two fit together. I thought she came up with metrics analogous to the data mining community where you talk about recoil and precision. She talked about fitness and precision, how much behavior that I've observed is actually explained by the model, and how much more behavior does this model have with respect to the observed behavior? Later it was Aria, a PhD student of Will and myself who develop this further and made this sort of computationally sound technique that works in the general case. So well, if you have a process model and an event log, I can tell you where and when there are deviations and we can explain exactly where this model does not agree with the data that was observed and how we can fix that either by labeling something in the log as - this probably didn't happen or it should not have happened or by adding something to the log and say, well, here that is a step that probably happened, but it wasn't probably recorded. The very simple concept and he did the groundwork on that. And that eventually led to the to the book that we wrote, I did not write this alone, not by a longshot. So I think Joseph and Matthias and Andreas deserve all the credit for doing that together. But then we try to take this one step further. So we said, Now let's look at process modeling in relation to this notion of a process - abstract, conceptual thing that everybody knows exists, but nobody knows how to exactly explain it versus a model that is in our book, a petri net, but can also be BPMN model and an event log. And that event log is the thing that of course we study in process mining extensively, but also there the idea of what is an event log is also shifting.

 

14:09

Jakub:

So basically, when it goes about the conformance chicken and I also will quote a book, your book now that conformance checking quantifies the relationship between data and a model in order to say something about the process, what can conformance checking actually say about the process?

 

14:31

Boudewijn van Dongen:

Well, maybe it's good to explain to you this bit in terms of an example. So I don't know if you've heard and if your listeners have heard of the BPI challenge, but this is a challenge that already for quite some time we publish a data set coming from a company, a real-life dataset, and we ask the research, but also the professional community to show what you can do with your techniques on that data. Can you answer the questions of this business log? And back in 2012, we published the dataset that came from a small financial institute in the Netherlands where consumers can ask for a small consumer credit. And a small consumer credit for the bank means an amount of a loan between five and €50,000. And this is a dataset that actually has a very structured process behind it. It's called agents. So people, mainly students that call customers after they fill in the website to ask for or to make offers to ask for details to come to an agreement on the terms of every loan. If we look at that event law, we can actually extract the model by hand. So there's not a single process mining tool that can do this fully, automatically, although one of our PhD students is getting close. But we have a model that is actually describing, quite accurately what that process should look like. And we validated that model also with the business owner. Is this indeed how you work? And he said, yeah, that's pretty much that describes how we think the process should go and now you can relate back the event data to that process model. And then we see two categories of deviations. And one category is around the making of offers. It is an activity in that log that says "offer sent back". And so that means the customer has sent back an offer because either the customer accepts the offer or the customer says, I don't want it, I want another one. And interestingly enough, that step is very often missing from the data. So then the process continues. New offers are being created. But the old one was never sent back. And sometimes it's even like the loan is accepted without this offer being sent back. That's very confusing and this happens a lot, really really a lot of times. So we talked about that with the business owner and then also some of the call agents, and they soon realized that this was actually a nice example where the call agents figured out that they should work around the system because the system was implemented in such a way that you could only have one offer live per application. And what your customers want is they want you to make two offers that they can compare. So what they did in practice, they printed an offer, they canceled it without it being sent back. They created a new one and printed that too, then sent two offers in one envelope to the customer. And if one of them came back and it was coincidentally the first one that was already canceled, they would have to go through this process again. So cancel that, cancel the outstanding offer, recreate the original one, and send that through. So that's a clear example where by looking at the difference between data and model, you find out that the process is actually executed correctly, but that the system is not supporting it properly.

 

18:24

Patrick:

So the model that you build with with the business people is then a what is a should be model of reality. And then you compare it to what is actually happening in the event log. And that is what is known as conformance checking?

 

18:39

Boudewijn van Dongen:

You could call it as an should be model. When you ask somebody to develop a model of a process, you typically will be presented with it as if it's an as is model. So they will say, this is how we do things. But indeed, you're right, it's more of an as should be model. And the differences that you identify clearly show, well, this is how it should be, but this is how things are going.

 

19:07

Patrick:

So is it always about finding the gaps or is it also about identifying activities that are happening too many times? Or what type of conformance can you actually measure?

 

19:18

Boudewijn van Dongen:

Of course. And so, typical examples would also be ping pong behavior. I use something that you may recall from especially incidental ticketing systems, or do you immediately reply back to the customer saying Oh, please give me more information? And their model would or any model of that organization would show a loop. And by looking at how often do you actually go through that loop? And that is also conformance checking. You also replay the data on this model in order to identify how often do you go through a specific part of the process? You would be able to identify this type of behavior. Similarly, you could identify parts of the model that are never used. But also that happens. We sometimes have very complicated processes defined that are not necessary. One of the BPI challenges, I'm not afraid to say that, was the data from our own university. And there was an approval step of declarations in there that was essentially never rejected. So the question is why the approval step is there anyway?

 

20:27

Jakub:

What if, what if?

 

20:29

Boudewijn van Dongen:

Well, indeed, in this case, it wasn't what ifs in that deal, because the person having to do this approval step by step is technically responsible.

 

20:39

Jakub:

So what it seems to me like from just listening to you and we've had a lot of guests, both from Process Mining World. And as we started the podcast, we realized process mining is more or just a business process management is more than process mining. So we also had a lot of people who are on this modeling part oof the end of the spectrum. Does this mean that the, let's say, conformance checking kind of closes the gap between process mining itself and this business process modeling?

 

21:12

Boudewijn van Dongen:

Yes, to some extent, yes. Because business process modeling comes as well, as it's traditionally done by people who are experts in the process or in similar processes. If you talk about consultants, they do this by talking to experts and stakeholders. They draw a picture of the process, but talking to stakeholders, you will not see the dirty sides of a process, if you will. This is not a bad thing, the models should exist and they should be made in a comprehensive and comprehensible way by people who understand that what the right level of abstraction is, that you want to talk about the process. But you cannot assume that when a business person management expert has has drawn a picture of your organization, that that picture accurately represents the truth. It's an abstraction and the deviations matter.

 

22:16

Jakub:

So maybe a next question would be if I am an organization and let's say I have all the tools in place, both process mining, both business process modeling tools, how do I engage in conformance checking activities? What should I do and how should I address that?

 

22:34

Boudewijn van Dongen:

Well, I think the first thing you should do is to reach out to your process mining vendor and tell them how important this topic is to be included in their tooling. So at the moment, not too many commercial vendors have already made a solution, there are some, and it is gaining also momentum on their side. But I think in general, the attitude towards the results you get out of your process mining tools is to be always a bit wary of the picture you're presented with, and especially what are the things that do not fit in this picture. So everything that you see when you look at a picture made by Celonis, UI Path, or any other commercial product or noncommercial product. Every arc, every note is probably explainable in terms of the data that underlies it. What you don't see is what part of the data is not explained. And that's where conformance checking comes in. So make sure that you always ask your vendor, what am I not looking at?

 

 

 

23:44

Patrick:

So how do you envision a proper tooling to look like? Because we have seen this in practice, Jakub and I have seen this in practice, and essentially it boils down to a list of activities and the sequence asking you, is this okay? Yes. No. Is this fine? Yes. No. And then you just kind of sit and set it as, Yes, this is fine and this is not fine. And then it kind of gives you a list of what's conforming and what's not conforming. And in your opinion, what should this actually look like? Is it fine the way it is?

 

24:18

Boudewijn van Dongen:

That's an interesting question because this is very much a research challenge. So there is not a off the shelf solution that provides you with a complete insight into the deviations into conformance checking results. The very rudimentary question, is this deviation okay or not? Right. That's a very first approach. Intuitively, The visualization capabilities of our tooling and also commercial tooling are quite powerful. So you can actually present the picture of the process that you discovered and in the same image, maybe with a different code, maybe with a different shading. I'm not a visualization expert here, but you can present also the parts of the data that were left out of this picture. And maybe you can even add the things, especially if we're talking about manmade models or models that are put in there, not by a process discovery tool, but by a person saying this is how we work This is even more important. But I think a projection on the level of the model is definitely important.

 

25:31

Patrick:

So when you look at this in research, I'm not sure about the size of the data that you're looking at, but when we look at it on an enterprise level, we often see processes that are massive data models that are massive and we're talking about millions up to billions of activities and things like that. How does conformance checking scale in terms of billions of activities.

 

25:53

Boudewijn van Dongen:

Yes, that's a nasty question to ask an academic, but there are two sides to the answer. So on the one hand, I think if you if you look at the also what we presented in the book and the work of Aria, essentially that is a solid mathematical foundation for performance checking that doesn't scale properly like that. It's a very complicated, theoretically, very complex problem and doing this on real life examples, if you make the right examples, you can break the software, like simply it will never finish. At the same time, there are not many real life scenarios where the deviations are precisely of that form. So you can actually make fast and efficient tooling that is able to find deviations in practical scenarios relatively quickly. I think there is a rule there for both academia trying to also properly define juristics that work in specific cases, as well as for the commercial vendors to implement juristics to implement fast versions that might not always give perfect answers in every scenario, but in 9 times out of 10, or 99 times out of 100 that are spot on.

 

 

27:27

Patrick:

So if we just also talk about the size of the event log, like what about just the different types of activities? If you have just a thousand activities, but they're all different. And how does conformance look in those cases?

 

27:42

Boudewijn van Dongen

Technology, the scaling of the technology is quite bad in the size of the models, not so much to the log necessarily, but experimentation that we've done shows that you're the worst off if you have a fitness of around 80%. So that means that 20% of the data is actually not properly represented in the model. And if that 20% is generally distributed over your events then it's tough to identify precisely where this happens. In practice, if you have a process model that is so far off you probably should trim it down to smaller chunks and say, let's not look at one monolithic model of our process, but let's look at smaller parts and see if we can already identify deviations in specific sub parts of the process before go into the whole thing. But again, the challenges that I think are twofold. On the one hand, there's a computational challenge that mainly I think is the responsibility of vendors to think about scalable sort of juristics. And at the same time, there is also academic challenges. How to deal with this complexity from a fundamental point of view?

 

29:03

Jakub:

Mm hmm. Speaking of complexity, from the fundamental view, at some point these analysis are performed by either business users or data analysts such as ourselves. How do we conduct a proper conformance checking analysis? What would be the key drivers then? What to focus on when we're doing it? Maybe first question would be, should we start with requesting some model as should be from the business and then comparing it to that? Or should we just derive, let's say, a happy path and then just keep going down from there?

 

29:41

Boudewijn van Dongen

I think both are possible, but just taking sort of a discovered happy path, that's probably not really the way to go. But what you would be able to do is to use discovery techniques to build an understanding of the build the model yourself. So you use the input of the discovery technology to get a model, a process model as it would also be made by maybe a consultant. So you get a should be model with input from the discovery algorithms that you validate that with your business owner. So the process mining tool says this. If I translate that into a model and I combine a couple of variants that I found in the data, then I get this model. Is this indeed properly describing your main process? And then you can use that to relate it to your data as well. And now we find deviations and sometimes these deviations give rise to updating the model. Sometimes they give rise to updating your information systems because your model is correct. But apparently your data is correctly showing what should be done, but the system simply doesn't support this. And sometimes it also gives rise to really questions in the organization. Look, I'm sorry, but what happened here? But to refer back to the BPI 12 example with the Financial Institute, we also found three cases where there was no signature recorded. But loans were paid out. So that means the customer was paid out alone that he or she did not accept. So I'm quite confident that this missing signature explains the hundreds of call the customer events that we saw after paying out the loan. I would also not answer the phone anymore. Would you?

 

31:35

Patrick:

Not likely. And so in terms of conformance checking, how does it compare to other process discovery methods? And in what cases do these other process discovery methods fall short where conformance checking can pick up the pieces?

 

31:53

Boudewijn van Dongen:

So it's not a process discovery method, right? Where I think it's a nice addition to the sort of the broad scope or the broad field of process discovery is that with discovery, you get a particular view on the process. We're very often represented as direct succession draft and multiple direct succession graphs. When you look at different variants, what you can do with performance checking is exactly what I just described before. You take these different views together, you make your own model of what is the as is and what should be process and then you use that and compare it with the data that is there to identify where there are still deviation. It's not something you can do instead of discovery. It's complementary to this discovery algorithms. And it basically allows you to take your discovery results and focus on where there are still improvements to be made in the organization.

 

33:13

Jakub:

Mm hmm. So I remember when I was starting in process mining business, one of the first thing was that I asked my colleague, what is this conformance checking step and what does it do? And he told me, Yeah, don't, don't, don't go down that road. Just don't use this report, because it is just challenging to explain to everyone and basically I have never use it ever since, which I am now very sorry to even admit, but it's the truth. The question for you would be why do you think that this practice specifically is so reduced or let's say excluded during professional implementation? And why don't really customers and people who are conducting process mining activities looking into conformance a bit stronger? What is preventing us from that.

 

34:04

Boudewijn van Dongen:

So I think that the reason why this was the case and I think it's slowly changing, but the reason why this was the case is that with simple process discovery, you could already identify quite a few improvement points in your processes by looking at this direct succession graph. With some timings on the arcs, you could say, look, here in the process we have a lead time of four days between these two activities and that in our mind should be one day. So you are already have very clear points where you can start improving your process. At some point that well dries up. So you will not be able to only look at discovery graphs. You will need to start looking at things that are not in the graph or the things that are happening and filtered out somehow in order to really reach improvement steps. So it will be there in commercial tooling more and more. It just wasn't ready yet. And yes, it's more complicated to explain than just a direct succession graph.

 

35:13

Jakub:

Now, maybe if you if you did a conformance checking on our implementations, you would find out that we usually take the easiest road unfortunately, because everything we do, our processes, that's what we learn. Anyhow, what are some of the key, let's say, applications where you could think of process conformance checking as a go to go to method in your processes, like where does it make most sense? And could you maybe give us some couple of more examples where you applied conformance checking? And the findings are pretty vital and important.

 

35:53

Boudewijn van Dongen:

So one of the obvious areas is really the area of compliance. To name a very simple example is GDPR, where we're confronted with this on a daily basis. Even in the current COVID pandemic, the Dutch government, we get these QR codes for going into a restaurant or for international travel. And the Dutch QR codes show less information than the international ones, but they're part of the same app. And all of this is because of GDPR. So GDPR describes processes. GDPR describes that you can record and store data if you need it for a purpose like that. If you were given this for a specific purpose and you also need it for a specific purpose. Now, think of an interesting scenario where a co agent in the health organization that sends out QR codes is calling you because you had a positive PCR test. That call agent then uses your data for the correct purpose. So that's fine. And now that same call agent accesses your data and sends it by WhatsApp to his neighbor or friend. The data access itself is not a problem. So there will not be any system in place that flags the data access here. At the same time, the data is not being used for the purpose that it was collected for. So piggybacking this purpose in the process itself and then using conformance checking techniques to identify when a certain data access was appropriate, for that particular purpose can help you to identify these kind of problems. And they are not trivial questions in health care. Nurse can look at patient records only if the nurse is actually treating the patient, but sometimes the nurse is not treating the patient. But the doctor on the other side of the room asks, Can you quickly look what the value of this test is and the nurse can do that. But how do you deal with that in terms of a process? If you look at conformance checking, you will be able to identify this. You identified a specific activity was executed by Person X out of the context in which this activity should be executed. What is very challenging in in the area of conformance checking is to correctly translate this to the process level because it might be a violation of the model where the model also includes the rules and regulations, but it's not a violation in the process is perfectly explainable So there we see I think there is a big challenge still to overcome to make also this tooling more useful in practical scenarios. So to really make this clear distinction between the deviations, the low level deviations you might find between the model and the dataset, but the actual implications on the level of the process, the environment, the context.

 

39:20

Patrick:

So in a way, the model or the conformance checking is only ever as good as the model that it relies on. And in that sense, how good must the model be specifically in compliance, where it must capture all the things that what can happen and all the things that can't happen because as you said, there's a lot of ways that the users find of subverting the system and going around it. And is it even possible to capture all this behavior?

 

39:50

Boudewijn van Dongen:

Sometimes, yes. So there is, for example, the work of Elham Ramazani, who managed to translate a large collection of rules, compliance rules into fragments of process models, and then use that to do checking and identify deviations. And building on that, one of our PhD students is is also trying to do something similar in the context of GDPR. So yes, you're absolutely right that the model is vital here to do this correctly. But the notion of a model should also be seen a bit broader than just the process model. So that can be a tessellation of a set of rules. Sometimes a translation of a set of business rules, legislation and anything that is essentially restricting what should be done in a in the process.

 

40:55

Jakub:

There is a broad consensus among Process mining experts that conformance checking will become more important. And you also mentioned that is becoming more and more used in the implementations and also in practice. What will drive this adoption of conformance checking and how can also we as, let's say, implementation partners for these corporations help incorporate conformance checking into our activities and, you know, get the interest from them as well.

 

41:31

Boudewijn van Dongen:

I agree that conformance checking will become more important and I think it will also be partly due to the fact that the low hanging fruit that you can get with process mining or process discovery technology, it sort of is already gone, especially if a company does this in a continuous cycle. They've gotten used to the the charts that are produced by commercial tools and somehow in the background alerting the end user of these process mining solutions to the fact that something is deviating from the picture that they've been looking at all this time. That will be helpful to understand why the picture is going to change next week. And so if you purely do discovery and you would look from one week to the next and there is a real fundamental change in the process, then you will look at different pictures and you have to start wondering why did my model change. With conformance checking and especially if you do this online, you would be able to signal, look, there is something changing here because there are deviations popping up with respect to the original process. So you have some sort of early warning. What also is, I think a challenge in the process mining research is the fact that most of the tools and techniques assume that a process is a monolithic thing that exists only in the context of that one process or that one organization. So when you as a user look at a model, you think that's your world, but in practice it never is. And there's always 100 other processes that touch each other. They touch each other because the same person is involved, the same employee is working on these two cases. So that employee can do two things at once. They touch each other because this is coming from the same customer, two different requests to the same organization. And they touch each other because a backlog incurred in a particular process at a particular point in time causes delays two weeks later, in a totally different point of the organization because suddenly there is an increased workload and nobody saw that coming. So this multi dimensionality, this notion of networks, of processes being interconnected. That is also where I think conformance checking techniques will start helping to identify properly where the touchpoints actually are. 

 

44:25

Patrick:

So this is kind of a root cause analysis in a sense, right? Because we can see, for example, that a previous activity calls another activity, right? So that's a direct relationship. But if we look a little bit more deeply, like you said, if things happened two weeks ago and this can have some sort of effect sometime down the road of increased workload. Now, how does that relate to conformance checking and the future of this?

 

44:49

Boudewijn van Dongen:

I have two examples that I can think of. One is from a project that we're running with a production company really a factory, they make stuff. And the production line, you can consider this as a process where two products, one after another, is being produced. And if you look at these products in isolation, you will find that there are minor deviations with respect to the defined process but if you put all the products together and try to do the conformance checking on a much broader scale, so it's the same production process, but now you're not considering each product individually you're considering all the product as they actually went through according to the data. Then you find that the way to explain the deviations changes. When you look at all of them together, for example, because you see in the data that the resource that you expect it to do something is working somewhere else. So that resource can not have been doing this activity that's not logged. So you need to come up with a better explanation for that. And this is not necessarily a root cause analysis, but it is allowing you to identify differences between models and logs at a much larger scale than an individual product. When you think really on root cause analysis, I should mention the work that Derek is doing with the Van Dalen industries where he really try to make images of how like objects in his case, suitcases go through a system, baggage handling system at the airport and how small deviations in this routing of a suitcase,by the way, he presents this visually, you can actually see where the whole problem started and not like we it's not a process model as such. It's a different view, but it definitely is something where a small deviation at some point translates and propagates through a network of interconnected systems. And that is definitely root cause analysis.

 

47:15

Jakub:

This is really fascinating. And I would just I would just say that if you're interested more in the topic of conformance checking, just please go and buy the book about conformance checking and read out more into that. There's a lot of math as well. So I need to warn you, but it's mostly fun.

 

47:33

Boudewijn van Dongen:

Well, the first part of the book should be accessible, right?

 

47:35

Jakub:

Yeah, I can confirm that. However, let us switch the gears here a bit. And I also wanted to kind of ask you about ICPM, which both myself and Patrick attended, which was in Eindhoven in 2021. We discussed it briefly also on our show but since we have you here, could you tell us what ICPM is and what is your relation to that?

 

48:00

Boudewijn van Dongen:

Yes, of course. So it was very nice to see you in Eindhoven in November in the middle of a pandemic where we have a very successful live conference. So ICPM is the international conference on process mining. Formally it's an academic conference, right? So it's about the research of process mining where researchers meet up and talk about your latest discoveries on process mining. They publish their latest achievements there. But at the same time, it's a conference that also aims to bring together the industrial partners on process mining. So one day of this three day event is dedicated and next year will even be a four day event. One day is really dedicated to our industrial partners. We discuss well, we ask companies to present their use cases, the applications of process mining in real life and the idea is that this is a venue once a year where we meet, we talk to you guys directly, we come up with new ideas. You can tell us what we should do research on, and we can tell you what you should be implementing. And then we come away with a like to point slightly change viewpoints but at least it ensures that that we keep doing elephant research and that you guys are also aware what is going on in the academic world. And you are so back to ICPM. It's a conference, it's rather independent. That also means that the vendors use this as an opportunity to meet each other. So there's always a lot of interaction. After this year's edition, I got a comment from a company that said: "Tell the vendors next time to send more engineers because yeah, I talk to the marketing people and that was fun, but I would also like to talk to the engineers of my competitor." But this is something that I think characterizes this open community. It's an annual event. So we've been organizing this since 2019. Wil was the first to host us in Aechen in 2020. We were supposed to go to Padova but unfortunately COVID turned out around and Massimiliano de Leon, he did a fantastic job in making this a online event online conference. And in 2021 we met in Eindhoven the first week of November. Our government locked down the country in the second week of November again. So we were very very lucky to see everybody live. And then next year it's late October in the beautiful city of Bolzano. If you've got a chance.

 

50:49

Patrick:

We'll be there.

 

50:51

Jakub:

We will be there for sure.

 

50:53

Patrick:

Yeah. So let me ask you what is your role in in the ICPM. You already mentioned that you're organizing, but like what is your role here?

 

51:00

Boudewijn van Dongen:

And so at the moment I'm chair of the task force on process mining. So IEEE task force on process mining. I took over the position as chair from Wil, because he well, he did that for a few years, but he felt that it was difficult to combine with his position in Celonis. And we also want to rotate this position frequently. And as chair of this task force, it is well, it's my duty to ensure that the ICPM gets organized, but I don't have to do it myself, fortunately. So I'm very happy with Marco Montali mostly organizing it this year. And I'm very much looking for volunteers to host us in the in the years to come. The task force itself is backed by the IEEE, which is a group of enthusiastic plus mining researchers as well as people from industry that use or apply process mining in their companies. The Task Force Steering Committee does not have vendors. So we're also vendor independent. And the aim is to raise the awareness of process mining in academia to raise awareness of process mining in industry, to bring the two together. So I really want, we really want that our research is close to what is necessary from a business point of view and that we are also being fed with data from you guys to to work with really challenging datasets because in the end that's what brings the research I have.

 

52:46

Jakub:

So hopefully we can contribute to that a bit during the podcast and bring these worlds together.

 

52:52

Boudewijn van Dongen:

Yes, absolutely. So if any of your listeners feels that they have a dataset they are willing to share publicly with a organization or with this community, for example, in the context of the BPI challenge that I mentioned, a couple of examples of already, pleasre reach out to me and I'm sure we can make that work.

 

53:13

Jakub:

Sounds great. Speaking of which, where can people find to reach out to you or get to know any of your content? And your published papers and so on?

 

53:25

Boudewijn van Dongen:

Well, there's the conformance checking website, right? There's also my home page that should be updated but you can find me on the university's website. The IEEE task force, also processmining.org , or you will find links to a lot of things everywhere so I don't have a decent home page. I'm very sorry. I'm very bad at websites. That's that's just not my cup of coffee.

 

53:50

Jakub:

Either way, you will find a direct link to any of those websites that Boudewijn just mentioned in our show description. And for you, dear listeners, this will be an end of today's episode. So Boudewijn, thank you very much for coming. It was a pleasure to listen to topics of conformance checking, and I'm kind of sorry that we didn't get to this earlier, but I'm still happy that because that eventually.

 

54:13

Boudewijn van Dongen:

Was my pleasure to be here today.

 

54:16

Jakub:

All right. So thank you very much. Thank you for listening. As usual, if you have any questions, any recommendations on future episodes, future guests, or someone who you just would like to hear from, just reachout to us on LinkedIn, we are pretty active there. You can find us on Mining Your Business. We also have a website where you can listen to all of our episodes miningyourbusinesspodcast.com and we are also available at email. If you want to write us an email with any questions whatsoever, miningyourbusinesspodcast@gmail.com . Leave us a rating. It's always nice to hear back from you and talk to you, Patrick, in the next episode. Thank you very much. Bye bye.