OUR MISSION:
To continually innovate & improve the process of Feature Engineering
A lean six-sigma approach
Feature-engineering is one of the most important parts of the model development process. It’s a critical-to-success component of data science, the simple process of transforming raw data into useful features. Like all processes, there is always room for improvement and innovation. That’s what we are laser-focused on.
How the Pareto principle applies to the feature-variable selection process.
The Pareto principle, often over-simplified in business settings as “the 80/20 rule;”
or law of “vital few and trivial many” is an emperical observation that reveals within virtually any process, most of the desired outcome -- approximately 70% to 80% -- is produced by only approximately 20% to 30% of the factors in the process (e.g. 80% of a company’s sales are generated by 20% of the salesforce.) For example,if you select = log45 ≈ 1.16 as the Pareto index, you may find +/- 80% of effects coming from +/- 20% of your feature variables.
In other words, only a vital few datapoints/feature variables in the process matter. This phenomen is even more pronounced in the data-mining aspect of the model development process. Why? Because you start with a big data set, thousands or millions of feature variables (the trivial many feature variables). However, you only end up using only a small percentage of those feature variables in your model (the 10, 15, or 20 vital few features that are actually useful).
Realizing this, it’s easy to see why a BIGGER universe of infinite features -- one that is constantly expanding -- is better than a finite set of BIG DATA.
Want proof? Request a free demo today and see for yourself.
Generative A.I. is at the forefront of my customer conversations. This drives renewed emphasis on data strategy in preparation of these new technologies. You heard the team say it many times, there is no A.I. strategy without a data strategy