Automatic Feature Engineering for Large Scale Time Series Data Using tsfresh and Dask
[This talk was delivered as a part of PyData Montreal Jan 2021 meetup & BelPy 2021 conference]
Time series data is different from cross sectional data. In time series data, observation at any instance of time depends on the observations from the past based on the underlying process. Often it contains noise and redundant information. To make things more complex, most of the traditional Machine Learning algorithms are developed for non-temporal data. Thus, extracting meaningful features from raw time series plays a major role.
First half of the presentation talks about a Python library called tsfresh. tsfresh accelerates the feature engineering process by automatically generating hundreds of features for time series data. The second half of the presentation describes various challenges encountered when the size of the data is large and how these challenges can be addressed using tsfresh on top of a parallel computing framework, Dask.
- Recording (Longer Version, Shorter Version)
Scaling Up Data Science Work Flow Using Dask
[This talk was delivered as a part of Bangalore Python Users Group, BanPypers’ Dec 2020 meetup]
As a Data Scientist, we face few challenges while dealing with large volume of data:
- Popular libraries like NumPy & Pandas are not designed to scale beyond single core/processor
- Numpy, Pandas, Scikit-Learn are not designed to scale beyond a single machine
- If data is bigger than RAM, these libraries can’t be used
In the talk, I discuss, how these challenges can be addressed using the parallel computing library, Dask.
Machine Learning for Predictive Maintenance of HVAC Assets
In this session of TWIMLfest 2020, Kai Lichtenberg and me explored how AI is being used in industrial applications. In the first part, we talked about our experience of applying AI in traditional industry and how the approach differs from the IT Industry. In the second part, I spoke about applying Machine Learning for Predictive Maintenance of HVAC Assets and Kai talked about Predictive Quality.
Basics Of Machine Learning & Its Application
This talk was delivered in a webinar organized by the IETF chapter of University of Engineering & Management, Jaipur in August 2020.
In the first half of the talk, I “intuitively ” explained Machine Learning, its applications, difference between AI/ML/DL etc. In the second half of the talk, I described Machine Learning is being used in the predictive maintenance industry.