Efficient Data Mining and Machine Learning for CRM
As the Chief Scientist,CTO and co-founder of Paramark, Sanjay conceptualized, designed and developed a real-time, parallel and fault-tolerant data mining based optimization service called PILOT. PILOT served more than 10 million optimized decisions a day in 2001 with a 99.99% uptime. Sanjay, along with his other co-founders, raised more than $10 million in venture capital. Technologies for direct marketing in 1999 were based on collecting consumer responses on multiple advertisements in a test campaign, using the data collected to determine the best advertisement and then using that for the main campaign. In practice, the best advertisement changed as consumer preference changed with time (e.g. due to change in economic conditions) resulting in significant underperformance. PILOT used active learning along with data mining techniques to in real-time update the best advertisement based on changing conditions and/or consumer profiles. This resulted in 20-100% improvement in revenue generated for advertisement campaigns of Capitol One, American Express and Chase Bank. Because of PILOT based products, Paramark was recognized by VentureWire/Technologic Partners as a top 100 Internet technology company in 2001 and 2002 and was acquired in 2002. Sanjay was the lead inventor of many patents. Real-time optimization pioneered by Paramark is present today in marketing/CRM solutions provided by companies such as Adobe-Omniture and SAP.
- United States patent number 8,386,315, Arvind Bala, Richard Chatwin, Brian Jones, Ahmet Nalcacioglu, Sanjay Ranka, Yield management system and method for advertising inventory, February 26, 2013.
- United States patent number 8,260,663, Sanjay Ranka, Diane Chang, and Daniel Veiner, Method, algorithm, and computer program for targeting messages including advertisements in an interactive measurable medium, September 4, 2012.
- United States patent number 7,415,423, Sanjay Ranka, Jason Lenderman, and James Weisinger, Method, algorithm, and computer program for optimizing the performance of messages including advertisements in an interactive measurable medium, August 19, 2008.
- United States patent number 7,406,434, Diane Chang, Richard Chatwin, Sachin Kumar, Sanjay Ranka, James Weisinger, Jason Lenderman, System and method for improving the performance of electronic media advertising campaigns through multi-attribute analysis and optimization, July 29, 2008.
Energy Aware Computing
Multi-core processors are poised to dominate the landscape of next generation computing. However, lack of generally applicable methods and tools for allocating tasks to cores while economizing energy remains a key challenge for many application environments. We are conducting research to develop a new theoretical and experimental framework called multi-element and multi-objective optimization that will simultaneously and flexibly optimize the goals of energy minimization and performance maximization while taking into account constraints due to multiple architectural elements such as cores, caches, and buses of current and emerging multi-core processors. We have focused on the following tasks:
- We have investigated radically new but effective methods that have the ability to efficiently derive pareto-optimal (or near optimal) solutions, with dynamically varying objectives and scalability.
- We have developed innovative low complexity static and dynamic algorithms for scheduling tasks while considering important factors such as thermal constraints, leakage currents etc.
- Our scheduling algorithms address key elements of energy and performance of multi-core processors. Multiple elements, such as L1, L2, and L3 caches have to operate in close harmony with the processing cores, requiring multi-element optimization.
We have benchmarked our algorithms on a diverse suite of scientific and multimedia applications.
- Hengxing Tan and Sanjay Ranka, Thermal-aware Scheduling for Data Parallel Workloads on Multi-Core Processors, Proceedings of 2014 IEEE ISCC 2014.
- Yifan Wang and Sanjay Ranka, Task Scheduling for Energy Optimization and Temperature Improvements, Proceedings of 2014 IEEE ISCC 2014.
- Arslan Munir, Ann Gordon-Ross, Sanjay Ranka: Multi-Core Embedded Wireless Sensor Networks: Architecture and Applications. IEEE Transactions on Parallel Distributed Systems. Vol. 25(6): 1553-1562 (2014)
- Arslan Munir, Ann Gordon-Ross, Sanjay Ranka, Farinaz Koushanfar: A queueing theoretic approach for performance evaluation of low-power multi-core embedded systems. Journal of Parallel Distributed Computing. 74(1): 1872-1890 (2014)
- Hafiz Fahad Sheikh, Hengxing Tan, Ishfaq Ahmad, Sanjay Ranka, and Phanisekhar Bv. 2012. Energy- and performance-aware scheduling of tasks on parallel and distributed systems. Journal of Emerging Technology in Computing Syst. 8, 4, Article 32 (November 2012).
- Arslan Munir, Sanjay Ranka, Ann Gordon-Ross, “High-Performance Energy-Efficient Multi-Core Embedded Computing,” IEEE Transactions on Parallel and Distributed Systems, Volume 23(4), 2012, pp. 684-700.
- Weixun Wang, Prabhat Mishra and Sanjay Ranka, Dynamic Reconfiguration in Real-Time Systems: Energy, Performance, and Thermal Perspectives, June 2012, Springer Verlag.
- Hafiz Fahad Sheikh, Hengxing Tan, Ishfaq Ahmad, Sanjay Ranka, Phanisekhar BV, Energy and Performance Aware Scheduling of Tasks on Parallel and Distributed Systems, Journal of Emerging Technology Computing Systems.
- Hafiz Fahad Sheikh, Ishfaq Ahmad, Zhe Wang, Sanjay Ranka, An overview and classification of thermal-aware scheduling techniques for multi-core processing systems, Sustainable Computing: Informatics and Systems.
- Weixun Wang, Sanjay Ranka and Prabhat Mishra, Energy-Aware Dynamic Reconfiguration Algorithms for Real-Time Multitasking Systems, Sustainable Computing: Informatics and Systems (SUSCOM), 2011, Volume 1, pages 35-45.
- Jaeyeon Kang and Sanjay Ranka, Slack allocation algorithms for parallel machines. Journal of Parallel and Distributed Computing, Vol. 70(1), pp. 23-24, 2010.
- Jaeyeon Kang and Sanjay Ranka: Dynamic slack allocation algorithms for energy minimization on parallel machines. Journal of Parallel and Distributed Computing, Vol. 70(5): 417-430, 2010.
- Arslan Munir, Ann Gordon-Ross, Sanjay Ranka: A queueing theoretic approach for performance evaluation of low-power multi-core embedded systems. ICCD 2011: 198-205.
- Zhe Wang, Prabhat Mishra and Sanjay Ranka Temperature-aware Task Partitioning for Real-Time Scheduling in Embedded Systems, Proceedings of VLSI 2012.
- Weixun Wang, Prabhat Mishra and Sanjay Ranka, Dynamic Cache Reconfiguration and Partitioning for Energy Optimization in Real-Time Multi-Core Systems, ACM/IEEE Design Automation Conference (DAC), pages – , Sand Diego, California, USA, June 5-10, 2011.
- Weixun Wang, Sanjay Ranka and Prabhat Mishra, A General Algorithm for Energy-Aware Dynamic Reconfiguration in Multitasking Systems, International Conference on VLSI Design, pages -, Chennai, India, January 2-7, 2011.
- Jaeyeon Kang and Sanjay Ranka, DVS based Energy Minimization Algorithm for Parallel Machines, Proceedings of IEEE International Parallel and Distributed Processing Symposium, 2008, pp. 1-12.
- Jaeyeon Kang and Sanjay Ranka, Dynamic Algorithms for Energy Minimization on Parallel Machines., Proceeding of Euromicro International Conference on Parallel, Distributed and network-based Processing (PDP), 2008, pp. 399-406.
- Jaeyeon Kang and Sanjay Ranka, Assignment Algorithm for Energy Minimization on Parallel Machines, Proceedings of the Third International Workshop on Advanced Distributed and Parallel Network Applications, pp. 484-491, 2009.
Data Mining Algorithms for Bioinformatics
Data driven research has become an important avenue for novel discoveries in a number of bioinfromatics applications. We have developed novel approaches for mining metabolic pathways, deriving genes, modeling microbial resistance and modeling CGH data. The current arsenal of antimicrobial or antibiotic drugs for treating bacterial infection is one of the most important public health tools available, but it is not an inexhaustible resource. The more haphazardly antimicrobial drugs are used, the more the targeted pathogens develop resistance. Once a pathogen develops resistance to all of the available drugs, treating an infected patient may become difficult or impossible. We developed data mining tools for discovering when and why antimicrobial resistance appears in nosocomial (hospital acquired) infections.
- Saad I. Sheikh, Tamer Kahveci, Sanjay Ranka, J. Gordon Burleigh, Stability analysis of phylogenetic trees. Bioinformatics 29(2): 166-174 (2013)
- Nirmalya Bandyopadhyay, Manas Somaiya, Tamer Kahveci, Sanjay Ranka, Analyzing Differential Gene Regulation in Two Group Perturbation Experiments, BMC Genomics, 2012, 13(Suppl 2):S2.
- Nirmalya Bandyopadhyay, Tamer Kahveci, Sanjay Ranka, Yijun Sun and Steve Goodison, Pathway based Feature Selection Algorithm for Cancer Microarray Data” Advances in Bioinformatics, to appear.
- Bin Song, Esra Buyuktahtakin, Tamer Kahveci and Sanjay Ranka, Manipulating the steady state of metabolic pathways, IEEE/ACM Transactions on Computational Biology and Bioinformatics, Volume 8, pp. 732-747, 2011.
- Bin Song, Padmavati Sridhar, Tamer Kahveci and Sanjay Ranka, Double Iterative Optimization for Metabolic Network-Based Drug Target Identification, International Journal of Data Mining and Bioinformatics, 3:2, pages 124-144, 2009.
- Nirmalya Bandyopadhyay, Manas Somaiya, Sanjay Ranka, Tamer Kahveci, Modeling Perturbations using Gene Networks, International Conference on Computational Systems Biology (CSB), 2010, to appear.
- Bin Song, Sanjay Ranka, Tamer Kahveci, Enzymatic target identification with dynamic states, International Conference on Bioinformatics and Computational Biology (ACM-BCB), 2010. (Best student paper award).
- Jun Liu, Nirmalya Bandyopadhyay, Sanjay Ranka, Michael Baudis, Tamer Kahveci. Inferring Progression Models for CGH data, Journal of Bioinformatics, Vol. 25:17, pp. 2208-2215, 2009.
- Padmavati Sridhar, Bin Song, Tamer Kahveci and Sanjay Ranka, Mining metabolic networks for optimal drug targets, Proceedings of Pacific Symposium on Biocomputing (PSB) 2008, pp. 291-302.
- Mingxi Wu, Chris Jermaine, Sanjay Ranka, Xiuyao Song, John Gums: A Model-Agnostic Framework for Fast Spatial Anomaly Detection. IEEE/ACM Transactions on Knowledge Discovery and Data Mining, TKDD 4(4): 20 (2010)
- Christopher M. Jermaine, Subramanian Arumugam, Abhijit Pol, Alin Dobra. “Scalable approximate query processing with the DBO engine,” Proceedings of the ACM SIGMOD International Conference on Management of Data, 2007, p. 725.
- John Gums, Sanjay Ranka, Chris Jermaine. “Heterogenetiy in Resistance Trends Greatest in Large Hospitals: Results of the Antimicrobial Resistance Management Program,” 47th Annual Interscience Conference on Antimicrobial Agents and Chemotherapy (ICAAC), v.47, 2007.
- John Gums, Sanjay Ranka, Christopher Jermaine. “Significant Heterogeneity Found in Resistance Trends Between Hospitals: Results of the Antimicrobial Resistance Management Program,” 47th Annual Meeting of the Infectious Diseases Society of America (IDSA 2007), v.47, 2007.
- Manas Somaiya, Christopher M. Jermaine, Sanjay Ranka. “Learning correlations using the mixture-of-subsets model,” ACM Transaction on Knowledge Discovery in Data, v.1, 2008.
- Mingxi Wu, Christopher M. Jermaine. “A Bayesian Method for Guessing the Extreme Values in a Data Set,” Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB 2008), v.33, 2008, p. 471.
- S.M. Smith, J.G. Gums, C. Jermaine, S. Ranka. “The Implications of Phenotypic Clustering of Antimicrobial Resistance Patterns on Predicting Future Trends,” 48th Annual ICAAC/46th Annual IDSA Meeting, Washington DC, 2008.
- X. Song, Chris Jermaine, Sanjay Ranka, John Gums. “A Beyesian Mixture Model with Linear Regression Mixing Proportions,” Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, v.14, 2008.
- Xiuyao Song, Mingxi Wu, Christopher M. Jermaine, Sanjay Ranka. “Conditional Anomaly Detection,” IEEE Trans. Knowl. Data Eng, v.19 (5), 2007, p. 631.
- Xiuyao Song, Mingxi Wu, Christopher M. Jermaine, Sanjay Ranka. “Statistical Change Detection for Multi-Dimensional Data,” Proceedings of the Thirteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2007), v.13, 2007, p. 667.
Data Intensive Computing and Communications for Grid and Cloud Computing
Grid and Cloud Scheduling: For many applications, data is distributed across geographically distributed storage systems and owned by multiple organizations. Grid computing allows multiple organizations to share resources. It requires concurrent utilization of a large number of heterogeneous and dynamic resources by multiple applications. With support from a NSF ITR grant and Intel Corporation, I have developed SPHINX, a middleware for managing grid resources that uses novel linear programming based algorithms for resource allocation. These algorithms guarantee application quality of service requirements and organizational policies simultaneously. The SPHINX middleware, was successfully demonstrated at Supercomputing 2003 and 2004.
Control Plane Provisioning for Grid and Could Computing: Communication is crucial portion of scaling data mining and large scale applications on grids. Large scale grids are deploying optical networks to support the high bandwidth requirements of these applications. Realizing these high bandwidths requires careful control of the underlying resources and a good understanding of the underlying topologies. In joint work with another NSF supported project (Ultralight), we are currently developing and analyzing control plane scheduling algorithms for maximum concurrent file transfer problem (MCFTP) in optical networks. We have several linear programming based approaches to solve this problem – a novel aspect of this work is in incorporating spatial and temporal policy constraints. Our preliminary results show that our algorithms can schedule for 100 node networks (with 3-7 edges per node) in several minutes or less.
Fault Tolerant Aggregation in Large Scale Systems: Communication is crucial portion of scaling data mining Our interest is in designing distributed algorithms for aggregation in sensor networks that are reliable, fault tolerant and have good, scalable performance. We have performed in-depth analysis of the techniques proposed for improving the fault tolerance of tree aggregation, such as multiple trees and local fixes. We have also worked on a methodology that optimizes the performance of a hybrid of tree and gossip-based aggregation.
- Yan Li, Sanjay Ranka, Sartaj Sahni: In-advance path reservation for file transfers in e-science applications. The Journal of Supercomputing, Vol. 59(3): 1167-1187, 2012.
- Eun-Sung Jung, Sanjay Ranka, Sartaj Sahni: Topology Aggregation for e-Science Networks. International Journal of Next-General Computing, volume 1, 2010, pp 1- .
- Eun-Sung Jung, Sanjay Ranka and Sartaj Sahni, Topology Aggregation for e-Science Networks, Proceeding of the 10th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 530-533.
- Eun-Sung Jung, Sanjay Ranka and Sartaj Sahni, Bandwidth Allocation for Iterative Data-dependent e-Science Applications, Proceedings of the 10th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 233-242.
- Yan Li, Sanjay Ranka, Sartaj Sahni: First-Slot scheduling with wavelength conversion for distributed file transfers. ISSPIT 2010: 42-47.
- Laukik Chitnis, Alin Dobra and Sanjay Ranka. Analyzing the techniques that improve fault tolerance of aggregation trees in sensor networks. Journal of Parallel and Distributed Computing, Vol. 69 (12), pp. 950-960, 2009.
- Laukik Chitnis, Alin Dobra, and Sanjay Ranka, Fault tolerant aggregation in heterogeneous sensor networks. Journal of Parallel and Distributed Computing. Vol 69(2): 210-219 (2009)
- Laukik Chitnis, Alin Dobra and Sanjay Ranka. Aggregation Methods for Large Scale Sensor Networks. ACM Transactions on Sensor Networks, Volume 4 , Issue 2 (March 2008).
- Kannan Rajah, Sanjay Ranka, Ye Xia, Advance Reservation and Scheduling for Bulk Transfers in Research Networks, IEEE Transactions on Parallel and Distributed Systems, December 2008
- Kannan Rajah, Sanjay Ranka and Ye Xia, Scheduling Bulk File Transfer with Start and End Times, Scheduling Bulk File Transfers with Start and End Times, Computer Networks, Vol 52(5), April 2008, pp. 1105-1122.
- Eun-Sung Jung, Yan Li, Sanjay Ranka and Sartaj Sahni, An Evaluation of In-Advance Bandwidth Scheduling Algorithms for Connection-Oriented Networks, Proceedings of International Symposium on Parallel Architectures, Algorithms, and Networks, 2008, pp. 62-67.
- Eun-Sung Jung, Yan Li, Sanjay Ranka and Sartaj Sahni, Performance Evaluation of Routing and Wavelength Assignment Algorithms For Optical Networks, IEEE Symposium on Computers and Communications, 2008.
- Kannan Rajah, Sanjay Ranka and Ye Xia, Scheduling Bulk File Transfer with Start and End Times, Proceedings of NCA 2007, pp. 295-298.
- Zhe Wang, Sanjay Ranka and Ye Xia, Slotted Wavelength Scheduling for Bulk Transfers in Research Networks, 2009 International Conference on Parallel Processing to appear
- Sartaj Sahni, Nageshwara Rao, Sanjay Ranka, Yan Li, Eun-Sung Jung, Nara Kamath, Bandwidth Scheduling and Path Computation Algorithms for Connection-Oriented Networks, Proceedings of ICN 2007 (best paper award).
Efficient Algorithms for Radiation Therapy
Dynamic multileaf collimator intensity modulated radiation therapy is used to deliver intensity modulated beams on a collimator with the multiple leaves in motion. In joint work with Sahni, with support from a NIH grant, we have shown that optimal leaf sequencing based on unidirectional movement of the collimator leaves is as monitor unit (MU) efficient as bi-directional movement of these leaves. We have also developed efficient algorithms for segmental collimator beam delivery that completely eliminates areas of under dosages due to practical tongue-and-groove effect between adjacent leafs. Our methods results in 10% to 20% decrease in total MU as compared to field splitting techniques used in commercial planning systems.
- Srijit Kamath, Sartaj Sahni, Sanjay Ranka, Jonathan Li, and Jatinder Palta, Generalized field-splitting algorithms for optimal IMRT delivery efficiency, Physics in Medicine and Biology,52, 2007, 5483-5496 (nominated for the 2008 Robbins Prize, given to the best paper in the journal of Physics in Medicine and Biology).
- Srijit Kamath, Sartaj Sahni, Jatinder Palta, and Sanjay Ranka, Algorithms for optimal sequencing of dynamic multileaf collimators, Physics in Medicine and Biology, 49, 1, 2004, 33-54.
- Srijit Kamath, Sartaj Sahni, Jatinder Palta, Sanjay Ranka, and Jonathan Li, Optimal leaf sequencing with elimination of tongue-and-groove underdosage, Physics in Medicine and Biology, 49, 3, 2004, N7-N19.
- Srijit Kamath, Sartaj Sahni, Sanjay Ranka, Jonathan Li, and Jatinder Palta, A comparison of step-and-shoot leaf sequencing algorithms that eliminate tongue-and-groove effect,Physics in Medicine and Biology, 49, 2004, 3137-3143.
- Srijit Kamath, Sartaj Sahni, Sanjay Ranka, Jonathan Li, and Jatinder Palta, Optimal field splitting for large intensity-modulated fields, Medical Physics, 31, 12, 2004, 3314-3323.
- Srijit Kamath, Sartaj Sahni, Jonathan Li, Jatinder Palta, and Sanjay Ranka, Leaf sequencing algorithms for segmented multileaf collimation, Physics in Medicine and Biology, 48, 3, 2003, 307-324.
Fortran90D/HPF compiler and runtime support
Development of Fortran 90D compiler, one of the first compilers that demonstrated that scalable performance can be automatically achieved on large scale distributed memory machines and that has had lasting impact on design of high performance parallel languages. Fortran90D allowed the programmer to view distributed memory as a single global shared space and automatically generate the message passing code required for data movement on distributed memory machines. This compiler demonstrated that the data decomposition directives and the explicit data parallel constructs in Fortran90D can be used to generate codes that are comparable to hand optimized code and led to the development of High Performance Fortran (HPF), a language supported by many vendors in the late 1990s. Key features in Fortran90D to support data parallelism and data distribution are present in more recent vendor-supported HPC languages (Chapel, Fortress, X10). They are also present in NVIDIA CUDA environment for achieving high performance on GPUs.
- I. Al-furaih, S. Aluru and S. Ranka: Parallel Construction of Multidimensional Binary Search Trees, IEEE Transactions on Parallel and Distributed Systems, February 2000, pp. 136-148.
- I. Al-furaih, S. Aluru and S. Ranka, Practical Algorithms for Selection on Coarse Grain Parallel Computers, IEEE Transactions on Parallel and Distributed Systems, August 1997, pp. 803-810.
- C. W. Ou and S. Ranka, Parallel Incremental Graph Partitioning, IEEE Transactions on Parallel and Distributed Systems, August 1997, pp. 884-896.
- R. V. Shankar and S. Ranka. Random Data Accesses on a Coarse-grained Parallel Machine I. One-to-one Mappings, Journal of Parallel and Distributed Computing, July 1997, pp. 14-23.
- R. V. Shankar and S. Ranka. Random Data Accesses on a Coarse-grained Parallel Machine II. One-to-many and Many-to-one Mappings, Journal of Parallel and Distributed Computing, July 1997, pp. 24-34.
- M. Kaddoura and S. Ranka. Runtime Support for Parallelization of Data-Parallel Applications on Adaptive and Nonuniform Environments, Journal of Parallel and Distributed Computing (Special Issue on Workstation Clusters and Network-based Computing), June 1997, pp. 163-168.
- C. W. Ou and S. Ranka. Parallel Remapping Algorithms for Adaptive Problems, Journal of Parallel and Distributed Computing, May 1997, pp. 1010-108.
- S. Bae and S. Ranka. A Comparison of Different Message-Passing Paradigms for the Parallelization of Two Irregular Applications, The Journal of Supercomputing, 1996, vol. 10, pp. 55-85.
- C. W. Ou, S. Ranka, and G. Fox. Fast and Parallel Mapping Algorithms for Irregular Problems, Journal of Supercomputing, October 1996, Vol. 10, No. 2, pp. 119-140.
- M. Kaddoura, S. Ranka and A. Wang. Array Decompositions for Non-uniform Computational Environment, Journal of Parallel and Distributed Computing, August 1996, pp. 91-105.
- K. Dincer, Z. Bozkus, S. Ranka, and G. Fox. Benchmarking the computation and communication performance of the CM-5, Concurrency: Practice and Experience, Jan-Feb 1996, vol. 8(1), pp. 47-69.
- J. C. Wang, T. H. Lin, and S. Ranka. Distributed Scheduling of Unstructured Collective Communication on the CM-5, Parallel Processing Letters (Special Issue on Partitioning and Scheduling), December 1995, pp. 647-658.
- S. Ranka, J. C. Wang and M. Kumar. Irregular Personalized Communication on Distributed Memory Machines, Journal of Parallel and Distributed Computing, 25 (1995), pp. 58-71.
- S. Ranka, J. C. Wang, and G. C. Fox. Static and Runtime Scheduling of All-to-Many Personalized Communication on Permutation Networks, IEEE Transactions on Parallel and Distributed Systems, December 1994, vol. 5, No. 12, pp. 1266-1274.
- A. N. Choudhary, G. C. Fox, S. Hiranandani, K. Kennedy, C. Koelbel, S. Ranka, and C-W. Tseng. Unified Compilation of Fortran 77D and 90D, ACM Letters on Programming Languages and Systems, March–December 1993, 2(1-4), pp. 95-114.
- Z. Bozkus, A. Choudhary, G. Fox, T. Haupt, and S. Ranka. Compiling Fortran 90D/HPF for Distributed Memory MIMD Computers, Journal of Parallel and Distributed computing, April 1994, pp. 15-26.