CDF Plots¶

Cumulative Distribution Function (CDF) plots are powerful tools for understanding data distributions and comparing multiple datasets. They show the probability that a variable takes a value less than or equal to a given value.

Basic CDF Plot¶

import rekha as rk
import pandas as pd
import numpy as np

# Generate sample data
data = np.random.normal(100, 15, 1000)
df = pd.DataFrame({'values': data})

fig = rk.cdf(df, x='values',
             title='Cumulative Distribution Function',
             labels={'values': 'Value', 'y': 'Cumulative Probability'})
fig.show()

Comparing Distributions¶

CDFs are particularly useful for comparing multiple distributions:

# Compare different distributions
df = pd.DataFrame({
    'value': np.concatenate([
        np.random.normal(100, 15, 1000),      # Normal
        np.random.lognormal(4.5, 0.5, 1000),  # Log-Normal
        np.random.uniform(50, 150, 1000)      # Uniform
    ]),
    'distribution': ['Normal'] * 1000 + ['Log-Normal'] * 1000 + ['Uniform'] * 1000
})

fig = rk.cdf(df, x='value', color='distribution',
             title='Distribution Comparison using CDFs',
             labels={'value': 'Value', 'y': 'Cumulative Probability'})
fig.show()

Percentile Analysis¶

CDFs are excellent for percentile analysis, especially for performance metrics:

# Response time analysis
response_times = np.concatenate([
    np.random.lognormal(3.0, 0.3, 800),  # Fast responses
    np.random.lognormal(4.0, 0.5, 150),  # Medium responses
    np.random.lognormal(5.0, 0.7, 50),   # Slow responses
])

df = pd.DataFrame({'response_time_ms': response_times})

fig = rk.cdf(df, x='response_time_ms',
             title='Response Time Percentile Analysis',
             labels={'response_time_ms': 'Response Time (ms)', 
                    'y': 'Percentile'})

# Add percentile markers
ax = fig.get_axes()[0]
for p in [50, 90, 95, 99]:
    value = np.percentile(response_times, p)
    ax.axhline(y=p/100, color='red', linestyle='--', alpha=0.3)
    ax.axvline(x=value, color='red', linestyle='--', alpha=0.3)
    ax.text(value, p/100, f'P{p}: {value:.0f}ms', 
            fontsize=8, ha='left', va='bottom')

fig.show()

Grouped CDFs¶

Compare distributions across different groups:

# Create sample sales data
sales_df = pd.DataFrame({
    'sales': np.random.exponential(50, 1000),
    'region': np.random.choice(['North', 'South', 'East', 'West'], 1000)
})

# Sales distribution by region
fig = rk.cdf(sales_df, x='sales', color='region',
             title='Sales Distribution by Region',
             labels={'sales': 'Sales ($k)', 'y': 'Cumulative Probability'})
fig.show()

Customization Options¶

Custom Colors¶

fig = rk.cdf(df, x='value', color='category',
             color_mapping={
                 'A': '#FF6B6B',
                 'B': '#4ECDC4',
                 'C': '#45B7D1'
             })

Parameters¶

See the API Reference for complete parameter documentation.