Astrophysics & AI with Python: Hunting for Hidden Wobbles in Starlight

When we look up at the night sky, the stars appear to be perfectly still, frozen points of light. The reality, however, is a cosmic dance. The universe is rarely static; in fact, the majority of stars are not solitary like our Sun. Instead, they exist in gravitational partnerships known as binary star systems.

For astrophysicists, these systems are the Rosetta Stone of the cosmos. They provide the only direct method for measuring a star's most fundamental property: its mass. But how do we decode the physics of these distant pairs? We look for their shadows.

This guide explores how to detect the subtle "wobbles" in light curves—the periodic dips in brightness caused by eclipsing binaries—using Python and advanced signal processing techniques like the Lomb-Scargle Periodogram.

The Physics of the Eclipse and the Data Challenge

When two stars orbit each other in a plane aligned with Earth, they periodically block each other's light. We record this event as a light curve: a time-series plot of the star's brightness over time.

A perfect light curve would show two distinct dips: 1. The Primary Minimum: The deeper dip, occurring when the hotter, brighter star is eclipsed by its cooler companion. 2. The Secondary Minimum: The shallower dip, occurring half an orbit later.

The time between these primary dips reveals the orbital period (\(P\))—the heartbeat of the system. However, astronomers face a massive hurdle: Uneven Sampling.

Unlike a heartbeat monitor in a hospital, we cannot observe a star 24/7. The Earth rotates, clouds roll in, and telescopes have scheduling constraints. This leaves us with data full of massive, irregular gaps. Standard analysis tools, like the Fast Fourier Transform (FFT), fail here because they require uniform data points.

The Solution: The Lomb-Scargle Periodogram

To solve the "Lost Commuter Train Schedule" problem (finding a regular rhythm in a logbook with huge gaps), astronomers rely on the Lomb-Scargle Periodogram (LSP).

The LSP is a modified Fourier analysis designed specifically for irregularly sampled data. Instead of assuming a constant time step, it tests thousands of potential frequencies. For each frequency, it calculates a Power value, which measures how well a sine wave of that frequency fits the scattered data points.

The output is a periodogram. If there is a true periodic signal (like an eclipse), it appears as a distinct peak in the plot. We must also calculate the False Alarm Probability (FAP) to ensure the peak isn't just random noise. A low FAP (e.g., \(10^{-5}\)) gives us high confidence in our discovery.

Phase Folding: Unmasking the Signal

Once we have a candidate period from the LSP, we perform Phase Folding. This technique collapses the entire time series onto a single timeline, aligning every observation based on where it falls in the orbital cycle.

The formula for phase (\(\phi\)) is:

\[ \phi = \frac{t - T_0}{P} \pmod{1} \]

Where \(t\) is the observation time, \(T_0\) is the time of a reference minimum, and \(P\) is the period.

If the period is correct, the scattered data points align perfectly to reveal the smooth, underlying shape of the eclipse. If the period is wrong, the data remains a chaotic smear.

Python Workflow: Simulating and Analyzing a Light Curve

Let's put theory into practice. Below is a Python workflow that simulates a noisy light curve and uses the Lomb-Scargle method to recover the hidden period.

1. Simulating the Data

First, we generate synthetic data representing a binary star with a period of 1.5 days, adding Gaussian noise to mimic atmospheric and instrumental errors.

import numpy as np
import matplotlib.pyplot as plt
from astropy.timeseries import LombScargle

# --- Simulation Parameters ---
TOTAL_TIME = 10.0       # Days of observation
NUM_POINTS = 500        # Number of data points
PERIOD = 1.5            # True orbital period (days)
ECLIPSE_DEPTH = 0.1     # 10% dip in brightness
NOISE_LEVEL = 0.005     # Standard deviation of noise

# Generate time array (irregular sampling is simulated by adding random offsets later)
time = np.linspace(0, TOTAL_TIME, NUM_POINTS)

# Create the eclipse signal (a simple box model for clarity)
phase = (time % PERIOD) / PERIOD
is_eclipsed = (phase > 0.45) & (phase < 0.55)
eclipse_signal = np.where(is_eclipsed, -ECLIPSE_DEPTH, 0)

# Add noise
noise = np.random.normal(0, NOISE_LEVEL, NUM_POINTS)
flux = 1.0 + eclipse_signal + noise

# Normalize the flux (Detrending)
flux = flux / np.mean(flux)

# --- Visualization ---
plt.figure(figsize=(12, 5))
plt.plot(time, flux, 'k.', alpha=0.5, label='Noisy Observations')
plt.plot(time, 1.0 + eclipse_signal, 'r-', label='True Signal')
plt.title(f"Simulated Eclipsing Binary Light Curve (P={PERIOD}d)")
plt.xlabel("Time (Days)")
plt.ylabel("Normalized Flux")
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

2. Detecting the Period with Lomb-Scargle

Now we apply the LombScargle class from astropy to find the period hidden in the noise.

# --- Frequency Grid Setup ---
# We define a range of periods to search (e.g., 0.1 to 20 days)
min_period = 0.1
max_period = 20.0
# Convert periods to frequencies (frequency = 1 / period)
frequencies = np.linspace(1.0 / max_period, 1.0 / min_period, 10000)

# --- Calculate Power Spectrum ---
ls = LombScargle(time, flux)
power = ls.power(frequencies)

# --- Find the Best Period ---
best_frequency = frequencies[np.argmax(power)]
best_period = 1.0 / best_frequency

# --- Calculate False Alarm Probability ---
fap = ls.false_alarm_probability(power.max(), method='baluev')

print(f"True Period: {PERIOD} days")
print(f"Recovered Period: {best_period:.4f} days")
print(f"False Alarm Probability: {fap:.2e}")

# --- Plot the Periodogram ---
plt.figure(figsize=(12, 5))
plt.plot(1.0 / frequencies, power, 'b-')
plt.axvline(best_period, alpha=0.4, color='r', linestyle='--')
plt.ylim(0, power.max() * 1.1)
plt.title("Lomb-Scargle Periodogram")
plt.xlabel("Period (Days)")
plt.ylabel("Power")
plt.show()

3. Phase Folding and Validation

Finally, we use the recovered period to fold the data. This validates the result by collapsing the light curve into a single cycle.

# --- Phase Folding ---
# We use the recovered period to calculate the phase for every data point
phase_folded = ((time - time[np.argmin(flux)]) % best_period) / best_period

# Sort data for clean plotting
sort_idx = np.argsort(phase_folded)
phase_sorted = phase_folded[sort_idx]
flux_sorted = flux[sort_idx]

# --- Plot Phase-Folded Light Curve ---
plt.figure(figsize=(8, 6))
plt.plot(phase_sorted, flux_sorted, 'k.', alpha=0.4, label='Observed Data')
plt.title(f"Phase-Folded Light Curve (P = {best_period:.4f} days)")
plt.xlabel("Orbital Phase")
plt.ylabel("Normalized Flux")
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

Conclusion

Detecting the "wobbles" in starlight is a classic signal processing challenge. By moving beyond standard Fourier transforms and utilizing the Lomb-Scargle Periodogram, we can extract precise orbital periods from messy, incomplete astronomical data.

The process—Simulate \(\rightarrow\) Detect \(\rightarrow\) Fold—forms the backbone of modern variable star astronomy. It allows us to peel back the layers of noise and reveal the gravitational dances that govern the universe.

Let's Discuss

If you were analyzing data from a telescope with significant instrumental drift (a slow trend up or down), how might you modify the data cleaning step before applying the Lomb-Scargle algorithm?
Beyond eclipsing binaries, what other real-world time-series datasets (e.g., finance, heart rate monitoring, climate data) could benefit from the Lomb-Scargle method, and why?

The concepts and code demonstrated here are drawn directly from the comprehensive roadmap laid out in the ebook Astrophysics & AI: Building Research Agents for Astronomy, Cosmology, and SETI. You can find it here: Leanpub.com or here: Amazon.com. Check all the other programming ebooks on python, typescript, c#: Leanpub.com or Amazon.com.

Code License: All code examples are released under the MIT License. Github repo.

All textual explanations, original diagrams, and illustrations are the intellectual property of the author. To support the maintenance of this site via AdSense, please read this content exclusively online. Copying, redistribution, or reproduction is strictly prohibited.