Definition of the CUDA GPU kernel for the simulation of x ray tracks in a voxelized geometry. More...

Defines
#define	LEAP_DISTANCE 256
	Upper limit of the number of random values sampled in a single track.
#define	a1_RANECU 40014
	Multipliers and moduli for the two MLCG in RANECU.
#define	m1_RANECU 2147483563
#define	a2_RANECU 40692
#define	m2_RANECU 2147483399
Functions
void	track_particles (int history_batch, int histories_per_thread, int num_p, int seed_input, unsigned long long int image, ulonglong2 voxels_Edep, float2 voxel_mat_dens, float2 mfp_Woodcock_table, float3 mfp_table_a, float3 mfp_table_b, struct rayleigh_struct rayleigh_table, struct compton_struct compton_table, struct detector_struct detector_data_array, struct source_struct source_data_array, ulonglong2 *materials_dose)
	Initialize the image array, ie, set all pixels to zero Essentially, this function has the same effect as the command: "cutilSafeCall(cudaMemcpy(image_device, image, image_bytes, cudaMemcpyHostToDevice))";.
void	tally_voxel_energy_deposition (float Edep, short3 voxel_coord, ulonglong2 *voxels_Edep)
	Tally the dose deposited in the voxels.
void	tally_image (float energy, float3 position, float3 direction, signed char scatter_state, unsigned long long int image, struct source_struct source_data_SHARED, struct detector_struct *detector_data_SHARED)
	Tally a radiographic projection image.
void	source (float3 position, float3 direction, float energy, int2 seed, int absvox, struct source_struct source_data_SHARED, struct detector_struct *detector_data_SHARED)
	Source that creates primary x rays, according to the defined source model.
void	move_to_bbox (float3 position, float3 direction, float3 size_bbox, int *intersection_flag)
	Functions that moves a particle inside the voxelized geometry bounding box.
void	init_PRNG (int history_batch, int histories_per_thread, int seed_input, int2 *seed)
	Initialize the pseudo-random number generator (PRNG) RANECU to a position far away from the previous history (leap frog technique).
int	abMODm (int m, int a, int s)
	Calculate "(a1*a2) MOD m" with 32-bit integers and avoiding the possible overflow, using the Russian Peasant approach modulo m and the approximate factoring method, as described in: L'Ecuyer and Cote, ACM Trans.
float	ranecu (int2 *seed)
	Pseudo-random number generator (PRNG) RANECU returning a float value (single precision version).
double	ranecu_double (int2 *seed)
	Pseudo-random number generator (PRNG) RANECU returning a double value.
int	locate_voxel (float3 position, short3 voxel_coord)
	Find the voxel that contains the current position.
void	rotate_double (float3 *direction, double costh, double phi)
	Rotates a vector; the rotation is specified by giving the polar and azimuthal angles in the "self-frame", as determined by the vector to be rotated.
void	GRAa (float energy, double costh_Rayleigh, int mat, float pmax_current, int2 seed, struct rayleigh_struct cgra)
	Sample a Rayleigh interaction using the sampling algorithm used in PENELOPE 2006.
void	GCOa (float energy, double costh_Compton, int mat, int2 seed, struct compton_struct *cgco_SHARED)
	Random sampling of incoherent (Compton) scattering of photons, using the sampling algorithm from PENELOPE 2006: Relativistic impulse approximation with analytical one-electron Compton profiles.
void	tally_materials_dose (float Edep, int material, ulonglong2 *materials_dose)
	Tally the depose deposited inside each material.

Detailed Description

Definition of the CUDA GPU kernel for the simulation of x ray tracks in a voxelized geometry.

This kernel has been optimized to yield a good performance in the GPU but can still be compiled in the CPU without problems. All the CUDA especific commands are enclosed in pre-processor directives that are skipped if the parameter "USING_CUDA" is not defined at compilation time.

Author:: Andreu Badal (Andreu.Badal-Soler@fda.hhs.gov)

Date:: 2012/12/12

Definition in file MC-GPU_kernel_v1.3.cu.

Define Documentation

#define a1_RANECU 40014

Multipliers and moduli for the two MLCG in RANECU.

Definition at line 814 of file MC-GPU_kernel_v1.3.cu.

Referenced by init_PRNG(), and update_seed_PRNG().

#define a2_RANECU 40692

Definition at line 816 of file MC-GPU_kernel_v1.3.cu.

Referenced by init_PRNG().

#define LEAP_DISTANCE 256

Upper limit of the number of random values sampled in a single track.

Definition at line 812 of file MC-GPU_kernel_v1.3.cu.

Referenced by init_PRNG(), and update_seed_PRNG().

#define m1_RANECU 2147483563

Definition at line 815 of file MC-GPU_kernel_v1.3.cu.

Referenced by init_PRNG(), and update_seed_PRNG().

#define m2_RANECU 2147483399

Definition at line 817 of file MC-GPU_kernel_v1.3.cu.

Referenced by init_PRNG().

Function Documentation

int abMODm	(	int	m,
		int	a,
		int	s
	)		`[inline]`

Calculate "(a1*a2) MOD m" with 32-bit integers and avoiding the possible overflow, using the Russian Peasant approach modulo m and the approximate factoring method, as described in: L'Ecuyer and Cote, ACM Trans.

Math. Soft. 17 (1991).

This function has been adapted from "seedsMLCG.f", see: Badal and Sempau, Computer Physics Communications 175 (2006)

Parameters:

[in] m,a,s MLCG parameters

Returns:: (a1*a2) MOD m

Definition at line 919 of file MC-GPU_kernel_v1.3.cu.

Referenced by init_PRNG(), and update_seed_PRNG().

void GCOa	(	float *	energy,
		double *	costh_Compton,
		int *	mat,
		int2 *	seed,
		struct compton_struct *	cgco_SHARED
	)		`[inline]`

Random sampling of incoherent (Compton) scattering of photons, using the sampling algorithm from PENELOPE 2006: Relativistic impulse approximation with analytical one-electron Compton profiles.

Parameters:

[in,out]	energy	incident and final photon energy (eV)
[out]	costh_Compton	cosine of the polar scattering angle
[in]	material	Current voxel material
[in]	seed	RANECU PRNG seed

Definition at line 1287 of file MC-GPU_kernel_v1.3.cu.

References compton_struct::fco, compton_struct::fj0, MAX_MATERIALS, MAX_SHELLS, max_value, min_value, compton_struct::noscco, ranecu(), and compton_struct::uico.

Referenced by track_particles().

void GRAa	(	float *	energy,
		double *	costh_Rayleigh,
		int *	mat,
		float *	pmax_current,
		int2 *	seed,
		struct rayleigh_struct *	cgra
	)		`[inline]`

Sample a Rayleigh interaction using the sampling algorithm used in PENELOPE 2006.

Parameters:

[in]	energy	Particle energy (not modified with Rayleigh)
[out]	costh_Rayleigh	Cosine of the angular deflection
[in]	material	Current voxel material

Definition at line 1181 of file MC-GPU_kernel_v1.3.cu.

References rayleigh_struct::aco, rayleigh_struct::bco, rayleigh_struct::itlco, rayleigh_struct::ituco, min_value, NP_RAYLEIGH, rayleigh_struct::pco, ranecu_double(), and rayleigh_struct::xco.

Referenced by track_particles().

void init_PRNG	(	int	history_batch,
		int	histories_per_thread,
		int	seed_input,
		int2 *	seed
	)		`[inline]`

Initialize the pseudo-random number generator (PRNG) RANECU to a position far away from the previous history (leap frog technique).

Each calculated seed initiates a consecutive and disjoint sequence of pseudo-random numbers with length LEAP_DISTANCE, that can be used to in a parallel simulation (Sequence Splitting parallelization method). The basic equation behind the algorithm is: S(i+j) = (a**j * S(i)) MOD m = [(a**j MOD m)*S(i)] MOD m , which is described in: P L'Ecuyer, Commun. ACM 31 (1988) p.742

This function has been adapted from "seedsMLCG.f", see: A Badal and J Sempau, Computer Physics Communications 175 (2006) p. 440-450

Parameters:

[in]	history	Particle bach number.
[in]	seed_input	Initial PRNG seed input (used to initiate both MLCGs in RANECU).
[out]	seed	Initial PRNG seeds for the present history.

Definition at line 841 of file MC-GPU_kernel_v1.3.cu.

References a1_RANECU, a2_RANECU, abMODm(), LEAP_DISTANCE, m1_RANECU, m2_RANECU, int2::x, and int2::y.

Referenced by track_particles().

int locate_voxel	(	float3 *	position,
		short3 *	voxel_coord
	)		`[inline]`

Find the voxel that contains the current position.

Report the voxel absolute index and the x,y,z indices. The structure containing the voxel number and size is read from CONSTANT memory.

Parameters:

[in]	position	Particle position
[out]	voxel_coord	Pointer to three integer values (short3*) that will store the x,y and z voxel indices.

Returns:: Returns "absvox", the voxel number where the particle is located (negative if position outside the voxel bbox).

Definition at line 1033 of file MC-GPU_kernel_v1.3.cu.

References EPS_SOURCE, voxel_struct::inv_voxel_size, voxel_struct::num_voxels, voxel_struct::size_bbox, voxel_data_CONST, int3::x, float3::x, short3::x, int3::y, float3::y, short3::y, float3::z, and short3::z.

Referenced by track_particles().

void move_to_bbox	(	float3 *	position,
		float3 *	direction,
		float3	size_bbox,
		int *	intersection_flag
	)		`[inline]`

Functions that moves a particle inside the voxelized geometry bounding box.

An EPSILON distance is added to make sure the particles will be clearly inside the bbox, not exactly on the surface.

This algorithm makes the following assumtions:

The back lower vertex of the voxel bounding box is always located at the origin: (x0,y0,z0)=(0,0,0).
The initial value of "position" corresponds to the focal spot location.
When a ray is not pointing towards the bbox plane that it should cross according to the sign of the direction, I assign a distance to the intersection =0 instead of the real negative distance. The wall that will be crossed to enter the bbox is always the furthest and therefore a 0 distance will never be used except in the case of a ray starting inside the bbox or outside the bbox and not pointing to any of the 3 planes. In this situation the ray will be transported a 0 distance, meaning that it will stay at the focal spot.

(Interesting information on ray-box intersection: http://tog.acm.org/resources/GraphicsGems/gems/RayBox.c)

Parameters:

[in,out]	position	Particle position: initially set to the focal spot, returned transported inside the voxel bbox.
[out]	direction	Sampled particle direction (cosine vectors).
[out]	intersection_flag	Set to <0 if particle outside bbox and will not cross the voxels, not changed otherwise.
[out]	size_bbox	Size of the bounding box.

Definition at line 714 of file MC-GPU_kernel_v1.3.cu.

References EPS_SOURCE, NEG_EPS_SOURCE, NEG_INF, float3::x, float3::y, and float3::z.

Referenced by source().

float ranecu ( int2 * seed ) [inline]

Pseudo-random number generator (PRNG) RANECU returning a float value (single precision version).

Parameters:

[in,out] seed PRNG seed (seed kept in the calling function and updated here).

Returns:: PRN double value in the open interval (0,1)

Definition at line 965 of file MC-GPU_kernel_v1.3.cu.

References int2::x, and int2::y.

Referenced by GCOa(), source(), and track_particles().

double ranecu_double ( int2 * seed ) [inline]

Pseudo-random number generator (PRNG) RANECU returning a double value.

Definition at line 995 of file MC-GPU_kernel_v1.3.cu.

References int2::x, and int2::y.

Referenced by GRAa(), and track_particles().

void rotate_double	(	float3 *	direction,
		double	costh,
		double	phi
	)		`[inline]`

Rotates a vector; the rotation is specified by giving the polar and azimuthal angles in the "self-frame", as determined by the vector to be rotated.

This function is a literal translation from Fortran to C of PENELOPE (v. 2006) subroutine "DIRECT".

Parameters:

[in,out]	(u,v,w)	input vector (=d) in the lab. frame; returns the rotated vector components in the lab. frame
[in]	costh	cos(theta), angle between d before and after turn
[in]	phi	azimuthal angle (rad) turned by d in its self-frame

Definition at line 1103 of file MC-GPU_kernel_v1.3.cu.

References float3::x, float3::y, and float3::z.

Referenced by track_particles().

void source	(	float3 *	position,
		float3 *	direction,
		float *	energy,
		int2 *	seed,
		int *	absvox,
		struct source_struct *	source_data_SHARED,
		struct detector_struct *	detector_data_SHARED
	)		`[inline]`

Source that creates primary x rays, according to the defined source model.

The particles are automatically moved to the surface of the voxel bounding box, to start the tracking inside a real material. If the sampled particle do not enter the voxels, it is init in the focal spot and the main program will check if it arrives at the detector or not.

Parameters:

[in]	source_data	Structure describing the source.
[in]	source_energy_data_CONST	Global variable in constant memory space describing the source energy spectrum.
[out]	position	Initial particle position (particle transported inside the voxel bbox).
[out]	direction	Sampled particle direction (cosine vectors).
[out]	energy	Sampled energy of the new x ray.
[in]	seed	Current seed of the random number generator, requiered to sample the movement direction.
[out]	absvox	Set to <0 if primary particle will not cross the voxels, not changed otherwise (>0).

Definition at line 626 of file MC-GPU_kernel_v1.3.cu.

References source_struct::cos_theta_low, source_struct::D_cos_theta, source_struct::D_phi, source_energy_struct::espc, source_energy_struct::espc_alias, source_energy_struct::espc_cutoff, source_struct::max_height_at_y1cm, move_to_bbox(), source_energy_struct::num_bins_espc, source_struct::phi_low, source_struct::position, ranecu(), source_struct::rot_fan, detector_struct::rotation_flag, voxel_struct::size_bbox, source_energy_data_CONST, voxel_data_CONST, float3::x, float3::y, and float3::z.

Referenced by track_particles().

void tally_image	(	float *	energy,
		float3 *	position,
		float3 *	direction,
		signed char *	scatter_state,
		unsigned long long int *	image,
		struct source_struct *	source_data_SHARED,
		struct detector_struct *	detector_data_SHARED
	)		`[inline]`

Tally a radiographic projection image.

This function is called whenever a particle escapes the voxelized volume. The code checks if the particle would arrive at the detector if it kept moving in a straight line after exiting the voxels (assuming vacuum enclosure). An ideal image formation model is implemented: each pixel counts the total energy of the x rays that enter the pixel (100% detection efficiency for any energy). The image due to primaries and different kinds of scatter is tallied separately.

In the GPU, and atomicAdd() function is used to make sure that multiple threads do not update the same pixel at the same time, which would result in a lose of information. Since the atomicAdd function is only available for 'unsigned long long int' data, the float pixel values are scaled by a factor "SCALE_eV" defined in the header file (eg, #define SCALE_eV 10000.0f) and stored as unsigned long long integers in main memory.

WARNING! If the total tallied signal (for all particles) is larger than "1.8e19/SCALE_eV", there will be a bit overflow and the value will be reset to 0 giving bogus results.

WARNING! The detector plane should be located outside the voxels bounding box. However, since the particles are moved outside the bbox in the last step, they could cross the detector plane anyway. If the particles are less than 2.0 cm behind the detector, they are moved back and detected. Therefore the detector can be a few cm inside the bbox and still work. If the Woodcock mean free path is larger than the distance from the bbox to the detector, we may lose some particles behind the detector!

Parameters:

[in]	energy	X-ray energy
[in]	position	Particle position
[in]	direction	Particle direction (cosine vectors)
[in]	scatter_state	Flag marking primaries, single Compton, single Rayleigh or multiple scattered radiation
[out]	image	Integer array containing the image, ie, the pixel values (in tenths of meV)

Definition at line 482 of file MC-GPU_kernel_v1.3.cu.

References detector_struct::center, detector_struct::corner_min_rotated_to_Y, source_struct::direction, detector_struct::inv_pixel_size_X, detector_struct::inv_pixel_size_Z, detector_struct::num_pixels, detector_struct::rot_inv, detector_struct::rotation_flag, SCALE_eV, detector_struct::total_num_pixels, int2::x, float3::x, int2::y, float3::y, and float3::z.

Referenced by track_particles().

void tally_materials_dose	(	float *	Edep,
		int *	material,
		ulonglong2 *	materials_dose
	)		`[inline]`

Tally the depose deposited inside each material.

This function is called whenever a particle suffers a Compton or photoelectric interaction. The energy released in each interaction is added and later in the report function the total deposited energy is divided by the total mass of the material in the voxelized object to get the dose. This naturally accounts for multiple densities for voxels with the same material (not all voxels have same mass). Electrons are not transported in MC-GPU and therefore we are approximating that the dose is equal to the KERMA (energy released by the photons alone). This approximation is acceptable when there is electronic equilibrium and when the range of the secondary electrons is shorter than the organ size.

The function uses atomic functions for a thread-safe access to the GPU memory. We can check if this tally was disabled in the input file checking if the array materials_dose was allocated in the GPU (disabled if pointer = NULL).

Parameters:

[in]	Edep	Energy deposited in the interaction
[in]	material	Current material id number
[out]	materials_dose	ulonglong2 array storing the mateials dose [in eV/g] and dose^2 (ie, uncertainty).

Definition at line 1547 of file MC-GPU_kernel_v1.3.cu.

References SCALE_eV, ulonglong2::x, and ulonglong2::y.

Referenced by track_particles().

void tally_voxel_energy_deposition	(	float *	Edep,
		short3 *	voxel_coord,
		ulonglong2 *	voxels_Edep
	)		`[inline]`

Tally the dose deposited in the voxels.

This function is called whenever a particle suffers a Compton or photoelectric interaction. It is not necessary to call this function if the dose tally was disabled in the input file (ie, dose_ROI_x_max_CONST < 0). Electrons are not transported in MC-GPU and therefore we are approximating that the dose is equal to the KERMA (energy released by the photons alone). This approximation is acceptable when there is electronic equilibrium and when the range of the secondary electrons is shorter than the voxel size. Usually the doses will be acceptable for photon energies below 1 MeV. The dose estimates may not be accurate at the interface of low density volumes.

We need to use atomicAdd() in the GPU to prevent that multiple threads update the same voxel at the same time, which would result in a lose of information. This is very improbable when using a large number of voxels but gives troubles with a simple geometries with few voxels (in this case the atomicAdd will slow down the code because threads will update the voxel dose secuentially).

Parameters:

[in]	Edep	Energy deposited in the interaction
[in]	voxel_coord	Voxel coordinates, needed to check if particle located inside the input region of interest (ROI)
[out]	voxels_Edep	ulonglong2 array containing the 3D voxel dose and dose^2 (ie, uncertainty) as unsigned integers scaled by SCALE_eV.

Definition at line 418 of file MC-GPU_kernel_v1.3.cu.

References dose_ROI_x_max_CONST, dose_ROI_x_min_CONST, dose_ROI_y_max_CONST, dose_ROI_y_min_CONST, dose_ROI_z_max_CONST, dose_ROI_z_min_CONST, SCALE_eV, short3::x, ulonglong2::x, short3::y, ulonglong2::y, and short3::z.

Referenced by track_particles().

void track_particles	(	int	history_batch,
		int	histories_per_thread,
		int	num_p,
		int	seed_input,
		unsigned long long int *	image,
		ulonglong2 *	voxels_Edep,
		float2 *	voxel_mat_dens,
		float2 *	mfp_Woodcock_table,
		float3 *	mfp_table_a,
		float3 *	mfp_table_b,
		struct rayleigh_struct *	rayleigh_table,
		struct compton_struct *	compton_table,
		struct detector_struct *	detector_data_array,
		struct source_struct *	source_data_array,
		ulonglong2 *	materials_dose
	)

Initialize the image array, ie, set all pixels to zero Essentially, this function has the same effect as the command: "cutilSafeCall(cudaMemcpy(image_device, image, image_bytes, cudaMemcpyHostToDevice))";.

CUDA performs some initialization work the first time a GPU kernel is called. Therefore, calling a short kernel before the real particle tracking is performed may improve the accuracy of the timing measurements in the relevant kernel.

Parameters:

[in,out]	image	Pointer to the image array.
[in]	pixels_per_image	Number of pixels in the image (ie, elements in the array). Main function to simulate x-ray tracks inside a voxelized geometry. Secondary electrons are not simulated (in photoelectric and Compton events the energy is locally deposited).

The following global variables, in the GPU __constant__ memory are used: voxel_data_CONST, source_energy_data_CONST, detector_data_CONST, mfp_table_data_CONST.

Parameters:

[in]	history_batch	Particle batch number (only used in the CPU version when CUDA is disabled!, the GPU uses the built-in variable threadIdx)
[in]	num_p	Projection number in the CT simulation. This variable defines a specific angle and the corresponding source and detector will be used.
[in]	histories_per_thread	Number of histories to simulate for each call to this function (ie, for GPU thread).
[in]	seed_input	Random number generator seed (the same seed is used to initialize the two MLCGs of RANECU).
[in]	voxel_mat_dens	Pointer to the voxel densities and material vector (the voxelized geometry), stored in GPU glbal memory.
[in]	mfp_Woodcock_table	Two parameter table for the linear interpolation of the Woodcock mean free path (MFP) (stored in GPU global memory).
[in]	mfp_table_a	First element for the linear interpolation of the interaction mean free paths (stored in GPU global memory).
[in]	mfp_table_b	Second element for the linear interpolation of the interaction mean free paths (stored in GPU global memory).
[in]	rayleigh_table	Pointer to the table with the data required by the Rayleigh interaction sampling, stored in GPU global memory.
[in]	compton_table	Pointer to the table with the data required by the Compton interaction sampling, stored in GPU global memory.
[in,out]	image	Pointer to the image vector in the GPU global memory.
[in,out]	dose	Pointer to the array containing the 3D voxel dose (and its uncertainty) in the GPU global memory.

Definition at line 135 of file MC-GPU_kernel_v1.3.cu.

References dose_ROI_x_max_CONST, linear_interp::e0, GCOa(), GRAa(), linear_interp::ide, init_PRNG(), locate_voxel(), MAX_MATERIALS, mfp_table_data_CONST, rayleigh_struct::pmax, ranecu(), ranecu_double(), rotate_double(), source(), tally_image(), tally_materials_dose(), tally_voxel_energy_deposition(), float2::x, float3::x, float2::y, float3::y, and float3::z.

Referenced by main().

Defines

Functions

Detailed Description

Define Documentation

Function Documentation