Skip to content

Commit

Permalink
rm Protein type, Chain -> ProteinChain
Browse files Browse the repository at this point in the history
  • Loading branch information
AntonOresten committed Dec 14, 2023
1 parent be6cb39 commit 8b046f7
Show file tree
Hide file tree
Showing 10 changed files with 55 additions and 91 deletions.
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,24 +20,24 @@ Pkg.add("Backboner")

## Types and functions

The `Protein` type wraps a vector of `Chain`s, which in turn wraps the `Backbone{4}` type (4, because it stores the positions of 4 atoms per residue: N, CA, C, O). The `Backbone{N}` type has the `N` type parameter in order to remain flexible. It allows one pass only the N, CA, and C atoms of a backbone, such that the O atom positions can added in using the `add_oxygens` function.
Proteins are represented as vectors of `ProteinChain`s, which in turn wrap the `Backbone{3}` type to store the coordinates of N, Ca, and C atoms.

The secondary structure of an entire chain is described by a `Vector{Char}`, where '-' stands for coil/loop, 'H' for helix, and 'E' for strand. For assignment of secondary structure, this package uses the [AssigningSecondaryStructure.jl](https://github.com/MurrellGroup/AssigningSecondaryStructure.jl) package, which implements a simplified version of the DSSP algorithm.
The secondary structures of a chain are described by a `Vector{Char}`, where '-' stands for coil/loop, 'H' for helix, and 'E' for strand. For assignment of secondary structure, this package uses the [AssigningSecondaryStructure.jl](https://github.com/MurrellGroup/AssigningSecondaryStructure.jl) package, which implements a simplified version of the DSSP algorithm.

Protein backbones can be loaded from a PDB file using the `pdb_to_protein` function, which returns a `Protein` instance. Inversely, a `Protein` instance can be written to a PDB file using the `protein_to_pdb` function.
Proteins can be loaded from a PDB file using the `pdb_to_protein` function, which returns a `Vector{ProteinChain}` instance. Inversely, a `Vector{ProteinChain}` instance can be written to a PDB file using the `protein_to_pdb` function.

## Example

```julia
julia> using Backboner

julia> protein = pdb_to_protein("test/data/1ZAK.pdb")
2-element Protein{Float32}:
Chain A with 220 residues
Chain B with 220 residues
2-element Vector{ProteinChain}:
ProteinChain A with 220 residues
ProteinChain B with 220 residues

julia> chain = protein["A"]
Chain A with 220 residues
ProteinChain A with 220 residues

julia> chain.backbone
3×4×220 Backbone{4, Float32}:
Expand Down
16 changes: 7 additions & 9 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,25 +33,23 @@ The `Protein` type wraps a vector of `Chain`s.
julia> using Backboner
julia> protein = pdb_to_protein("test/data/1ZAK.pdb")
2-element Protein{Float32}:
Chain A with 220 residues
Chain B with 220 residues
2-element Vector{ProteinChain}:
ProteinChain A with 220 residues
ProteinChain B with 220 residues
julia> chain = protein["A"] # chains can be accessed by name
Chain A with 220 residues
ProteinChain A with 220 residues
julia> protein["A"] == protein[1] # numeric indexing also works
true
julia> new_protein = Protein([protein["A"]]) # create a new protein with a single chain
1-element Protein{Float32}:
Chain A with 220 residues
julia> new_protein = [protein["A"]] # create a new protein with a single chain
1-element Vector{ProteinChain}:
ProteinChain A with 220 residues
julia> protein_to_pdb(new_protein, "test/data/1ZAK_A.pdb");
```

The `Chain` type wraps the `Backbone{4}` type (4, because it stores the positions of 4 atoms per residue: N, CA, C, O).

## API Reference

```@autodocs
Expand Down
8 changes: 4 additions & 4 deletions docs/src/oxygen.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,12 @@ The `Backbone` type has a type parameter `N` to represent the number of atoms pe
julia> using Backboner
julia> protein = pdb_to_protein("test/data/1ZAK.pdb")
2-element Protein{Float32}:
Chain A with 220 residues
Chain B with 220 residues
2-element Vector{ProteinChain}:
ProteinChain A with 220 residues
ProteinChain B with 220 residues
julia> chain = protein["A"]
Chain A with 220 residues
ProteinChain A with 220 residues
julia> backbone4 = chain.backbone
3×4×220 Backbone{4, Float32}:
Expand Down
14 changes: 4 additions & 10 deletions docs/src/types.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,23 +6,17 @@ The `Backbone` type is designed to efficiently store and manipulate the three-di

`Backbone{N, T}` is a wrapper around a 3xNxL array, where:
- **3** are the three spatial dimensions for the coordinates.
- **N** is the number of atoms per residue.
- **N** is the number of atoms in the backbone per residue.
- **L** is the number of residues in the backbone.
- **T** is the element type of the coordinate array.

## Chain

A `Chain` represents a protein chain, and holds an identifier (usually a single letter), backbone atom coordinates, the amino acid sequence, and secondary structure information.
A `ProteinChain` represents a protein chain, and holds an identifier (usually a single letter), backbone atom coordinates, the amino acid sequence, and secondary structure information.

- `id`: A string identifier for the chain.
- `backbone`: An instance of `Backbone{4}`, storing the coordinates of backbone atoms.
- `aavector`: A vector for storing the amino acid sequence.
- `ssvector`: A vector for storing the secondary structure.

The `Chain` type is designed to provide a comprehensive and consistent representation of a protein chain, ensuring that the backbone coordinates align with the corresponding amino acid sequences and secondary structures.

## Protein

The `Protein` type holds multiple `Chain` instances, representing complete protein structures.

- Stores a collection of `Chain` objects.
- Includes a dictionary for quick access to chains via their identifiers.
The `ProteinChain` type is designed to provide a comprehensive and consistent representation of a protein chain, ensuring that the backbone coordinates align with the corresponding amino acid sequences and secondary structures.
4 changes: 2 additions & 2 deletions src/assign.jl
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ import AssigningSecondaryStructure: assign_secondary_structure!, assign_secondar
Uses a simplified version of DSSP to fill the secondary structure vector of each chain with '-' (coil/loop), 'H' (helix), and 'E' (strand).
"""
function assign_secondary_structure!(protein::Protein)
function assign_secondary_structure!(protein::Vector{ProteinChain})
ss_vectors = assign_secondary_structure([chain.backbone.coords for chain in protein])
for (chain, ssvector) in zip(protein, ss_vectors)
@assert length(chain.ssvector) == length(ssvector)
Expand All @@ -21,7 +21,7 @@ end
Returns a new protein with secondary structure assigned.
"""
function assign_secondary_structure(protein::Protein)
function assign_secondary_structure(protein::Vector{ProteinChain})
new_protein = deepcopy(protein)
assign_secondary_structure!(new_protein)
return new_protein
Expand Down
24 changes: 12 additions & 12 deletions src/chain.jl
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
export Chain
export ProteinChain

"""
Chain <: AbstractVector{Residue}
ProteinChain <: AbstractVector{Residue}
A chain has an identifier (usually a single letter) and holds the backbone atom coordinates, amino acid sequence, and secondary structures of a protein chain.
"""
struct Chain <: AbstractVector{Residue}
struct ProteinChain <: AbstractVector{Residue}
id::AbstractString
backbone::Backbone{4}
aavector::Vector{Char}
ssvector::Vector{Char}

function Chain(
function ProteinChain(
id::AbstractString,
backbone::Backbone{N};
aavector::Vector{Char} = fill('G', length(backbone)),
Expand All @@ -26,15 +26,15 @@ struct Chain <: AbstractVector{Residue}
return new(id, backbone, aavector, ssvector)
end

Chain(backbone::Backbone; kwargs...) = Chain("_", backbone; kwargs...)
ProteinChain(backbone::Backbone; kwargs...) = ProteinChain("_", backbone; kwargs...)
end

@inline Base.:(==)(chain1::Chain, chain2::Chain) = chain1.id == chain2.id && chain1.backbone == chain2.backbone && chain1.ssvector == chain2.ssvector
@inline Base.length(chain::Chain) = length(chain.backbone)
@inline Base.size(chain::Chain) = (length(chain),)
@inline Base.getindex(chain::Chain, i::Integer) = Residue(i, chain.backbone, chain.aavector[i], chain.ssvector[i])
@inline Base.:(==)(chain1::ProteinChain, chain2::ProteinChain) = chain1.id == chain2.id && chain1.backbone == chain2.backbone && chain1.ssvector == chain2.ssvector
@inline Base.length(chain::ProteinChain) = length(chain.backbone)
@inline Base.size(chain::ProteinChain) = (length(chain),)
@inline Base.getindex(chain::ProteinChain, i::Integer) = Residue(i, chain.backbone, chain.aavector[i], chain.ssvector[i])

Base.summary(chain::Chain) = "Chain $(chain.id) with $(length(chain)) residue$(length(chain) == 1 ? "" : "s")"
Base.show(io::IO, chain::Chain) = print(io, summary(chain))
Base.summary(chain::ProteinChain) = "ProteinChain $(chain.id) with $(length(chain)) residue$(length(chain) == 1 ? "" : "s")"
Base.show(io::IO, chain::ProteinChain) = print(io, summary(chain))

has_assigned_ss(chain::Chain) = has_assigned_ss(chain.ssvector)
has_assigned_ss(chain::ProteinChain) = has_assigned_ss(chain.ssvector)
21 changes: 10 additions & 11 deletions src/io.jl
Original file line number Diff line number Diff line change
Expand Up @@ -37,29 +37,28 @@ function Backbone(atoms::Vector{PDBTools.Atom})
return Backbone(coords)
end

function Chain(atoms::Vector{PDBTools.Atom})
function ProteinChain(atoms::Vector{PDBTools.Atom})
id = PDBTools.chain(atoms[1])
@assert allequal(PDBTools.chain.(atoms)) "atoms must be from the same chain"
backbone = Backbone(atoms)
aavector = [get(ONE_LETTER_AA_CODES, atom.resname, 'X') for atom in atoms if atom.name == "CA"]
return Chain(id, backbone, aavector=aavector)
end

function Protein(atoms::Vector{PDBTools.Atom})
filter!(a -> a.name in ["N", "CA", "C", "O"], atoms)
ids = PDBTools.chain.(atoms)
chains = [Chain(atoms[ids .== id]) for id in unique(ids)]
return Protein(chains)
return ProteinChain(id, backbone, aavector=aavector)
end

"""
pdb_to_protein(filename::String)
Assumes that each residue starts with four atoms: N, CA, C, O.
"""
pdb_to_protein(filename::String) = Protein(PDBTools.readPDB(filename))
function pdb_to_protein(filename::String)
atoms = PDBTools.readPDB(filename)
filter!(a -> a.name in ["N", "CA", "C", "O"], atoms)
ids = PDBTools.chain.(atoms)
chains = [ProteinChain(atoms[ids .== id]) for id in unique(ids)]
return chains
end

function protein_to_pdb(protein::Protein, filename, header=:auto, footer=:auto)
function protein_to_pdb(protein::Vector{ProteinChain}, filename, header=:auto, footer=:auto)
atoms = PDBTools.Atom[]
index = 0
residue_index = 0
Expand Down
29 changes: 2 additions & 27 deletions src/protein.jl
Original file line number Diff line number Diff line change
@@ -1,28 +1,3 @@
export Protein
@inline Base.getindex(protein::AbstractVector{ProteinChain}, id::AbstractString) = protein[findfirst(c -> c.id == id, protein)]

"""
Protein <: AbstractVector{Chain}
A wrapper for a vector of chains.
Chains can be accessed by index or by ID.
"""
struct Protein <: AbstractVector{Chain}
chains::Vector{Chain}
id_dict::Dict{AbstractString, Chain}

function Protein(chains::Vector{Chain})
@assert length(unique([chain.id for chain in chains])) == length(chains)
id_dict = Dict{AbstractString, Chain}(chain.id => chain for chain in chains)
return new(chains, id_dict)
end
end

@inline Base.:(==)(protein1::Protein, protein2::Protein) = protein1.chains == protein2.chains
@inline Base.size(protein::Protein) = size(protein.chains)
@inline Base.length(protein::Protein) = length(protein.chains)
@inline Base.getindex(protein::Protein, i) = protein.chains[i]
@inline Base.getindex(protein::Protein, id::AbstractString) = protein.id_dict[String(id)]

Base.summary(protein::Protein) = "Protein with $(length(protein)) chain$(length(protein) == 1 ? "" : "s")"

has_assigned_ss(protein::Protein) = all(has_assigned_ss, protein.chains)
has_assigned_ss(protein::AbstractVector{ProteinChain}) = all(has_assigned_ss, protein)
8 changes: 4 additions & 4 deletions test/chain.jl
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,20 @@

coords = randn(3, 4, 5)
backbone = Backbone(coords)
chain = Chain("A", backbone)
chain = ProteinChain("A", backbone)
@test chain.id == "A"
@test chain.backbone.coords == coords
@test chain.aavector == fill('G', length(chain))
@test chain.ssvector == fill(' ', length(chain))
@test !has_assigned_ss(chain)
@test length(chain) == 5
@test size(chain) == (5,)
@test Chain(remove_column(backbone, 4)).backbone == add_oxygens(remove_column(backbone, 4))
@test Chain(backbone).id == "_"
@test ProteinChain(remove_column(backbone, 4)).backbone == add_oxygens(remove_column(backbone, 4))
@test ProteinChain(backbone).id == "_"

@test chain[1] == Residue(1, backbone, 'G', ' ')

@test summary(chain) == "Chain A with 5 residues"
@test summary(chain) == "ProteinChain A with 5 residues"

io = IOBuffer()
show(io, chain)
Expand Down
8 changes: 3 additions & 5 deletions test/protein.jl
Original file line number Diff line number Diff line change
@@ -1,14 +1,12 @@
@testset "protein.jl" begin

@testset "Protein" begin
A = Chain("A", Backbone(randn(3, 4, 5)))
B = Chain("B", Backbone(randn(3, 4, 6)))
protein = Protein([A, B])
A = ProteinChain("A", Backbone(randn(3, 4, 5)))
B = ProteinChain("B", Backbone(randn(3, 4, 6)))
protein = [A, B]
@test protein[1] == protein["A"] == A
@test protein[2] == protein["B"] == B
@test length(protein) == 2
@test length.(protein) == [5, 6]
@test summary(protein) == "Protein with 2 chains"
@test !has_assigned_ss(protein)
end

Expand Down

2 comments on commit 8b046f7

@AntonOresten
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator register

Release notes:

  • Add functions for calculating displacements and distances between sets of particular atoms, e.g. for getting carbonyl and nitrogen bond vectors/lengths.
  • Add functions for working with dihedral angles
  • Rename Chain type to ProteinChain
  • Removed the Protein type in favor of simply representing proteins as Vector{ProteinChain}, and overloading some functions like getindex, to allow for indexing with chain IDs.

Further rewrite and restructuring of types soon, see #10.

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/97109

Tagging

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v0.4.0 -m "<description of version>" 8b046f7b9e551b707291c58df4ce1d8961d127c7
git push origin v0.4.0

Please sign in to comment.